What has four letters, sometimes has nine letters but never has five letters.
If you can't think of an answer, it's because the preceding sentence was not a question, it was a statement, as indicated by the period in the end. This goes to show that the meaning of a word depends on its context, and sometimes the dependency is very distant.
Check out this sentence in Hebrew that has four different meanings:
" יש בבית הספר מורה לספרות"
The ambiguity arises from the fact that מורה and ספרות are homonyms, and while this is a valid sentence it does not contain enough information to decide which sense of the word is correct.
Even a human couldn't look at that sentence in isolation and know which meaning the author intended. We call such a sentence "underspecified" and while underspecified sentences are ubiquitous in the written world, we barely notice, because we don't read sentences on their own.
A pattern emerges here. A word's meaning can depend on the context it appears in. A sentence's meaning can also depend on it's surrounding sentences. Sometimes we need the broad meaning of a sentence to understand the meaning of a certain word in it which in turn informs the meaning of the sentence.
If you think these are edge cases you're right, they are. But human language is full of edge cases, and if they weren't we'd be solving all of NLP with regular expressions and a few if statements.
The evolution in architecture for deep NLP has been about handling the structure of language and the edge cases that structure invokes.
As you uncover errors in your text annotation and NLP models, leverage tools like LightTag's analytics to review individual cases and notice if the context that defines the meaning of a word is more involved than your current model can handle.