What is natural language processing?

Natural language processing is designed to handle text designed for humans and not for computers. Programming languages follow a strict grammar which is not the case for human languages, and there are also problems with sentences that can mean different things, for example the classic phrase “Time flies like an arrow but fruit flies like a banana”. This sentence has 2 possible meanings, the second clause could mean that if you throe some fruit that it is not very aerodynamic, or more likely that fruit flies like to eat bananas.

Natural language processing can be either spoken or written, with spoken having additional complications over written. Most natural language processing deals with converting input from a person to something the computer can understand but there is an additional aspect natural language processing which is generation of text or speech.

Features of natural language processing

  • Phonologydeals with the component sounds.
  • Morphologydeals with the fundamental components of words.
  • Syntaxrefers to the structure of the sentence.
  • Semanticsdeals with the meaning of the words.
  • Pragmanticsconcerns how sentences are used in different contexts.
  • Discoursedeals with how prior sentences can affect the current sentence.
  • World knowledgerefers to general information that everyone knows and is implicit in sentences.

Ambiguity

The phrase I made her duck has a number of different interpretations.

  1. I cooked duckfor her
  2. I cooked duckbelonging to her
  3. I made a toy duckwhich she owns
  4. I made her quickly lower herself down
  5. I turned her into a duck using magic

Language translation

The problem with language translation is that we need to understand in order to perform an accurate translation, this is the reason that Google translate can make some terrible translation errors since it relies on statistical models. It uses UN documents for example that are translated into multiple languages by human translators and then finds words that are used in the same context in each language pair. There is a story (possibly apocryphal) about a translation to Russian and back again.

“The spirit was willing, but the flesh was weak” was translated to Russian and then the English translation of the phrase was “The vodka was good, but the meat was rotten”.