Blog

Natural Language Processing

What is natural language processing?

Natural language processing is designed to handle text designed for humans and not for computers. Programming languages follow a strict grammar which is not the case for human languages, and there are also problems with sentences that can mean different things, for example the classic phrase “Time flies like an arrow but fruit flies like a banana”. This sentence has 2 possible meanings, the second clause could mean that if you throe some fruit that it is not very aerodynamic, or more likely that fruit flies like to eat bananas.

Natural language processing can be either spoken or written, with spoken having additional complications over written. Most natural language processing deals with converting input from a person to something the computer can understand but there is an additional aspect natural language processing which is generation of text or speech.

Features of natural language processing

  • Phonology deals with the component sounds.
  • Morphology deals with the fundamental components of words.
  • Syntax refers to the structure of the sentence.
  • Semantics deals with the meaning of the words.
  • Pragmantics concerns how sentences are used in different contexts.
  • Discourse deals with how prior sentences can affect the current sentence.
  • World knowledge refers to general information that everyone knows and is implicit in sentences.

Ambiguity

The phrase I made her duck has a number of different interpretations.

  1. I cooked duck for her
  2. I cooked duck belonging to her
  3. I made a toy duck which she owns
  4. I made her quickly lower herself down
  5. I turned her into a duck using magic

Language translation

The problem with language translation is that we need to understand in order to perform an accurate translation, this is the reason that Google translate can make some terrible translation errors since it relies on statistical models. It uses UN documents for example that are translated into multiple languages by human translators and then finds words that are used in the same context in each language pair. There is a story (possibly apocryphal) about a translation to Russian and back again.

“The spirit was willing, but the flesh was weak” was translated to Russian and then the English translation of the phrase was “The vodka was good, but the meat was rotten”.

Intro to Artificial Intelligence

What is Artificial Intelligence ?

The term AI was invented by McCarthy in 1956, and refers to the behaviour that seems complex and would be regarded as intelligent. The most well known test of intelligence is the so called Turing test which was named after Alan Turing and it is a subjective test that relies on the idea that if you can fool a person into thinking you are another person then you have passed the test.

The turing test takes place on 2 terminals and you type a message and read the reply on screen, and then try and determine if the other participant was a human or a machine. There are many attempts to pass the Turing test, and one of the first attempts was calld Eliza. It simulated a psychiatrist.

  Men are all alike.
IN WHAT WAY
  They're always bugging us about something or other.
CAN YOU THINK OF A SPECIFIC EXAMPLE
  Well, my boyfriend made me come here.
YOU BOYFRIEND MADE YOU COME HERE
  He says I'm depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
  It's true. I am unhappy.
DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY
 I need some help, that much seems certain. 
WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
 Perhaps I could learn to get along with my mother. 
TELL ME MORE ABOUT YOUR FAMILY
  My mother takes care of me.
WHO ELSE IN YOU FAMILY TAKES CARE OF YOU
  My Father.
YOUR FATHER
  You are like my father in some ways.
WHAT RESEMBLANCE DO YOU SEE
  You are not very aggressive but I think you don't want me to notice that.
WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVE
 You don't argue with me. 
WHY DO YOU THINK I DON'T ARGUE WITH YOU
  You are afraid of me.
DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU
  My father is afraid of everybody.
WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER
  Bullies.
DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE

This looks pretty reasonable however, it is designed to use keywords, such as Mother, Father and then refer to family, but it has no understanding of context and so can easily be led astray and comes back non-sensical or inappropriate responses.

There has never been a program that has passed the Turing test but some of the other things that people used to think would require an intelligent computer to solve have been done. Chess was regarded in this manner but ever since Deep Blue beat the reigning Grand Master at chess, then it has been regarded as solved, and even more impressive is the game of Go. This has a lot more moves than chesss and Alpha Go managed to teach itself how to play by viewing a series of games. Ironically a lot of things that even the most stupid person can do, are still far beyond the scope of the best computer programs, for example what is in a photograph, or why is The Phantom Menace such a terrible film.

The term AI ia overused and many companies use it as a buzzword rather than implementing something that is actually smart. There are 4 approaches to AI.

  1. Strong AI which tries to build machines than can reason and have knowledge of the real world. This approach was popular during the optimism of 50’s and 60’s but turned out to be far harder than was expected. 2, Weak AI which are not designed to actually reason but merely to act as though they are intelligent.
  2. Applied AI is AI with a specific purpose in mind, for example a door lock than opens when it recognises your face.
  3. Cognitive AI are designed to simulate experiments on how the mind works.

History of Artificial Intelligence

  • Aristotle (384-322 BC) developed a systen of logic which was the first formal basis for deductive reasoning.
  • Pascal (1642) made the first mechanical calculating machine.
  • George Boole (19th century) developed boolean alegbra to express rules.
  • Charles Babbage + Ada Byron (19th century) worked on the first programmable machines.
  • Alan Turing (1950) wrote a paper detailing of computer AI.
  • Marvin Minsky (1951) developed the first neural networks.
  • MYCIN (1974) which was the first expert system and used in the medical field.

Successes of AI

  1. Deep Blue, the computer by IBM that beat Gary Kasparov at chess in 1997.
  2. Machine translation, systems such as Google translate which can translate text between multiple languages.
  3. Autonomous agents, NASA’s Mars rovers explored Mars for months at a time.

Machine Learning

What is machine learning?

There are some problems where the solution is not known and so conventional programming techniques do not work instead we get the computer to perform statistical data gathering and determine the factors themselves. The computer is able to process vast amounts of data and detect significance in cases where no human would have ever found it.

There are different types of machine learning, but most of them can be classified as supervised learning or unsupervised learning. Supervised learning is where you have labelled data, for example you could have loan decisions (the label would be loan granted and loan refused) along with the application forms and other data (such as credit rating). Unsupervised learning is where you don’t even know the type of answers so for example you could find films that likely to be popular with another person (Netflix recommendations).

Supervised Learning

Supervised learning is generally done with data that has already been evaluated by a person or possibly another algorithm. In the loan example we might take into account the type of loan (mortage, credit card, overdraft), the employment history, wages, outstanding debt, and credit score. Some of these values are categorical (the type of loan), and some are quantitative (the amount of outstanding debt). Sometimes is may be worth changing a quantitative value into a categorical value (0-5000 debt, 5000-15000 debt and 15000+ or more debt). One problem with supervised learning is that it can enforce biases in the original data, so if loans were refused to Afrian Americans then the algorithm could pick that up even if there is no data relating to the race of the applicant stored directly in the data. There is an article in Forbes on this problem.

Unsupervised learning

One of the algorithms used for unsupervised learning is K means, this tries to minimize the distance between points in a multidimensional sense. So for the Netflix example you would consider how films are rated by the most prolific 5000 people (ones with lots of rating information), and you assign each person to a random group, you then find the center point of the group and then move people in the group with the closest center, you keep on doing this until there are no changes, or after a certain number of loops around. You can then match other people to one of the people in the same group and determine films they are likely to like.