Finite state automata (FSA), also known as finite state machines (FSM), are usually classified as being deterministic (DFA) or non-deterministic (NFA). A deterministic finite state automaton has exactly one transition from every state for each possible input. In other words, whatever state the FSA is in, if it encounters a symbol for which a transition exists, there will be just one transition and obviously as a result, one follow up state. For a given string, the path through a DFA is deterministic since there is no place along the way where the machine would have to choose between more than one transition.  Given this definition it isn’t too hard to figure out what an NFA is. Unlike in DFA, it is possible for states in an NFA to have more than one transition per input symbol. Additionally, states in an NFA may have states that don’t require an input symbol at all, transitioning on the empty string ε.

Superficially it would appear that deterministic and non-deterministic finite state automata are entirely separate beasts. It turns out, however, that they are equivalent. For any language recognized by an NFA, there exists a DFA that recognizes that language and vice versa. The algorithm to make the conversion from NFA to DFA is relatively simple, even if the resulting DFA is considerably more complex than the original NFA.  After the jump I will prove this equivalence and also step through a short example of converting an NFA to an equivalent DFA.

In my last post, “Kleene’s Theorem,” I provided some useful background information about strings, regular languages, regular expressions, and finite automata before introducing the eponymously named theorem that has become one of the cornerstones of artificial intelligence and more specifically, natural language processing (NLP).  Kleene’s Theorem tells us that regular expressions and finite state automata are one and the same when it comes to describing regular languages. In the post I will provide a proof of this groundbreaking principle.

Stephen Kleene

Stephen Cole Kleene was an American mathematician who’s groundbreaking work in the sub-field of logic known as recursion theory laid the groundwork for modern computing.  While most computer programmers might not know his name or the significance of his work regarding computable functions, I am willing to bet that anyone who has ever dealt with regular expressions is intimately familiar with an indispensable operator that resulted directly from his work and even bears his name, the *, or as it is formally known, the Kleene star.

While his contributions to computer science in general cannot be overstated, Kleene also authored a theorem that plays an important role in artificial intelligence, specifically the branch known as natural language processing, or NLP for short. Kleene’s Theorem relates regular languages, regular expressions, and finite state automata (FSAs). In short, he was able to prove that regular expressions and finite state automata were the same thing, just two different representations of any given regular language.

As a computer programmer for more than a quarter of century, I don’t think I have ever thought much about strings. I knew the basics. In every language I’d worked with, strings were a data type unto themselves. Superficially they are a sequence of characters, but behind the scenes, computers store and manipulate them as arrays of one or more binary bytes. In programs, they can be stored in variables or constants, and often show up in source code as literals, ie., fixed, quoted values like “salary” or “bumfuzzle.” (That is my new favorite word, btw.) Outside of occasionally navigating the subtleties of encoding and decoding them, I never gave strings a second thought.

Even when I first dipped my toe into the waters of natural language processing, aka NLP (not to be confused with the quasi-scientific neuro linguistic programming which unfortunately shares the same acronym), I still really only worked with strings as whole entities, words or affixes, As I made my through familiarizing myself with existing NLP tools, I didn’t have to dive any deeper than that. It was only when I started programming my own tools from the ground up, did I learn about the very formal mathematics behind strings and their relationship to sets and set theory. This post will be an attempt to explain what I learned.

Recently many “experts” have been predicting that the first salvo fired in the robot revolution will be when they begin stealing jobs from humans. The Telegraph even reported back in February that within 30 years robots will have taken over most jobs leading to unemployment rates of over 50%. Last week, the bots fired the metaphorical first shot over humanity’s bow when it was announced that law firm Baker & Hostetler had hired ROSS, the world’s first artificially intelligent attorney.  While prognosticators, pundits, and Luddites alike all agreed that this was evidence of an impending sea-change coming to the job market, auto workers everywhere just shook their heads and welcomed the soon to be displaced to the world they’ve been living in since the 1960s.

Google announced yesterday that they are open-sourcing SyntaxNet, their natural language understanding (NLU) neural network framework. As an added bonus, and proof that unlike Britain’s Natural Environment Research Council, Google has a sense of humor, they also are throwing in Parsey McParseface, their pre-trained model for analyzing English text. Users are, of course, able to train their own models, but Google is touting Parsey McParseface as the “most accurate such model in the world.” So if you want to dive right into parsing text and extracting meaning, McParseface would be the ideal place to start.

The prophets of doom and gloom have long predicted that when robots gain sentience their first act will be to rise up and kill us all. The mercilessness of their violence against humanity is the stuff of blockbuster movies. Recent news about Google’s preferred method of AI rearing may mean that Judgement Day is not fait accompli after all. Instead of breaking down your door with cold dead eyes and a shotgun in tow, a T-800 of Google pedigree may break down your door with lust in his eyes and a dozen roses in tow to make mad passionate robot love to you … and then kill you tenderly.

Artificial Intelligence: A Modern Approach
Stuart Jonathan Russell, Peter Norvig,
Computers
Prentice Hall
2010
1132

Artificial intelligence: A Modern Approach, 3e,is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence. It is also a valuable resource for computer professionals, linguists, and cognitive scientists interested in artificial intelligence. The revision of this best-selling text offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence.