Google announced yesterday that they are open-sourcing SyntaxNet, their natural language understanding (NLU) neural network framework. As an added bonus, and proof that unlike Britain’s Natural Environment Research Council, Google has a sense of humor, they also are throwing in Parsey McParseface, their pre-trained model for analyzing English text. Users are, of course, able to train their own models, but Google is touting Parsey McParseface as the “most accurate such model in the world.” So if you want to dive right into parsing text and extracting meaning, McParseface would be the ideal place to start.
Built on top of Google’s TensorFlow numerical computation library, SyntaxNet is a syntactic parser. Often the first component in natural language systems, a syntactic parser deconstructs each sentence of the source text, performing part-of-speach (POS) tagging on each word and analyzing the syntactic relationships between the words. The results of this process are stored in what is known as a dependency parse tree.
The main problem facing SyntaxNet, and any attempt at natural language understanding for that matter, is the extraordinary amount of ambiguity present, sometimes in even short, superficially straightforward sentences. The example Google uses to illustrate this is, “Alice drove down the street in her car.” Is Alice driving on street that is in her car, or is Alice in her car which in turn driving down the street? For humans, this is an easy sentence to understand. The first option is absurd. But for computers doing probabilistic reasoning, anything with a non-zero probability of being correct must be considered before it can be ruled out.
Taking the lead from the effectiveness of the human brain to quickly disambiguate natural language, SyntaxNet leverages the non-linear classification capabilities to generate models like Parsey McParseface that is capable of 94% accuracy, a number fast approaching human-level performance on well formed text. Google’s goal is of course loftier, they “… to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts.” If I had to put my money on anyone, it would be them.
Read Google’s press release: “Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source”