Close

19.04.2018

AI BASICS: NATURAL LANGUAGE PROCESSING WITH NODE.JS

For a couple of years now, AI and Machine Learning have been taking over web forums as well as adding to the excitement of eager developers who are keen to give it a go in their projects. I’m by no means a Machine Learning expert, but I have brushed the surface of it in a couple of development projects which required a “smarter” way of doing things.

To be clear, what we’re going to explore today is only the tip of the iceberg when it comes to Machine Learning and my examples are going to be quite rudimentary, however Natural Language Processing is quite an exciting prospect and has some amazing, yet creative use cases for your projects, examples of those would be:

  • More intelligent search suggestions and search results
  • Chatbot integration for a better understanding of user input / conversation
  • Text to Speech integrations, similar to services like Amazon Polly
  • Content editing features for content producers like spell checks, syntax and more.

What is NLP?

Natural Language Processing by definition as stated on Wikipedia, refers to: “the application of computational techniques to the analysis and synthesis of natural language and speech.” – Lets break that down.

An NLP library will help you perform relatively complicated data extraction on string. Ever searched on Google and spelled a word incorrectly? Ever noticed how Google then tells you, “Displaying results for x as well” – well that’s some NLP at work. The search form on Google’s home page is incredibly complex when it comes to analyzing what, we as humans have typed into the form input.

NLP is usually performed on a string of words.

As a developer using an NLP library I can extract a ton of information that could help perform almost any task I like. For this project, we’ll be using Natural.

For the sake of brevity I’m just going to cover the most useful methods that would be quick to implement into your own projects and iterate on.

I found this article title on Web Designer Depot:

“The Secret Designer: First Job Horror”

Analyzing the entire string is one thing, but we want to be able to perform methods on individual words in order to extract more data from them. Luckily, we can use a tokenizer to do so, have a look below:

TOKENS

var nlp = require('natural');
var tokenizer = new nlp.WordTokenizer();
console.log(tokenizer.tokenize("The Secret Designer: First Job Horror"));

This will return a simple JS array to our program:

[ 'The', 'Secret', 'Designer', 'First', 'Job', 'Horror' ]

The WordTokenizer simply breaks up the string into words that we can iterate and perform methods on. Interestingly, the Natural library comes with a few different tokenizers.

Natural has collected a number of algorithms written by some very smart people to perform the functionality we will discuss, there’s a few ways to skin a cat in these examples so if you’d like to dive a bit deeper, all the information is on the Natural Github page.

STRING DISTANCE

Natural uses the Levenshtein distances algorithm as a way of determining if two strings match:

var nlp = require('natural');
console.log(natural.LevenshteinDistance("Daine","Dane"));

The above will log out 1 meaning that the two strings match in context of the algorithm. As you can see, there are many ways of spelling “Daine” – Levenshtein distances can get incredibly complex as it uses a number of parameters referred to as:

  • insertions
  • substitutions
  • deletions

In the example above, we have inserted an i thus the Levenshtein distance will compute the string based on an insertion. This method is very useful for providing suggestions based on bad spelling.

APPROXIMATE STRING MATCHING

Another great piece of functionality that could really spice up your apps is Approximate String Matching. Similar to “String Distance” above.  In fact, it implements the Levenshtein algorithm. This method is better for strings with more context or some kind of entity (ie, A city, country, person, etc) that could be spelled wrong within a string.

PHONETICS

For words that sound the same, yet have a different meaning, the metaphone.compare()method is incredibly useful.

var nlp = require('natural');
var metaphone = nlp.Metaphone;
if(metaphone.compare('see', 'sea')) {
  console.log('Phonetically they match!');
}

SPELLCHECK

Spellchecks can be used in a dynamic way, this kind of functionality is great if you are rewriting spellcheck functionality in your app, or maybe you’re building some kind of word processing tool.

var checks = ['something', 'soothing']; // Known as a corpus
var spellcheck = new nlp.Spellcheck(checks);

We could then run:

spellcheck.getCorrections('soemthing', 1); // ['something']

DICTIONARY

Wordnet is the latest integration in Natural. It’s a dictionary database developed by Princeton University which allows for the instantaneous lookup of words, including all meta associated with that word. Examples of meta would be verbs, adjectives and synonyms.

Wordnet requires that you install the wordnet-db NPM package in order to run keywords against it. You can install it into your projects by typing:

npm install wordnet-db

This bundled functionality has some big implications for standardizing the the native dictionary lookup that generally most operating systems bake in to browsers and software. Coupled with libraries like React, developers could take this pretty far.

Example

Okay! So how about a full example then? Let’s build a simple CLI tool that will prompt us for a word and then return a dictionary look up. Please note that for the sake of brevity, I have left out checks and error handling. As Node.js has support for promises, it would be easy enough to add this in to the logic flow.

First, create a folder whose name is whatever you would like to call your app. Next, cd into the folder and run:

npm init -y

This will create a blank package.json. There is one property that is important here and that is main which tells the Node.js application to run from this file. Keep it as index.js and create the file in the same folder as the package.json, you can do that on the CLI by typing:

touch index.js

Next up, we’ll want to install a couple of dependencies:

npm install --save commander wordnet-db natural

Commander.js is a powerful package for NPM that makes writing CLI based applications a breeze. We’ve already covered Natural in this post.

Open up index.js and paste the following on the very first line of the file:

#!/usr/bin/env node

This line tells Node to execute the file when we npm link our dictionary app to usr/local/bin, as in to use it as a program with flags, for instance:

dictionary lookup “HTML”

Lets add two more properties to our package.json:

"preferGlobal": true,
"bin": "./index.js"

preferGlobal will allow us to run our package anywhere in our OS, the bin property simply tells NPM link what file to execute. Make sure you’re in your project folder and run:

npm link

I’ve added the completed package.json and index.js as a Gist below:

https://gist.github.com/dainemawer/d4dc972fd2c0db5e58615c13c17ca8aa

Here’s an explanation:

  • First, we require commander, wordnet and natural  from node_modules.
  • We’ll come back to the wordNetLookup function in a bit.
  • We can call program multiple times, in this application I’ve called it three times.
  • The first call sets up the version and description.
  • Next we setup a command. This command will take one required parameter <word>
  • We then provide it an alias in case you don’t want to type out the lookup everytime you run the program.
  • We then add a description for what the command does and finally we run .action()
  • The action method is fed a callback function which takes the word the user typed on the terminal as a parameter. Within this call back, we run wordNetLookup() which passed down the fed parameter and performs the lookup.

We can now run dictionary lookup “word” and the program will perform the lookup of any word that’s currently in the WordNet database.

And that’s it! I hope you’ve enjoyed this crash course into NLP. In my next article we’ll look at NER, or otherwise known as Natural Entity Recognition, which will allow us to extract data from real things like cities, people and countries. Coupled with NLP, NER provides some seriously powerful integrations into our Node-based applications.