Natural Language Processing in JavaScript with Natural

In this course we’ll work through Natural’s API for natural language processing in JavaScript. We’ll look at how to process text: learning how to break up language strings, find the word roots, work with inflectors, find sequences of words, and tag parts of speech. We’ll learn how to find important stats about a body of text: how to compare strings, how to classify text with machine learning, how to use the tf-idf tool to find relevant words. We’ll look at some of the extra tools Natural gives us, including the dictionary/thesaurus of WordNet, a phonetics comparer that lets us see if two words sound the same, and a spellcheck feature. We’ll also look at tries and digraphs, two data structures that help us better analyze bodies of text.

Watch User Created Playlist (14)

This playlist is user created.

pro-course-rss-logo

PRO RSS Feed

Break up language strings into parts using Natural

P

Find the roots of words using stemming in Natural

P

Pluralizing nouns and counting numbers with inflectors in Natural

P

Find sequences of words (n-grams) using Natural

P

Tag parts of speech using Natural

P

Compare similarity of strings through string distance in Natural

P

Classify text into categories with machine learning in Natural

P

Classify JSON text data with machine learning in Natural

P

Using machine learning classifiers in a new project

P

Identify the most important words in a document using tf-idf in Natural

P

Find a word’s definition using WordNet in Natural

P

Search more efficiently with tries using Natural

P

Include spell-check in text projects using Natural

P

Check if words sound alike using Natural

P
natural tutorial about Break up language strings into parts using Natural

Break up language strings into parts using Natural

1:25 natural PRO

A part of Natural Language Processing (NLP) is processing text by “tokenizing” language strings. This means we can break up a string of text into parts by word, sentence, etc. In this lesson, we will use the natural library to tokenize a string. First, we will break the string into words using WordTokenizer, WordPunctTokenizer, and TreebankWordTokenizer. Then we will break the string into sentences using RegexpTokenizer.

natural tutorial about Find the roots of words using stemming in Natural

Find the roots of words using stemming in Natural

1:33 natural PRO

We will learn about “stemming,” the process of finding the root of words, often in order to group words by a common base root. We will look at the Porter and Lancaster Stemmers, briefly touch on Natural’s support for Russian and Spanish stemmers, and introduce the function to stem and tokenize at the same time.

natural tutorial about Pluralizing nouns and counting numbers with inflectors in Natural

Pluralizing nouns and counting numbers with inflectors in Natural

1:06 natural PRO

Inflectors are the modifiers of a word that indicate grammatical categories. While Natural’s coverage of inflectors is not comprehensive, we will show how Natural can pluralize/singularize nouns and count numbers.

natural tutorial about Find sequences of words (n-grams) using Natural

Find sequences of words (n-grams) using Natural

2:06 natural PRO

N-grams are sequences of words, where the 'n' stands for the number of words in the sequence. In this lesson, we will see how to find bigrams (2-grams), trigrams (3-grams), and any other length n-gram in a body of text.

natural tutorial about Tag parts of speech using Natural

Tag parts of speech using Natural

2:16 natural PRO

An important component of many natural language processing projects is being able to identify the grammar of a piece of text. We’ll learn how to do that with Natural’s parts of speech (POS) tagger.

There are many tags, and it's worth looking them up online (search "POS tag symbols") to become familiar with them all.

The setup of the tagger may seem a little strange, but it allows you to replace the lexicon or the rules with a different lexicon or rule set of your choice.

natural tutorial about Compare similarity of strings through string distance in Natural

Compare similarity of strings through string distance in Natural

3:32 natural PRO

We will learn how to compare how similar two strings are to each other, examining three algorithms: Jaro-Winkler, Levenshtein, and Dice’s Coefficient.

You should note that none of these algorithms are inherently better than the others. Instead, it's important to choose the one that best fits your text data.

natural tutorial about Classify text into categories with machine learning in Natural

Classify text into categories with machine learning in Natural

3:42 natural PRO

In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classifier - basic machine learning algorithms - in order to classify text into categories.

natural tutorial about Classify JSON text data with machine learning in Natural

Classify JSON text data with machine learning in Natural

6:05 natural PRO

In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.

While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.

The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.

This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization

natural tutorial about Using machine learning classifiers in a new project

Using machine learning classifiers in a new project

2:03 natural PRO

By this point we've seen that classification can take a long time, and with more data, it would take even longer. Luckily, Natural provides support to save your classifiers. In this lesson, we will learn how to save a classifier and load it into a new project in order to classify new data.

natural tutorial about Identify the most important words in a document using tf-idf in Natural

Identify the most important words in a document using tf-idf in Natural

5:15 natural PRO

Tf-idf, or term frequency-inverse document frequency, is a statistic that indicates how important a word is to the entire document. This lesson will explain term frequency and inverse document frequency, and show how we can use tf-idf to identify the most relevant words in a body of text.

natural tutorial about Find a word’s definition using WordNet in Natural

Find a word’s definition using WordNet in Natural

2:43 natural PRO

This lesson introduces WordNet, which is an important resource in natural language processing. With WordNet, we can look up a word’s definition, or find its synonyms.

natural tutorial about Search more efficiently with tries using Natural

Search more efficiently with tries using Natural

1:40 natural PRO

Tries are a data structure that provide an efficient way to search for the existence of a word or phrase in a body of text, or to search by prefix.

natural tutorial about Include spell-check in text projects using Natural

Include spell-check in text projects using Natural

3:05 natural PRO

In this lesson, we’ll see how to use Natural’s probabilistic spell-checker, which uses the trie data structure.

natural tutorial about Check if words sound alike using Natural

Check if words sound alike using Natural

2:14 natural PRO

In this lesson, we’ll take a look at Natural’s phonetics feature. We’ll learn how to check whether two words sound alike, looking at both the SoundEx and Metaphone algorithms.

HEY, QUICK QUESTION!
Joel's Head
Why are we asking?