In this lesson, we’ll see how to use Natural’s probabilistic spell-checker, which uses the trie data structure.
[00:00] First, import the "natural" library. We'll also "fs" module, because we'll be working with the file system. We will also need a "tokenizer." The spellcheck requires a large corpus. The more general you want your spellcheck to be, the larger corpus you need.
[00:26] We'll be importing a corpus of about half a million words. To do that, we'll say "fs.readFileSync," because for the purpose of this lesson, we'll read it synchronously, the file name, and an encoding.
[00:45] Our corpus is an array of words that we'll get by saying "tokenizer.tokenize(text)." To make a spellchecker, we say "new natural.Spellcheck" and pass at the corpus. You can use the spellcheck in two ways.
[01:07] The first is to check if a word is spelled correctly. We can do that by saying, "spellcheck.isCorrect," a word.
[01:26] The other thing we can do is say "spellcheck.getCorrections" of a word. That will get us the corrections for a given work. Let's look at a slightly longer example.
[01:46] Here is a horribly spelled sentence. We're going to split that on the spaces. Here, we can go through each word and say "spellcheck.getCorrections" for each word. We'll wrap that so we can print it out.
[02:19] GetCorrections shows us all of the words that are one edit away from the given word. This captures 80 to 95 percent of spelling errors. If you feel like you need even more suggestions, you can pass in "2," which requests all words that are within two edits of the given word.
[02:45] This gives us many more options, but also takes longer. Finally, if you only want the first suggestion, you can get the first index of the corrections array. For this sentence, that would give us, "They had trouble finding the thing."