Include spell-check in text projects using Natural

Hannah Davis
InstructorHannah Davis
Share this video with your friends

Social Share Links

Send Tweet
Published 8 years ago
Updated 6 years ago

In this lesson, we’ll see how to use Natural’s probabilistic spell-checker, which uses the trie data structure.

[00:00] First, import the "natural" library. We'll also "fs" module, because we'll be working with the file system. We will also need a "tokenizer." The spellcheck requires a large corpus. The more general you want your spellcheck to be, the larger corpus you need.

[00:26] We'll be importing a corpus of about half a million words. To do that, we'll say "fs.readFileSync," because for the purpose of this lesson, we'll read it synchronously, the file name, and an encoding.

[00:45] Our corpus is an array of words that we'll get by saying "tokenizer.tokenize(text)." To make a spellchecker, we say "new natural.Spellcheck" and pass at the corpus. You can use the spellcheck in two ways.

[01:07] The first is to check if a word is spelled correctly. We can do that by saying, "spellcheck.isCorrect," a word.

[01:26] The other thing we can do is say "spellcheck.getCorrections" of a word. That will get us the corrections for a given work. Let's look at a slightly longer example.

[01:46] Here is a horribly spelled sentence. We're going to split that on the spaces. Here, we can go through each word and say "spellcheck.getCorrections" for each word. We'll wrap that so we can print it out.

[02:19] GetCorrections shows us all of the words that are one edit away from the given word. This captures 80 to 95 percent of spelling errors. If you feel like you need even more suggestions, you can pass in "2," which requests all words that are within two edits of the given word.

[02:45] This gives us many more options, but also takes longer. Finally, if you only want the first suggestion, you can get the first index of the corrections array. For this sentence, that would give us, "They had trouble finding the thing."

egghead
egghead
~ 28 minutes ago

Member comments are a way for members to communicate, interact, and ask questions about a lesson.

The instructor or someone from the community might respond to your question Here are a few basic guidelines to commenting on egghead.io

Be on-Topic

Comments are for discussing a lesson. If you're having a general issue with the website functionality, please contact us at support@egghead.io.

Avoid meta-discussion

  • This was great!
  • This was horrible!
  • I didn't like this because it didn't match my skill level.
  • +1 It will likely be deleted as spam.

Code Problems?

Should be accompanied by code! Codesandbox or Stackblitz provide a way to share code and discuss it in context

Details and Context

Vague question? Vague answer. Any details and context you can provide will lure more interesting answers!

Markdown supported.
Become a member to join the discussionEnroll Today