Join egghead, unlock knowledge.

Want more egghead? It's 45% off for a limited time only!

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock All Content for 45% Off

Already subscribed? Sign In

Save 45% for a limited time.

Get access to all courses and lessons on egghead today.

Autoplay

    Include spell-check in text projects using Natural

    Hannah DavisHannah Davis

    In this lesson, we’ll see how to use Natural’s probabilistic spell-checker, which uses the trie data structure.

    naturalNatural
    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    00:00 First, import the "natural" library. We'll also "fs" module, because we'll be working with the file system. We will also need a "tokenizer." The spellcheck requires a large corpus. The more general you want your spellcheck to be, the larger corpus you need.

    00:26 We'll be importing a corpus of about half a million words. To do that, we'll say "fs.readFileSync," because for the purpose of this lesson, we'll read it synchronously, the file name, and an encoding.

    00:45 Our corpus is an array of words that we'll get by saying "tokenizer.tokenize(text)." To make a spellchecker, we say "new natural.Spellcheck" and pass at the corpus. You can use the spellcheck in two ways.

    01:07 The first is to check if a word is spelled correctly. We can do that by saying, "spellcheck.isCorrect," a word.

    01:26 The other thing we can do is say "spellcheck.getCorrections" of a word. That will get us the corrections for a given work. Let's look at a slightly longer example.

    01:46 Here is a horribly spelled sentence. We're going to split that on the spaces. Here, we can go through each word and say "spellcheck.getCorrections" for each word. We'll wrap that so we can print it out.

    02:19 GetCorrections shows us all of the words that are one edit away from the given word. This captures 80 to 95 percent of spelling errors. If you feel like you need even more suggestions, you can pass in "2," which requests all words that are within two edits of the given word.

    02:45 This gives us many more options, but also takes longer. Finally, if you only want the first suggestion, you can get the first index of the corrections array. For this sentence, that would give us, "They had trouble finding the thing."

    Discuss

    Discuss