[00:00] First, import the natural library. We'll also make a test string here. To create a new tokenizer, the syntax is new natural.WordTokenizer. From there, all we need to do is tokenizer.tokenize our string.

[00:21] WordTokenizer splits text by spaces and punctuation. Note that contractions are split on their apostrophes. WordTokenizer also discards the punctuation. If you want to retain the punctuation, you can use another tokenizer called WordPunctTokenizer.

[00:47] This will retain the punctuation putting it into its own tokens. Natural also has a TreebankWordTokenizer. This tries to preserve some of the semantics of the text. It splits contractions into their respective words. It also keeps the punctuation.

[01:06] Lastly, natural has a regular expression tokenizer. Here, you have to pass a regular expression pattern. In our case, we'll look for end of sentence punctuation. This splits the text into sentences.

Natural Language Processing in JavaScript with Natural

Break up language strings into parts using Natural

Natural Language Processing in JavaScript with Natural

Break up language strings into parts using Natural

Social Share Links