Classify text into categories with machine learning in Natural

Hannah Davis
InstructorHannah Davis

Share this video with your friends

Send Tweet

In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classifier - basic machine learning algorithms - in order to classify text into categories.

Yonatan Shalev
Yonatan Shalev
~ a year ago

Should I split documents into single sentences or use them as is to train text classification model? I was wondering what's the best way to feed the model with training data.

Can i just use the document as is? like this: {"phrase": "First long document with up to 30 sentences", "result": {"label 1": 1}}, {"phrase": "first long document with up to 30 sentences", "result": {"label 2": 1}} {"phrase": "Second long document with up to 30 sentences", "result": {"label 2": 1}}, etc. Or, should I split all documents into sentences and then the data will look like something this: {"phrase": "Sentence 1 out of document 1", "result": {"label 1": 1}}, {"phrase": "Sentence 2 out of document 1", "result": {"label 2": 1}}, etc.

{"phrase": "Sentence 1 out of document 2", "result": {"label 5": 1}}, etc.

{"phrase": "Sentence X out of document X", "result": {"No labels at all": 1}}, etc.

Same question about using the model, should I just apply it on the complete document or should I split it to separate sentences then apply the model on each sentence.

What's the best practice?

Yonatan Shalev
Yonatan Shalev
~ a year ago

Also, how do i approach multiple categories classification ?