The ability to reply to discussions is limited to PRO members. Want to join in the discussion? Click here to subscribe now.

Classify JSON text data with machine learning in Natural

Classify JSON text data with machine learning in Natural

6:05
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories. While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results. The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well. This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization
Watch this lesson now
Avatar
egghead.io

In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.

While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.

The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.

This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization

Avatar
John

The two files used in the code example, training_data.json and test_data.json are not part of the data set at http://ana.cachopo.org/datasets-for-single-label-text-categorization. It would be useful to know which of the 30 possible files specifically were used for the example.

In reply to egghead.io
Avatar
Hannah

In the "Newsgroups" section on that page, I pulled the "talk.politics.misc" and "sci.space" sections and created trainingdata.json and testdata.json from those. They are small datasets as an example for this course. Let me know if that clears things up!

In reply to John
HEY, QUICK QUESTION!
Joel's Head
Why are we asking?