Classify Mystery Items with the K-Nearest Neighbors Algorithm in JavaScript

Tyler Clark
InstructorTyler Clark
Share this video with your friends

Social Share Links

Send Tweet

The k-nearest neighbors algorithm is used for classification of unknown items and involves calculating the distance of the unknown item's neighbors. We'll use Euclidean distance to determine the closest neighbor and make our best guess on what the mystery fruit is.

Instructor: [00:00] Let's say we needed to write a function that when given the features of a fruit, we need to make an educated guess on what this mystery fruit is. Let's make our function called determine fruit and destructure two fruit features that will get size and redness. Then we'll create our key, which is an array of currently known fruit and their attributes.

[00:20] Right now, let's say that we only know about grape, which is a size of one and a redness of zero and orange which is a size of two and a redness of one, and a grapefruit which is largest size of three, redness of two.

[00:33] Then we'll console log this function, passing in the features of a grape just to make sure we're getting a grape back from our function first. Perfect. The algorithm we'll use to figure out what fruit the provided fruit features most closely represents is called the K nearest neighbors algorithm, or K and N.

[00:50] The K and N algorithm changes the way we think about classifying items. It works by finding the shortest distance between the provided set of features and our currently known items. By finding the closest relatives to our provided features, we can make an educated guess on what this fruit is by these close neighbors.

[01:10] It might be easier to comprehend by placing this on a graph. Since we're only working with two features, this is a two-dimensional graph. Along the X axis is our size of the fruit and the Y axis is the redness. Grapes are small and not red at all, so it's towards the bottom left, while grapefruits are the opposite.

[01:28] When we want to classify mystery fruits, we can imagine placing it on this graph and then finding its closest neighbor. The closer the neighbor, the better chance we have that we are correct with our algorithm.

[01:41] Let's write our function that will calculate this distance between our mystery fruit and our known fruits. We'll call it calc distance and it's going to take one parameter.We're going to return the math.square root of data.reduce. Our accumulator will destructure to items, then we'll return our accumulator plus math.pal and init minus test squared and zero. Perfect.

[02:09] Now let's talk about this because that was probably pretty confusing. The metric we're using to calculate the distance between our items is called the Euclidean distance. This formula is to subtract the same features of our mystery item against the controlled fruit features. To then take this to the power of two and add it to the other features doing the same thing.

[02:31] When we're done doing this to each feature, we take the square root. The smaller the number we get against the neighbor, the closer it is to it. Meaning the more likely our mystery fruit is probably that closer smaller away distance fruit.

[02:45] Now let's actually apply this calc distance function against our controlled fruit. We'll destructure name from fruit.reduce, previous current. We'll say let current cal equals calc distance, passing in our two feature sets, size as well as redness. Then we're going to return whichever one is smallest, either the previous or our current calculated distance.

[03:10] We return a new object with a name property and a dist. We need to work with the initial object, so we'll say name, fruit at the first.name, and then we're going to do our calc distance as well. Again, on our two feature sets, size for the first one, as well as redness for that first one as well.

[03:33] Now that we have this, we can return a string that says this is most likely a name. A quick look at our console.log and it looks like it's working. We're passing in the features of a grape and that's what we're getting back.

[03:47] Let's walk through this again. A reduce loops through each controlled fruit. It applies our calc distance formula against each of the controlled fruit to find its distance from our mystery fruit. We check this value against the previous smallest value, initially starting with the first item in our fruit array, and the smallest one wins out.

[04:07] A console log shows us the distance we're getting back from our Euclidean distance function. As we can imagine, the zero distance is winning out because it has the same features as our controlled grape. We can play around with different scenarios and see what our algorithm tells us the best guess is.

[04:27] Let's wrap up with some final thoughts. Picking the right features is important when using the K and N algorithm to classify an item and our algorithm becomes smarter as we add more features to calculate distance. Also, ideally, we would not just use the one closest neighbor to guess what the mystery fruit is. The more controlled data points and features you have to work with the better.

[04:49] You might find the closest five neighbors and see which controlled fruit of those five appear the most. There's no magic number of neighbors, though a good rule of thumb is the square root of the number of controlled items you have.