Use Reduce to Filter and Map over Large Datasets

Share this video with your friends

Social Share Links

Send Tweet

Learn how two common array functions - map() and filter() - are syntactic sugar for reduce operations. Learn how to use them, how to compose them, and how using reduce can give you a big performance boost over composing filters and maps over a large data set.

[00:00] I'd like to talk a little bit about some different kinds of reduce operations that are pretty commonly used. In fact, some of them are so common and so everyday that they don't even use the reduce function. There are other ways to implement them, but I'd like you to still be thinking about these as reductions. Because what they do is they take an array and they transform it into something else.

[00:23] The most common one of these is one of the most common, is let's say you've got some array and you need to transform it into another array with the same size. It's got the same number of items, but some sort of transformation has happened to each item in that array.

[00:42] For instance, why don't we take our data array here. We're going to reduce it into another array where every value is doubled. We can say var doubled equals data.reduce. The initial value for our accumulator is going to be a new array. All we're going to do is say accumulator.push value times two. Then we need to make sure that we return the accumulator.

[01:15] If we say console.log my doubled data, we'll see that we've indeed turned every one of these values into its double. If you're paying attention, you might know that there's another name for this operation. This is called a map. We've mapped this array into this array. JavaScript lets you do this particular kind of reduction just by using a different function.

[01:43] We can say var double mapped equals data.map, function item, return item times two. When we run this we'll see that we get the same thing.

[02:04] Another very common kind of array reduction is one where you start with an array that has a bunch of values and you need to reduce this array into a shorter array which only contains some subset of those values. In this case, let's reduce this array into a smaller array which only contains the even numbers.

[02:38] Just like when we were implementing map by using reduce, we're going to pass in an empty array as our accumulator value. Now instead of taking every value and pushing it into our accumulator, we're going to do a little check here.

[02:52] We're going to say if value mod two equals zero, in other words if the value is divisible by two, then accumulator.push value, return the accumulator, so Console.log evens. It's going to spit out two, four, and six.

[03:18] Now if you're following along at home, you know that the name for this operation is actually filtering. Like mapping above, JavaScript gives you a way natively on the array object to filter. We can say var even filtered equals data two.filter function item. We say return item mod two equals zero. If we log even filtered out, we'll see that that worked the same way.

[03:57] Now I know what some of you are going to be saying right away. You're going to be saying, "Mick, why on earth would I use this reduce function with all of these moving parts when I could just use a filter function or when I could just use a map function?"

[04:09] You might go on to say, "Look, it's even better because filter and map and all of these various array operators, not only do they take a simpler function, but you can compose them." So if you wanted to take our dataset and you wanted to filter it to only return the even numbers and then you wanted to double those numbers, you could do that very elegantly. You could do something like this.

[04:37] First you filter. Then you just chain map right on top of that. If we console.log filter mapped, we get 8, 4, and 12. You're totally right. That is exactly the right way to think about this. If you have these tools that work like this, you should be using that.

[05:08] I still want you to be thinking in terms of reduction. I still want you to realize that what you've really done is reduce this array into another array. But keeping your code simple and concise and readable is really important, and I encourage that.

[05:22] Here's something that you might not have thought of. This is going to work fine. If you're running this on an array of six numbers here, do it this way, absolutely. What if you have slightly more information? What if you have a big array? Let's create a million items in this array.

[05:52] What if you wanted to perform the same kind of operation here? Do you know guys this cool trick where you can say console.time? Then we can do this same operation. We can say filter mapped big data. This is big data.filter.

[06:23] Then when we're done we say console.time and big data. We don't want to log out a million items into the console here. But we are going to log out how long it takes this thing to run. OK, it's not bad. 75 milliseconds.

[06:41] Look at this. We have the same logic, but we're going to use the reduce function. We're going to say, if value mod two equals zero, then accumulator.push value times two, return accumulator. We're going to start that off with an empty array. Now we're going to say console.time and big data reduce.

[07:34] Wow, look at that. This one took 79 milliseconds, this one took 54. Why is that? Well, that's because when you're composing, you're mapping your filter, first the filter's going to iterate through one million items, and it's going to return 500,000 items. Then your map needs to be applied to each of those 500,000 items.

[07:55] Contrast that to the reduce approach where we're filtering and mapping all in one step. We only need to iterate over those million items once and then we're done. I'm not saying that you should always be doing this, but I am saying that for complex chains like especially if you're doing things like filter this and then map to that and then like sort this and da da da.

[08:16] If you're dealing with a lot of information, then it can be very helpful to be aware of the way reduce can give you a huge performance boost by allowing you to merge all these operations together so you're not constantly iterating over the same dataset.