Count Words in a String with Ramda's countBy and invert

John Lindquist
InstructorJohn Lindquist
Share this video with your friends

Social Share Links

Send Tweet

You can really unlock the power of ramda (and functional programming in general) when you combine functions. Counting words in a string may seem like a relatively difficult task, but ramda makes it easy by providing a countBy function. This lesson walks through using the countBy to count words in a string.

[00:00] The text here is the text from the end of "Cat in the Hat", so when we log that out, we get that text right now.

[00:06] To count these words, I'm going to start with the match function from Ramda. Match takes a regular expression. I'll say match, which is any word character, so that'd be any letter, and then a plus, meaning to match a letter proceeded by a letter, which will end at an entire word, then the global flag to match all of them and not just the first word.

[00:27] That's a basic, common, regular expression. I'm going to call this matchWords and invoke matchWords on our text. When I save here, you'll see we have all the words and no punctuation. There's no order to them, and you'll see multiple occurrences like a "then" there and a "then" there.

[00:44] To count them, let's bring in countBy from Ramda. I can create a countWords function, which is countBy. There's a word here, it's not an object with properties or anything.

[00:58] I can say identity, which is another function I'll bring in, which essentially tells it to count that word by itself. We're going to want to run through this function, then that function, which means we need to turn to compose to bring those two functions together.

[01:14] I can say compose matchWords with countWords. It'll run through those two functions. Now we have an object with all of the words on it as keys and the count as values. You can see "and" has a count of 16, while many of these only have 1, "did" has a count of 3. It counted each of these words.

[01:35] You'll notice some of these are upper case, like "Should" and "You." Instead of counting the identity, which is the actual word, we can bring in another function called toLower, which will lower case the words. We'll countBy toLower.

[01:49] That will lower case all the words that we had. "Should" and "You" will be counted with the lower case "you" and the lower case "should."

[01:58] A neat trick we can do here, because the keys are words and the values are numbers, we can tell this to invert. So, I'll invert here, and in my compose, I'll say invert. Inverting an object tells the object to use the values as keys and the keys as values.

[02:16] When you do that with an object like this, you'll see that we have an object with a key of 1, 2, 3, 4, 5, 6, 15, and 16, where 16 has "and", 15 has "the", 6 has "that", 5 has "you", 4 has "I", then "we", "he", "what."

[02:33] We've inverted the keys and the values so that the keys are now the counts, and the values are an array of values of those words that we matched.

[02:42] Finally, you'll notice that in the one-count group, that these are out of order, and we didn't specify any order to that. Let's go ahead and sort by and map on this object, because in Ramda, we can map on objects. My mapping function is going to be a sortBy of identity.

[03:03] Map is going to go through all the values of the object, invoke a sorting function on them, and it's going to sort by the word in there. There's no need for any fancy sorting. Hit save.

[03:12] In our 1 array, we start with A and go all the way down in alphabetical order with our results for each word that had only one occurrence all the way down to "for," "he," "I," then "we," "what", and 16 still only has that 1 in there.

[03:26] Because these are tiny functions, I'd usually inline them. I'll take match words, copy and paste it in there, countBy, copy and paste it in there. Delete these, and compose. I'll new-line these, countBy, match.

[03:46] Instead of this being just results, I can make this a reusable function called countWords, and I can use countWords, pass in the text, and get that same result. I could say, "I really, really love Ramda," hit save. You'll see I have a count of one for "I love Ramda," and a count of two for "really."