The anagram test is commonly used to demonstrate how an naive implementation can perform significant order of magnitudes slower than an efficient one. We’ll also briefly go over why each implementation is not as efficient as you could make it.
A word is an anagram of another if you can rearrange its characters to produce the second word. Here we’ll write multiple increasingly more efficient functions that given two strings determines if they are anagrams of each other.
[00:00] Here, we have a requirement to determine if two strings are anagrams of each other. For example, earth and heart are anagrams. Silent and listen are anagrams. However, foo and bar are not. We will start by creating a function called R anagrams. It'll take two strings, s1 and s2, and will return a Boolean based on s1 and s2.
[00:25] If you try to follow the given definition for an anagram, that is that the arrangement of the characters should match, we could do that by creating a loop over all the permutations of string s1, and for each permutation, if it is equal to the second string, then the two strings are an anagram, and we return "true."
[00:48] Otherwise, after the loop, we simply return false because the strings are not an anagram. Although such an implementation would work, in this case, the complexity of the algorithm will be equal to the possible permutations of s1, so of an order of n!, where n is the number of characters in the string.
[01:09] If you read into the requirements, you can realize that instead of doing actual rearrangements, you simply need to check if they have exactly the same characters. One quick way of checking the equality between two sets of characters in strings is simply to break the strings into their characters, sort them, and then join them again.
[01:30] We do the same for the second string. Finally, we check if the strings are equal. If these sorted character strings are equal, then the original strings were anagrams. The complexity in this case would be driven by the sort function, which is of the order n log(n). We can do better.
[01:55] A better way to make sure that the strings have the same characters is simply to use a hash map that counts the characters between the two strings, and makes sure that this count per character is the same between the strings.
[02:09] We start by creating a map to track this character count. We initialize it to a map with string keys and number values. We go ahead and iterate through all the characters in string1.
[02:25] For each character in string1, we set the character count for this character. We get the previous value for this character. If there was no previous value, we will initialize it to 0Finally, we increment it to 1.
[02:44] We repeat the process for the second string, iterating through all of its characters. For each character, if there's not already a key in the character count for this character, then the strings are definitely not anagrams, and we can immediately return false. Otherwise, we simply decrement the count for this particular character.
[03:16] Finally, we loop through all the values in the character count map and ensure that every value is zero. This ensures that we saw an equal number of character counts in string1 during incrementing and string2 during decrementing.
[03:35] Since we are simply looping through the characters in the two strings and doing a constant amount of work in each iteration, the time complexity of this version is of order n, where n is the number of characters in the strings.