In this lesson, you will learn how to use view and interpret the garbage collection activity of your node.js app via the console. You will also learn how to take heapdump snapshots and analyze them with Chrome Developer Tools to identify possible memory leaks. A sample app with a known memory leak is provided as part of the walk-through to illustrate how to use the tools in your own environments.
[00:01] We're going to start a Node application server, and pass the --trace_gc flag to it, which is going to cause it to output the garbage collection activity to the console. In my other console I'm going to use a for loop to hit that node end point 10 times, and when we do that in the other console here we can see the garbage collection activity output into the console. We have two things going on here.
[00:26] We have scavenge operations and we have mark sweep operations, and that's quite a bit of activity for only 10 get requests. Before we dig too deep into that, let's cover what each of those operations mean and how memory is handled inside of Node. The memory space managed by node is known as the heap, and it's divided into several different spaces also referred to as generations. The ones we're concerned with are called the new space and the old space.
[00:57] New space is where objects are initially created and it's designed to be small and fast to garbage collect. Memory in new space is managed by a pointer. When a new object is created it's written at the space indicated by the pointer, and the pointer is moved to the next free slot. When it reaches the end of the space allocated for new space, it undergoes a scavenge operation.
[01:21] The scavenge operation identifies all of the objects in memory that are no longer referenced. It then removes these and compacts the memory and resets the pointer for the next available memory allocation. If we examine the scavenge operations that happened in our application here, the first thing that we see is the timestamp, and this is the number of milliseconds that have elapsed since the Node server was started.
[01:46] This is useful whenever you're looking back through these to see how frequently garbage collections are occurring. Next we see the type of garbage collection that occurred, we have a scavenge operation here, and we have a mark sweep operation here. Then we get to our memory statistics. This tell us that there were 12 megabytes allocated to new space when the garbage collection operation started, and the total heap size was 47.6 megabytes.
[02:14] After the garbage collection operation, new heap size was still 12 megabytes, and the total heap allocation increased to 48.6 megabytes. The final piece of information we can get from this is the garbage collection duration, and this took .5 milliseconds. One thing to note here is that seeing the garbage collection initially as a single line isn't really good nor bad, it's simply a baseline. Every application has its own memory footprint, and it's only when you compare the baseline over time that tells you how the app is performing.
[02:48] In this case, as we look through the garbage collection cycles, we can see that the memory allocation even after garbage collection is continuing to increase with the number of get requests. We can see the same thing happening with our heap size as it continues to grow with every operation. As the scavenge garbage collection operations happen, objects that survive two scavenge operations are promoted into old space.
[03:15] Old space is much larger than the new space, as a result it employs a different garbage collection algorithm, and that's the one you see here known as mark sweep. You can really do a deep dive into how the mark sweep garbage collection works, but the main thing to know is that the garbage collector runs through the memory and marks all of the objects that are still referenced.
[03:37] This means anything that's not referenced is unreachable, and is subject to garbage collection, so then the sweep process kicks in, and removes those objects for memory and then the process starts all over. One really important thing to note is during this garbage collection cycle, your node process is in a stop the world state. Because the garbage collector is marking everything that's reachable in memory, it can't allow anything to change, otherwise you risk removing data that is still actively in use.
[04:08] So it actually stops the node process while this occurs. If you have a lot of garbage collection going on, this can really start to impact performance as it stops the Node process to do the garbage collection, and so you end up in a situation where you have an application that's having to garbage collect a lot, which is not really performant, and on top of that, you're stopping the process to do the garbage collection which creates this cascading effect for negative performance of your server.
[04:35] One other difference that you can see in the console between the mark sweep and the scavenge operations is that it was done in three steps since the start of marking. Because it's actively marking everything it knows is still in use, it doesn't have to necessarily sweep those all at the same time, so it will break it up in cycles to minimize the garbage collection impact to your application. So now we know how garbage collection works, but that leaves the question of what do we do about it?
[05:01] The answer lies in heap dumps. A heap dump is a snapshot of the memory allocation at the time of the dump, and much like garbage collection stats, a single data point may not reveal as much and they work better when you compare multiple samples over time. So in my console window I'm going to find the process ID for my Node application. I get two results returned for that. One is the grep command that I used to find the process statement, and the other is my Node server itself.
[05:30] So I'm going to issue a kill-usr2 and then the process ID. If I do a directory listing you can see that created a heap snapshot that we'll view in just a minute. Now I'm going to run a significant load against this server, so I get a really nice comparison between the two snapshots. You can see just as it starts to go here, garbage collection in the left-hand console is starting to go through the roof.
[05:55] One other thing to note here is as the output scrolls past in the right window, you're starting to see empty replies from the server. That's a great leading indicator from your Node application that it's starting to fall over. My load test has completed so I'm going to do one more heap snapshot, and I've switched to the Chrome developer tools and using the profiles tab, I'm going to load those two snapshots I created.
[06:19] I can select the second one, and then I'm going to switch from the summary view to the comparison view, and if you happen to have multiple snapshots listed over here, you can choose which one you compare it to using this drop-down menu. If we sort by the number of new objects you can see that there were almost 10,000 new objects of a big data constructor created and none of them were deleted.
[06:44] So right away that's a pretty big indicator that that might be the source of our leak. If we switch over to our code view, you can see that with every get request we're pushing a new instance of this big data function onto an array, and at the same time, we're never actually using that anywhere. So let's see how our performance changes if we stop doing that.
[07:07] I'm going to restart the Node application, find my process ID again, take an initial snapshot, and now repeat my load test and already you can see a big difference.
[07:18] We have garbage collection happening in the left-hand console, but it's at a much less frequent rate. Load test has completed, so let's take one more snapshot, I loaded up the new snapshots up in the Google developer tools and when we compare the two snapshots from the beginning and the end of the load test, you can see that there were some new objects created but at the same time they were being destroyed as they were no longer being used, and there doesn't appear to be a clear runaway memory object like we had initially.
[07:48] One last thing I want to point out is that taking these snapshots requires the heap dump NPM module, so you'll need to make sure that that is included in any apps that you will be profiling which leads me to the key takeaways from this module. First, memory management is a key component to scaling. Without it, your only option is to throw hardware at the problem until someone accidentally finds the leak.
[08:13] Second, it takes planning. As you saw, a single garbage collection statement or a single heap dump don't necessarily show you where the problem is. It wasn't until we started comparing things over time that we really got a clear indicator of what was going on. Finally, baselines are extremely helpful. Knowing how your app performs under a known load provides a great reference and can help you accurately forecast how your app is going to perform under future loads.
Yup. I was able to duplicate the results you show. The lesson is still accurate for versions of node 4.x and lower but I'll need to update it for v6. Thanks for pointing this out!
I see you are requiring a module heap but it's not been referenced anywhere in the code. Where is the heap dump file created?
Your log say "allocation on failure" mine says "allocation failure". Are they different?
I ran this lesson on Ubuntu 14.04, Node version 6.2.2. There was no memory leak activity reported.
output
server-heapdump.js