Monitor Elasticsearch cluster health and status with the _cluster and _nodes APIs

Will Button
InstructorWill Button

Share this video with your friends

Send Tweet
Published 5 years ago
Updated 3 years ago

Using the _cat API is great for console based, adhoc queries of your cluster. To get even more detailed info on the health and performance of your cluster, or for programmatic access, the _cluster and _nodes endpoints may be your new best friends. There is a tremendous amount of information available about your elasticsearch cluster via these APIs. This lesson doesn’t cover all of them exhaustively, but instead introduces you to the endpoints and the data returned, arming you with the skills you need to go deeper as needed using the Elasticsearch docs found here.

[00:01] We can use the Cluster API endpoint to get the health of our cluster, and we get information much like we saw using the cat endpoint, but formatted as JSON this time. Using this same endpoint, we can drill down and get the health of a specific index as well.

[00:17] Much like many of the Elasticsearch APIs, we can do a comma-separated list. We can do wildcards as well. I could do S* and F* to get any index in my system that started with the letter S or started with the letter F.

[00:33] The stats endpoint is probably one of the most useful endpoints to collect data for monitoring. In the output, we get the overall cluster health. We get details on the indices and shard allocation, the number of docs, the size of our data store, and we get a ton of information about each of the nodes in the cluster.

[00:59] Not only the version of Elasticsearch that it's running, the processors, the amount of memory total, the amount of memory in use, the CPU utilization. You can also use the _nodes endpoint to get more specific statistics on each of the nodes.

[01:18] This actually goes a lot deeper than the Cluster API. It provides you the name of your node, the IP address, the roles that the node is currently performing, and then this part is where you can really get into a lot of detail.

[01:33] The indexing statistics show you stats related to the documents being indexed or deleted from your indices. The get and search statistics show you the number of queries that are from either get operations, or the searches being performed on your cluster, or on this node in the cluster.

[01:52] Those three combined -- the indexing, get, and searches -- give you a great overview of what's going in and what's coming out of your cluster. Plotting them over time can help you identify the main usage patterns of your cluster, giving you the background knowledge you'll need to be able to effectively scale.

[02:11] As I scroll down here a little bit further, you get the JVM statistics. Not only the overall memory in use, but breakouts for each of the JVM memory pools -- the young, survivor, and old pools. These are key metrics in identifying the performance bottlenecks of your cluster.

[02:28] Like I mentioned in the last lesson, looking at these at a snapshot in time is not going to do you a lot of good. They're most helpful to you whenever you track them regularly, and you're able to plot them over time.