Elasticsearch provides a powerful API over http for accessing its features. In this lesson, you will be introduced to the API and learn how to use it to get data from Elasticsearch using the browser and the popular command curl. We will see how adding query stream parameters can exclude or include the properties that we desire from our data.
Setup is specified in the README.md
of the repository linked below.
[00:02] I've got my Elasticsearch server running locally and I have some data imported into it. If you're following along on your own computer, make sure you check out the README doc in the lesson repo for instructions on getting Elasticsearch set up locally and importing the data if you don't have a sample dataset to play with.
[00:19] I'm going to use the curl command to execute an HTTP get request against the Elasticsearch cluster to retrieve documents. The syntax is curl-s followed by the URL to my Elasticsearch cluster, a trailing slash followed by the name of the index, and another trailing slash followed by the index type, and then a final trailing slash and the ID of the document I want to retrieve.
[00:48] For now I'm just going to tell you the index name is Simpsons and the index type is episode if you're using the sample dataset from the repo, but by the end of this course you're going to know how to identify those details on any Elasticsearch cluster for yourself.
[01:04] If I run this command it returns a JSON object with the result of the operation and that actually looks pretty horrible so I'm going to rerun it and pipe it to the JQ command. Now when we look at the results we have our index and type specified, which we already knew, and this is included here because later when you start doing searches you may get back different results with different values.
[01:29] We have the ID of the document and this version, which gets automatically incremented each time the document's updated in Elasticsearch. We have a found object that is set to true and that would be false if we specified a document that doesn't exist in the cluster.
[01:45] Finally we have this _source object, which is really the heart of what you probably are looking for when requesting a document. In this case it has the details of this particular episode of "The Simpsons."
[01:58] To get any document from the Elasticsearch cluster, all I have to do is supply the document ID in the API endpoint. The first episode was episode one. I can replace that and get episode two, or I can replace it with a three and get episode three. As long as that document exists, Elasticsearch is going to return it. If I request episode 999, you can see that found is set to false because that episode doesn't exist.
[02:27] If I go back to episode one here, we talked about the _source object here being the part that I'm probably the most interested in when requesting a document. I can actually just retrieve that directly by calling the _source endpoint for a given document. That just returns that value.
[02:48] I can also add the query string parameter to exclude certain fields like the video URL using the source exclude parameter. Just by using comma-separated values I can add multiple excludes to this parameter. Or ultimately I can just request specific fields to be returned using the source include parameter.
[03:19] There are a bunch of other options available to the get API endpoint, but these are the ones that have covered a majority of my use cases. If you're interested in the other options, be sure to check out the Elasticsearch documentation using the URL shown on my browser here.