⚠️ This lesson is retired and might contain outdated information.

Add data to Elasticsearch

Will Button
InstructorWill Button

Share this video with your friends

Send Tweet
Published 7 years ago
Updated a year ago

In this lesson, you will learn how to add new documents to the Elasticsearch data store using HTTP methods and the Elasticsearch API. You will also learn how Elasticsearch uniquely identifies each document and what happens when you attempt to create a document that already exists.

[00:02] Adding data or saving data in Elasticsearch is called indexing because it's a bit more involved than just writing a document to a database. It's got to be broken down, stored, and made searchable. All of this happens when you post a document to the index end point.

[00:16] If you watched the previous lesson on getting data from Elasticsearch, you learned that documents follow an index type ID format, and indexing a new document is the same. You can either supply your own ID or allow Elasticsearch to assign one for you.

[00:32] Let's create a new document and we'll give it a title called Add Data to Elasticsearch. We'll give it a summary that's going to just say, "Learn to index into Elasticsearch."

[00:47] Finally, we'll give it a views and set that to 1,000. You can see with curl, I'm using a put, and we're going to put that to local host port 9200, which is where my Elasticsearch server is running. Put it to an index named Egghead and to a type called Lessons, and give it an ID of three.

[01:10] The response we get back contains the index that we wrote to, the type, the ID number we specified, our version number which Elasticsearch tracks itself. Finally, at the end is a created Boolean flag indicating that the document was created. I'll show you why that's important to point out in just a few minutes.

[01:31] That's if we want to specify the ID number ourselves. If we don't care and we want Elasticsearch to manage that, instead of doing a put, we're going to do a post. Again, I'll post it to local host port 9200, the Egghead index, the Lessons map, and then I'm going to stop there because I'm not going to specify the ID.

[01:52] I'll write that. We get the similar result back. This time instead of the ID we specified, you can see that the ID has been built forced by Elasticsearch. This ID that you get back is guaranteed to be unique. It's auto-generated, 20 characters long. It's URL safe. It's just a base 64 encoded GUID string.

[02:16] There's a problem with that though. Look at this. We've already written a document to the Egghead index, Lessons with an ID of three, so if I write a new document to that end point, I get back a successful response. If we do a curl real quick, you can see that the document contains the updated data that I posted to it.

[02:41] If we look at our response back here, you can also see that version number was incremented to two and created with set two faults. That's cool because you can use that to discover that you overwrote an existing document, but it really doesn't help you because you can no longer access that document.

[02:59] Your document is gone forever and you're in this point where you know you just did something painful, but you don't have a way out of it.

[03:07] You can do this. I can do a curl to get that Lesson three document again. I get an HTTP 200 response back as well as the document itself so I can check to see that it does exist. Just to show you, if I try to access the document that doesn't exist, I get an HTTP 404.

[03:26] You can use that to check, but in a high volume system, there's the possibility that between the time that you check for this document and you write this document, something else could have created the document. You still have that possibility of overriding something else that should be there.

[03:44] To get around this, we can use one of two different methods. I can do the same put operation, and then I can use the op type query parameter and set that equal to create. If I'm correct, trying to create a duplicate document. I actually threw a -i so we get our headers returned. I get an HTTP 409.

[04:05] In the body, it tells me that that document already exists. Otherwise, if I specify a valid ID number, I get an HTTP 201 and get the response message that we saw earlier in this lesson.

[04:20] The other way of doing this is instead of using a query parameter, you can use the underscore create API endpoint. Again, if that document exists, we get an HTTP 409.

[04:34] If it doesn't exist, then we get our HTTP 201. That allows you to use your HTTP headers or the body of the document to check for the success of your operation and then handle that properly in your application logic.

Adam Kucharczyk
Adam Kucharczyk
~ 6 years ago

curl -XPUT -H'Content-type: application/json' -d '{"title": "Add data to elasticsearch", "summary": "Learn to index into es", "views":"10000"}' localhost:9200/egghead/lessons/3