1×
Become a member
to unlock all features
Autoplay

    Convert JSON to CSV with jq

    Brett CassetteBrett Cassette
    jqjq

    jq can help you quickly filter massive amounts of JSON data, but how can you turn it into a CSV, so you can process it quickly with a database or spreadsheet application?

    This lesson shows how to select the data you want for your CSV with jq, and uses the helper utility linked below to output as CSV.

    https://github.com/joelpurra/jq-hopkok/tree/master/src/tabular

    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    Instructor: 0:00 Hey everybody. Today I'm going to be using JQ to show you how to slice-and-dice JSON and turn it into a CSV. CSVs are really useful because we can just import them into a spreadsheet application and perform high-level aggregations really quickly and learn a lot about the underlying data.

    0:17 To kick things off with JQ, we need to grab a JSON data source, so let's hop over to Egghead. We'll use the search query and we'll just copy it as a QRL request. Then we can head into our terminal and paste that. We'll see what it looks like.

    0:34 Here we can see that there's going to be a parameter with hits per page. Let's just bump that up to a thousand so we have a lot of data to work with.

    0:44 Great, so that's a lot of data. Obviously, we're going to have to pipe this somewhere. Let's just rerun our request, but we'll pump this into egghead.json.

    0:53 We can go ahead and pop open an editor so that we can look at it in a little more detail. We'll pop in here, open egghead.json and make sure we format it.

    1:06 When it's formatted, we can see a little bit better how we're going to drill into this and pull off the information that we're interested in. We'll probably be interested in the primary tag name. We'll want to know its popularity order. We'll want to know its rank, maybe its slug, and that should be enough to get started.

    1:24 Then we can return to the terminal and we can cat our file, which means it's just going to be fed into the terminal here. We can use that output with this pipe character and feed it into JQ.

    1:37 We're going to use the -r flag, which is going to quote our strings properly when we output it as CSV. Then with JQ, the way that we tell it how to filter out what we want is with this .syntax.

    1:51 We'll say we want .results since we know that contains all of the data. We say .results. Then results is an array, which we saw, so we want the zeroth element because that array is all of the data.

    2:06 Inside that we want .hits because everything else is just metadata on top of that. .hits is what we're interested in. We can output that to out terminal and see that it is in fact all of the data.

    2:23 With that, we know that we have an array of elements. Each one of these elements contains the same amount of information. What we want is to map each of these and just pull off the elements that we're interested in.

    2:39 I'm going to use the map command. We can actually use the pipe character inside a JQ string in the same way we would in a terminal, so we'll say map. We're going to map it as a new hash. We'll say the name is the .primarytag.name.

    2:57 We can come back in here and see what else we want. We wanted our popularity order. We can say that our popularity is going to be .primarytag.popularityorder. We can add in a few more things here.

    3:16 We want our rank as our .rank and our duration as our .duration, and then we can output that. We'll see that we have this array of hashes which contains only the data that we wanted to select.

    3:31 Then what we're going to do is we're going to fork this utility that's listed in the show notes. This is an output to CSV utility that's really helpful. I'm going to go ahead and fork this, and we'll fork it to my repo. When that's done, you just make sure that you get clone it.

    3:54 Now we're going to rerun this previous command, but that's going to pipe into the utility that we just cloned. It's this JQ hopcac, and we're going to use tabular array of objects to CSV. We're going to pipe that into, let's just call it egghead.csv.

    4:18 You're ready to open your CSV, and then you can slice and dice the data however you want to. You can use a spreadsheet application or a database. You can import this easily into a database. I like Airtable. It's pretty straightforward.

    4:36 Once everything is imported, we can group by our topic and then learn things like our average popularity score -- these are relative rankings obviously -- average rank, maybe the total duration in seconds that we have recorded on topics and get a high-level overview of our data.

    Discuss

    Discuss