Join egghead, unlock knowledge.

Want more egghead?

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock This Lesson

Already subscribed? Sign In

Autoplay

    Intro to Web Scraping with Node and X-ray

    John LindquistJohn Lindquist

    Node and Xray have made web scraping a really simple affair. This video introduces you to the process of scraping all of the "a" tags off of a url and saving them to a .json file.

    xrayX-ray
    nodeNode.js
    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    00:01 I've already NPM installed x-ray. I'm just requiring x-ray here. Make sure to include the dash when you install it or else you'll get a different package entirely. I'll go ahead and create a new x-ray.

    00:13 To scrape something, I need a URL I want to scrape. I'll say, http//google if you've heard of that site before and then a selector that I want to grab from that site. I'll go ahead and grab the title. X-ray allows me to write this out to a file. I'll say, write to results.json which is this file over here. When I run this, you can see that the title element of Google.com is Google.

    00:45 To make this a little more interesting, let's go ahead and grab all the A tags instead of the title. I'll run this. You see it gives me images. There's an images A tag in there somewhere, but that's not all the information I want. I want all of the A tags and I want them formatted nicely.

    01:03 We can do that with a third parameter here, which will be an array and then an object describing what we want to pass in. You can name these whatever you want. I'm going to name this key A and then an empty string. That's going to give me all of the content of each of the A tags. I'll run that. You can see that now, I have an array with objects inside of it with keys of A mapping to each of the content of all those A tags.

    01:31 Now, you probably also want the link inside of it. Grab the href and the selector for the href is going to att for attribute href, meaning that just grab the attribute off of this A tag that we found. I'll run this. You can see that images would go here. Maps would go there. Play would go there.

    01:51 Just to give us a little bit more information, we'll grab something like the CSS. I'll say give me the class attribute. Run it again. You can see the various class names used on that A tag.

    Discuss

    Discuss