This Lesson is for Members

Subscribe today and get access to all lessons! Plus direct HD download for offline use, enhances transcripts, member comment forums, and iTunes "podcast" RSS feed. Level up your skills now!

Unlock This Lesson

Already subscribed? Sign In

Autoplay

    Scraping Dynamic JavaScript Websites with Nightmare

    John LindquistJohn Lindquist

    Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around Electron) to launch a url and scrape dynamic data.

    nightmareNightmare
    nodeNode.js
    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    00:00 Many sites like weather.com require JavaScript execute to do something like render the temperature. For example, if I search for temperature as a selector, you can see it finds my temperature, but if I drill into it, you can see ng-isolate-scope, which means that someone is using Angular to render out this 76 degrees.

    00:22 If I try to scrape the temperature, I would only get a blank HTML tag right there. I wouldn't get the actual degrees, because you need a browser to run and execute the JavaScript.

    00:33 What I'm going to do, I'm going to leverage a project called "Nightmare," which is a wrapper around PhantomJS, which is a headless browser, meaning, it doesn't have any UI.

    00:42 It just launches in the background and can render a page and execute JavaScript. This Nightmare project makes it much easier to work with. I've already npm installed Nightmare and PhantomJS. I can say, "import Nightmare from Nightmare."

    01:01 Then, I just create a new Nightmare, and then leverage the Nightmare API to achieve what I want to achieve. To scrape the temperature from weather.com, I'll just say, "Go to." Then, I'll chain on an evaluate, and then basically, tell that to execute with a run.

    01:20 I want to go to weather.com. Evaluate is going to take two functions, the first one being in the scope of the browser so I can actually access the document here. The second function is going to handle the result that I return from that scope of the browser.

    01:45 What I mean by that is if I return document query selector, my query is just going to be temperature. I'll grab the inner text. This is going to return, and then, pass that in as an argument here, which I'll just call temperature and then log out temperature.

    02:12 I'll go ahead and run this. This will take a while, but it logs out 76 degrees. Nightmare is not doing anything too fancy for us. It's just giving us a convenient API or way to work with PhantomJS, where we go to a URL, we evaluate what's actually in the browser.

    02:31 Inside this function, we have the browser scope. Then in the second function, we can take what we return from this first one, and we're back into the scope of Node. Then, we run it.

    Discuss

    Discuss