Scraping Dynamic JavaScript Websites with Nightmare

John Lindquist
InstructorJohn Lindquist
Share this video with your friends

Social Share Links

Send Tweet
Published 9 years ago
Updated 6 years ago

Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around Electron) to launch a url and scrape dynamic data.

[00:00] Many sites like weather.com require JavaScript execute to do something like render the temperature. For example, if I search for temperature as a selector, you can see it finds my temperature, but if I drill into it, you can see ng-isolate-scope, which means that someone is using Angular to render out this 76 degrees.

[00:22] If I try to scrape the temperature, I would only get a blank HTML tag right there. I wouldn't get the actual degrees, because you need a browser to run and execute the JavaScript.

[00:33] What I'm going to do, I'm going to leverage a project called "Nightmare," which is a wrapper around PhantomJS, which is a headless browser, meaning, it doesn't have any UI.

[00:42] It just launches in the background and can render a page and execute JavaScript. This Nightmare project makes it much easier to work with. I've already npm installed Nightmare and PhantomJS. I can say, "import Nightmare from Nightmare."

[01:01] Then, I just create a new Nightmare, and then leverage the Nightmare API to achieve what I want to achieve. To scrape the temperature from weather.com, I'll just say, "Go to." Then, I'll chain on an evaluate, and then basically, tell that to execute with a run.

[01:20] I want to go to weather.com. Evaluate is going to take two functions, the first one being in the scope of the browser so I can actually access the document here. The second function is going to handle the result that I return from that scope of the browser.

[01:45] What I mean by that is if I return document query selector, my query is just going to be temperature. I'll grab the inner text. This is going to return, and then, pass that in as an argument here, which I'll just call temperature and then log out temperature.

[02:12] I'll go ahead and run this. This will take a while, but it logs out 76 degrees. Nightmare is not doing anything too fancy for us. It's just giving us a convenient API or way to work with PhantomJS, where we go to a URL, we evaluate what's actually in the browser.

[02:31] Inside this function, we have the browser scope. Then in the second function, we can take what we return from this first one, and we're back into the scope of Node. Then, we run it.