Scraping Dynamic JavaScript Websites with Nightmare

John Lindquist
InstructorJohn Lindquist

Share this video with your friends

Send Tweet

Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around Electron) to launch a url and scrape dynamic data.

aaawtest
aaawtest
~ 6 years ago

Nice tutorial. Unfortunately I tried to use Nightmare to crawl an AJAX site and it didn't work.

var Nightmare = require('nightmare'); new Nightmare() .goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl') .evaluate(function () { var links = document.querySelectorAll('th a'); return links }, function (links) { console.log(links); }) .run();

Willy
Willy
~ 6 years ago

looks Nightmare doesn't support https... or how to configurate it for https?

Thanks!

John Lindquist
John Lindquistinstructor
~ 6 years ago

When googling for answers, try looking for "PhantomJS https" (because Nightmare is just a wrapper around PhantomJS).

So add the follow config when you run your script: "--ssl-protocol=any" And I tossed together this as a bonus (I'll be talking more about "cheerio" in future videos):


import Nightmare from "nightmare";
import cheerio from "cheerio";

new Nightmare()
    .goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl')
    .evaluate(function(){
        return document.documentElement.innerHTML; //pass all of the html as text
    }, function(html){
        let $ = cheerio.load(html); //use cheerio for jqeury in node
        let titles = $('#jobs .absolute>span>a').map(function(){
            return $(this).text();
        }).get();
        console.log(titles); //log out the array of job titles
    })
    .run();
BoomTown
BoomTown
~ 6 years ago

How does it compare to CasperJS? I've spent a lot of time with Casper, and I'm curious if someone out there is familiar enough with both APIs to have an opinion on the two.

Baskin Tapkan
Baskin Tapkan
~ 6 years ago

Thanks for the video. However, I could not get the current script to work. Looks like Nightmare has changed their syntax. Borrowing from the their posted example, I have come up with this and it worked for me after installing 'vo'.

var Nightmare = require('nightmare');
var vo = require('vo');

vo(function* () {
  var nightmare = Nightmare({ show: true });
  var link = yield nightmare
    .goto('http://weather.com')
    .evaluate(function () {
      return document.querySelector('.temperature').innerText;
    });
  yield nightmare.end();
  return link;
})(function (err, result) {
  if (err) return console.log(err);
  console.log(result);
});
BrightPixels
BrightPixels
~ 5 years ago

Thanks Baskin, your code worked for me. These video should be updated with notes more clearly. I don't understand why vo is being used. Could you explain?

coolsid
coolsid
~ 5 years ago

Hi, I need to send custom header in goto. How to achieve this ?