The ability to reply to discussions is limited to PRO members. Want to join in the discussion? Click here to subscribe now.

Scraping Dynamic JavaScript Websites with Nightmare

Scraping Dynamic JavaScript Websites with Nightmare

2:43
Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around Electron) to launch a url and scrape dynamic data.
Watch this lesson now
Avatar
egghead.io

Many websites have more than just simple static content. Dynamic content which is rendered by JavaScript requires browser to be able to scrape data. This video demonstrates how to use Nightmare (which is a wrapper around PhantomJS) to launch a url and scrape dynamic data.

Avatar
aaawtest

Nice tutorial. Unfortunately I tried to use Nightmare to crawl an AJAX site and it didn't work.

var Nightmare = require('nightmare');
new Nightmare()
.goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl')
.evaluate(function () {
var links = document.querySelectorAll('th a');
return links
}, function (links) {
console.log(links);
})
.run();

Avatar
Willy

looks Nightmare doesn't support https...
or how to configurate it for https?

Thanks!

In reply to aaawtest
Avatar
John

When googling for answers, try looking for "PhantomJS https" (because Nightmare is just a wrapper around PhantomJS).

So add the follow config when you run your script:
"--ssl-protocol=any"
And I tossed together this as a bonus (I'll be talking more about "cheerio" in future videos):

import Nightmare from "nightmare";
import cheerio from "cheerio";

new Nightmare()
    .goto('https://l3com.taleo.net/careersection/l3_ext_us/jobsearch.ftl')
    .evaluate(function(){
        return document.documentElement.innerHTML; //pass all of the html as text
    }, function(html){
        let $ = cheerio.load(html); //use cheerio for jqeury in node
        let titles = $('#jobs .absolute>span>a').map(function(){
            return $(this).text();
        }).get();
        console.log(titles); //log out the array of job titles
    })
    .run();
In reply to Willy
Avatar
BoomTown

How does it compare to CasperJS? I've spent a lot of time with Casper, and I'm curious if someone out there is familiar enough with both APIs to have an opinion on the two.

Avatar
Baskin Tapkan

Thanks for the video. However, I could not get the current script to work. Looks like Nightmare has changed their syntax. Borrowing from the their posted example, I have come up with this and it worked for me after installing 'vo'.

var Nightmare = require('nightmare');
var vo = require('vo');

vo(function* () {
  var nightmare = Nightmare({ show: true });
  var link = yield nightmare
    .goto('http://weather.com')
    .evaluate(function () {
      return document.querySelector('.temperature').innerText;
    });
  yield nightmare.end();
  return link;
})(function (err, result) {
  if (err) return console.log(err);
  console.log(result);
});
Avatar
Abdi

Thanks Baskin, your code worked for me. These video should be updated with notes more clearly. I don't understand why vo is being used. Could you explain?

In reply to Baskin Tapkan
Avatar
coolsid

Hi,
I need to send custom header in goto.
How to achieve this ?

HEY, QUICK QUESTION!
Joel's Head
Why are we asking?