Puppeteer

A WebScraping and UI Testing tool

Who am I?

Miki Lombardi

Full Stack Developer at Plansoft s.r.l

       Endless curious,         computer science lover,        cinema addicted,         sport maniac | Married w\ Pizza

@thejoin95

What is it?

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

What can I do?

  • Scrape a website in headless mode
  • Page automation
  • UI Testing environment
  • Timeline trace (performance)
  • CSS & JS Usage stats
  • Most things that you can do manually in the browser and more!

Wow! But.. what it means "headless"?

Why Puppeteer?

Google vs Other

  • Powered by Google
  • Javascript & NPM (easy to setup)
  • Easy to use/learn APIs
  • Fast and high reliability
  • Easy to integrate with Test Framework
  • Visual Regression Test compatible
  • Offline Mode
  • Network request interceptor
  • Debuggable
  • Multi platform
  • Puppeteer Recorder
  • Puppeteer Extra (plugins)

Advantages

Disadvantages

  • Just one language
  • Just one platform (puppeteer-firefox in beta)
  • No native video recording in non-headless mode

Why Puppeteer?

Google vs Other

Selenium & PhantomJS

Cypress & Scrapy

  • Maintenance issue
  • Low Learn curve
  • Setup issue

Disadvantages

  • Selenium is multi-language and multi platform
  • Great user community

Advantages

  • Single language
  • Only Testing or scraping purpose
  • Few system integration

Disadvantages

  • Cypress is a great test service
  • Great user community
  • (scrapy) Python language

Advantages

Getting started

npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.plansoft.it/');
  await page.screenshot({path: 'hp.png'});

  await browser.close();
})();

Take a screenshot of a webpage

Install & Setup Puppeteer

Ok, cool. Let's get serious

Testing & Scraping

const puppeteer = require('puppeteer');

(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
	const price = await page.$eval('.vi-price', div => div.textContent);
	console.log(price);
	await browser.close();
})();



/**
* Print out: EUR 79,99
*/

We have to test (and maybe trace the price change) the product page of Ebay.it.

How you do it? First of all.. get that price!

Testing & Scraping

const puppeteer = require('puppeteer');

(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
        // Wait until the selector is being visible on the page
        await page.waitForSelector('.vi-price');
	const price = await page.$eval('.vi-price', div => div.textContent);
	console.log(price);
	await browser.close();
})();



/**
* If the price will be loaded via AJAX we can have these results:
* 
* Without the waitForSelector: Null or Throw an error
*
* With the waitForSelector it print out: EUR 79,99
*/

Price loaded over ajax request - Example

Testing & Scraping

// Dependencies
// More tests

it('should exists the buy now button', async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
	expect(await page.$('#binBtn_btn')).toBeTruthy();
	await browser.close();
})();

// More tests

Test if exist the "Buy Now" button

// Dependencies
// More tests
const browser = await puppeteer.launch();

it('should exists the buy now button', async () => {
	const context = await browser.createIncognitoBrowserContext();
	const page = await context.newPage();
	await page.goto('https://www.ebay.it/.....')
	expect(await page.$('#binBtn_btn')).toBeTruthy();
	await context.close();
})();

// More tests
await browser.close();

Testing: request & async

it('should pay', async () => {
	const context = browser.createIncognitoBrowserContext();
	const browser = await context.launch();
	const page = await context.newPage();
        await page.emulate(puppeteer.devices['iPhone 8']);
	await page.goto('https://www.ebay.it/.....')

	await page.click('#binBtn_btn');
  	const response = page.waitForResponse(res => res.url().endsWith('paypal'));

	// Wait to catch the response of the payment request
	await response;
  	console.log(response); // {success: true, etc...}

	assert(await page.$('.payment-success-popup'));
	await context.close();
})();

We can test also the payment feature by clicking on the "Buy now" button and wait for the request response.

If the request fail or the success popup did not show, the test fail.
Note the emulate method, that is emulating an iPhone 8.

The
waitFor* method are very powerful and useful for e2e testing:

waitForSelector
waitForRequest
waitForResponse
waitForFunction
waitForNavigation
waitForTarget
waitForXpath

Methods:

note: this is just an example. Ebay works differently

Service Worker

it('should cache the logo image', async () => {
	const context = browser.createIncognitoBrowserContext();
	const browser = await context.launch();
	const page = await context.newPage();
	await page.goto('https://pptr.dev')

	const swTarget = await context.waitForTarget(target => {
		return target.type() === 'service_worker';
	});
	const serviceWorker = await swTarget.worker();
	const isCached = await serviceWorker.evaluate(async () => {
	    return !!(
                await caches.match(
                    'https://user-images.githubusercontent.com/10379601/29446482-04f7036a-841f-11e7-9872-91d1fc2ea683.png'
                )
            );
	});
	
        expect(isCached).toBe(true);
	await context.close();
})();

Test the service worker with Puppeteer:

Testing: geolocation

it('should have the price in pound', async () => {
	...
	await context.overridePermissions('https://www.ebay.it', [
		'geolocation'
	]);

	const page = await context.newPage();
	await page.setGeolocation({
		latitude: 51.50,
		longitude: 0
	});
	
	await page.goto('https://www.ebay.it/.....')
	const price = await page.$eval('.vi-price', div => div.textContent);

	expect(price).toBe('£79.99');
	
})();

We can set also the geolocation for testing the application in different language and currency:

note: this is just an example. Ebay works differently

Testing: request interceptor

...

await page.setRequestInterception(true);

page.on('request', request => {
	if(request.resourceType() === 'image')
		request.respond({body: 'cat.jpg'});
	else
		request.continue();
});

...

We can intercept the requests and change the content, or abort it, with the following example.
Imagine an ebay of cats:

note: this is just an example. Ebay does not sell cats

Performance Testing

....
const metrics = await page.metrics();
console.log(metrics);

{
	...
	Frames: 4,
	JSEventListeners: 353,
	Nodes: 160,
	LayoutCount: 7,
	ScriptDuration: 0.20300213,
	JSHeapUsedSize: 78309013,
	JSHeapTotalSize: 11689984
	...
}

Using the DevTools backend it possible to retrieve the metrics, the css & js coverage and all the other useful data from the google developer console

 

The following example will print out the page metrics:

Performance Testing

await page.tracing.start({path: '/metrics/trace.json'});
await page.coverage.startJSCoverage()
await page.coverage.startCSSCoverage()

await page.goto('https://www.ebay.it/...');

// Some actions here

await page.tracing.stop();
await page.coverage.stopJSCoverage()
await page.coverage.stopCSSCoverage()

The page trace follow the whole navigation of the browser, from the point where the tracing was started. This is a very useful feature if there is a necessary to record the performance of a single page or a single feature instead of all.

The following example will record the JS and CSS coverage and the navigation trace:

The trace and the JS & CSS coverages are the same of the Google Developer Console, but in JSON.

Performance Testing

[
  {
    "url": "some file .css",
    "ranges": [
      {
        "start": 4,
        "end": 58
      },
      {
        "start": 102,
        "end": 130
      }
    ],
    "text": "    div {\n      background: red;\n      color: white;\n    }\n\n    .aqua {\n      color: aqua;\n    }\n\n"
  }
]

Here an example of the CSS coverage:

Mouse & Keyboard event

It's possible to emulate the keyPress and the mouseClick events by using the following methods:

await page.click('#element');

await page.type('#element', 'some text');

await page.select('#element', 'value');

await page.keyboard.press('Enter');

await page.keyboard.press('Shift');

Skyscanner Scraper

....
await this.page.click('#departure-fsc-datepicker-button');
await this.page.waitForSelector('.departure-datepicker'); // is a popup element loaded by an AJAX request

await page.select('select[name="months"]', '2019-10');

let day = '20';

await page.evaluate(day => {
    document.querySelectorAll(
        '[class*="BpkCalendarDate_bpk-calendar-date"]:not([class*="BpkCalendarDate_bpk-calendar-date--outside"])'
    )[(parseInt(day) - 1)]
        .click();
);
...

I made a simple SkyScanner Scraper that use Puppeteer, so you can see a "real usage".
The code is available at:
https://github.com/TheJoin95/skyscanner-flights-scraping

 

Here are some implementation of the feature as show in this presentation:

Here is another great project: https://github.com/social-manager-tools

Example: Login & Auth

Questions?

Thank you

Puppeteer | Web scraping and UI testing tool

By Miki Lombardi

Puppeteer | Web scraping and UI testing tool

  • 1,032