Puppeteer

A WebScraping and UI Testing tool

help

Who am I?

Miki Lombardi

Full Stack Developer at Plansoft s.r.l

Endless curious, computer science lover, cinema addicted, sport maniac | Married w\ Pizza

@thejoin95

What is it?

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

What can I do?

Scrape a website in headless mode
Page automation
UI Testing environment
Timeline trace (performance)
CSS & JS Usage stats
Most things that you can do manually in the browser and more!

Wow! But.. what it means "headless"?

Why Puppeteer?

Google vs Other

Powered by Google
Javascript & NPM (easy to setup)
Easy to use/learn APIs
Fast and high reliability
Easy to integrate with Test Framework
Visual Regression Test compatible
Offline Mode
Network request interceptor
Debuggable
Multi platform
Puppeteer Recorder
Puppeteer Extra (plugins)

Advantages

Disadvantages

Just one language
Just one platform (puppeteer-firefox in beta)
No native video recording in non-headless mode

Why Puppeteer?

Google vs Other

Selenium & PhantomJS

Cypress & Scrapy

Maintenance issue
Low Learn curve
Setup issue

Disadvantages

Selenium is multi-language and multi platform
Great user community

Advantages

Single language
Only Testing or scraping purpose
Few system integration

Disadvantages

Cypress is a great test service
Great user community
(scrapy) Python language

Advantages

Getting started

npm install puppeteer

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.plansoft.it/');
  await page.screenshot({path: 'hp.png'});

  await browser.close();
})();

Take a screenshot of a webpage

Install & Setup Puppeteer

Demo console: https://try-puppeteer.appspot.com/

Ok, cool. Let's get serious

Testing & Scraping

const puppeteer = require('puppeteer');

(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
	const price = await page.$eval('.vi-price', div => div.textContent);
	console.log(price);
	await browser.close();
})();



/**
* Print out: EUR 79,99
*/

We have to test (and maybe trace the price change) the product page of Ebay.it.

How you do it? First of all.. get that price!

Testing & Scraping

const puppeteer = require('puppeteer');

(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
        // Wait until the selector is being visible on the page
        await page.waitForSelector('.vi-price');
	const price = await page.$eval('.vi-price', div => div.textContent);
	console.log(price);
	await browser.close();
})();



/**
* If the price will be loaded via AJAX we can have these results:
* 
* Without the waitForSelector: Null or Throw an error
*
* With the waitForSelector it print out: EUR 79,99
*/

Price loaded over ajax request - Example

Testing & Scraping

// Dependencies
// More tests

it('should exists the buy now button', async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://www.ebay.it/.....')
	expect(await page.$('#binBtn_btn')).toBeTruthy();
	await browser.close();
})();

// More tests

Test if exist the "Buy Now" button

// Dependencies
// More tests
const browser = await puppeteer.launch();

it('should exists the buy now button', async () => {
	const context = await browser.createIncognitoBrowserContext();
	const page = await context.newPage();
	await page.goto('https://www.ebay.it/.....')
	expect(await page.$('#binBtn_btn')).toBeTruthy();
	await context.close();
})();

// More tests
await browser.close();

Testing: request & async

it('should pay', async () => {
	const context = browser.createIncognitoBrowserContext();
	const browser = await context.launch();
	const page = await context.newPage();
        await page.emulate(puppeteer.devices['iPhone 8']);
	await page.goto('https://www.ebay.it/.....')

	await page.click('#binBtn_btn');
  	const response = page.waitForResponse(res => res.url().endsWith('paypal'));

	// Wait to catch the response of the payment request
	await response;
  	console.log(response); // {success: true, etc...}

	assert(await page.$('.payment-success-popup'));
	await context.close();
})();

We can test also the payment feature by clicking on the "Buy now" button and wait for the request response.

If the request fail or the success popup did not show, the test fail.
Note the emulate method, that is emulating an iPhone 8.

The waitFor* method are very powerful and useful for e2e testing:

waitForSelector

waitForRequest

waitForResponse

waitForFunction

waitForNavigation

waitForTarget

waitForXpath

Methods:

note: this is just an example. Ebay works differently

Service Worker

it('should cache the logo image', async () => {
	const context = browser.createIncognitoBrowserContext();
	const browser = await context.launch();
	const page = await context.newPage();
	await page.goto('https://pptr.dev')

	const swTarget = await context.waitForTarget(target => {
		return target.type() === 'service_worker';
	});
	const serviceWorker = await swTarget.worker();
	const isCached = await serviceWorker.evaluate(async () => {
	    return !!(
                await caches.match(
                    'https://user-images.githubusercontent.com/10379601/29446482-04f7036a-841f-11e7-9872-91d1fc2ea683.png'
                )
            );
	});
	
        expect(isCached).toBe(true);
	await context.close();
})();

Test the service worker with Puppeteer:

Testing: geolocation

it('should have the price in pound', async () => {
	...
	await context.overridePermissions('https://www.ebay.it', [
		'geolocation'
	]);

	const page = await context.newPage();
	await page.setGeolocation({
		latitude: 51.50,
		longitude: 0
	});
	
	await page.goto('https://www.ebay.it/.....')
	const price = await page.$eval('.vi-price', div => div.textContent);

	expect(price).toBe('£79.99');
	
})();

We can set also the geolocation for testing the application in different language and currency:

note: this is just an example. Ebay works differently

Testing: request interceptor

...

await page.setRequestInterception(true);

page.on('request', request => {
	if(request.resourceType() === 'image')
		request.respond({body: 'cat.jpg'});
	else
		request.continue();
});

...

We can intercept the requests and change the content, or abort it, with the following example.
Imagine an ebay of cats:

note: this is just an example. Ebay does not sell cats

Performance Testing

....
const metrics = await page.metrics();
console.log(metrics);

{
	...
	Frames: 4,
	JSEventListeners: 353,
	Nodes: 160,
	LayoutCount: 7,
	ScriptDuration: 0.20300213,
	JSHeapUsedSize: 78309013,
	JSHeapTotalSize: 11689984
	...
}

Using the DevTools backend it possible to retrieve the metrics, the css & js coverage and all the other useful data from the google developer console

The following example will print out the page metrics:

Performance Testing

await page.tracing.start({path: '/metrics/trace.json'});
await page.coverage.startJSCoverage()
await page.coverage.startCSSCoverage()

await page.goto('https://www.ebay.it/...');

// Some actions here

await page.tracing.stop();
await page.coverage.stopJSCoverage()
await page.coverage.stopCSSCoverage()

The page trace follow the whole navigation of the browser, from the point where the tracing was started. This is a very useful feature if there is a necessary to record the performance of a single page or a single feature instead of all.

The following example will record the JS and CSS coverage and the navigation trace:

The trace and the JS & CSS coverages are the same of the Google Developer Console, but in JSON.

Performance Testing

[
  {
    "url": "some file .css",
    "ranges": [
      {
        "start": 4,
        "end": 58
      },
      {
        "start": 102,
        "end": 130
      }
    ],
    "text": "    div {\n      background: red;\n      color: white;\n    }\n\n    .aqua {\n      color: aqua;\n    }\n\n"
  }
]

Here an example of the CSS coverage:

Mouse & Keyboard event

It's possible to emulate the keyPress and the mouseClick events by using the following methods:

await page.click('#element');

await page.type('#element', 'some text');

await page.select('#element', 'value');

await page.keyboard.press('Enter');

await page.keyboard.press('Shift');

Skyscanner Scraper

....
await this.page.click('#departure-fsc-datepicker-button');
await this.page.waitForSelector('.departure-datepicker'); // is a popup element loaded by an AJAX request

await page.select('select[name="months"]', '2019-10');

let day = '20';

await page.evaluate(day => {
    document.querySelectorAll(
        '[class*="BpkCalendarDate_bpk-calendar-date"]:not([class*="BpkCalendarDate_bpk-calendar-date--outside"])'
    )[(parseInt(day) - 1)]
        .click();
);
...

I made a simple SkyScanner Scraper that use Puppeteer, so you can see a "real usage".
The code is available at: https://github.com/TheJoin95/skyscanner-flights-scraping

Here are some implementation of the feature as show in this presentation:

Here is another great project: https://github.com/social-manager-tools

Example: Login & Auth

Questions?

Thank you

Puppeteer

Who am I?

Miki Lombardi

What is it?

What can I do?

Wow! But.. what it means "headless"?

Why Puppeteer?

Why Puppeteer?

Getting started

Ok, cool. Let's get serious

Testing & Scraping

Testing & Scraping

Testing & Scraping

Testing: request & async

Service Worker

Testing: geolocation

Testing: request interceptor

Performance Testing

Performance Testing

Performance Testing

Mouse & Keyboard event

Skyscanner Scraper

Questions?

Puppeteer | Web scraping and UI testing tool

Puppeteer | Web scraping and UI testing tool

Miki Lombardi