Puppeteer
A WebScraping and UI Testing tool
Who am I?
Miki Lombardi
Full Stack Developer at Plansoft s.r.l
Endless curious, computer science lover, cinema addicted, sport maniac | Married w\ Pizza
@thejoin95
What is it?
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
What can I do?
- Scrape a website in headless mode
- Page automation
- UI Testing environment
- Timeline trace (performance)
- CSS & JS Usage stats
- Most things that you can do manually in the browser and more!
Wow! But.. what it means "headless"?
Why Puppeteer?
Google vs Other
- Powered by Google
- Javascript & NPM (easy to setup)
- Easy to use/learn APIs
- Fast and high reliability
- Easy to integrate with Test Framework
- Visual Regression Test compatible
- Offline Mode
- Network request interceptor
- Debuggable
- Multi platform
- Puppeteer Recorder
- Puppeteer Extra (plugins)
Advantages
Disadvantages
- Just one language
- Just one platform (puppeteer-firefox in beta)
- No native video recording in non-headless mode
Why Puppeteer?
Google vs Other
Selenium & PhantomJS
Cypress & Scrapy
- Maintenance issue
- Low Learn curve
- Setup issue
Disadvantages
- Selenium is multi-language and multi platform
- Great user community
Advantages
- Single language
- Only Testing or scraping purpose
- Few system integration
Disadvantages
- Cypress is a great test service
- Great user community
- (scrapy) Python language
Advantages
Getting started
npm install puppeteer
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.plansoft.it/');
await page.screenshot({path: 'hp.png'});
await browser.close();
})();
Take a screenshot of a webpage
Install & Setup Puppeteer
Demo console: https://try-puppeteer.appspot.com/
Ok, cool. Let's get serious
Testing & Scraping
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.ebay.it/.....')
const price = await page.$eval('.vi-price', div => div.textContent);
console.log(price);
await browser.close();
})();
/**
* Print out: EUR 79,99
*/
We have to test (and maybe trace the price change) the product page of Ebay.it.
How you do it? First of all.. get that price!
Testing & Scraping
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.ebay.it/.....')
// Wait until the selector is being visible on the page
await page.waitForSelector('.vi-price');
const price = await page.$eval('.vi-price', div => div.textContent);
console.log(price);
await browser.close();
})();
/**
* If the price will be loaded via AJAX we can have these results:
*
* Without the waitForSelector: Null or Throw an error
*
* With the waitForSelector it print out: EUR 79,99
*/
Price loaded over ajax request - Example
Testing & Scraping
// Dependencies
// More tests
it('should exists the buy now button', async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.ebay.it/.....')
expect(await page.$('#binBtn_btn')).toBeTruthy();
await browser.close();
})();
// More tests
Test if exist the "Buy Now" button
// Dependencies
// More tests
const browser = await puppeteer.launch();
it('should exists the buy now button', async () => {
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await page.goto('https://www.ebay.it/.....')
expect(await page.$('#binBtn_btn')).toBeTruthy();
await context.close();
})();
// More tests
await browser.close();
Testing: request & async
it('should pay', async () => {
const context = browser.createIncognitoBrowserContext();
const browser = await context.launch();
const page = await context.newPage();
await page.emulate(puppeteer.devices['iPhone 8']);
await page.goto('https://www.ebay.it/.....')
await page.click('#binBtn_btn');
const response = page.waitForResponse(res => res.url().endsWith('paypal'));
// Wait to catch the response of the payment request
await response;
console.log(response); // {success: true, etc...}
assert(await page.$('.payment-success-popup'));
await context.close();
})();
We can test also the payment feature by clicking on the "Buy now" button and wait for the request response.
If the request fail or the success popup did not show, the test fail.
Note the emulate method, that is emulating an iPhone 8.
The waitFor* method are very powerful and useful for e2e testing:
waitForSelector |
waitForRequest |
waitForResponse |
waitForFunction |
waitForNavigation |
waitForTarget |
waitForXpath |
Methods:
note: this is just an example. Ebay works differently
Service Worker
it('should cache the logo image', async () => {
const context = browser.createIncognitoBrowserContext();
const browser = await context.launch();
const page = await context.newPage();
await page.goto('https://pptr.dev')
const swTarget = await context.waitForTarget(target => {
return target.type() === 'service_worker';
});
const serviceWorker = await swTarget.worker();
const isCached = await serviceWorker.evaluate(async () => {
return !!(
await caches.match(
'https://user-images.githubusercontent.com/10379601/29446482-04f7036a-841f-11e7-9872-91d1fc2ea683.png'
)
);
});
expect(isCached).toBe(true);
await context.close();
})();
Test the service worker with Puppeteer:
Testing: geolocation
it('should have the price in pound', async () => {
...
await context.overridePermissions('https://www.ebay.it', [
'geolocation'
]);
const page = await context.newPage();
await page.setGeolocation({
latitude: 51.50,
longitude: 0
});
await page.goto('https://www.ebay.it/.....')
const price = await page.$eval('.vi-price', div => div.textContent);
expect(price).toBe('£79.99');
})();
We can set also the geolocation for testing the application in different language and currency:
note: this is just an example. Ebay works differently
Testing: request interceptor
...
await page.setRequestInterception(true);
page.on('request', request => {
if(request.resourceType() === 'image')
request.respond({body: 'cat.jpg'});
else
request.continue();
});
...
We can intercept the requests and change the content, or abort it, with the following example.
Imagine an ebay of cats:
note: this is just an example. Ebay does not sell cats
Performance Testing
....
const metrics = await page.metrics();
console.log(metrics);
{
...
Frames: 4,
JSEventListeners: 353,
Nodes: 160,
LayoutCount: 7,
ScriptDuration: 0.20300213,
JSHeapUsedSize: 78309013,
JSHeapTotalSize: 11689984
...
}
Using the DevTools backend it possible to retrieve the metrics, the css & js coverage and all the other useful data from the google developer console
The following example will print out the page metrics:
Performance Testing
await page.tracing.start({path: '/metrics/trace.json'});
await page.coverage.startJSCoverage()
await page.coverage.startCSSCoverage()
await page.goto('https://www.ebay.it/...');
// Some actions here
await page.tracing.stop();
await page.coverage.stopJSCoverage()
await page.coverage.stopCSSCoverage()
The page trace follow the whole navigation of the browser, from the point where the tracing was started. This is a very useful feature if there is a necessary to record the performance of a single page or a single feature instead of all.
The following example will record the JS and CSS coverage and the navigation trace:
The trace and the JS & CSS coverages are the same of the Google Developer Console, but in JSON.
Performance Testing
[
{
"url": "some file .css",
"ranges": [
{
"start": 4,
"end": 58
},
{
"start": 102,
"end": 130
}
],
"text": " div {\n background: red;\n color: white;\n }\n\n .aqua {\n color: aqua;\n }\n\n"
}
]
Here an example of the CSS coverage:
Mouse & Keyboard event
It's possible to emulate the keyPress and the mouseClick events by using the following methods:
await page.click('#element');
await page.type('#element', 'some text');
await page.select('#element', 'value');
await page.keyboard.press('Enter');
await page.keyboard.press('Shift');
Skyscanner Scraper
....
await this.page.click('#departure-fsc-datepicker-button');
await this.page.waitForSelector('.departure-datepicker'); // is a popup element loaded by an AJAX request
await page.select('select[name="months"]', '2019-10');
let day = '20';
await page.evaluate(day => {
document.querySelectorAll(
'[class*="BpkCalendarDate_bpk-calendar-date"]:not([class*="BpkCalendarDate_bpk-calendar-date--outside"])'
)[(parseInt(day) - 1)]
.click();
);
...
I made a simple SkyScanner Scraper that use Puppeteer, so you can see a "real usage".
The code is available at: https://github.com/TheJoin95/skyscanner-flights-scraping
Here are some implementation of the feature as show in this presentation:
Here is another great project: https://github.com/social-manager-tools
Questions?
Thank you
Puppeteer | Web scraping and UI testing tool
By Miki Lombardi
Puppeteer | Web scraping and UI testing tool
- 1,032