Levelling Up Your Web Scraping Game

PHP UK Conference 2022

Ian Littman (CTO @ Covie) / @iansltx

What we'll cover

Principle #1: It's all about the data

That data gets to a normal user over HTTPS (at least in most cases)

Principle #2: Choose runtime tradeoffs wisely

PHP

Headless Browser

Principle #3: Try to hit edge cases during dev

Tool: Firefox

Demo #1: LocalCallingGuide

Demo #2: Hippo

...but you probably won't use an interactive CLI...

...so You'll need to Save/restore state between requests

If you need to use a real browser...

SOmetimes sites don't want you to scrape

Principle #4: Try the mobile app

TooL: CharlesProxy for iOS

Caveat: Certificate pinning

Sometimes your data isn't in HTML/JS/XML

Thanks! Questions?

By Ian Littman

Levelling Up Your Web Scraping Game - PHP UK 2022

Levelling Up Your Web Scraping Game

PHP UK Conference 2022

What we'll cover

Principle #1: It's all about the data

Principle #2: Choose runtime tradeoffs wisely

PHP

Headless Browser

Principle #3: Try to hit edge cases during dev

Tool: Firefox

Demo #1: LocalCallingGuide

Demo #2: Hippo

...but you probably won't use an interactive CLI...

...so You'll need to Save/restore state between requests

If you need to use a real browser...

SOmetimes sites don't want you to scrape

Principle #4: Try the mobile app

TooL: CharlesProxy for iOS

Caveat: Certificate pinning

Sometimes your data isn't in HTML/JS/XML

Thanks! Questions?

More from Ian Littman