Creating Web Scrapers
Tim Bond
Seattle PHP Meetup - July 11, 2017
What is a ?
Things I'm Scraping Right Now
- Apartment listings
- HTC's Repair Status Page
- My eBay Feedback
- TV Schedules
- Podcasts
- Wait times
- Tons more
Get the Raw Data
- Look for AJAX requests
- View source
- HTML Tags
- Embedded JSON
Extract the Data
- explode()
- SimpleXML
- DOMDocument
- regex

Often a combination of two or more
<body>
<div>Irrelevant data</div>
<!-- Details -->
<div>Important data</div>
<!-- Details End -->
</body>
$html = explode('<!-- Details -->', $html)[1];
$html = explode('<!-- Details End -->', $html)[0];
Explode Example
<div>
<ul>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ul>
</div>
$xml = simplexml_load_string($string);
foreach($xml->ul->li as $li) {
echo "$li\n";
}
SimpleXML Example
Warnings
- Act like a browser
- Cache
- Ongoing development

Non-Published APIs


Packet Capture
for Android
Questions
Scrapers
By Tim Bond
Scrapers
Seattle PHP Meetup - July 11, 2017
- 857
