"Crawling" JS Pages
Andrei Demian @ Software Engineer OPSWAT. TimJS - 2015
the requirements
-
make JS content available to crawlers
-
make JS content shareable
prerender.io
Node server that uses PhantomJs to render a javascript page as HTML.
why prerender.io
-
opensource
-
easy to setup
-
well documented
-
lots of plugins & middlewares
- paid service
full compatibility






supported middlewares
-
apache
-
nginx
-
javascript
-
php
...
available plugins
-
basic auth
-
cache (S3, mongo, memory, level)
-
whitelist / blacklist requests
-
access logger (me)
...
Andrei Demian @ Software Engineer OPSWAT. TimJS - 2015

escaped fragment
If you use html5 push state (recommended):
<meta name="fragment" content="!">
http://www.example.com/user/1
http://www.example.com/user/1?_escaped_fragment_=
If you use the hashbang (#!):
http://www.example.com/#!/user/1
http://www.example.com/?_escaped_fragment_=/user/1
.htaccess magic
RewriteCond %{HTTP_USER_AGENT} (Google|Facebot|Googlebot|bingbot) [NC] RewriteCond %{QUERY_STRING} _escaped_fragment_ RewriteRule ^ http://prerender.local/http://yourdomain.com/%{REQUEST_URI} [P,L]
tricks
// Page is ready
window.prerenderReady = true;
// Replace #! in shared URLs
www.mydomain/#!/stats => www.mydomain.com/sharer/stats
// hashbang urls in angular
$location.hasPrefix = '!';
tricks
// Detect crawlers and redirect them to the prerender server
RewriteCond %{HTTP_USER_AGENT} (Google|Facebot|Googlebot|bingbot) [NC]
RewriteRule ^sharer/(.*) /?_escaped_fragment_=/$1 [P,L]
// Detect humans and redirect them to #!
RewriteCond %{HTTP_USER_AGENT} ! (Google|Facebot|Googlebot|bingbot) [NC]
RewriteRule ^sharer/(.*) /#!/$1 [NE,L,R=301]
?
Thanks!
Crawling with Prerender.io
By Andrei Demian
Crawling with Prerender.io
- 456