BROWSER for Python


Pavel Tyslacki

What we need

  • execute JS
  • run JS tests
  • automatization
  • create PDF
  • create screenshots
  • get web page load time
  • get HTML of dynamic web page

WHAT WE HAVE

Browsers (RENDER + JS ENGINE + NETWORKING + UI)

  • normal browsers
  • web views (QT, GTK, TK, WX)
  • phantomjs/slimerjs/htmlunit


JS Engines

  • node.js (V8)

Execute JS

just execute without DOM and other standard browser APIs

  • pyexecjs
  • pyv8


>>> import execjs


>>> print(execjs.eval('[1, 2, 4].indexOf(4)'))
2

>>> print(execjs.eval('[1, 2, 4].indexOf(3)'))
-1


Run JS tests + Automation

  • grunt/gulp
  • spynner
  • phantomjs/slimerjs/casperjs
  • selenium

GULP

JS task manager or build system


var gulp = require('gulp'),
    eslint = require('gulp-eslint'),
    qunit = require('gulp-qunit');

gulp.task('lint', function () {
    gulp.src('./tests.js')
        .pipe(eslint({
            rules: {'quotes': 1}, globals: {'QUnit': 1}, env: {browser: 1}
        }))
        .pipe(eslint.format());
});

gulp.task('test', function () {
    return gulp.src('./tests.html')
.
pipe(qunit()); });

Spynner

web browser module for python

based on pyqt webkit


import spynner

browser = spynner.Browser()
browser.load('http://www.python.org/')
browser.snapshot().save('file.png')
browser.close()

Phantomjs

headless webkit scriptable with JS API

  • ghost.py


  • DOM and other APIs
  • JS
  • webkit renderer
  • networking
  • PDF/PNG

Selenium

testing and automation framework for web applications

  • selenium web driver
  • selenium IDE
  • selenium grid

Selenium

  • Python
  • Ruby
  • JS (node.js)
  • Java
  • C#
  • Linux
  • Mac OS X
  • Windows
    • Firefox
    • Chrome
    • Safari
    • IE
    • PhantomJS
    • Android
    • IOS
    • Blackberry
    • Windows Phone

    Selenium

    login to python.org and go to public profile


    from selenium import webdriver
    
    driver = webdriver.Firefox()
    driver.get('https://www.python.org/accounts/login/')
    driver.find_element_by_id('id_login').send_keys('user')
    driver.find_element_by_id('id_password').send_keys('password')
    driver.find_element_by_css_selector('.primaryAction').click()
    
    feblt = driver.find_element_by_link_text
    action = webdriver.ActionChains(driver)
    action.move_to_element(feblt('Sign Out')).perform()
    driver.implicitly_wait(0.5)
    action.move_to_element(feblt('View your public profile')).perform()
    action.click().perform()
    
    driver.quit()
    

    PDF

    • weasyprint
    • wkhtmltopdf based: wkhtmltopdf/pdfkit
    • phantomjs based: ghost.py
    • pisa based: pisa/xhtml2pdf


    Result GROUPs

    • original (firefox)
    • weasyprint
    • webkit based (wkhtmltopdf and phantomjs based)
    • pisa based

    Bender


    • original
    • weasyprint
    • webkit based

    kyle


    • weasyprint
    • webkit based
    • original

    homer


    • webkit based
    • weasyprint
    • original

    python.org


    • pisa based
    • webkit based
    • weasyprint
    • original

    Python.su


    • webkit based
    • weasyprint
    • original

    PDF Results

    1. original browsers
    2. webkit based
      1. wkhtmltopdf
      2. phantomjs
    3. weasyprint


    pisa based - works bad

    Screenshots

    • phantomjs based: ghost.py
    • web view based: webkit2png/spynner
    • selenium


    no wkhtmltoimage :(

    Page Load Time

    • phantomjs
    • webpagetest
    • browsermob-proxy


    • HAR - HTTP archive format

    WebPageTest

    measuring and analyzing the performance of web pages

    • capture har
    • capture video


    instances on windows only :(

    Browsermod-proxy

    capture performance data from browsers

    • browsermob-proxy - proxy
    • browsermob-proxy-py - python client

    from browsermobproxy import Server
    server = Server('path/to/browsermob-proxy')
    server.start()
    proxy = server.create_proxy()
    
    from
    selenium import webdriver profile = webdriver.FirefoxProfile() profile.set_proxy(proxy.selenium_proxy()) driver = webdriver.Firefox(firefox_profile=profile) proxy.new_har('python') driver.get('http://www.google.co.uk') proxy.har # returns a HAR JSON blob server.stop() driver.quit()

    Get Dynamic HTML page

    • phantomjs
    • selenium
    • splash
    • spynner

    Splash

    JS rendering service with HTTP API

    runs on top of twisted and pyqt webkit


    run

    $ python -m splash.server
    

    render HTML

    $ curl 'http://localhost:8050/render.html?url=http://domain.com/page-with-javascript.html&timeout=10&wait=0.5'

    Email markup

    tab and inline hell

    • premailer


    <html>
      <style type="text/css">
        h1 { border:1px solid black }
        p { color:red;}
      </style>
      <h1 style="font-weight:bolder">Peter</h1>
      <p>Hej</p>
    </html>
    

    <html>
        <h1 style="font-weight:bolder; border:1px solid black">Peter</h1>
        <p style="color:red">Hej</p>
    </html>

    Links


    Questions

    pavel.tyslyatsky@gmail.com

    https://github.com/tbicr/browser_for_python

    Browser for Python

    By Pavel Tyslacki

    Browser for Python

    • 1,572