Advanced 

programming

Lecture 5

Måns Magnusson

Statistics and Machine learning

Department of computer and information science

Since last time?

Input and output

Input and output

Advanced R Programming

Måns Magnusson

Input

Output

Format and localization

"Formats"

Advanced R Programming

Måns Magnusson

csv files

excel files

documents

sas, spss, R,  ... files

html

json

xml

SQL

MongoDB

data files (see input)

reports (dyn)

update DB

html

graphs

"Shiny"

maps

maps

More?

pictures

pictures

Localization

Advanced R Programming

Måns Magnusson

own computer

local network

local database

cloud storage

web pages

web scraping

web APIs

remote database

Basic I/O

Basic I/O

Advanced R Programming

Måns Magnusson

Files on your computer

# Input simple data
read.table()
read.csv()
read.csv2()

load()


# Output simple data
write.table()
write.csv()
write.csv2()

save()

More complex formats

Advanced R Programming

Måns Magnusson

software/data package
Excel XLConnect
SAS, SPSS, STATA, ... foreign
XML xml
JSON (GeoJSON) rjsonio, RJSON
Documents tm
Maps sp
Images raster

Cloud storage

Cloud storage

Advanced R Programming

Måns Magnusson

(

)

(

)

Why?

Advanced R Programming

Måns Magnusson

Robust

Backups

Cloud computing

... but how about safety?

... and can be tricky in the beginning

Advanced R Programming

Måns Magnusson

Arbitrary data

Structured data

Packages

Advanced R Programming

Måns Magnusson

Remote package
General downloader
GitHub repmis, downloader
Dropbox rdrop2
Amazon RAmazonS3
Google Docs googlesheets

web API:s

using httr

web API:s

Advanced R Programming

Måns Magnusson

 application program interface using http

"contract to 'get data' online"

more and more common

examples:
github

Riksdagen

Statistics Sweden

RESTful API:s

RESTful

Advanced R Programming

Måns Magnusson

 Basic principles:

Data is returned (JSON / XML)

Each specific data has its own URI

Communication is based on HTTP verbs

Hypertext Transfer Protocol (http)

Advanced R Programming

Måns Magnusson

Hypertext Transfer Protocol (http)

Advanced R Programming

Måns Magnusson

Verbs

Advanced R Programming

Måns Magnusson

Verb Description
GET Get "data" from server.
POST Post "data" to server (to get something)
PUT Update "data" on server
DELETE Delete posted data on server

Status codes

Advanced R Programming

Måns Magnusson

Code Description
1xx Information from server
2xx Yay! Gimme' data!
3xx Redirections
4xx You failed
5xx Server failed

Advanced R Programming

Måns Magnusson

Common API formats

Advanced R Programming

Måns Magnusson

JavaScript Object Notation (JSON)

Think of named lists in R

R Packages: RJSONIO, rjsonlite 

Extensible Markup Language (XML)

Older format (using nodes)

xpath

R Packages: XML 

JSON

Advanced R Programming

Måns Magnusson

{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
     "address": {
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": "10021"
     },
     "phoneNumber": [
         { "type": "home", "number": "212 555-1234" },
         { "type": "fax", "number": "646 555-4567" }
     ],
     "newSubscription": false,
     "companyName": null
 }

XML

Advanced R Programming

Måns Magnusson

<?xml version="1.0" encoding="utf-8"?>
<wikimedia>
  <projects>
    <project name="Wikipedia" launch="2001-01-05">
      <editions>
        <edition language="English">en.wikipedia.org</edition>
        <edition language="German">de.wikipedia.org</edition>
        <edition language="French">fr.wikipedia.org</edition>
        <edition language="Polish">pl.wikipedia.org</edition>
        <edition language="Spanish">es.wikipedia.org</edition>
      </editions>
    </project>
    <project name="Wiktionary" launch="2002-12-12">
      <editions>
        <edition language="English">en.wiktionary.org</edition>
        <edition language="French">fr.wiktionary.org</edition>
        <edition language="Vietnamese">vi.wiktionary.org</edition>
        <edition language="Turkish">tr.wiktionary.org</edition>
        <edition language="Spanish">es.wiktionary.org</edition>
      </editions>
    </project>
  </projects>
</wikimedia>

web scraping

using rvest

web scraping

Advanced R Programming

Måns Magnusson

Unstructured https data

 

Often HTML format

 

Spiders / scraping / web crawlers

 

Basics behind search engines

HTML

Advanced R Programming

Måns Magnusson

<!DOCTYPE html>
<html>
  <head>
    <title>This is a title</title>
  </head>
  <body>
    <p>Hello world!</p>
  </body>
</html>

(har)rvest

Advanced R Programming

Måns Magnusson

Simplify spider activity

Download data

Parse data

Follow links

Fill out forms

Store crawling history

Advanced R Programming

Måns Magnusson

Scraping is fragile

Difficulties and bad spiders

www.domain.se/robot.txt

robot traps

javascript

delays

Politeness

Shiny

Shiny?

Advanced R Programming

Måns Magnusson

Interactive dashboards made easy

online or local

R as "backend"

Shiny?

Advanced R Programming

Måns Magnusson

How it works

Advanced R Programming

Måns Magnusson

Application

Reactive

MyAppName/server.R
MyAppName/ui.R

modify using HTML

server.R define working directory

How it works

Advanced R Programming

Måns Magnusson

library(shiny)
# Examples with code
runExample("01_hello")
runExample("03_reactivity")

Publish

Advanced R Programming

Måns Magnusson

locally

zip-file in cloud

github (see runGithub() )

your own server

shinyapps.io

Relational Databases

Relational databases

Advanced R Programming

Måns Magnusson

Structured datasbase in tables

local or online

query language for I/O

effective for big data

difficult to design

Advanced R Programming

Måns Magnusson

Advanced R Programming

Måns Magnusson

A good database

Advanced R Programming

Måns Magnusson

"Normal forms"

No duplicates

No redundancies

Easy to query

Easy to update

Can be difficult to design

Using R

Advanced R Programming

Måns Magnusson

Database system R package
ODBC (Microsoft Access) RODBC
PostgreSQL RPostgresql
Oracle ROracle
MySQL RMySql
MongoDB rmongodb

Advanced R - Lecture 5

By monsmagn

Advanced R - Lecture 5

Lecture 5 in the course Advanced R programming at Linköping University.

  • 1,594