Advanced
programming
Lecture 5


Måns Magnusson
Statistics and Machine learning
Department of computer and information science
Since last time?

Input and output

Input and output

Advanced R Programming
Måns Magnusson
Input
Output
Format and localization
"Formats"

Advanced R Programming
Måns Magnusson
csv files
excel files
documents
sas, spss, R, ... files
html
json
xml
SQL
MongoDB
data files (see input)
reports (dyn)
update DB
html
graphs
"Shiny"
maps
maps
More?
pictures
pictures
Localization

Advanced R Programming
Måns Magnusson
own computer
local network
local database
cloud storage
web pages
web scraping
web APIs
remote database
Basic I/O

Basic I/O

Advanced R Programming
Måns Magnusson
Files on your computer
# Input simple data
read.table()
read.csv()
read.csv2()
load()
# Output simple data
write.table()
write.csv()
write.csv2()
save()
More complex formats

Advanced R Programming
Måns Magnusson
software/data | package |
---|---|
Excel | XLConnect |
SAS, SPSS, STATA, ... | foreign |
XML | xml |
JSON (GeoJSON) | rjsonio, RJSON |
Documents | tm |
Maps | sp |
Images | raster |
Cloud storage

Cloud storage

Advanced R Programming
Måns Magnusson
(
)
(
)
Why?

Advanced R Programming
Måns Magnusson
Robust
Backups
Cloud computing
... but how about safety?
... and can be tricky in the beginning

Advanced R Programming
Måns Magnusson



Arbitrary data
Structured data
Packages

Advanced R Programming
Måns Magnusson
Remote | package |
---|---|
General | downloader |
GitHub | repmis, downloader |
Dropbox | rdrop2 |
Amazon | RAmazonS3 |
Google Docs | googlesheets |
web API:s
using httr

web API:s

Advanced R Programming
Måns Magnusson
application program interface using http
"contract to 'get data' online"
more and more common
examples:
github
Riksdagen
Statistics Sweden
RESTful API:s
RESTful

Advanced R Programming
Måns Magnusson
Basic principles:
Data is returned (JSON / XML)
Each specific data has its own URI
Communication is based on HTTP verbs
Hypertext Transfer Protocol (http)

Advanced R Programming
Måns Magnusson
Hypertext Transfer Protocol (http)

Advanced R Programming
Måns Magnusson
Verbs

Advanced R Programming
Måns Magnusson
Verb | Description |
---|---|
GET | Get "data" from server. |
POST | Post "data" to server (to get something) |
PUT | Update "data" on server |
DELETE | Delete posted data on server |
Status codes

Advanced R Programming
Måns Magnusson
Code | Description |
---|---|
1xx | Information from server |
2xx | Yay! Gimme' data! |
3xx | Redirections |
4xx | You failed |
5xx | Server failed |

Advanced R Programming
Måns Magnusson
Common API formats

Advanced R Programming
Måns Magnusson
JavaScript Object Notation (JSON)
Think of named lists in R
R Packages: RJSONIO, rjsonlite
Extensible Markup Language (XML)
Older format (using nodes)
xpath
R Packages: XML
JSON

Advanced R Programming
Måns Magnusson
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "fax", "number": "646 555-4567" }
],
"newSubscription": false,
"companyName": null
}
XML

Advanced R Programming
Måns Magnusson
<?xml version="1.0" encoding="utf-8"?>
<wikimedia>
<projects>
<project name="Wikipedia" launch="2001-01-05">
<editions>
<edition language="English">en.wikipedia.org</edition>
<edition language="German">de.wikipedia.org</edition>
<edition language="French">fr.wikipedia.org</edition>
<edition language="Polish">pl.wikipedia.org</edition>
<edition language="Spanish">es.wikipedia.org</edition>
</editions>
</project>
<project name="Wiktionary" launch="2002-12-12">
<editions>
<edition language="English">en.wiktionary.org</edition>
<edition language="French">fr.wiktionary.org</edition>
<edition language="Vietnamese">vi.wiktionary.org</edition>
<edition language="Turkish">tr.wiktionary.org</edition>
<edition language="Spanish">es.wiktionary.org</edition>
</editions>
</project>
</projects>
</wikimedia>
web scraping
using rvest

web scraping

Advanced R Programming
Måns Magnusson
Unstructured https data
Often HTML format
Spiders / scraping / web crawlers
Basics behind search engines
HTML

Advanced R Programming
Måns Magnusson
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<p>Hello world!</p>
</body>
</html>
(har)rvest

Advanced R Programming
Måns Magnusson
Simplify spider activity
Download data
Parse data

Follow links
Fill out forms
Store crawling history

Advanced R Programming
Måns Magnusson

Scraping is fragile
Difficulties and bad spiders
www.domain.se/robot.txt
robot traps
javascript
delays
Politeness
Shiny

Shiny?

Advanced R Programming
Måns Magnusson
Interactive dashboards made easy

online or local
R as "backend"
Shiny?

Advanced R Programming
Måns Magnusson
How it works

Advanced R Programming
Måns Magnusson
Application
Reactive
MyAppName/server.R
MyAppName/ui.R
modify using HTML
server.R define working directory
How it works

Advanced R Programming
Måns Magnusson
library(shiny)
# Examples with code
runExample("01_hello")
runExample("03_reactivity")
Publish

Advanced R Programming
Måns Magnusson
locally
zip-file in cloud
github (see runGithub() )
your own server
shinyapps.io
Relational Databases

Relational databases

Advanced R Programming
Måns Magnusson
Structured datasbase in tables
local or online
query language for I/O
effective for big data
difficult to design

Advanced R Programming
Måns Magnusson


Advanced R Programming
Måns Magnusson

A good database

Advanced R Programming
Måns Magnusson
"Normal forms"
No duplicates
No redundancies
Easy to query
Easy to update
Can be difficult to design
Using R

Advanced R Programming
Måns Magnusson
Database system | R package |
---|---|
ODBC (Microsoft Access) | RODBC |
PostgreSQL | RPostgresql |
Oracle | ROracle |
MySQL | RMySql |
MongoDB | rmongodb |
Advanced R - Lecture 5
By monsmagn
Advanced R - Lecture 5
Lecture 5 in the course Advanced R programming at Linköping University.
- 1,766