Måns Magnusson
Statistics and Machine learning
Department of computer and information science
Advanced R Programming
Måns Magnusson
Input
Output
Format and localization
Advanced R Programming
Måns Magnusson
csv files
excel files
documents
sas, spss, R, ... files
html
json
xml
SQL
MongoDB
data files (see input)
reports (dyn)
update DB
html
graphs
"Shiny"
maps
maps
More?
pictures
pictures
Advanced R Programming
Måns Magnusson
own computer
local network
local database
cloud storage
web pages
web scraping
web APIs
remote database
Advanced R Programming
Måns Magnusson
Files on your computer
# Input simple data
read.table()
read.csv()
read.csv2()
load()
# Output simple data
write.table()
write.csv()
write.csv2()
save()
Advanced R Programming
Måns Magnusson
software/data | package |
---|---|
Excel | XLConnect |
SAS, SPSS, STATA, ... | foreign |
XML | xml |
JSON (GeoJSON) | rjsonio, RJSON |
Documents | tm |
Maps | sp |
Images | raster |
Advanced R Programming
Måns Magnusson
(
)
(
)
Advanced R Programming
Måns Magnusson
Robust
Backups
Cloud computing
... but how about safety?
... and can be tricky in the beginning
Advanced R Programming
Måns Magnusson
Arbitrary data
Structured data
Advanced R Programming
Måns Magnusson
Remote | package |
---|---|
General | downloader |
GitHub | repmis, downloader |
Dropbox | rdrop2 |
Amazon | RAmazonS3 |
Google Docs | googlesheets |
Advanced R Programming
Måns Magnusson
application program interface using http
"contract to 'get data' online"
more and more common
examples:
github
Riksdagen
Statistics Sweden
RESTful API:s
Advanced R Programming
Måns Magnusson
Basic principles:
Data is returned (JSON / XML)
Each specific data has its own URI
Communication is based on HTTP verbs
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
Verb | Description |
---|---|
GET | Get "data" from server. |
POST | Post "data" to server (to get something) |
PUT | Update "data" on server |
DELETE | Delete posted data on server |
Advanced R Programming
Måns Magnusson
Code | Description |
---|---|
1xx | Information from server |
2xx | Yay! Gimme' data! |
3xx | Redirections |
4xx | You failed |
5xx | Server failed |
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
JavaScript Object Notation (JSON)
Think of named lists in R
R Packages: RJSONIO, rjsonlite
Extensible Markup Language (XML)
Older format (using nodes)
xpath
R Packages: XML
Advanced R Programming
Måns Magnusson
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "fax", "number": "646 555-4567" }
],
"newSubscription": false,
"companyName": null
}
Advanced R Programming
Måns Magnusson
<?xml version="1.0" encoding="utf-8"?>
<wikimedia>
<projects>
<project name="Wikipedia" launch="2001-01-05">
<editions>
<edition language="English">en.wikipedia.org</edition>
<edition language="German">de.wikipedia.org</edition>
<edition language="French">fr.wikipedia.org</edition>
<edition language="Polish">pl.wikipedia.org</edition>
<edition language="Spanish">es.wikipedia.org</edition>
</editions>
</project>
<project name="Wiktionary" launch="2002-12-12">
<editions>
<edition language="English">en.wiktionary.org</edition>
<edition language="French">fr.wiktionary.org</edition>
<edition language="Vietnamese">vi.wiktionary.org</edition>
<edition language="Turkish">tr.wiktionary.org</edition>
<edition language="Spanish">es.wiktionary.org</edition>
</editions>
</project>
</projects>
</wikimedia>
Advanced R Programming
Måns Magnusson
Unstructured https data
Often HTML format
Spiders / scraping / web crawlers
Basics behind search engines
Advanced R Programming
Måns Magnusson
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<p>Hello world!</p>
</body>
</html>
Advanced R Programming
Måns Magnusson
Simplify spider activity
Download data
Parse data
Follow links
Fill out forms
Store crawling history
Advanced R Programming
Måns Magnusson
Scraping is fragile
www.domain.se/robot.txt
robot traps
javascript
delays
Politeness
Advanced R Programming
Måns Magnusson
Interactive dashboards made easy
online or local
R as "backend"
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
Application
Reactive
MyAppName/server.R
MyAppName/ui.R
modify using HTML
server.R define working directory
Advanced R Programming
Måns Magnusson
library(shiny)
# Examples with code
runExample("01_hello")
runExample("03_reactivity")
Advanced R Programming
Måns Magnusson
locally
zip-file in cloud
github (see runGithub() )
your own server
shinyapps.io
Advanced R Programming
Måns Magnusson
Structured datasbase in tables
local or online
query language for I/O
effective for big data
difficult to design
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
"Normal forms"
No duplicates
No redundancies
Easy to query
Easy to update
Can be difficult to design
Advanced R Programming
Måns Magnusson
Database system | R package |
---|---|
ODBC (Microsoft Access) | RODBC |
PostgreSQL | RPostgresql |
Oracle | ROracle |
MySQL | RMySql |
MongoDB | rmongodb |