Advanced
programming
Lecture 5
Måns Magnusson
Statistics and Machine learning
Department of computer and information science
Since last time?
Input and output
Input and output
Advanced R Programming
Måns Magnusson
Input
Output
Format and localization
"Formats"
Advanced R Programming
Måns Magnusson
csv files
excel files
documents
sas, spss, R, ... files
html
json
xml
SQL
MongoDB
data files (see input)
reports (dyn)
update DB
html
graphs
"Shiny"
maps
maps
More?
pictures
pictures
Localization
Advanced R Programming
Måns Magnusson
own computer
local network
local database
cloud storage
web pages
web scraping
web APIs
remote database
Basic I/O
Basic I/O
Advanced R Programming
Måns Magnusson
Files on your computer
# Input simple data
read.table()
read.csv()
read.csv2()
load()
# Output simple data
write.table()
write.csv()
write.csv2()
save()
More complex formats
Advanced R Programming
Måns Magnusson
software/data | package |
---|---|
Excel | XLConnect |
SAS, SPSS, STATA, ... | foreign |
XML | xml |
JSON (GeoJSON) | rjsonio, RJSON |
Documents | tm |
Maps | sp |
Images | raster |
Cloud storage
Cloud storage
Advanced R Programming
Måns Magnusson
(
)
(
)
Why?
Advanced R Programming
Måns Magnusson
Robust
Backups
Cloud computing
... but how about safety?
... and can be tricky in the beginning
Advanced R Programming
Måns Magnusson
Arbitrary data
Structured data
Packages
Advanced R Programming
Måns Magnusson
Remote | package |
---|---|
General | downloader |
GitHub | repmis, downloader |
Dropbox | rdrop2 |
Amazon | RAmazonS3 |
Google Docs | googlesheets |
web API:s
using httr
web API:s
Advanced R Programming
Måns Magnusson
application program interface using http
"contract to 'get data' online"
more and more common
examples:
github
Riksdagen
Statistics Sweden
RESTful API:s
RESTful
Advanced R Programming
Måns Magnusson
Basic principles:
Data is returned (JSON / XML)
Each specific data has its own URI
Communication is based on HTTP verbs
Hypertext Transfer Protocol (http)
Advanced R Programming
Måns Magnusson
Hypertext Transfer Protocol (http)
Advanced R Programming
Måns Magnusson
Verbs
Advanced R Programming
Måns Magnusson
Verb | Description |
---|---|
GET | Get "data" from server. |
POST | Post "data" to server (to get something) |
PUT | Update "data" on server |
DELETE | Delete posted data on server |
Status codes
Advanced R Programming
Måns Magnusson
Code | Description |
---|---|
1xx | Information from server |
2xx | Yay! Gimme' data! |
3xx | Redirections |
4xx | You failed |
5xx | Server failed |
Advanced R Programming
Måns Magnusson
Common API formats
Advanced R Programming
Måns Magnusson
JavaScript Object Notation (JSON)
Think of named lists in R
R Packages: RJSONIO, rjsonlite
Extensible Markup Language (XML)
Older format (using nodes)
xpath
R Packages: XML
JSON
Advanced R Programming
Måns Magnusson
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "fax", "number": "646 555-4567" }
],
"newSubscription": false,
"companyName": null
}
XML
Advanced R Programming
Måns Magnusson
<?xml version="1.0" encoding="utf-8"?>
<wikimedia>
<projects>
<project name="Wikipedia" launch="2001-01-05">
<editions>
<edition language="English">en.wikipedia.org</edition>
<edition language="German">de.wikipedia.org</edition>
<edition language="French">fr.wikipedia.org</edition>
<edition language="Polish">pl.wikipedia.org</edition>
<edition language="Spanish">es.wikipedia.org</edition>
</editions>
</project>
<project name="Wiktionary" launch="2002-12-12">
<editions>
<edition language="English">en.wiktionary.org</edition>
<edition language="French">fr.wiktionary.org</edition>
<edition language="Vietnamese">vi.wiktionary.org</edition>
<edition language="Turkish">tr.wiktionary.org</edition>
<edition language="Spanish">es.wiktionary.org</edition>
</editions>
</project>
</projects>
</wikimedia>
web scraping
using rvest
web scraping
Advanced R Programming
Måns Magnusson
Unstructured https data
Often HTML format
Spiders / scraping / web crawlers
Basics behind search engines
HTML
Advanced R Programming
Måns Magnusson
<!DOCTYPE html>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<p>Hello world!</p>
</body>
</html>
(har)rvest
Advanced R Programming
Måns Magnusson
Simplify spider activity
Download data
Parse data
Follow links
Fill out forms
Store crawling history
Advanced R Programming
Måns Magnusson
Scraping is fragile
Difficulties and bad spiders
www.domain.se/robot.txt
robot traps
javascript
delays
Politeness
Shiny
Shiny?
Advanced R Programming
Måns Magnusson
Interactive dashboards made easy
online or local
R as "backend"
Shiny?
Advanced R Programming
Måns Magnusson
How it works
Advanced R Programming
Måns Magnusson
Application
Reactive
MyAppName/server.R
MyAppName/ui.R
modify using HTML
server.R define working directory
How it works
Advanced R Programming
Måns Magnusson
library(shiny)
# Examples with code
runExample("01_hello")
runExample("03_reactivity")
Publish
Advanced R Programming
Måns Magnusson
locally
zip-file in cloud
github (see runGithub() )
your own server
shinyapps.io
Relational Databases
Relational databases
Advanced R Programming
Måns Magnusson
Structured datasbase in tables
local or online
query language for I/O
effective for big data
difficult to design
Advanced R Programming
Måns Magnusson
Advanced R Programming
Måns Magnusson
A good database
Advanced R Programming
Måns Magnusson
"Normal forms"
No duplicates
No redundancies
Easy to query
Easy to update
Can be difficult to design
Using R
Advanced R Programming
Måns Magnusson
Database system | R package |
---|---|
ODBC (Microsoft Access) | RODBC |
PostgreSQL | RPostgresql |
Oracle | ROracle |
MySQL | RMySql |
MongoDB | rmongodb |
Advanced R - Lecture 5
By monsmagn
Advanced R - Lecture 5
Lecture 5 in the course Advanced R programming at Linköping University.
- 1,594