Data Scrapping & Visualization on Facebook Reactions
using AWS Lambda
香港🇭🇰來的vincent(不講也聽得出來)
怕大家聽不懂已經準備好字幕😅
4
1
3
2
4
1
3
2
(@F8, May, 2016)
蘋果日報 東森新聞 ETtoday新聞雲 自由時報 快點報報 The News Lens 關鍵評論網 udn.com 聯合新聞網
蘋果日報 South China Morning Post 香港獨立媒體網
熱血時報 明報即時新聞 信報財經新聞
香港經濟日報 am730 香港01 頭條日報 852郵報 都市日報
算流量基本上都是蘋果
台灣
香港
台灣心情比香港好?
台灣人不太用心情功能?
題外話:他年薪1.7千萬台幣
FB台灣也在香港發功...
4
1
3
2
挑戰 Challenges
FB API as bottleneck
Even scraping for all pages' latest posts
separate request for /reactions by Post Id
No /_changes feed nor time-cursor at FB API
Effective scraping: less for old posts
Post Ids
/page
/reactions
Aggregated
Reactions by Post
analyzed
result
Redis
API
Text
count_store
analyze
4
1
3
2
- with
AWS Lambda
modularized
Logic
Execution
👍 Stateless
👍 Short Job
👍 Event-driven stuff, e.g. Time-based / webhook
👍 Spike Patterns of usage
👍 Simple program with less dependency
👍 ad-hoc scripts with need to modify and version
👍 Need Real quick set up, quicker than Knubertes
🙅 Stateful e.g. Session servers
🙅 Long running (=>$$)
🙅 Cache is important -> can use external elasticCache
🙅 High CPU/Memory Usage (Max 1.5GB per process)
🙅 Complex gcc / machine drivers etc
🙅 High frequency self-trigger (max @1min event now) -> Use SNS/Celery!
🙅 Need DAG Job dependencies -> Use airflow!
🙅 Strong fault tolerance /monitor req -> Use Celery / Gearman!
🙅 Layman stuff -> Use Zappier / IFTTT!
Simple API, Slackbot etc..
Auth
Scraping
IoT
Point to point Data Sync among services (continuous / ad-hoc)
E.g. App, Logs -> Salesforce / ReportingDB
Marshalling / Aggregation for Analytics
Integrate with Amazon Kinesis
4
1
3
2
Simple Enough
Stateless => Stable
Easy to Scale up / Down
Easy to backfilling
Limited Capacity Required
dumb API
(memory type * exe time,
roughly 3M * 1s call <$20 USD)
基本上比AWS Lambda好用....(只支持Node.JS)
2D clustering and visualization
Tune for better API throughput
Real-time streaming
(and bug fixes)
Data Science for Recruitment
徵才中 We're Hiring
JS Developer | Designer
投影片 Slides: bit.ly/fbreactionsio
Email: lauchunyin@gmail.com
X over-interpretation
X reason about detailed figures
Inspired by:
http://emojitracker.com/
simple rules but beautiful emergent behaviour
deal with boundaries and indifferent size
Identify the relationship
(even when drunk)
especially at Data Viz...
Hard to transpose
the table
Hack: overflow-x:scroll;
Even Better: blur to suggest scrollable
d3.Js is Scalable Vector Graphics (SVG)