用AWS Lambda 爬數據視覺化
Data Scrapping & Visualization on Facebook Reactions
using AWS Lambda
fbreactions.io
港台媒體FB心情數據分析
This is for you - Subtitles in English
香港🇭🇰來的vincent(不講也聽得出來)
怕大家聽不懂已經準備好字幕😅
fbreactions.io
- 用AWS Lambda
爬數據視覺化
4
1
3
2
1. What to build 2.Pbm
3. AWS Lambda 4. others
fbreactions.io
- 用AWS Lambda
爬數據視覺化
4
1
3
2
API Released
for FB Reactions
(@F8, May, 2016)
抓高流量媒體FB數據
Scrapping Facebook Pages of Media
蘋果日報 東森新聞 ETtoday新聞雲 自由時報 快點報報 The News Lens 關鍵評論網 udn.com 聯合新聞網
蘋果日報 South China Morning Post 香港獨立媒體網
熱血時報 明報即時新聞 信報財經新聞
香港經濟日報 am730 香港01 頭條日報 852郵報 都市日報
算流量基本上都是蘋果
台灣
香港
時事情緒指數
香港 vs 台灣
an Index on HK vs TW's Emotion
其實沒有時間弄Slides
直接DEMO好了
Live DEMO
發現 Findings
台灣心情比香港好?
台灣人不太用心情功能?
Taiwan has a better mood?
Taiwan folks don't use reaction much?
理論 Theory 1
認真
台灣真的是個好地方
TW is a Good place!
理論 Theory 2
認真
香港特首是FB臥底
HK Chief Executive is FB's SPY
題外話:他年薪1.7千萬台幣
理論 Theory 2.1
Some media use it for poll
FB台灣也在香港發功...
理論 Theory 3
好像太認真?
是時候反省我們的媒體?
Time to reflect on our media?
fbreactions.io
- 用AWS Lambda
爬數據視覺化
4
1
3
2
爬數據 Scrapping
挑戰 Challenges
FB API as bottleneck
Even scraping for all pages' latest posts
separate request for /reactions by Post Id
No /_changes feed nor time-cursor at FB API
Effective scraping: less for old posts
架構 Architecture
Post Ids
/page
/reactions
Aggregated
Reactions by Post
analyzed
result
Redis
API
Text
count_store
analyze
fbreactions.io
- 用AWS Lambda
爬數據視覺化
4
1
3
2
Serverless Architecture
- with
AWS Lambda
背景、動機
Why Serverless?
modularized
Modularized
模組化
Atomic Unit
of a running script
Logic
Execution
API Gateway
AWS Lambda
功能 Features
- Python / NodeJS / Java8 runtime
- 🕒 Cron like execution
(CloudWatch events) - Pay-as-you-go
- Can access VPC Finally
- Auto Scalling
- Latency: so far so good
When to use 💚
👍 Stateless
👍 Short Job
👍 Event-driven stuff, e.g. Time-based / webhook
👍 Spike Patterns of usage
👍 Simple program with less dependency
👍 ad-hoc scripts with need to modify and version
👍 Need Real quick set up, quicker than Knubertes
When NOT to use 💔
🙅 Stateful e.g. Session servers
🙅 Long running (=>$$)
🙅 Cache is important -> can use external elasticCache
🙅 High CPU/Memory Usage (Max 1.5GB per process)
🙅 Complex gcc / machine drivers etc
🙅 High frequency self-trigger (max @1min event now) -> Use SNS/Celery!
🙅 Need DAG Job dependencies -> Use airflow!
🙅 Strong fault tolerance /monitor req -> Use Celery / Gearman!
🙅 Layman stuff -> Use Zappier / IFTTT!
🙋使用案例 Use Cases
-
Simple API, Slackbot etc..
-
Auth
-
Scraping
-
IoT
- CI - LambCI
-
Point to point Data Sync among services (continuous / ad-hoc)
-
E.g. App, Logs -> Salesforce / ReportingDB
-
-
Marshalling / Aggregation for Analytics
-
Integrate with Amazon Kinesis
-
痛點 yet some other pain points
- VPC 設定 -- Overhead to setup proper VPC subnet
- 需要打包程序 -- Tedious to package
- 部分程式庫不能用 -- Some modules fail in runtime
- (Python) Libraries like Pg8000 (Postgres driver) tested well on local/ec2 but failed
- hard to reproduce the environment
- Just give up and switch lib
Old school Cloudwatch events and loggings
- (Python) Libraries like Pg8000 (Postgres driver) tested well on local/ec2 but failed
- AWS UI 🤔
to the rescue 🙌
fbreactions.io
- 用AWS Lambda
爬數據視覺化
4
1
3
2
使用動機
Why AWS Lambda
Simple Enough
Stateless => Stable
Easy to Scale up / Down
Easy to backfilling
Limited Capacity Required
dumb API
成本 Cost Structure
(memory type * exe time,
roughly 3M * 1s call <$20 USD)
其他服務 Other Choices
Check it out before you use AWS lambda
基本上比AWS Lambda好用....(只支持Node.JS)
坑 More TODOs
-
2D clustering and visualization
-
Tune for better API throughput
-
Real-time streaming
(and bug fixes)
男女都可以,認真
徵設計師
Open source, looking for designer
Terminal 1
Data Science for Recruitment
徵才中 We're Hiring
JS Developer | Designer
投影片 Slides: bit.ly/fbreactionsio
Email: lauchunyin@gmail.com
(Extra Slides)
I'm not a graphic designer....
Design
Telling story with the
Emojis
X over-interpretation
X reason about detailed figures
Inspired by:
http://emojitracker.com/
d3.Js Force Layout
simple rules but beautiful emergent behaviour
caveat:
deal with boundaries and indifferent size
animations has huge effect
Need of
Conceptual Model
Association
Identify the relationship
(even when drunk)
Needa..
highlight the story
start a conversation
Responsive is hard...
especially at Data Viz...
Hard to transpose
the table
Hack: overflow-x:scroll;
Even Better: blur to suggest scrollable
d3.Js is Scalable Vector Graphics (SVG)
fbreactions.io - 用AWS Lambda 爬數據視覺化
By Chun Yin Vincent Lau
fbreactions.io - 用AWS Lambda 爬數據視覺化
- 3,826