用AWS Lambda 爬數據視覺化

 

 

 

 

 

 

 

Data Scrapping & Visualization on Facebook Reactions

using AWS Lambda

fbreactions.io

港台媒體FB心情數據分析

This is for you - Subtitles in English

香港🇭🇰來的vincent(不講也聽得出來)

怕大家聽不懂已經準備好字幕😅

fbreactions.io

-  用AWS Lambda
爬數據視覺化

4

1

3

2

1. What to build 2.Pbm
3. AWS Lambda 4. others

fbreactions.io

-  用AWS Lambda
爬數據視覺化

4

1

3

2

API Released

for FB Reactions

(@F8, May, 2016)

抓高流量媒體FB數據

Scrapping Facebook Pages of Media

 

蘋果日報    東森新聞 ETtoday新聞雲    自由時報    快點報報    The News Lens 關鍵評論網    udn.com 聯合新聞網

 

蘋果日報    South China Morning Post    香港獨立媒體網
熱血時報    明報即時新聞    信報財經新聞
香港經濟日報 am730    香港01    頭條日報  852郵報   都市日報

算流量基本上都是蘋果

台灣

香港

時事情緒指數

香港 vs 台灣

an Index on HK vs TW's Emotion

其實沒有時間弄Slides

直接DEMO好了

Live DEMO

發現 Findings

台灣心情比香港好?

台灣人不太用心情功能?


Taiwan has a better mood?

Taiwan folks don't use reaction much?

 

 

 

理論 Theory 1

認真

台灣真的是個好地方

TW is a Good place!

理論 Theory 2

認真

香港特首是FB臥底

HK Chief Executive is FB's SPY

 

題外話:他年薪1.7千萬台幣

理論 Theory 2.1

Some media use it for poll

FB台灣也在香港發功...

香港用來投票了

 

 

 

Source 立場新聞

理論 Theory 3

好像太認真?

是時候反省我們的媒體?

Time to reflect on our media?

fbreactions.io

-  用AWS Lambda
爬數據視覺化

4

1

3

2

爬數據 Scrapping 

挑戰                  Challenges

FB API as bottleneck

 

Even scraping for all pages' latest posts

 

separate request for /reactions by Post Id

 

No /_changes feed nor time-cursor at FB API

 

Effective scraping: less for old posts

 

架構  Architecture

Post Ids

/page

/reactions

Aggregated

Reactions by Post

analyzed

result

Redis

API

Text

count_store

analyze

fbreactions.io

-  用AWS Lambda
爬數據視覺化

4

1

3

2

Serverless Architecture

             - with 
            
                AWS Lambda
            
        

背景、動機

Why Serverless?

            
                modularized
            
        

Modularized

模組化

Atomic Unit

of a running script

            
                
                    Logic
                
            
        
            
                
                    Execution
                
            
        

API Gateway

AWS Lambda

功能 Features

  1. Python / NodeJS / Java8 runtime
     
  2. 🕒 Cron like execution
    (CloudWatch events)
  3. Pay-as-you-go
     
  4. Can access VPC Finally
     
  5. Auto Scalling
     
  6. Latency: so far so good

When to use 💚

👍 Stateless

👍 Short Job

👍 Event-driven stuff, e.g. Time-based / webhook

👍 Spike Patterns of usage

👍 Simple program with less dependency

👍 ad-hoc scripts with need to modify and version

👍 Need Real quick set up, quicker than Knubertes

 

When NOT to use 💔

🙅 Stateful e.g. Session servers

🙅 Long running (=>$$)

🙅 Cache is important -> can use external elasticCache

🙅 High CPU/Memory Usage (Max 1.5GB per process)

🙅 Complex gcc / machine drivers etc

🙅 High frequency self-trigger (max @1min event now) -> Use SNS/Celery!

🙅 Need DAG Job dependencies   -> Use airflow!

🙅 Strong fault tolerance /monitor req -> Use Celery / Gearman!

🙅 Layman stuff -> Use Zappier / IFTTT!

 

 

🙋使用案例 Use Cases

  • Simple API, Slackbot etc..

  • Auth

  • Scraping

  • IoT

  • CI - LambCI
  • Point to point Data Sync among services (continuous / ad-hoc)

    • E.g. App, Logs  -> Salesforce / ReportingDB

  • Marshalling / Aggregation for Analytics

    • Integrate with Amazon Kinesis

 痛點 yet some other pain points

  • VPC 設定 -- Overhead to setup proper VPC subnet
  • 需要打包程序  --  Tedious to package
  • 部分程式庫不能用  -- Some modules fail in runtime
    • (Python) Libraries like Pg8000 (Postgres driver) tested well on local/ec2 but failed
      • hard to reproduce the environment
      • Just give up and switch lib
        Old school Cloudwatch events and loggings
  • AWS UI 🤔

Apex /      

 

 

 

        serverless/serverless

to the rescue 🙌

fbreactions.io

- 用AWS Lambda
爬數據視覺化

4

1

3

2

使用動機

Why AWS Lambda

Simple Enough

 

Stateless => Stable

 

Easy to Scale up / Down

 

Easy to backfilling

 

Limited Capacity Required

 

dumb API

成本 Cost Structure

(memory type * exe time,
roughly 3M * 1s call <$20 USD)

其他服務  Other Choices

Check it out before you use AWS lambda

基本上比AWS Lambda好用....(只支持Node.JS)

坑 More TODOs

  • 2D clustering and visualization

  • Tune for better API throughput

  • Real-time streaming

(and bug fixes)

男女都可以,認真

徵設計師

Open source,  looking for designer

Terminal 1

Data Science for Recruitment

徵才中 We're Hiring

 JS Developer | Designer

投影片 Slides: bit.ly/fbreactionsio

Email: lauchunyin@gmail.com

(Extra Slides)

I'm not a graphic designer....

Design

Telling story with the 

Emojis

 

 

 

 

 

 

X over-interpretation

X  reason about detailed figures

Inspired by:

 

 

http://emojitracker.com/

d3.Js Force Layout

simple rules but beautiful emergent behaviour

Mike Bostock

caveat: 

deal with boundaries and indifferent size

animations has huge effect

Need of

Conceptual Model

Association

Identify the relationship

(even when drunk)

Needa..



highlight the story


start a conversation

Responsive is hard...

especially at Data Viz...

Hard to transpose

the table

Even Better: blur to suggest scrollable

d3.Js is Scalable Vector Graphics (SVG)

fbreactions.io - 用AWS Lambda 爬數據視覺化

By Chun Yin Vincent Lau

fbreactions.io - 用AWS Lambda 爬數據視覺化

  • 3,826