task manager
background
打造一個平台,在對的時間, 對的版位, 將對的廣告,送到正確的人面前
tagtoo
目標是要創造 媒體, 廣告主, 使用者的三贏
background
- a/b test
- 使用者分析
- 客製化網路廣告投放
- 報表處理
要做什麼
- 近億筆的log資料(廣告曝光&使用者log)
- 使用者標籤, 商品內容, 媒體內文 分析
- 廣告狀態log
大量的資料
Task
Task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459645/_______012.png)
Example: 用戶分析
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459686/_______013.png)
Example: 用戶分析
傳統作法
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459725/_______016.png)
process server
Example: 用戶分析
傳統作法
process server
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459788/_______016.png)
Example: 用戶分析
傳統作法
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459760/_______019.png)
Task 具有關聯性
filter_user_log(input="raw_log", "user_log")
aggregate_user_log(input="user_log", output="user_cursor")
analytics_user_log(input="user_cursor", output="user_info")
Task 需要有錯誤處理
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能避免重複執行
if not exists("user_cursor"):
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能用 shell 傳入參數
def run(date):
input_file = "user_log_{}".format(date)
output_file = "user_cursor_{}".format(date)
if not exists("user_cursor"):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove("user_cursor")
import clime.now
反覆
def run(date):
input_file = "raw_log_{}".format(date)
output_file = "user_log_{}".format(date)
if not exists("user_cursor"):
try:
filter_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
input_file=output_file
output_file="user_cursor_{}".format(date)
if not exists(output_file):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
.....
import clime.now
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1459953/wximg.jpg)
Luigi
A Python framwork for data flow definition and execution
Luigi dataflow
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460030/_______024.png)
import luigi
class MyTask(luigi.Task):
input_param1 = luigi.Parameter()
...
def output(self):
...
def requires(self):
...
def run(self):
...
if __name__ == '__main__':
luigi.run()
Luigi Task
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
Luigi analytics
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460056/_______027.png)
$ python test2.py TagtooUserAnalytics --date=2015-06-01
Run on the command line
luigid
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460087/_______028.png)
luigid
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460147/user_recs.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1419661/docker.jpg)
Task
+
+
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1431329/server.png)
Task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1417200/complex_t.png)
Task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1416971/config.png)
重新定義 Task
+
+
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429902/bash.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1419792/Flowchart__6_.png)
重新定義 Task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1417147/taskmanager.png)
重新定義 Task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1430768/Flowchart__8_.png)
需求
- 更容易的設定
task manager
需求
- 更容易的設定
- travis 的 log view
task manager
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460141/_______029.png)
需求
- 更容易的設定
- travis 的 log view
- luigi 的 task dependence
task manager
需求
- 更容易的設定
- travis 的 log view
- luigi 的 task dependence
- luigi 類似的 task graph view
task manager
需求
- 更容易的設定
- travis 的 log view
- luigi 的 task dependence
- luigi 類似的 task graph view
- 還要能夠web 啟動
task manager
需求
- 更容易的設定
- travis 的 log view
- luigi 的 task dependence
- luigi 類似的 task graph view
- 還要能夠web 觸發
- 結合 CI/CD 管理
task manager
設計
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460744/_______047.png)
設計
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460741/_______045.png)
Task
設計
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460735/_______044.png)
Task
task manager
define project
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460172/_______033.png)
task manager
define task
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460195/_______035.png)
task manager
define trigger
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460212/_______037.png)
task manager
run
trigger 被觸發後
task manager
job
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460221/_______038.png)
task manager
job
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1460299/_______040.png)
summary
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1421154/Information_Systems_Help_Desk__2_.png)
summary
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1421123/Information_Systems_Help_Desk.png)
summary
![](https://s3.amazonaws.com/media-p.slid.es/uploads/141968/images/1429493/Wireframe__1_.png)
task manager
By georgefs
task manager
- 2,359