Chien-Hsun Chen
Drink Coffee in Whitehorse
打造一個平台,在對的時間, 對的版位, 將對的廣告,送到正確的人面前
tagtoo
目標是要創造 媒體, 廣告主, 使用者的三贏
要做什麼
大量的資料
傳統作法
process server
傳統作法
process server
傳統作法
Task 具有關聯性
filter_user_log(input="raw_log", "user_log")
aggregate_user_log(input="user_log", output="user_cursor")
analytics_user_log(input="user_cursor", output="user_info")
Task 需要有錯誤處理
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能避免重複執行
if not exists("user_cursor"):
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能用 shell 傳入參數
def run(date):
input_file = "user_log_{}".format(date)
output_file = "user_cursor_{}".format(date)
if not exists("user_cursor"):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove("user_cursor")
import clime.now
反覆
def run(date):
input_file = "raw_log_{}".format(date)
output_file = "user_log_{}".format(date)
if not exists("user_cursor"):
try:
filter_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
input_file=output_file
output_file="user_cursor_{}".format(date)
if not exists(output_file):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
.....
import clime.now
Luigi
A Python framwork for data flow definition and execution
Luigi dataflow
import luigi
class MyTask(luigi.Task):
input_param1 = luigi.Parameter()
...
def output(self):
...
def requires(self):
...
def run(self):
...
if __name__ == '__main__':
luigi.run()
Luigi Task
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
Luigi analytics
$ python test2.py TagtooUserAnalytics --date=2015-06-01
Run on the command line
task manager
task manager
task manager
task manager
task manager
task manager
Task
Task
define project
define task
define trigger
run
trigger 被觸發後
job
job
By Chien-Hsun Chen