打造一個平台,在對的時間, 對的版位, 將對的廣告,送到正確的人面前
tagtoo
目標是要創造 媒體, 廣告主, 使用者的三贏
要做什麼
大量的資料
傳統作法
process server
傳統作法
process server
傳統作法
Task 具有關聯性
filter_user_log(input="raw_log", "user_log")
aggregate_user_log(input="user_log", output="user_cursor")
analytics_user_log(input="user_cursor", output="user_info")
Task 需要有錯誤處理
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能避免重複執行
if not exists("user_cursor"):
try:
aggregate_user_log(input="user_log", output="user_cursor")
except Exception as e:
# process taskexception ...
remove("user_cursor")
Task 要能用 shell 傳入參數
def run(date):
input_file = "user_log_{}".format(date)
output_file = "user_cursor_{}".format(date)
if not exists("user_cursor"):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove("user_cursor")
import clime.now
反覆
def run(date):
input_file = "raw_log_{}".format(date)
output_file = "user_log_{}".format(date)
if not exists("user_cursor"):
try:
filter_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
input_file=output_file
output_file="user_cursor_{}".format(date)
if not exists(output_file):
try:
aggregate_user_log(input=input_file, output=output_file)
except Exception as e:
# process taskexception ...
remove(output_file)
.....
import clime.now
Luigi
A Python framwork for data flow definition and execution
Luigi dataflow
import luigi
class MyTask(luigi.Task):
input_param1 = luigi.Parameter()
...
def output(self):
...
def requires(self):
...
def run(self):
...
if __name__ == '__main__':
luigi.run()
Luigi Task
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
import luigi
class TagtooUserAnalytics(luigi.Task):
date = luigi.DateParameter()
def output(self):
...
def requires(self):
return [AggregateUserTask(self.date)]
def run(self):
with self.input().open('r') as aggregate_in:
...
with self.output().open('w') as out_file:
...
if __name__ == '__main__':
luigi.run()
Luigi analytics
Luigi analytics
$ python test2.py TagtooUserAnalytics --date=2015-06-01
Run on the command line
task manager
task manager
task manager
task manager
task manager
task manager
Task
Task
define project
define task
define trigger
run
trigger 被觸發後
job
job