Django Q

Task Queue, Scheduler, Work Application

Jason

Outline

  • Schedule, Task
  • General practice
  • Django Q
  • Demo

Schedule

  • 常見:工作排程器, Crontab
  • 定時執行

Task

  • 任務、工作
  • 例如:定期從資料夾抓資料

例:穿襪子、穿鞋子是兩個 tasks

任務可能會有順序,這就是一個 DAG

Django Q

Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing.

Features

  • Multiprocessing worker pools
  • Asynchronous tasks
  • Encrypted and compressed packages
  • Scheduled and repeated tasks
  • Failure and success database or cache
  • Result hooks, groups, and chains
  • Django Admin integration
  • PaaS compatible with multiple instances
  • Multi-cluster monitor
  • Redis, Disque, IronMQ, SQS, MongoDB or ORM

Installation

  • Install the latest version with pip:
$ pip install django-q
  • Add django_q to INSTALLED_APPS in your projects settings.py:
INSTALLED_APPS = (
    # other apps
    'django_q',
)
  • Run Django migrations to create the database tables:
$ python manage.py migrate

Installation

  • Choose a message broker , configure it and install the appropriate client library.

  • Run Django Q cluster in order to handle tasks async:

$ python manage.py qcluster

General practice​

  • 寫隻 Script(一個任務或很多任務)
  • 用 Crontab 設定排程
  • 重複執行

產生問題

  • Script 可能做很多件事(既是 task 又是 flow)
  • 產生日誌檔案維護麻煩也不易瀏覽
  • Tasks 間有相依性容易系統性崩潰
  • 如果用 Crontab 同時執行很多工作可能會出狀況

產生問題

  • Script 在一些輸入或是一些中間文件被修改後需要重新跑
  • Script 手動的平行
  • Script 難以維護
port=3128
BUILD="PROTO_2"
DOCUMENT="./index/"
DOCUMENT_PATH="$DOCUMENT$BUILD"
CSV_PATH="./CSV/"
TITLE="Script:"

# csv file define:
PANTHER="PANTHER.csv"
PANTHER_SKU="PANTHER_SKU.csv"

# python path define:
crawler="./crawler/craw_Nimbus.py"
transfer="./PDBOM/BOM_PD.py"
compare="./PDBOM/PD_CK.py"

Shell Script

function python_exec(){
  sudo python $path
  if [ $? -eq 254 ]; then
     echo "$TITLE The BM VERSION is the same with server record.(Errorlevel = $?)"
     echo -e "\n"
     exit
  fi  

  if [ $? -ne 0 ]; then
     echo "$TITLE error execute $path Script stop.(ErrorLevel = $?)"
     echo -e "\n"
     exit
  fi
}

ckport=$(netstat -tuln | grep ":$port ")
if [ "$ckport" == "" ]; then
    echo "$TITLE Enable Automatically Proxy 'cntlm -v."
    sudo cntlm -v & 
    sleep 1
else
    echo "$TITLE Automatically Proxy is in running."
fi

Shell Script

echo "$TITLE Start Download and Transfer Nimbus .xlsx file."
path=$crawler
python_exec

path=$transfer
python_exec

test -d $DOCUMENT_PATH && echo "$TITLE exist $DOCUMENT_PATH" && ret_d=1 || echo "$TITLE Not exist $DOCUMENT_PATH"
if [ "$ret_d" -eq 1 ]; then
    test -f $PANTHER -a -f $PANTHER_SKU && echo "$TITLE exist $PANTHER and $PANTHER_SKU" && ret_f=1 || echo "$TITLE not exist $PANTHER and $PANTHER_SKU"
        if [ "$ret_f" -eq 1 ]; then
	          echo "$TITLE Compare new and old files."
	          path=$compare
            python_exec
	      else
	          echo "$TITLE Script stop."
            echo -e "\n"
            exit
	      fi
else
    echo "$TITLE test command return code ret_d variable = $ret_d"
    echo "$TITLE create new build document $BUILD at $DOCUMENT"
    mkdir -p $DOCUMENT_PATH
    test -f $PANTHER -a -f $PANTHER_SKU && echo "$TITLE exist $PANTHER and $PANTHER_SKU" && ret_f=1 || echo "$TITLE not exist $PANTHER and $PANTHER_SKU"
        if [ "$ret_f" -eq 1 ]; then
            echo "$TITLE Compare new and old files."
            path=$compare
            python_exec
        else
            echo "$TITLE Script stop."
            echo -e "\n"
            exit
        fi
fi

Shell Script

echo "$TITLE copy $PANTHER and $PANTHER_SKU to $DOCUMENT_PATH"
sudo cp $PANTHER $PANTHER_SKU $DOCUMENT_PATH
echo "$TITLE move $PANTHER and $PANTHER_SKU to $CSV_PATH"
sudo mv $PANTHER $PANTHER_SKU $CSV_PATH

echo "$TITLE ok"
echo -e "\n"
exit

Shell Script

心得

  • Shell Script 是很方便的工具
  • 但無法有效的管理流程
  • 容易在開發中迷失(專案常常為目的導向)
  • 效能上不佳,crash 後不容易清

get version

Clean Docum

get files

Check

Task and flow

DRY (Don't repeat yourself)

  • 一次實現一種功能
  • 不重複

 

取得版本

  • Bullet One
  • Bullet Two
  • Bullet Three
@property
def get_version(self):
    """
    get version
    """
    driver = self._nimbus_login
    wait = WebDriverWait(driver, 3600)
    try:
        self.version = self.logged_status(driver=driver, wait=wait)
    except Exception as e:
        raise e
    finally:
        driver.close()
        if self.virtual: self.display.stop()
    return self.version

下載檔案

  • Bullet One
  • Bullet Two
  • Bullet Three
def download_file(self, version):
    """process:
        1. Check document empty(clean)
        2. Download
        3. After finish, check file exist
        4. close browser
    ret: OrderedDict()
    """
    try:
        self.remove_file()  # clean .xlsx file.
    except Exception as e:
        raise e

    driver = self._nimbus_login
    wait = WebDriverWait(driver, 60)
    
    try:
        self.version = self.logged_status(driver=driver, wait=wait)
        if self.version == version:
            logger.info('version not change')
            ret = OrderedDict((('ret', 255), ('status', 'version not change.'), ('version', self.version)))
        elif self.version == None:
            logger.info('version is None, maybe Build matrix not online.')
            ret = OrderedDict((('ret', 255), \
                               ('status', 'version is None, \
                                 maybe Build matrix not online.'), \
                               ('version', self.version)))
        if self.virtual:
            self.display.stop()
        driver.close()
        driver.quit()
        return ret
    except Exception as e:
        raise e

Broker

  • Queue, 夾在 Django instance 和 Django Q 中間
  • DB 支援:Redis, MongoDB, Disque, DjangoORM 

Tasks

  • async
  • Async
  • group
  • Iterable

Chains

  • run tasks sequentially
  • A → B → C → D

Schedule

  • Django Admin
  • Command

Architecture

  • Signed task
  • Broker
  • Pusher
  • Worker
  • Monitor
  • Sentinel
  • Timeouts
  • Scheduler
  • Stop procedure

DEMO

 async, Async

from django_q.tasks import async, result, Async

from math import copysign

# create the task
async('math.copysign', 2, -2)

# or with import and storing the id
task_id = async(copysign, 2, -2)

# get the result
task_result = result(task_id)

# result returns None if the task has not been executed yet
# you can wait for it
task_result = result(task_id, 200)

# but in most cases you will want to use a hook:
async('math.modf', 2.5, hook='radars.hooks.print_result')


 async, Async

from django_q.tasks import async, result, Async

from math import copysign

# instantiate an async task
a = Async('math.floor', 1.5, group='math')

# you can set or change keywords afterwards
a.cached = True

# run it
a.run()

# change the args
a.args = (2.5,)

# run it again
a.run()

# wait max 10 seconds for the result and print it
print(a.result(wait=10))

Cluster

Admin 介面

Admin 介面

Admin 介面

Admin 介面

Admin 介面

結論

  • 簡潔的任務排程工具,方便管理
  • 可以結合 Django,容易將資訊以網頁呈現
  • Qinfo, Qmonitor 結合 systemd
  • Task 要設定 timeout 避免無窮迴圈
  • 不支援平行,dependence 問題要自己處理。

Thanks

Django Q

By Jason

Django Q

  • 1,264