Sirius & Storm

 

Agenda

  • Apache Storm
    • Storm Cluster (2 min)
    • Storm Components (5 min)
  • Sirius Deploy (2 min)
  • Team City (2 min)
  • Ansible (2 min)
  • Sirius Config (2 min)

Apache Storm

Apache Storm

Distributed data transformation streaming cluster

Storm Components

  • Apache Storm JVMs
    • Scheduling: Nimbus
    • Orchestration: Supervisor
      • Worker (four per supervisor)
    • Visual Logs: Logviewer
    • Front Panel: UI
    • Client: Jar
  • Nginx (reverse proxy)
  • Supervisor (service supervisor)

Configuration

---
storm_clusters:
  stormcluster-01:
    nimbus-stormcluster.gobalto.com:
      inet: 10.110.20.7
      roles: [nimbus, logviewer, ui]
    nimbus2-stormcluster.gobalto.com:
      inet: 10.110.20.146
      roles: [nimbus, logviewer, ui]
    supervisor1-stormcluster.gobalto.com:
      inet: 10.110.20.10
      roles: [supervisor, logviewer]
    supervisor2-stormcluster.gobalto.com:
      inet: 10.110.20.11
      roles: [supervisor, logviewer]
    supervisor3-stormcluster.gobalto.com:
      inet: 10.110.20.9
      roles: [supervisor, logviewer]
    zookeeper-stormcluster.gobalto.com:
      inet: 10.110.20.8
      roles: [zookeeper, proxy]
storm cluster group_var
  • Cluster Systems
    • 2 Nimbus (3 JVMs)
      • nimbus JVM
      • logviewer JVM
      • ui JVM
    • 3 Supervisor (6 JVMs)
      • supervisor JVM
      • logviewer JVM
      • 4 worker JVM
    • 1 Zookeeper (1 JVM)
      • ​zookeper JVM
      • nginx reverse-proxy
  • Dev System (or container)
    • 1 Client (jar jvm)

Note: This is a current configuration for deploy playbooks and future cluster playbooks.  It should be refactored to pull AWS tags for the configuration.

Nginx Reverse Proxy pt1

.
├── sites-available
│   ├── default
│   ├── nimbus1-stormcluster.gobalto.com
│   ├── nimbus2-stormcluster.gobalto.com
│   ├── storm-cluster.gobalto.com
│   ├── supervisor1-stormcluster.gobalto.com
│   ├── supervisor2-stormcluster.gobalto.com
│   └── supervisor3-stormcluster.gobalto.com
└── sites-enabled
    ├── nimbus1-stormcluster.gobalto.com -> /etc/nginx/sites-available/nimbus1-stormcluster.gobalto.com
    ├── nimbus2-stormcluster.gobalto.com -> /etc/nginx/sites-available/nimbus2-stormcluster.gobalto.com
    ├── storm-cluster.gobalto.com -> /etc/nginx/sites-available/storm-cluster.gobalto.com
    ├── supervisor1-stormcluster.gobalto.com -> /etc/nginx/sites-available/supervisor1-stormcluster.gobalto.com
    ├── supervisor2-stormcluster.gobalto.com -> /etc/nginx/sites-available/supervisor2-stormcluster.gobalto.com
    └── supervisor3-stormcluster.gobalto.com -> /etc/nginx/sites-available/supervisor3-stormcluster.gobalto.com

An NGiNX reverse proxy is absolutely required to all access to all the systems for the web access.  Additionally a /etc/hosts must be configured for every system so that all the components can properly communicate to each other.

NGiNX Reverse Proxy pt2

# LogViewer example
server {
  listen 8000;
  server_name supervisor1-stormcluster.gobalto.com;
  location / {
    proxy_pass http://10.110.20.10:8000;
  }
}

Two examples of  proxy configuration.  Each Logviewer and UI end-point must have a configuration.

  • Prerequisite Knowledge: hostnames, DNS, hosts file, web virtual host, reverse proxy routing
# UI example
server {
  listen 80;
  server_name storm-cluster.gobalto.com nimbus-stormcluster.gobalto.com;
  location / {
    proxy_pass http://10.110.20.7:8080;
  }
}

Apache Storm Config

storm.zookeeper.servers:
    - "zookeeper-stormcluster.gobalto.com"

nimbus.seeds: ["nimbus-stormcluster.gobalto.com", "nimbus2-stormcluster.gobalto.com"]
nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703

storm.local.dir: "/app/storm"

Apache Storm 1.x  (storm.yaml)

The cluster needs to have a list of all zookeepers and nimbus servers

Sirius Topology

Topology

A topology is a blue print of the actual transformation.  It is created in software code and packaged as a Java JAR.

 

This is what is submitted to a topology.

Sirius Topology

Sirius Topology transforms data from Activate (Source) to a Star Schema (Star).

 

Therefore you need to create a database schema (tables) for the audit logs on the source, and the star schema.  This process is called migration, and there are two configs.

 

The Spouts and Bolts will use tenantConfig.

Sirius Migration Config

{
  "dev": {
    "username": "storm_user",
    "password": null,
    "database": "star_covance",
    "host": "localhost",
    "dialect": "postgres"
  },
  "test": {
    "username": "storm_user",
    "password": null,
    "database": "star_covance",
    "host": "localhost",
    "dialect": "postgres"
  }
}

Sequelize (config.json)

There are two migration operations: source and star.  The environment variable NODE_ENV determines, which configuration to utilize.

Sirius Tenant Config

{
  "covance" : {
    "sourceDBInfo": {
      "database": "storm_togo_dev",
      "host": "REDACTED",
      "port": 5432,
      "username": "storm_togo_test",
      "password": "REDACTED"
    },
    "starDBInfo": {
      "database": "storm_dev",
      "host": "REDACTED",
      "port": 5432,
      "username": "storm_dev_user",
      "password": "REDACTED"
    }
  }
}

Sirius  (tenantconfig.json)

The topology job running in the Apache Storm cluster has an embedded configuration.

 This configured per tenant (customer).

Note that NODE_ENV cannot be supported in a cluster because there is no plausible way to configure the env var for each JVM.  This is control by Apache Storm project developers.

Sirius Deploy

Deploy Process Overview

  1. Team City Build/Deploy
  2. Docker Build Container
  3. Docker Ship Container
  4. Ansible Configure Container
  5. Ansible Launch Container
  6. Ansible Orchestrate Container
    • Configure DB Sources
    • Build JAR
    • Deactivate/Kill Topology
    • DB Migrate Source & Star
    • Submit Topology
  7. Monitor Progress

Life Cycle

  1. Build
  2. Ship
  3. Deploy
  4. Pull
  5. Run
  6. Orchestrate

Orchestrate

Configure, Build, Stop, Migrate, Deploy (Submit Topology), Status (load_sql)

Base Docker Images

  • storm base build
    • storm 1.0.2
  • maven base build
    • maven 3.3.9
  • java base build
    • oracle jdk 8
  • ubuntu 14 trusty
 
DOCKER_REPO="gobaltoops/sirius"
docker build -t=${DOCKER_REPO}:base .
docker push ${DOCKER_REPO}:base
DOCKER_REPO="gobaltoops/sirius"
docker build -t=${DOCKER_REPO}:maven --no-cache=true .
docker push ${DOCKER_REPO}:maven
DOCKER_REPO="gobaltoops/sirius"
docker build -t=${DOCKER_REPO}:storm --no-cache=true .
docker push ${DOCKER_REPO}:storm

There are three base systems that are required.

Sirius Container

Unlike a self-contained web application, Apache Storm only accepts a JAR.

 

Thus this container is thus only used for build-configuration-deploy.  It has a self-contained (segregated) Apache Storm, Java JDK 8, Maven build system, and Node JS environment.


This is used to build, configure, and deploy a topology to a cluster.
 

Sirius Dockerfile pt 1

FROM gobaltoops/sirius:maven

ENV APP_ROOT /gobalto
ENV NODE_ROOT ${APP_ROOT}/src/main/resources/resources/
ENV TEST_ROOT ${APP_ROOT}/test/

WORKDIR ${APP_ROOT}

RUN mkdir -p ${TEST_ROOT} && \
    mkdir -p ${NODE_ROOT} && \
    mkdir -p ${APP_ROOT}/output

#### UNIT TESTS
COPY test/package.json ${TEST_ROOT}
RUN npm -g install mocha istanbul && \
    cd ${TEST_ROOT} && \
    npm install

#### NODE LIBRARY SUPPORT
COPY src/main/resources/resources/package.json ${NODE_ROOT}
RUN cd ${NODE_ROOT} && \
    npm -g install sequelize@3.23 sequelize-cli pg pg-hstore && \
    npm install

VOLUME ${APP_ROOT}/logs/
VOLUME ${APP_ROOT}/output/

Dockerfile pt 2


#### COPY REST OF CODE
COPY . ${APP_ROOT}/

#### SIRIUS CONFIG SCRIPT SUPPORT
RUN apt-get update && \
    apt-get install -y libpq-dev python3-pip
RUN pip3 install --upgrade pip setuptools wheel && \
    pip3 install psycopg2
#### Needed for Psycopg2 output from UTF-8 Postgres database 
ENV PYTHONIOENCODING=utf-8
#### LINK SIRIUS CONFIG SCRIPT
ENV COMMON_SCRIPTS ${APP_ROOT}/ci/docker/configs/common/
RUN ln -sf ${COMMON_SCRIPTS}/sirius_cfg.py /usr/local/bin/sirius

#### KEEP ALIVE
CMD while :; do sleep 1; done

Team City Deploy

#!/bin/sh

# Skip Build if no change and Redeploy is 0
#[ $SKIP_BUILD -eq 1 -a $REDEPLOY -eq 1 ] || exit 0

CUSTOMER=%customer%
GIT_HASH_SHORT=$(/usr/bin/git log --abbrev-commit --abbrev=8 --max-count=1 --format=%h)
ssh storm-dev-01 sirius_deploy ${CUSTOMER} dev ${GIT_HASH_SHORT}

Team City Build

# BUILD AND SHIP
/usr/bin/docker login -u ${DOCKER_HUB_USER} -p ${DOCKER_HUB_PSSWD} -e ${DOCKER_HUB_EMAIL}
/usr/bin/docker build -t ${DOCKER_REPO}:${GIT_HASH_SHORT} .
/usr/bin/docker push ${DOCKER_REPO}:${GIT_HASH_SHORT}

# DEPLOY PROCESS
CUSTOMER=%customer%
GIT_HASH_SHORT=$(/usr/bin/git log --abbrev-commit --abbrev=8 --max-count=1 --format=%h)
ssh storm-dev-01 sirius_deploy ${CUSTOMER} dev ${GIT_HASH_SHORT}

There are two steps for the build process currently.

  1. creates the build container and ships it to DockerHub.
  2. remotely run Ansible script on strom-dev-01 system.

Sirius Deploy

#!/bin/bash

CUST=$1
ENV=$2
HASH=$3

SCRIPT=$(echo $0 | awk -F/ '{ print $NF }')

if [ $# -lt 3 ]; then
  echo 1>&2 "$0: not enough arguments, usage is '$SCRIPT CUST ENV HASH'"
  exit 2
elif [ $# -gt 3 ]; then
  echo 1>&2 "$0: too many arguments, usage is '$SCRIPT CUST ENV HASH'"
  exit 2
fi
# The three arguments are available as "$1", "$2", "$3"

ansible-playbook -v -e "git_hash_short=${HASH} customer=${CUST} env=${ENV}" \
                       /etc/ansible/playbooks/sirius_deploy.yml

Sirius Deploy

#!/bin/bash

CUST=$1
ENV=$2
HASH=$3

SCRIPT=$(echo $0 | awk -F/ '{ print $NF }')

if [ $# -lt 3 ]; then
  echo 1>&2 "$0: not enough arguments, usage is '$SCRIPT CUST ENV HASH'"
  exit 2
elif [ $# -gt 3 ]; then
  echo 1>&2 "$0: too many arguments, usage is '$SCRIPT CUST ENV HASH'"
  exit 2
fi
# The three arguments are available as "$1", "$2", "$3"

ansible-playbook -v -e "git_hash_short=${HASH} customer=${CUST} env=${ENV}" \
                       /etc/ansible/playbooks/sirius_deploy.yml

sirius_deploy playbook

---
- hosts: all
  tasks:
    - name: Add sirius_container to storm_clusters group
      add_host: name=sirius_container groups=storm_clusters
    - name: Add sirius (docker) to storm_clusters group
      add_host:
        name: sirius
        groups: storm_clusters
        ansible_connection: docker
        ansible_ssh_user: root
        ansible_become_user: root
        ansible_become: yes

- hosts: sirius_container
  connection: local
  roles:
    - sirius_container
  tasks:

- hosts: sirius
  roles:
    - sirius_deploy

sirius_container pt 1

---
- name: Test External Variables
  fail: msg="Bailing out. This role requires '{{ item }}'"
  when: "{{ item }} is not defined"
  with_items: "{{ required_vars }}"

- include: setup.yml
- include: config.yml

sirius_container pt 2

 

---
# tasks to configure
- name: Include customers variables
  include_vars: customers.yml

- name: Configure envfile
  template:
    src: dev.env.j2
    dest: "{{host_staging_dir}}/templates/dev.env"

- name: Configure storm configuration (storm.yaml)
  template:
    src: storm.yaml.j2
    dest: "{{host_staging_dir}}/templates/storm.yaml"

- name: Configure storm hosts enviroment
  template:
    src: hosts.j2
    dest: "{{host_staging_dir}}/templates/hosts"

sirius_container templates

 

# dev.env.j2
{% for key, val in customers[customer].iteritems() %}
{{ key }}="{{ val }}"
{% endfor %}

# hosts.j2
{% for key, val in storm_clusters[storm_cluster].iteritems() %}
{{ val['inet'] }} {{ key }}
{% endfor %}

# storm.yaml.j2
nimbus.seeds: [{{ storm_clusters[storm_cluster].keys() | 
                       select('search', 'nimbus') | 
                       join(", ") }}]

sirius_container pt 3

---
# task to setup container build environment
# these task can be run on localhost or remote system

- name: Include docker variables
  include_vars: docker.yml

- name: Log into DockerHub
  docker_login:
    username: "{{ docker_hub_username }}"
    password: "{{ docker_hub_password }}"
    email: "{{ docker_hub_email }}"

- name: Make dirs
  file: path={{item}} state=directory mode=0755
  with_items:
    - "{{ host_staging_dir }}/logs"
    - "{{ host_staging_dir }}/templates"
    - "{{ host_staging_dir }}/target"

- name: Find if select container exists
  shell: docker ps -a | grep -q '{{ name_app }}$'
  register: sirius_container
  ignore_errors: true

- name: Stop and remove the container if it already exists
  shell: 'docker stop {{ name_app }} && docker rm {{ name_app }}'
  when: sirius_container.rc == 0

- name: Build Sirius Container
  docker:
    name: "{{ name_app }}"
    image: gobaltoops/sirius:{{ git_hash_short }}
    state: reloaded
    pull: always
    command: bash "{{ app_root }}"/"{{ config_path }}"/"{{ app_env }}"/wrapper.sh
    volumes:
         - "{{ host_staging_dir }}/templates:/templates"
         - "{{ host_staging_dir}}/logs:{{ app_root }}/logs"
         - "{{ host_staging_dir }}/target:{{ app_root }}/output"
    env:
        APP_ROOT: "{{ app_root }}"
        CUSTOMER: "{{ customer }}"
        NODE_ENV: "{{ app_env }}"

- name: Add Docker Connection
  add_host:
    name: "{{ name_app }}"
    groups: storm_clusters
    ansible_connection: docker
    ansible_ssh_user: root
    ansible_become_user: root
    ansible_become: yes

sirius_container pt 4

- name: Build Sirius Container
  docker:
    name: "{{ name_app }}"
    image: gobaltoops/sirius:{{ git_hash_short }}
    state: reloaded
    pull: always
    command: bash "{{ app_root }}"/"{{ config_path }}"/"{{ app_env }}"/wrapper.sh
    volumes:
         - "{{ host_staging_dir }}/templates:/templates"
         - "{{ host_staging_dir}}/logs:{{ app_root }}/logs"
         - "{{ host_staging_dir }}/target:{{ app_root }}/output"
    env:
        APP_ROOT: "{{ app_root }}"
        CUSTOMER: "{{ customer }}"
        NODE_ENV: "{{ app_env }}"

- name: Add Docker Connection
  add_host:
    name: "{{ name_app }}"
    groups: storm_clusters
    ansible_connection: docker
    ansible_ssh_user: root
    ansible_become_user: root
    ansible_become: yes

sirius_deploy role

---
# tasks file for sirius_deploy
- name: Test External Variables
  fail: msg="Bailing out. This role requires '{{ item }}'"
  when: "{{ item }} is not defined"
  with_items: "{{ required_vars }}"

- name: Config db connections from envfile
  command: sirius config {{ envfile }}

- name: Build topology jar
  command: sirius build {{ git_hash_short }}

- name: Stop active topologies
  command: sirius stop

- name: Migrate source Activate db
  command: sirius migrate source

- name: Migrate destination star db
  command: sirius migrate star

- name: Deploy topology
  command: sirius deploy {{ git_hash_short }}
Made with Slides.com