Hold my beer and watch this!

Upgrading OpenStack from Havana to Juno

Jesse Keating

@iamjkeating
Lover of complicated ~~problems~~ challenges
Upgrading OpenStack since Grizzly
Developed an appreciation for dark heavy beer
.... since Grizzly

OpenStack Upgrades

+5 Sword of Ansible

COW PARADE – BRAVEHEART

Image by Richard Cross

Upgrade Styles

Micro upgrades
Macro upgrades

Orchestration

Eventual consistency need not apply
Ordered set of actions to accomplish

But first,

Database upgrade!

Percona XtraDB Cluster(****)

Two node cluster
One arbiter
Based on MySQL 5.5
MySQL 5.5 can't handle Neutron migration
Upgrade to 5.6

DB Hosts (2)

Stop DB
Remove packages
Put updated config in
Modify compat settings
Turn off replication
Install new packages
Run upgrade migration
Restore replication
Restart DB (again)
Repeat on other host
Remove compat settings
Restart DB (again again)

Arbiter

Purge old package/config
Fix filesystem perms
Run role as if new

- name: upgrade percona cluster in compat mode
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: check db version
      command: mysql -V
      changed_when: False
      register: mysqlver

    - include: upgrade-db-cluster.yml
      when: not mysqlver.stdout|search('Distrib 5\.6')

- name: upgrade percona arbiter
  hosts: db_arbiter
  max_fail_percentage: 1
  tags: dbupgrade

  pre_tasks:
    - name: purge old garbd and configs
      apt: name=percona-xtradb-cluster-garbd-2.x state=absent
           purge=true

    - name: remove old garb config
      file: path=/etc/default/garb state=absent

    - name: garbd.log permissions
      file: path=/var/log/garbd.log owner=nobody state=touch

  roles:
    - role: percona-common

    - role: percona-arbiter

- name: remove percona compat settings
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: remove compat settings
      lineinfile: regexp="{{ item }}" state=absent
                  dest=/etc/mysql/conf.d/replication.cnf
      with_items:
        - '^wsrep_provider_options\s*='
        - '^log_bin_use_v1_row_events\s*='
        - '^gtid_mode\s*='
        - '^binlog_checksum\s*='
        - '^read_only\s*='
      notify: restart mysql

  handlers:
    - name: restart mysql
      service: name=mysql state=restarted

# stop all the dbs to prevent writes
- name: stop databases
  service: name=mysql state=stopped

- name: remove old packages
  apt: name={{ item }} state=absent
  with_items:
    - percona-xtradb-cluster-server-5.5
    - percona-xtradb-cluster-galera-2.x
    - percona-xtradb-cluster-common-5.5
    - percona-xtradb-cluster-client-5.5

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/my.cnf mode=0644
  when: ansible_distribution_version == "12.04"
  notify:
    - restart mysql server

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/mysql/my.cnf mode=0644
  when: ansible_distribution_version != "12.04"
  notify:
    - restart mysql server

- name: install mysql config files
  template: src=roles/percona-server/templates/etc/mysql/conf.d/{{ item }}
            dest=/etc/mysql/conf.d/{{ item }}
            mode=0644
  with_items:
    - bind-inaddr-any.cnf
    - tuning.cnf
    - utf8.cnf

- name: adjust replication for compatability and new features
  lineinfile: regexp="{{ item.value.regexp }}"
              line="{{ item.value.line }}"
              dest=/etc/mysql/conf.d/replication.cnf state=present
  with_dict:
    provider:
      regexp: '^wsrep_provider\s*='
      line: "wsrep_provider = none"
    provider_options:
      regexp: '^wsrep_provider_options\s*='
      line: 'wsrep_provider_options="socket.checksum=1"'
    log_bin_v1:
      regexp: '^log_bin_use_v1_row_events\s*='
      line: 'log_bin_use_v1_row_events=1'
    gtid:
      regexp: '^gtid_mode\s*='
      line: 'gtid_mode=0'
    binlog:
      regexp: '^binlog_checksum\s*='
      line: 'binlog_checksum=None'
    wsrep_method:
      regexp: '^wsrep_sst_method\s*='
      line: "wsrep_sst_method = xtrabackup-v2"
    read_only:
      regexp: '^read_only\s*='
      line: "read_only = ON"

- name: install new packages
  apt: name=percona-xtradb-cluster-56

- name: run mysql_upgrade
  command: mysql_upgrade

- name: restore galera wsrep provider
  lineinfile: regexp='^wsrep_provider\s*='
              line="wsrep_provider = /usr/lib/libgalera_smm.so"
              dest=/etc/mysql/conf.d/replication.cnf

- name: restart mysql to rejoin the cluster
  service: name=mysql state=restarted

Upgrade Rabbit?

On to OpenStack!

Repeating Pattern

New code + config
Stop old code
Migrate database
Start new code

Order

Matters if you care
Seems to work in our order
Minimizes disruption
Avoid inter-project version deps

Our Order

glance
cinder
nova
neutron
swift
keystone
horizon

Shortcuts

Neutron
ml2
Linux bridge
Newer kernel

Strategy

Reuse deployment code
Delay restarts
Fail immediately
Non-destructive re-runs

Glance

No surprises

- name: upgrade glance
  hosts: controller
  max_fail_percentage: 1
  tags: glance

  roles:
    - role: glance
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: glance config
  template: src={{ item }} dest=/etc/glance mode=0644
  with_fileglob: ../templates/etc/glance/*
  notify:
    - restart glance services

- name: stop glance services before db sync
  service: name={{ item }} state=stopped
  with_items:
    - glance-api
    - glance-registry
  when: database_create.changed or force_sync|default('false')|bool

- name: sync glance database
  command: glance-manage db_sync
  when: database_create.changed or force_sync|default('false')|bool
  run_once: true
  changed_when: true
  notify:
    - restart glance services
  # we want this to always be changed so that it can notify the service restart

- meta: flush_handlers

- name: start glance services
  service: name={{ item }} state=started
  with_items:
    - glance-api
    - glance-registry

Cinder

More complicated

# Cinder block
- name: stage cinder data software
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  roles:
    - role: cinder-data
      restart: False

    - role: stop-services
      services:
        - cinder-volume

- name: stage cinder control software and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-control

  roles:
    - role: cinder-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start cinder data services
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  tasks:
    - name: start cinder data services
      service: name=cinder-volume state=started

- name: ensure cinder v2 endpoint
  hosts: controller[0]
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-endpoint

  tasks:
    - name: cinder v2 endpoint
      keystone_service: name={{ item.name }}
                        type={{ item.type }}
                        description='{{ item.description }}'
                        public_url={{ item.public_url }}
                        internal_url={{ item.internal_url }}
                        admin_url={{ item.admin_url }}
                        region=RegionOne
                        auth_url={{ endpoints.auth_uri }}
                        tenant_name=admin
                        login_user=provider_admin
                        login_password={{ secrets.provider_admin_password }}
      with_items: keystone.services
      when: endpoints[item.name] is defined and endpoints[item.name]
            and item.name == 'cinderv2'

- name: stop services
  service: name={{ item }} state=stopped
  with_items: services

Nova

Pretty straight forward

# Nova block
- name: stage nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  roles:
    - role: nova-data
      restart: False
      when: ironic.enabled == False

    - role: stop-services
      services:
        - nova-compute
      when: ironic.enabled == False

- name: stage nova control and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - nova
    - nova-control

  roles:
    - role: nova-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  tasks:
    - name: start nova compute
      service: name=nova-compute state=started
      when: ironic.enabled == False

Neutron

DB Stamp

# Neutron block
- name: stage neutron core data
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  roles:
    - role: neutron-data
      restart: False

- name: stage neutron network
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  roles:
    - role: neutron-data-network
      restart: False

- name: upgrade neutron control plane
  hosts: controller
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-control

  pre_tasks:
    - name: check db version
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               current
      register: neutron_db_ver
      run_once: True

    - name: stamp neutron to havana
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               stamp havana
      when: not neutron_db_ver.stdout|search('juno')
      run_once: True

  roles:
    - role: neutron-control
      force_sync: true
      restart: False
      database_created:
        changed: false

- name: restart neutron data service
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  tasks:
    - name: restart neutron data service
      service: name=neutron-linuxbridge-agent state=restarted

- name: restart neutron data network services
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  tasks:
    - name: restart neutron data network agent services
      service: name={{ item }} state=restarted
      with_items:
        - neutron-l3-agent
        - neutron-dhcp-agent
        - neutron-metadata-agent

Swift

Even easier!

- name: upgrade swift
  hosts: swiftnode
  any_errors_fatal: true
  tags: swift

  roles:
    - role: haproxy
      haproxy_type: swift
      tags: ['openstack', 'swift', 'control']

    - role: swift-object
      tags: ['openstack', 'swift', 'data']

    - role: swift-account
      tags: ['openstack', 'swift', 'data']

    - role: swift-container
      tags: ['openstack', 'swift', 'data']

    - role: swift-proxy
      tags: ['openstack', 'swift', 'control']

Keystone

No sweat

- name: upgrade keystone
  hosts: controller
  max_fail_percentage: 1
  tags: keystone

  roles:
    - role: keystone
      force_sync: true
      restart: False
      database_create:
        changed: False

Horizon

It's just a webapp!

- name: upgrade horizon
  hosts: controller
  max_fail_percentage: 1
  tags: horizon

  roles:
    - role: horizon

Gotchas

Keystone PKI tokens

Not actually faster
Break services until restart

Neutron / Nova vif_plugging_is_fatal

Version dependency
Breaks builds until both upgraded

Bloated Nova

Deleted Instances

Data still there
Migrations longer
No supported tool to trim

Resources

Ursula https://github.com/blueboxgroup/ursula/
Twitter @iamjkeating
IRC #openstack-operators

Questions?

Blue Box Booth #T5!

Come by our booth, get bling, see our schedule of sessions, chat with awesome people!

Hold my beer and watch this

By Jesse Keating

Hold my beer and watch this

Upgrading OpenStack from Havana to Juno

2,346

Hold my beer and watch this!

Upgrading OpenStack from Havana to Juno

Jesse Keating

OpenStack Upgrades

Upgrade Styles

Orchestration

But first,

Database upgrade!

Percona XtraDB Cluster(****)

DB Hosts (2)

Arbiter

Upgrade Rabbit?

On to OpenStack!

Repeating Pattern

Order

Our Order

Shortcuts

Strategy

Glance

No surprises

Cinder

More complicated

Nova

Pretty straight forward

Neutron

DB Stamp

Swift

Even easier!

Keystone

No sweat

Horizon

It's just a webapp!

Gotchas

Keystone PKI tokens

Neutron / Nova vif_plugging_is_fatal

Bloated Nova

Deleted Instances

Resources

Questions?

Blue Box Booth #T5!

Hold my beer and watch this

More from Jesse Keating