Hold my beer and watch this!

Upgrading OpenStack from Havana to Juno

Jesse Keating

  • @iamjkeating
  • Lover of complicated problems challenges
  • Upgrading OpenStack since Grizzly
  • Developed an appreciation for dark heavy beer
  • .... since Grizzly

OpenStack Upgrades

+5 Sword of Ansible

COW PARADE – BRAVEHEART

Image by Richard Cross

Upgrade Styles

  • Micro upgrades
  • Macro upgrades

Orchestration

  • Eventual consistency need not apply
  • Ordered set of actions to accomplish

But first,

Database upgrade!

Percona XtraDB Cluster(****)

  • Two node cluster
  • One arbiter
  • Based on MySQL 5.5
  • MySQL 5.5 can't handle Neutron migration
  • Upgrade to 5.6

DB Hosts (2)

  • Stop DB
  • Remove packages
  • Put updated config in
  • Modify compat settings
  • Turn off replication
  • Install new packages
  • Run upgrade migration
  • Restore replication
  • Restart DB (again)
  • Repeat on other host
  • Remove compat settings
  • Restart DB (again again)

Arbiter

  • Purge old package/config
  • Fix filesystem perms
  • Run role as if new
- name: upgrade percona cluster in compat mode
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: check db version
      command: mysql -V
      changed_when: False
      register: mysqlver

    - include: upgrade-db-cluster.yml
      when: not mysqlver.stdout|search('Distrib 5\.6')

- name: upgrade percona arbiter
  hosts: db_arbiter
  max_fail_percentage: 1
  tags: dbupgrade

  pre_tasks:
    - name: purge old garbd and configs
      apt: name=percona-xtradb-cluster-garbd-2.x state=absent
           purge=true

    - name: remove old garb config
      file: path=/etc/default/garb state=absent

    - name: garbd.log permissions
      file: path=/var/log/garbd.log owner=nobody state=touch

  roles:
    - role: percona-common

    - role: percona-arbiter

- name: remove percona compat settings
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: remove compat settings
      lineinfile: regexp="{{ item }}" state=absent
                  dest=/etc/mysql/conf.d/replication.cnf
      with_items:
        - '^wsrep_provider_options\s*='
        - '^log_bin_use_v1_row_events\s*='
        - '^gtid_mode\s*='
        - '^binlog_checksum\s*='
        - '^read_only\s*='
      notify: restart mysql

  handlers:
    - name: restart mysql
      service: name=mysql state=restarted
# stop all the dbs to prevent writes
- name: stop databases
  service: name=mysql state=stopped

- name: remove old packages
  apt: name={{ item }} state=absent
  with_items:
    - percona-xtradb-cluster-server-5.5
    - percona-xtradb-cluster-galera-2.x
    - percona-xtradb-cluster-common-5.5
    - percona-xtradb-cluster-client-5.5

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/my.cnf mode=0644
  when: ansible_distribution_version == "12.04"
  notify:
    - restart mysql server

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/mysql/my.cnf mode=0644
  when: ansible_distribution_version != "12.04"
  notify:
    - restart mysql server

- name: install mysql config files
  template: src=roles/percona-server/templates/etc/mysql/conf.d/{{ item }}
            dest=/etc/mysql/conf.d/{{ item }}
            mode=0644
  with_items:
    - bind-inaddr-any.cnf
    - tuning.cnf
    - utf8.cnf

- name: adjust replication for compatability and new features
  lineinfile: regexp="{{ item.value.regexp }}"
              line="{{ item.value.line }}"
              dest=/etc/mysql/conf.d/replication.cnf state=present
  with_dict:
    provider:
      regexp: '^wsrep_provider\s*='
      line: "wsrep_provider = none"
    provider_options:
      regexp: '^wsrep_provider_options\s*='
      line: 'wsrep_provider_options="socket.checksum=1"'
    log_bin_v1:
      regexp: '^log_bin_use_v1_row_events\s*='
      line: 'log_bin_use_v1_row_events=1'
    gtid:
      regexp: '^gtid_mode\s*='
      line: 'gtid_mode=0'
    binlog:
      regexp: '^binlog_checksum\s*='
      line: 'binlog_checksum=None'
    wsrep_method:
      regexp: '^wsrep_sst_method\s*='
      line: "wsrep_sst_method = xtrabackup-v2"
    read_only:
      regexp: '^read_only\s*='
      line: "read_only = ON"

- name: install new packages
  apt: name=percona-xtradb-cluster-56

- name: run mysql_upgrade
  command: mysql_upgrade

- name: restore galera wsrep provider
  lineinfile: regexp='^wsrep_provider\s*='
              line="wsrep_provider = /usr/lib/libgalera_smm.so"
              dest=/etc/mysql/conf.d/replication.cnf

- name: restart mysql to rejoin the cluster
  service: name=mysql state=restarted

Upgrade Rabbit?

On to OpenStack!

Repeating Pattern

  • New code + config
  • Stop old code
  • Migrate database
  • Start new code

Order

  • Matters if you care
  • Seems to work in our order
  • Minimizes disruption
  • Avoid inter-project version deps

Our Order

  • glance
  • cinder
  • nova
  • neutron
  • swift
  • keystone
  • horizon

Shortcuts

  • Neutron
  • ml2
  • Linux bridge
  • Newer kernel

Strategy

  • Reuse deployment code
  • Delay restarts
  • Fail immediately
  • Non-destructive re-runs

Glance

No surprises

- name: upgrade glance
  hosts: controller
  max_fail_percentage: 1
  tags: glance

  roles:
    - role: glance
      force_sync: true
      restart: False
      database_create:
        changed: false
- name: glance config
  template: src={{ item }} dest=/etc/glance mode=0644
  with_fileglob: ../templates/etc/glance/*
  notify:
    - restart glance services

- name: stop glance services before db sync
  service: name={{ item }} state=stopped
  with_items:
    - glance-api
    - glance-registry
  when: database_create.changed or force_sync|default('false')|bool

- name: sync glance database
  command: glance-manage db_sync
  when: database_create.changed or force_sync|default('false')|bool
  run_once: true
  changed_when: true
  notify:
    - restart glance services
  # we want this to always be changed so that it can notify the service restart

- meta: flush_handlers

- name: start glance services
  service: name={{ item }} state=started
  with_items:
    - glance-api
    - glance-registry

Cinder

More complicated

# Cinder block
- name: stage cinder data software
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  roles:
    - role: cinder-data
      restart: False

    - role: stop-services
      services:
        - cinder-volume

- name: stage cinder control software and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-control

  roles:
    - role: cinder-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start cinder data services
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  tasks:
    - name: start cinder data services
      service: name=cinder-volume state=started

- name: ensure cinder v2 endpoint
  hosts: controller[0]
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-endpoint

  tasks:
    - name: cinder v2 endpoint
      keystone_service: name={{ item.name }}
                        type={{ item.type }}
                        description='{{ item.description }}'
                        public_url={{ item.public_url }}
                        internal_url={{ item.internal_url }}
                        admin_url={{ item.admin_url }}
                        region=RegionOne
                        auth_url={{ endpoints.auth_uri }}
                        tenant_name=admin
                        login_user=provider_admin
                        login_password={{ secrets.provider_admin_password }}
      with_items: keystone.services
      when: endpoints[item.name] is defined and endpoints[item.name]
            and item.name == 'cinderv2'
- name: stop services
  service: name={{ item }} state=stopped
  with_items: services

Nova

Pretty straight forward

# Nova block
- name: stage nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  roles:
    - role: nova-data
      restart: False
      when: ironic.enabled == False

    - role: stop-services
      services:
        - nova-compute
      when: ironic.enabled == False

- name: stage nova control and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - nova
    - nova-control

  roles:
    - role: nova-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  tasks:
    - name: start nova compute
      service: name=nova-compute state=started
      when: ironic.enabled == False

Neutron

DB Stamp

# Neutron block
- name: stage neutron core data
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  roles:
    - role: neutron-data
      restart: False

- name: stage neutron network
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  roles:
    - role: neutron-data-network
      restart: False

- name: upgrade neutron control plane
  hosts: controller
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-control

  pre_tasks:
    - name: check db version
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               current
      register: neutron_db_ver
      run_once: True

    - name: stamp neutron to havana
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               stamp havana
      when: not neutron_db_ver.stdout|search('juno')
      run_once: True

  roles:
    - role: neutron-control
      force_sync: true
      restart: False
      database_created:
        changed: false

- name: restart neutron data service
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  tasks:
    - name: restart neutron data service
      service: name=neutron-linuxbridge-agent state=restarted

- name: restart neutron data network services
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  tasks:
    - name: restart neutron data network agent services
      service: name={{ item }} state=restarted
      with_items:
        - neutron-l3-agent
        - neutron-dhcp-agent
        - neutron-metadata-agent

Swift

Even easier!

- name: upgrade swift
  hosts: swiftnode
  any_errors_fatal: true
  tags: swift

  roles:
    - role: haproxy
      haproxy_type: swift
      tags: ['openstack', 'swift', 'control']

    - role: swift-object
      tags: ['openstack', 'swift', 'data']

    - role: swift-account
      tags: ['openstack', 'swift', 'data']

    - role: swift-container
      tags: ['openstack', 'swift', 'data']

    - role: swift-proxy
      tags: ['openstack', 'swift', 'control']

Keystone

No sweat

- name: upgrade keystone
  hosts: controller
  max_fail_percentage: 1
  tags: keystone

  roles:
    - role: keystone
      force_sync: true
      restart: False
      database_create:
        changed: False

Horizon

It's just a webapp!

- name: upgrade horizon
  hosts: controller
  max_fail_percentage: 1
  tags: horizon

  roles:
    - role: horizon

Gotchas

Keystone PKI tokens

  • Not actually faster
  • Break services until restart

Neutron / Nova vif_plugging_is_fatal

  • Version dependency
  • Breaks builds until both upgraded

Bloated Nova

Deleted Instances

  • Data still there
  • Migrations longer
  • No supported tool to trim

Resources

  • Ursula https://github.com/blueboxgroup/ursula/
  • Twitter @iamjkeating
  • IRC #openstack-operators

Questions?

Blue Box Booth #T5!

Come by our booth, get bling, see our schedule of sessions, chat with awesome people!

Hold my beer and watch this

By Jesse Keating

Hold my beer and watch this

Upgrading OpenStack from Havana to Juno

  • 2,175