Hold my beer and watch this!

Upgrading OpenStack from Havana to Juno

Jesse Keating

  • @iamjkeating
  • Lover of complicated problems challenges
  • Upgrading OpenStack since Grizzly
  • Developed an appreciation for dark heavy beer
  • .... since Grizzly

OpenStack Upgrades

+5 Sword of Ansible

COW PARADE – BRAVEHEART

Image by Richard Cross

Upgrade Styles

  • Micro upgrades
  • Macro upgrades

Orchestration

  • Eventual consistency need not apply
  • Ordered set of actions to accomplish

But first,

Database upgrade!

Percona XtraDB Cluster(****)

  • Two node cluster
  • One arbiter
  • Based on MySQL 5.5
  • MySQL 5.5 can't handle Neutron migration
  • Upgrade to 5.6

DB Hosts (2)

  • Stop DB
  • Remove packages
  • Put updated config in
  • Modify compat settings
  • Turn off replication
  • Install new packages
  • Run upgrade migration
  • Restore replication
  • Restart DB (again)
  • Repeat on other host
  • Remove compat settings
  • Restart DB (again again)

Arbiter

  • Purge old package/config
  • Fix filesystem perms
  • Run role as if new
- name: upgrade percona cluster in compat mode
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: check db version
      command: mysql -V
      changed_when: False
      register: mysqlver

    - include: upgrade-db-cluster.yml
      when: not mysqlver.stdout|search('Distrib 5\.6')

- name: upgrade percona arbiter
  hosts: db_arbiter
  max_fail_percentage: 1
  tags: dbupgrade

  pre_tasks:
    - name: purge old garbd and configs
      apt: name=percona-xtradb-cluster-garbd-2.x state=absent
           purge=true

    - name: remove old garb config
      file: path=/etc/default/garb state=absent

    - name: garbd.log permissions
      file: path=/var/log/garbd.log owner=nobody state=touch

  roles:
    - role: percona-common

    - role: percona-arbiter

- name: remove percona compat settings
  hosts: db
  max_fail_percentage: 1
  tags: dbupgrade
  serial: 1

  tasks:
    - name: remove compat settings
      lineinfile: regexp="{{ item }}" state=absent
                  dest=/etc/mysql/conf.d/replication.cnf
      with_items:
        - '^wsrep_provider_options\s*='
        - '^log_bin_use_v1_row_events\s*='
        - '^gtid_mode\s*='
        - '^binlog_checksum\s*='
        - '^read_only\s*='
      notify: restart mysql

  handlers:
    - name: restart mysql
      service: name=mysql state=restarted
# stop all the dbs to prevent writes
- name: stop databases
  service: name=mysql state=stopped

- name: remove old packages
  apt: name={{ item }} state=absent
  with_items:
    - percona-xtradb-cluster-server-5.5
    - percona-xtradb-cluster-galera-2.x
    - percona-xtradb-cluster-common-5.5
    - percona-xtradb-cluster-client-5.5

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/my.cnf mode=0644
  when: ansible_distribution_version == "12.04"
  notify:
    - restart mysql server

- name: configure my.cnf
  template: src=roles/percona-server/templates/etc/my.cnf
            dest=/etc/mysql/my.cnf mode=0644
  when: ansible_distribution_version != "12.04"
  notify:
    - restart mysql server

- name: install mysql config files
  template: src=roles/percona-server/templates/etc/mysql/conf.d/{{ item }}
            dest=/etc/mysql/conf.d/{{ item }}
            mode=0644
  with_items:
    - bind-inaddr-any.cnf
    - tuning.cnf
    - utf8.cnf

- name: adjust replication for compatability and new features
  lineinfile: regexp="{{ item.value.regexp }}"
              line="{{ item.value.line }}"
              dest=/etc/mysql/conf.d/replication.cnf state=present
  with_dict:
    provider:
      regexp: '^wsrep_provider\s*='
      line: "wsrep_provider = none"
    provider_options:
      regexp: '^wsrep_provider_options\s*='
      line: 'wsrep_provider_options="socket.checksum=1"'
    log_bin_v1:
      regexp: '^log_bin_use_v1_row_events\s*='
      line: 'log_bin_use_v1_row_events=1'
    gtid:
      regexp: '^gtid_mode\s*='
      line: 'gtid_mode=0'
    binlog:
      regexp: '^binlog_checksum\s*='
      line: 'binlog_checksum=None'
    wsrep_method:
      regexp: '^wsrep_sst_method\s*='
      line: "wsrep_sst_method = xtrabackup-v2"
    read_only:
      regexp: '^read_only\s*='
      line: "read_only = ON"

- name: install new packages
  apt: name=percona-xtradb-cluster-56

- name: run mysql_upgrade
  command: mysql_upgrade

- name: restore galera wsrep provider
  lineinfile: regexp='^wsrep_provider\s*='
              line="wsrep_provider = /usr/lib/libgalera_smm.so"
              dest=/etc/mysql/conf.d/replication.cnf

- name: restart mysql to rejoin the cluster
  service: name=mysql state=restarted

Upgrade Rabbit?

On to OpenStack!

Repeating Pattern

  • New code + config
  • Stop old code
  • Migrate database
  • Start new code

Order

  • Matters if you care
  • Seems to work in our order
  • Minimizes disruption
  • Avoid inter-project version deps

Our Order

  • glance
  • cinder
  • nova
  • neutron
  • swift
  • keystone
  • horizon

Shortcuts

  • Neutron
  • ml2
  • Linux bridge
  • Newer kernel

Strategy

  • Reuse deployment code
  • Delay restarts
  • Fail immediately
  • Non-destructive re-runs

Glance

No surprises

- name: upgrade glance
  hosts: controller
  max_fail_percentage: 1
  tags: glance

  roles:
    - role: glance
      force_sync: true
      restart: False
      database_create:
        changed: false
- name: glance config
  template: src={{ item }} dest=/etc/glance mode=0644
  with_fileglob: ../templates/etc/glance/*
  notify:
    - restart glance services

- name: stop glance services before db sync
  service: name={{ item }} state=stopped
  with_items:
    - glance-api
    - glance-registry
  when: database_create.changed or force_sync|default('false')|bool

- name: sync glance database
  command: glance-manage db_sync
  when: database_create.changed or force_sync|default('false')|bool
  run_once: true
  changed_when: true
  notify:
    - restart glance services
  # we want this to always be changed so that it can notify the service restart

- meta: flush_handlers

- name: start glance services
  service: name={{ item }} state=started
  with_items:
    - glance-api
    - glance-registry

Cinder

More complicated

# Cinder block
- name: stage cinder data software
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  roles:
    - role: cinder-data
      restart: False

    - role: stop-services
      services:
        - cinder-volume

- name: stage cinder control software and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-control

  roles:
    - role: cinder-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start cinder data services
  hosts: cinder_volume
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-volume

  tasks:
    - name: start cinder data services
      service: name=cinder-volume state=started

- name: ensure cinder v2 endpoint
  hosts: controller[0]
  max_fail_percentage: 1
  tags:
    - cinder
    - cinder-endpoint

  tasks:
    - name: cinder v2 endpoint
      keystone_service: name={{ item.name }}
                        type={{ item.type }}
                        description='{{ item.description }}'
                        public_url={{ item.public_url }}
                        internal_url={{ item.internal_url }}
                        admin_url={{ item.admin_url }}
                        region=RegionOne
                        auth_url={{ endpoints.auth_uri }}
                        tenant_name=admin
                        login_user=provider_admin
                        login_password={{ secrets.provider_admin_password }}
      with_items: keystone.services
      when: endpoints[item.name] is defined and endpoints[item.name]
            and item.name == 'cinderv2'
- name: stop services
  service: name={{ item }} state=stopped
  with_items: services

Nova

Pretty straight forward

# Nova block
- name: stage nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  roles:
    - role: nova-data
      restart: False
      when: ironic.enabled == False

    - role: stop-services
      services:
        - nova-compute
      when: ironic.enabled == False

- name: stage nova control and stop services
  hosts: controller
  max_fail_percentage: 1
  tags:
    - nova
    - nova-control

  roles:
    - role: nova-control
      force_sync: true
      restart: False
      database_create:
        changed: false

- name: start nova compute
  hosts: compute
  max_fail_percentage: 1
  tags:
    - nova
    - nova-data

  tasks:
    - name: start nova compute
      service: name=nova-compute state=started
      when: ironic.enabled == False

Neutron

DB Stamp

# Neutron block
- name: stage neutron core data
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  roles:
    - role: neutron-data
      restart: False

- name: stage neutron network
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  roles:
    - role: neutron-data-network
      restart: False

- name: upgrade neutron control plane
  hosts: controller
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-control

  pre_tasks:
    - name: check db version
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               current
      register: neutron_db_ver
      run_once: True

    - name: stamp neutron to havana
      command: neutron-db-manage --config-file /etc/neutron/neutron.conf
               --config-file /etc/neutron/plugins/ml2/ml2_plugin.ini
               stamp havana
      when: not neutron_db_ver.stdout|search('juno')
      run_once: True

  roles:
    - role: neutron-control
      force_sync: true
      restart: False
      database_created:
        changed: false

- name: restart neutron data service
  hosts: compute:network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-data

  tasks:
    - name: restart neutron data service
      service: name=neutron-linuxbridge-agent state=restarted

- name: restart neutron data network services
  hosts: network
  max_fail_percentage: 1
  tags:
    - neutron
    - neutron-network

  tasks:
    - name: restart neutron data network agent services
      service: name={{ item }} state=restarted
      with_items:
        - neutron-l3-agent
        - neutron-dhcp-agent
        - neutron-metadata-agent

Swift

Even easier!

- name: upgrade swift
  hosts: swiftnode
  any_errors_fatal: true
  tags: swift

  roles:
    - role: haproxy
      haproxy_type: swift
      tags: ['openstack', 'swift', 'control']

    - role: swift-object
      tags: ['openstack', 'swift', 'data']

    - role: swift-account
      tags: ['openstack', 'swift', 'data']

    - role: swift-container
      tags: ['openstack', 'swift', 'data']

    - role: swift-proxy
      tags: ['openstack', 'swift', 'control']

Keystone

No sweat

- name: upgrade keystone
  hosts: controller
  max_fail_percentage: 1
  tags: keystone

  roles:
    - role: keystone
      force_sync: true
      restart: False
      database_create:
        changed: False

Horizon

It's just a webapp!

- name: upgrade horizon
  hosts: controller
  max_fail_percentage: 1
  tags: horizon

  roles:
    - role: horizon

Gotchas

Keystone PKI tokens

  • Not actually faster
  • Break services until restart

Neutron / Nova vif_plugging_is_fatal

  • Version dependency
  • Breaks builds until both upgraded

Bloated Nova

Deleted Instances

  • Data still there
  • Migrations longer
  • No supported tool to trim

Resources

  • Ursula https://github.com/blueboxgroup/ursula/
  • Twitter @iamjkeating
  • IRC #openstack-operators

Questions?

Blue Box Booth #T5!

Come by our booth, get bling, see our schedule of sessions, chat with awesome people!

Made with Slides.com