Managing PostgreSQL with Ansible

Gülçin Yıldırım

PGConf EU 2015, Vienna

select * from me;

Postgres DBA @ 2ndQuadrant

Studying MSc Comp. & Systems Eng. @ Tallinn University of Technology

Studied BSc Maths Eng. @ Yildiz Technical University

Writes blog on 2ndQuadrant blog

Does some childish paintings

Loves independent films

@apatheticmagpie

Skype: gulcin2ndq

Github: gulcin

Agenda

What is Ansible?

Simple, agentless and powerful open source IT automation tool

Provisioning
Configuration Management
Application Deployment
Continuous Delivery
Security & Compliance
Orchestration

Why so popular?

Why now?

The technology: APIs, Cloud Management

Nearly every major service has an API these days.

The need: More and more products are now composed of distributed components. More moving parts. They need to be managed.

The practise: It is now common to automate builds, tests, deployments, integration. Infrastructure is the next thing to automate.

What are the options?

Why Ansible?

Agent-less architecture (no agent is required, everything is done by using SSH, ssh for communication)
No centralised server, no client-side agents
SSH based, no PKI
Configuration as data, not code (YAML files)
Batteries included (community, playbooks, modules, roles, tasks, documentation)
Full configuration management, deployment and orchestration
Custom modules can be written in any programming language.
JSON input/output is sufficient to integrate a module into Ansible.

Life before Ansible

Multiple ssh, control panels, editing config files manually.

Let's install Ansible

Debian or Ubuntu:


apt-get install python-dev python-setuptools
easy_install pip
pip install ansible boto

Mac OS X:


sudo easy_install pip
sudo pip install ansible boto

Code of this talk

Before we start discussing Ansible in more detail,
please check the repo that I created for this talk:

github.com/gulcin/pgconfeu2015

In this example, we will:

Provision Amazon VPC and EC2 instances
Install latest PostgreSQL packages
Setup a streaming replication with 1 master and 2 standbys
Add a new standby server

Configuration management

Why do we need a better & modern approach?

Manual tasks are error prone
Very difficult to setup test environments before changing/implementing something big
Too many servers require too many tasks to be repeated on each of them
Since it is a manual process, some servers may go out of sync, they may become outdated or even misconfigured
Leads to heterogenous environment instead of a homogenous & more managable one

Provisioning

Automated:
Simplifies system administrators job by freeing them from most of manual interventions.
Idempotent:
Ensures that the system is in the state we expect it to be.
Agentless:
Every resource is managed by SSH or APIs (like the Amazon API).
Feature-rich:
Provides lots of modules from file handling to virtual machine provisioning.

Application deployment

Time & Bandwidth Efficient:
Synchronizes only new or changed files.
Full Process Handling:
Ansible can remove servers from load-balancers and stop services before transfers. Then it can restore them all back.
Zero-Downtime Deployment:
It can deploy on a subset of servers and then move on to the next set to ensure that the system is always up and running.
Same solution can be run on developer machines, production environment, staging environment, testing environment, etc...

Continuous delivery

Continuous delivery means adopting a fast and simple approach for releases.

Ansible helps us with two main aspects:

Enables frequent releases without downtimes
Requires as little human intervention as possible

It can automate any task from provisioning servers to configuring services to become production ready.
Works very well in-front of a continuous integration system.

Security and compliance

You need to be sure about security of your:

Systems / Services / Software
Data
Customer's data

Once the security policy has been set, Ansible can be used to turn the system back into the compliance mode, instantly.

Ansible is easy-to-use whether it’s about:

Setting up firewall rules
Locking down users and groups
Applying custom security policies

Orchestration

Ansible ensures that all tasks are executed in the proper order.

This makes orchestration of complex multi-tier deployments easier.

For example, considering the deployment of a software stack, Ansible makes sure that:

The networking is setup before resources are communicating
PostgreSQL servers are up and running before app deployment
Containers are launched before the services are taken online
Application is deployed before the load balancers are activated

We will see this in action with our demo.

Building Blocks

An Ansible solution is composed of one or more items listed on the left side.

Typically, our solutions executes tasks for an inventory, utilizing some modules, using or populating some variables, processing some file templates, in a playbook, which can be organized in roles.

Let's see each of them in detail.

Building Blocks - Inventory

Tells Ansible about hosts it should manage
- Hostnames, IPs, ports, SSH parameters
- Server specific variables
2 common ways to provide Ansible an inventory:
- Static inventory: A flat INI file (e.g., hosts.ini)
- Dynamic inventory: An application returning a JSON data (e.g.,: ec2.py for Amazon EC2)
Hosts are grouped. They can belong to multiple groups
Groups can also belong to multiple groups

Building Blocks - Inventory

Below inventory file in INI format contains 3 hosts under 2 groups.

There is also a 3rd group which contains other groups and all of the hosts.

# "master" group with 1 host
[master]
postgresql-master        ansible_ssh_host=10.0.0.5     ansible_ssh_user=ubuntu

# "standby" group with 2 hosts
[standbys]
postgresql-standby-01    ansible_ssh_host=10.0.0.10    ansible_ssh_user=ubuntu
postgresql-standby-02    ansible_ssh_host=10.0.0.11    ansible_ssh_user=ubuntu

# the "replication" group contains both "master" and "standbys" groups
[replication:children]
master
standbys

Building Blocks - Module

Modules provide Ansible means to control or manage resources on local or remote servers.
They perform a variety of functions. For example a module may be responsible for rebooting a machine or it can simply display a message on the screen.
Ansible allows users to write their own modules and also provides out-of-the-box core or extras modules.
- Core: Maintained by the core Ansible team and always shipped with Ansible.
- Extras: Maintained by the community. Might be shipped separately in the future.

Building Blocks - Module

Some of the most commonly used modules are:

File handling: file, stat, copy, template
Remote execution: command, shell
Service management: service
Package management: apt, yum, bsd, ports
Source control systems: git, subversion

Building Blocks - Task

Tasks are responsible for calling a module with a specific set of parameters.

Each Ansible task contains:

a descriptive name [optional]
a module to be called
module parameters
pre/post-conditions [optional]
processing directives [optional]

They allow us to call Ansible modules and pass information to consecutive tasks.

Building Blocks - Task

Below task invokes the "file" module by providing 4 parameters. It ensures 3 conditions are true:

/var/lib/postgresql exists as a directory
owner of /var/lib/postgresql is "postgres"
group of /var/lib/postgresql is "postgres"

If it doesn't exist, Ansible creates the directory and assigns owner & group. If only the owner is different, Ansible makes it "postgres".


- name: Ensure the data folder has right ownership
  file: path="/var/lib/postgresql" state=directory owner=postgres group=postgres

Building Blocks - Task

Following example shows relationships between tasks. The first task checks if a device exists and the second task mounts the device depending on the result from the first task.

Please note "register" and "when" keywords.


- name: Check if the data volume partition exists
  stat: path=/dev/sdc1
  register: partition

- name: Ensure the PostgreSQL data volume is mounted
  mount: src=/dev/sdc1 name="/var/lib/postgresql/9.4" fstype=ext4 state=mounted
  when: partition.stat.exists is defined and partition.stat.exists

Building Blocks - Variable

Variables in Ansible are very useful for reusing information. Sources for variables are:

Inventory: We can assign variables to hosts or groups (group vars, host vars).
YAML files: We can include files containing variables.
Task results: Result of a task can be assigned to a variable using the register keyword as shown in the previous slide.
Playbooks: We can define variables in Ansible playbooks (more on that later).
Command line: (-e means extra variable // -e "uservar=gulcin")

Building Blocks - Variable

There is also discovered variables (facts) and they can be found by using setup module:

All of the output in json: bios_version, architecture, default_ipv4_address, ansible_os_family, etc.

You can use variables in tasks, templates themselves and you can iterate over using with_type functions.

ansible -i hosts.ini -m setup hostname

- name: Ensure PostgreSQL users are present
  postgresql_user:
    state: present
    name: "{{ item.name }}"
    password: "{{ item.password }}"
    role_attr_flags: "{{ item.roles }}"
  with_items: postgresql_users

Building Blocks - Template

We can think templates as our configuration files. Ansible lets us use the Jinja2 template engine for reforming and parameterising our files.

The Jinja2 templating engine offers a wide range of control structures, functions and filters..

Some of useful capabilities of Jinja2:

for-loops
join(), default(), range(), format()
union(), intersect(), difference()
| to_json, | to_nice_yaml, | from_json, | from_yml
| min, | max, | unique, | version_compare, | random

Building Blocks - Template


*:*:*:{{ postgresql_replication_user.name }}:{{ postgresql_replication_user.password }}

Let's check our pgpass.j2 template:

Let's have a look at pg_hba.conf.j2 file, too:


# Access to user from local without password, from subnet with password
{% for user in postgresql_users %}
local  all         {{ user.name }}                             trust
host   all         {{ user.name }}    {{ ec2_subnet }}         md5
{% endfor %}

# Replication user for standby servers access
{% for standby_ip in postgresql_standby_ips %}
host   replication {{ postgresql_replication_user.name }}  {{ standby_ip }}/32  md5
{% endfor %}

Building Blocks - Playbook

Playbooks contains Plays
- Plays contain Tasks
  - Tasks call Modules and may (optionally) trigger handlers (run once, run at the end)

Building Blocks - Playbook

Ansible playbooks are written using the YAML syntax.
Playbooks may contain more than one plays
Each play contains:
- name of host groups to connect to
- tasks it needs to perform.
A play may also contain variables/roles/handlers, if defined.

Strict dependency ordering: everything in file performs in a sequential order.

Building Blocks - Playbook

Let's look at our playbook example (main.yml):

---
- name: Ensure all virtual machines are ready
  hosts: 127.0.0.1
  connection: local
  vars_files: # load default variables from YAML files below
    - 'defaults/postgresql.yml'
    - 'defaults/aws.yml'
  tasks:
    - include: 'tasks/provision.yml' # load infrastructure setup tasks

- name: Ensure all required PostgreSQL dependencies ready
  hosts: postgresql-all # manage all PostgreSQL servers
  sudo: yes
  sudo_user: root
  vars_files:
    - 'defaults/postgresql.yml'
    - 'defaults/aws.yml'
  tasks:
    - include: 'tasks/postgresql.yml' # load PostgreSQL setup tasks

...

Building Blocks - Role

You absolutely should be using roles. Roles are great. Use roles. Roles! Did we say that enough? Roles are great.

In Ansible,

playbooks organize tasks
roles organize playbooks

Imagine that we have lots of independent resources to manage (e.g., web servers, PostgreSQL servers, logging, monitoring, AWS, ...).

Putting everything in a single playbook may result in an unmaintainable solution.

Building Blocks - Role

To reduce such complexity, roles help us with:

Splitting tasks into much smaller playbooks
This allows us to focus on resources, independently. That makes it simpler to maintain and debug. Also it will be much easier to understand the structure.
Reusing configs, files, templates, tasks
This way we can easily share those components between playbooks, without rewriting over and over.
Handling playbook dependencies
When we execute a role, we can be sure that all of preconditions are satisfied for that role.

Building Blocks - Role

Here you can see a dependency graph and the corresponding role directory structure:

How to Invoke Ansible?

To work with Ansible, we have 2 main alternatives;

Running ad-hoc commands
Running playbooks

Let's check them out one by one.

Ad-hoc Commands

We can call any Ansible module from the command line, anytime.

The ansible CLI tool works like a single task. It requires an inventory, a module name, and module parameters.

For example, given an inventory file like:

     [dbservers]
     db.example.com

Now we can call any module.

Ad-hoc Commands

We can check uptimes of all hosts in dbservers using:

ansible dbservers -i hosts.ini -m command -a "uptime"

Here we can see the Ansible output:


gulcin@apathetic ~ # ansible dbservers -i hosts.ini -m command -a "uptime"
db.example.com | success | rc=0 >>
21:16:24 up 93 days,  9:17,  4 users,  load average: 0.08, 0.03, 0.05

How to Run Playbooks?

For more complex scenarios, we can create playbooks or roles and ask Ansible to run them.

When we run a playbook or a role, Ansible first gathers a lot of useful facts about remote systems it manages. These facts can later be used in playbooks, templates, config files, etc.

We can use the ansible-playbook CLI tool to run playbooks.

How to Run Playbooks?

Given an inventory file like this:

     [dbservers]

     db.example.com

We may have a playbook that connects to hosts in dbservers group, executes the uptime command, and then displays that command's output.

Now let's create a simple playbook to see how it can be ran.

How to Run Playbooks?

Here is the main.yml file for the playbook we just described:


---

- hosts: dbservers

  tasks:
  - name: retrieve the uptime
    command: uptime
    register: command_result # Store this command's result in this variable

  - name: Display the uptime
    debug: msg="{{ command_result.stdout }}" # Display command output here

How to Run Playbooks?

Now we can run the playbook and see it's output here:


gulcin@apathetic ~ $ ansible-playbook -i hosts.ini main.yml

PLAY [dbservers] **************************************************************

GATHERING FACTS ***************************************************************
ok: [db.example.com]

TASK: [retrieve the uptime] ***************************************************
changed: [db.example.com]

TASK: [Display the uptime] ****************************************************
ok: [db.example.com] => {
    "msg": " 15:54:47 up 3 days, 14:32,  2 users,  load average: 0.00, 0.01, 0.05"
}

PLAY RECAP ********************************************************************
db.example.com             : ok=3    changed=1    unreachable=0    failed=0

Playbook control

--tags / --skip-tags
Runs or skips tasks with specified tags
--limit
Manages only specified hosts or groups
--start-at-task
Start execution from a specific task
--step
Executes step-by-step and asks for confirmation to continue
--check / --diff / --syntax-check
Runs a playbook without actually executing anything

Playbook loops

N.B! register assigns result of a task to a variable.

Ansible supports iterating over facts with loop statements:

with_items: runs the task using the provided array variable
with_indexed_items: runs the task using the provided array variable and adds item index
with_flattened: runs the task using merged variables
with_file: runs the task using given file's contents

Some other with_type functions:
with_together, with_nested, with_subelements, with_sequence,
with_random_choice, with_first_found, with_lines..

with_items


- vars:
    postgresql_users:
     - { name: gulcin, password: apathetic, roles: "SUPERUSER,LOGIN" }
     - { name: foo, password: foobarbaz, roles: "CREATEDB,LOGIN" }

- name: Ensure PostgreSQL users are present
  postgresql_user:
    state: present
    name: "{{ item.name }}"
    password: "{{ item.password }}"
    role_attr_flags: "{{ item.roles }}"
  with_items: postgresql_users

Executes the task for each element in given array
Current element value can be accessed using {{ item }}

with_indexed_items


- name: Ensure standby hosts have updated IP addresses
  add_host:
    name: "postgresql-standby-{{ item.0 + 1 }}" # e.g., postgresq-standby-1
    groups: "postgresql-standbys"
    ansible_ssh_host: "{{ item.1.tagged_instances[0].public_ip }}"
    ansible_ssh_user: "{{ ec2_ssh_user }}"
  with_indexed_items: ec2_standbys.results

Executes the task for each element in given array
Current element index can be accessed using {{ item.0 }}
Current element value can be accessed using {{ item.1 }}

with_flattened


- name: Wait for SSH to become ready
  wait_for: host="{{ item.tagged_instances[0].public_ip }}" port=22 timeout=320 state=started
  with_flattened:
    - [ ec2_master ] # ec2_master is a single item. 
                     # By putting [ and ], we convert it into an array
    - ec2_standbys.results

Merges given arrays into a single one
Executes the task for each element in merged array
Current element value can be accessed using {{ item }}

with_file

Executes the task for the given file
Contents of the file can be accessed using {{ item }}


- name: Ensure the SSH key is present
  ec2_key:
    state: present
    region: "eu-central-1"      # Amazon's Frankfurt Datacenter
    name: "postgresql-key"      # we will use this name later
    key_material: "{{ item }}"
  with_file: "id_rsa.pub"       # reads contents of this file to use in "key_material"

Advanced playbook features

For some advanced operations, we can use following in our tasks:

failed_when (condition) / changed_when (condition):
Lets tasks to know when they're going to fail or change based on the given condition. It might be stdout or stderr, or text to log.
until (condition):
Repeats the task until the given condition is satisfied.
ignore_errors:
Doesn't fail on errors
run_always:
Runs even in dry-run / check / diff modes

Advanced playbook features

serial:
Runs the task on a single host at a single time
pre_tasks / post_tasks:
Defines names of tasks that are executed before or after the current task. These referenced tasks are not either main tasks or roles, but they setup the environment to be ready for that.
delegate_to (host):
This tells Ansible to execute the current task on the given host instead of the current host that is defined by the inventory.

For example, if we want to start a virtual machine, we can't do that on that machine, because it is not running. We can simply delegate this task to localhost (127.0.0.1).

Postgres modules

My blog post covers Ansible PostgreSQL modules: Ansible Loves PostgreSQL

You can find the related Postgres modules' examples on my github repo.

postgresql_db: Creates/removes a given db.
postgresql_ext: Adds/removes extensions from a db.
postgresql_user: Adds/removes users and roles from a db.
postgresql_privs: Grants/revokes privileges on db objects (table, sequence, function, db, schema, language, tablespace, group).
postgresql_lang: Adds, removes or changes procedural languages with a db.

postgresql_db

Creates/removes a given db. In Ansible terminology, it ensures that a given db is present or absent.

Required param: name
Connection params: login_host, port,login_user, login_password
Important param: state (present or absent)

Create db module_test

Remove db module_test

postgresql_ext

Adds/removes extensions from a db. PostgreSQL offers a wide range of extensions which are extremely helpful.

Mandatory params: db and name (for extension)
Other params: state and connection params (login_host, port) as in postgres_db

postgresql_user

Adds/removes users and roles from a db.

Parameters: name (mandatory), state, connection params, db, password, priv, role_attr_flags ([NO]SUPERUSER, [NO]CREATEDB..)

Creates user

Alters role and gives login and createdb

postgresql_privs

Grants/revokes privileges on db objects (table, sequence, function, db, schema, language, tablespace, group).

Required params: database, roles
Important opt. params: type (i.e. table,sequence,function), objs, privs (i.e. ALL; SELECT, UPDATE, INSERT)

postgresql_lang

Adds, removes or changes procedural languages with a db.

One of the very powerful features of PostgreSQL is its support for virtually any language to be used as a procedural language.

Params: lang (mandatory), db, connection params, state, cascade, trust

AWS modules

Ansible provides more than 50 modules for provisioning and managing Amazon Web Services resources.

http://docs.ansible.com/ansible/list_of_cloud_modules.html

We have modules for:

EC2 (Virtual machine instances)
AMI (Virtual machine images)
ELB (Load balancers)
VPC (Networking infrastructure)
EBS (Storage infrastructure)
...

AWS modules

Ansible AWS modules are developed using Boto.

Boto is a Python package that provides interfaces to Amazon Web Services.

For more complex workflows, we can use the ec2.py script to discover dynamic inventories for our playbooks:

Download ec2.py and ec2.ini from github.com/ansible/ansible/tree/devel/contrib/inventory
cp ec2.py /etc/ansible/hosts
cp ec2.ini /etc/ansible
chmod +x /etc/ansible/hosts

ec2_key

Manages SSH key pairs that we use to connect to our instances.

Parameters:

state: present or absent
name: name of the SSH key
key_material: contents of the public SSH key
region: AWS region to deploy key to (Ireland, Frankfurt, etc)
wait: wait for the state 'available' before returning

- name: Ensure the SSH key is present
  ec2_key:
    state: present
    region: "eu-central-1"      # Amazon's Frankfurt Datacenter
    name: "postgresql-key"      # we will use this name later
    key_material: "{{ item }}"
  with_file: "id_rsa.pub"       # reads contents of this file to use in "key_material"

ec2_vpc

Manages virtual private clouds & networking.

state, region, wait: ... same as previous module ...
cidr_block: CIDR representing this cloud (e.g., 10.0.0.0/20)
internet_gateway: is internet access enabled?
resource_tags: A list of tags in form:
{ tag1: value1, tag2: value2, ... }
subnets: An array of subnets in form:
[{ cidr: BLOCK, az: ZONE, resource_tags: [ TAGS ] }]
route_tables: An array of routes in form:
[{ subnets: [ BLOCKS ], routes: [ ROUTES ], resource_tags: [ TAGS ] }]

ec2_vpc


- name: Ensure VPC is present
  ec2_vpc:
    state: present           # Create a VPC
    region: "eu-central-1"   # Amazon's Frankfurt Datacenter
    cidr_block: "10.0.0.0/20"
    subnets:
      - cidr: "10.0.0.0/20"
    route_tables:
      - subnets: [ "10.0.0.0/20" ]
        routes:
          - dest: 0.0.0.0/0  # Everywhere
            gw: igw          # Name of Amazon's default gateway
    internet_gateway: yes    # We want our instances to connect to internet
    wait: yes                # Wait until the VPC is ready
    resource_tags: { environment: "production", tier: "DB" }
  register: vpc              # Store task results in this variable to use in later tasks

ec2_group

Manages firewalls.

state, region, name: ... same as previous module ...
vpc_id: identifier of the VPC
description: Text for purpose of this security group
rules: An array of security rules in form:
[{ proto: protocol, from_port: port, to_port: port, cidr_ip: block }]

protocol can be tcp, udp, icmp, any, ...
port can be a port number or -1 (any port)
block should be a CIDR block (e.g., 10.0.0.4/32)

ec2_group


- name: Ensure the PostgreSQL security group is present
  ec2_group:
    state: present
    vpc_id: "{{ vpc.vpc_id }}" # Obtain recently provisioned VPC's ID
    region: "eu-central-1"     # Amazon's Frankfurt Datacenter
    name: "PostgreSQL"
    description: "Security group for PostgreSQL database servers"
    rules:
      - proto: tcp             # Allow SSH access from anywhere
        from_port: 22
        to_port: 22
        cidr_ip: 0.0.0.0/0
      - proto: all             # Allow everything from machines in the subnet
        from_port: -1          # Any port
        to_port: -1            # Any port
        cidr_ip: "10.0.0.0/20"
  register: security_group     # Use results of this task later

ec2

Creates, deletes, starts, stops, restarts virtual machine instances.

state, region, name, wait: ... same as previous modules ...
instance_tags: same as resource_tags
group_id: identifier of the security group
vpc_subnet_id: identifier of the VPC subnet
key_name: SSH key name
instance_type: type of the instance (e.g., c3.large, t2.micro, etc.)
private_ip: IP address in the subnet we provided
assign_public_ip: should the instance have a public IP address?
volumes: an array of disk volumes in form:
{[ device_name: NAME, volume_size: SIZE, delete_on_termination: FLAG ]}

ec2


- name: Ensure master EC2 instances & volumes are present
  ec2:
    assign_public_ip: yes # our machines should access internet
    instance_tags: { Name: "pg-master {{ postgresql_master_ip }}", environment: "production" }
    exact_count: 1
    count_tag:
      Name: "pg-master {{ postgresql_master_ip }}"
    image: "ami-accff2b1"                     # Ubuntu Server 14.04
    instance_type: "t2.micro"
    group_id: "{{ security_group.group_id }}"
    key_name: "postgresql-key"
    private_ip: "{{ postgresql_master_ip }}"  # e.g., 10.0.0.5
    region: "eu-central-1"                    # Amazon's Frankfurt Datacenter
    volumes:
      - device_name: /dev/sdc                                                                                                       
        volume_size: "50"                     # 50GB disk volume
        delete_on_termination: false          # Volume should remain after instance is deleted
    vpc_subnet_id: "{{ vpc.subnets[0].id }}"
    wait: yes                                 # Wait until the instance becomes running
  register: ec2_master                        # Register results here so we can use them later