DevOps

Technologies for Tomorrow

Rúben dos Santos Barros

Summary

  • Traditional Software Development
  • DevOps
  • Research Question
  • Goals
  • The Road So Far
  • Conclusions

Development

Traditional Software Development

Test/QA

Operations

Need for Change

Operations goals:

  • Server uptime;
  • Application response time.

Traditional Software Development

Development goals:

  • Faster development.           

Fear of Change

Source: ITSM/Serena.com 2012 study of IT professionals

Traditional Software Development

75% says that

Ops is a Roadblock

72% says that

Dev is not Supportive

What if we...

  • Were faster in time-to-market by deploying more often?
  • Didn't had to choose between stability and new features?
  • Could increase our effectiveness?

DevOps

August 2008

Toronto, Canada

 

Agile Infrastructure

October 2009

Ghent, Belgium

 

Developers + System Administrators

Andrew Shafer

Patrick Debois

DevOps (Development and Operations) describes a culture in which business owners and the development, operations, and quality assurance departments collaborate to deliver software in a continuous manner and encourages practices to evolve to meet that culture focusing on business instead of departmental objectives. [Sharma and Coyne, 2015].

DevOps

DevOps = New Mindset + New Tools + New Skills

DevOps

Automation

  • Automate Code Testing;

Automate Everything

  • Automate Workflows;
  • Automate Infrastructure.

Source:  Stephen Elliot, 2014 - DevOps and the Cost of Downtime: Fortune 1000 Best Practice Metrics Quantified 

DevOps metrics from 20+ Fortune 1000 organizations:

  • Infrastructure failure costs $100,000 per hour;
  • Critical application failure costs $500,000 to $1 million per hour;
  • The number of deployments per month is expected to double in two years;
  • During an application’s development, testing, deployment, and operations life cycle 25% of spent time is considered wasteful and unnecessary.
  • DevOps-led projects will accelerate the delivery of functionalities to the customer by 15–20%.

IT operations statistics

Research Question

What information can we capture and how can we formalize it so that we can improve how software teams practice DevOps?

Goals

  • Research DevOps and DevOps tools;
  • Identify categories to aggregate DevOps tools in;
  • Identify key functionalities for each category;
  • Discuss forces influencing the adoption of the available tools;
  • Identify requirements when starting a new project which will influence technological decisions;
  • Cooperate with Software Development Teams;
  • Elaborate a DevOps Knowledge Map;
  • Validate the captured information;
  • Publish conclusions.

The Road So Far

DevOps Categories

Cloud

Cloud

  • Cloud Computing allows companies to consume computing resources as a utility;
  • Companies no longer require  investment in hardware and people operating it;
  • The same server can serve multiple applications depending the work hours of a country or continent.
AWS Azure Google CP Digital Ocean
Europe ✔ * 2 ✔ * 2 ✔ * 3
Asia ✔ * 4 ✔ * 7
North America ✔ * 4 ✔ * 6 ✔ * 6 ✔ * 3
South America
Africa
Oceania ✔ * 2
MySQL
PostgreSQL
SQL Server
MongoDB ✘ (DynamoDB) ✘ (Cloud BigTable)

About Cloud Providers

AWS Azure Google CP Digital Ocean
Free 1 year (t2.micro) 60 minutes/CPU daily
Linux VM (1Ghz, 1GB RAM) $10.08 (monthly) $14.40 (0.75GB RAM) $4.03 (shared CPU, 0.6GB RAM) $10.80
Linux VM (2Ghz, 4GB RAM) $40.32 (monthly) $86.40 (3.5GB RAM) $50.40 (7.5GB RAM) $43.20
Linux VM (4Ghz, 16GB RAM) $190.08 (monthly) $210.24 (14GB RAM) $100.80 (15GB RAM) $85.68 (8GB RAM)
Windows VM (1Ghz, 1GB RAM) $13.68 $12.96 (0.75GB RAM) $18.43 (shared CPU, 0.6GB RAM)
Windows VM (2Ghz, 4GB RAM) $54.72 $108 (3.5GB RAM) $108 (7.5GB RAM)
Windows VM (4Ghz, 16GB RAM) $351.36 $371.52 (14GB RAM) $216 (15GB RAM)
Object Storage S3 - $0.03/GB Blob - $0.08/GB Disk - $0.04/GB

About Cloud Providers

Infrastructure Management

Infrastructure Management

Xen KVM VMware vSphere
x86
x86_64
ARM
Linux ✔ (Host - special distro/XenServer)
Windows ✔ (Guest)
Solaris ✔ (Guest)
Full Virtualization
Hypervisor ✔ (vSphere)
Paravirtualization ✔ (ESXi Hypervisor)
Live Migration
Used by Amazon, Linode ? Adobe, Vodafone

Scheduling

0 6 * * 1-5
"At 06:00 on Mon, Tue, Wed, Thu and Fri."

5 0 * 8 *
"At 00:05 every day in Aug."

Scheduling

  • Cron
  • Chronos (Mesos)

UNIX's Cron Example:

Automation

Automation - Provisioning

The development environment should be as similar as possible to the production environment

  • Ubuntu 16.04 LTS => python 3.5
  • > Ubuntu 15.10  =>  python >3.4

Automation - Provisioning

Automation - Provisioning

Vagrant.configure(2) do |config|
  config.vm.box = "hashicorp/precise64"
  config.vm.provision :shell, path: "bootstrap.sh"
  config.vm.network :forwarded_port, guest: 80, host: 4567

  config.vm.provider :virtualbox do |vb|
    vb.customize [
      "modifyvm", :id,
      "--cpuexecutioncap", "50",
      "--memory", "256",
    ]
  end

  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "puppet/manifests"
    puppet.manifest_file = "site.pp"
    puppet.module_path = "puppet/modules"
  end
end

Automation - Test

Automation - Test

E2E Tools

Unit Testing Tools

TESTNG

Automation - Test

public class CalculatorTest {

  private Calculator classUnderTest;

  @Test
  public void testSubstract() {
    assertEquals("substract", 2, classUnderTest.substract(5, 3));
  }

  @Test
  public void testMultiply() {
    assertEquals("multiply", 56, classUnderTest.multiply(7, 8));
  }
}

Automation - Test

describe('adding user to application', function() {

  var random = browser.params.random;

  it('should fill and submit a user', function(){
    element(by.id('a_user_dropdown')).click();
    element(by.id('a_user_add')).click();

    expect(browser.getCurrentUrl()).toContain('users/add');

    element(by.model('form_user.name')).sendKeys('user' + random);
    element(by.model('form_user.username')).sendKeys('user'+ random);
    browser.params.handleSelect2('form_user.country', 'Portugal');
    element(by.model('form_user.email')).sendKeys('user' + random + '@gmail.com');
    element(by.model('form_user.password')).sendKeys('user' + random);
        
    element(by.id('btn_user_submit')).click();

    expect(browser.getCurrentUrl()).toEqual(browser.params.url + '/#/users');
  });
});

Automation - Deploy

"If anything can go wrong, it will go wrong." - Murphy's Law

Manual deployments are error prone.

Anyone in the team is able to deploy software.

Engineers spend more time developing.

Deploying to somewhere new is a matter of configuration.

This way we can release more often!

Automation - Deploy

---
- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum: name=httpd state=latest
  - name: write the apache config file
    template: src=/srv/httpd.j2 dest=/etc/httpd.conf
    notify:
    - restart apache
  - name: ensure apache is running (and enable it at boot)
    service: name=httpd state=started enabled=yes
  handlers:
    - name: restart apache
      service: name=httpd state=restarted
[webservers]
webserver1
webserver2
webserver3

[dbservers]
dbserver1
dbserver2

hosts.example

site.yml - Ansible Script

Automation - Roll Back

puppi rollback <project_name>

Puppi command - Puppet Module

deploy_revision 'name' do
  ...
  action :rollback
end

Chef Recipe

Automation - Applications

Chef Puppet Ansible Atlas
AWS
Azure
Google CP
OpenStack
Script Language Ruby Puppet Specific YAML K\V (Json)
Node Agent ✘ (SSH)
Middleman Server ?
Push Commands
Free plan ✔ (5 nodes) ✔ (10 nodes) ✔ (10 nodes) - no WebUI ✘ (Packer + Terraform + Consul)
Basic Plan (monthly) 1,440$ (20 nodes) 120$ (node) 5,000$ (100 nodes) ?
Used by Facebook Google, Harvard Apple, NASA

About Automation Tools

Monitoring

Monitoring - Servers/VMs

Information about:

  • CPU;
  • RAM;
  • Disk space;

 

  • Network traffic;
  • Processes;
  • Services;

Monitoring - Applications

  • Application performance impacts:
    • SQL statements;
    • Specific code segments;
  • ​Flag key transactions;

Monitoring - R.U.M.

  • Geolocation and load times of users;
  • User's movement through the application;
  • Find application bottlenecks;
  • Application performance in real time;
  • Javascript errors.

Monitoring - Applications

RUXIT New Relic Status Cake Pingdom
North America ✔ * 7 ✔ * 7 ✔ * 9 ✔ * 3
South America ✔ * 3
Europe ✔ * 3 ✔ * 3 ✔ * 17 ✔ * 2
Africa
Asia ✔ * 2 ✔ * 4
Oceania ✔ * 2
Application Monitoring ≃ ✔
Mobile Monitoring
Node Monitoring
R.U.M.
Free account ✔ (1 day rec's) 
HTTP Health Check 0.20$ (100 checks) 99$ (10k checks) 20€ (43k checks) 13€ (430k checks)
R.U. sessions 0.20$ (500 sessions) 200$ (500k pageviews) Free  (100k pageviews)
Node/App Pricing =0.20$ (node/hour)= Free/0.20$(hour)
Mobile Pricing 1499$ (p/ app)

About Monitoring Tools

Supervision

Supervision

"In the early days of Reddit, we didn’t really have any crash protection. I used to have to sleep with my laptop and I would wake up every couple of hours and see if Reddit was working, and restart it. It was the worst feeling in the world." - Steve Huffman, Reddit Founder's

  • Boot Application/Service at Boot;
  • Ensure Application/Service is Running;
  • Restart Application/Service if it fails.

Supervision

Upstart systemd Supervisor Circus
Act as UNIX's init
Log rotation ✔ (logrotate + copytruncate)
Host Ubuntu Redhat/Fedora
Start several instances of a program
Script Language Configuration + Shell Configuration+ Shell Configuration Python (WSGI)
HTTP Server

Loggers

Loggers

2016-06-14 17:44:15,814 DEBUG TcpListener - New connection accepted
2016-06-14 17:44:15,820 ERROR HttpServerConnection - Aborting encrypted 
connection to hostname.pt/192.168.1.1:46650 due to 
[SSLHandshakeException:Client requested protocol SSLv3 not enabled or not supported] -> 
[SSLHandshakeException:Client requested protocol SSLv3 not enabled or not supported]
2016-06-14 17:44:15,820 DEBUG HttpServerConnection - Connection was Aborted, awaiting TcpConnection termination...
2016-06-14 17:44:15,820 DEBUG HttpServerConnection - TcpConnection terminated, stopping

Loggers

Splunk Loggly LogStash (ELK)
Free Plan ✔ (500Mb/day) ✔ (200Mb/day + 7 days retention) Open Source
Basic Plan $170/mo (1GB/day) $55/mo (1GB/day + 7 days retention) $30 (1GB/day + 7 days retention - logit.io)
In-House ✔ (Splunk Enterprise)
Cloud based ✔ (Splunk Cloud) ✔ (Outsource)
Target Medium/Big Enterprise Find and fix operational problems Small Companies

Service Discovery

Service Discovery

ZooKeeper etcd Consul
Depends On Third-party tools (Registrator + confd)
Client-side Server Active Connection + Keep-Alive Gossip Protocol
Node Health Check Ping HTTP 200, RAM and Disk Check
Built with Java Go Go
Embedded Service Discovery System

Validation

Evaluated Metrics

  • Time testing the product?
  • Time to go from development to deploy?
  • Time to deploy the product?
  • Downtime for the update?
  • Deploys per week/month?
  • Time to notice an error?
  • Time to roll back an update?
  • Configuration issues in production?

DevOps

Technologies for Tomorrow

Rúben dos Santos Barros

DevOps - Long Presentation

By xumbino

DevOps - Long Presentation

  • 1,269