Next Generation Data Center

Agenda

  • About Me
  • Research Notes
  • Overview
  • Virtualization
  • Cloud Computing
  • Buzzword Primer
  • Solution Proposal
  • OpenStack/Ceph

About Me

  • 10+ years of experience in systems engineering automation
  • 5+ years of experience in full stack web development
  • for 5 years part of the systems engineering team at Puzzle ITC, maintaining 200+ bare metal and virtual servers across three datacenters
  • for 2 years part of the development team at Atizo, a community driven, crowdsourcing platform with 15k users running on multiple cloud providers
  • for 1 year part of the private cloud engineering team at SWISS TXT for building a platform for SRF/RTS/RSI across two datacenters

Research Notes

  • 40h Research
  • 12h Presentation preperation
  • 12h Presentation design
  • Informal exchange with lead srchitect of new Swisscom Application Cloud

Overview

WORKLOAD

COMPUTE

STORAGE

NETWORK

RESILIENCY

AUTOMATION

scalability
availability
redundancy
disaster recovery
RTO/RPO
distributed systems
quorum
split-brain
CAP
partition tolerance
active/active
active/passive
metro/stretch-cluster

 

highly available
highly automated
stateless
stateful
imutable
ephemeral
shift & load

 

network function virtualization
virtual network function
software defined networking
converged network
leaf-spine architecture
QoS
global loadbalancing
anycast/dns

virtualization
containers
cloud computing
self-service
multi-tenancy
hyperconvergence​

SAN/NAS
virtualization
shared storage
local storage
replication
scale-out storage

 

configuration mangement
software defined
IaaS
Paas

 

WORKLOAD

COMPUTE

STORAGE

NETWORK

RESILIENCY

AUTOMATION

network function virtualization
virtual network function

software defined networking
converged network

leaf-spine architecture
QoS
global loadbalancing
anycast/dns

virtualization
containers
cloud computing
self-service
multi-tenancy
hyperconvergence

 

SAN/NAS
virtualization
shared storage
local storage
replication
scale-out storage

 

Virtualization

Virtualization

COMPUTE

NETWORK

VIRTUALIZATION

VIRTUALIZATION

VIRTUALIZATION

STORAGE

Compute Virtualization

Compute Virtualization Hypervisors

Compute Virtualization Container Engines

Storage Virtualization

Storage Virtualization Solutions

Network Virtualization

Network Virtualization

Network Virtualization Solutions

OpenStack Neutron

WORKLOAD

COMPUTE

STORAGE

NETWORK

RESILIENCY

AUTOMATION

virtualization
cloud computing
self-service
multi-tenancy
hyperconvergence

 

configuration mangement
software defined

IaaS
Paas

 

Cloud Computing

Cloud Computing

Cloud Computing
Service Models

Cloud Computing
Service Models

IaaS

COMPUTE

NETWORK

VIRTUALIZATION

VIRTUALIZATION

VIRTUALIZATION

IAAS

STORAGE

AUTOMATION

IaaS

  • unified management of virtual resources
    • compute, storage and network
  • acquire resources though a single API or UI
  • highly automated resource aquisition
  • highly abstracted
  • very short provisioning times
  • multi-tenancy

IaaS Solutions

for private clouds

PaaS

COMPUTE

NETWORK

VIRTUALIZATION

VIRTUALIZATION

VIRTUALIZATION

IAAS

PAAS

STORAGE

AUTOMATION

AUTOMATION

PaaS

  • Platform Services as Resources
  • Application Servers
    • php, java, python, ...
  • Databases
    • mysql, postgresql, ...
  • Queues and Indexes
    • RabbitMQ, Elasticsearch, ...

PaaS Solutions

for private clouds

Buzzword Primer

a.k.a. Buzzword-Bingo a.k.a. Bullshit-Bingo

Workload

  • highly available
  • highly automated
  • immutable
  • ephemeral
  • persistent
  • stateful
  • stateless
  • shift & load

Compute

  • virtualization
  • cloud computing
  • multi-tenancy
  • self-service
  • hyperconvergence

Storage

  • SAN/NAS
  • virtualization
  • shared storage
  • local storage
  • replication
  • scale-out storage

Scale-out Storage

Network

  • network function virtualization
  • virtual network function
  • software defined networking
  • converged network
  • leaf-spine architecture
  • QoS
  • global loadbalancing
  • bgp anycast

Leaf-Spine Architecture

Leaf-Spine Architecture

Leaf-Spine Architecture

Global Load Balancing

  • DNS
  • BGP anycast

Automation

  • configuration mangement
  • software defined
  • IaaS
  • Paas

Resiliency

  • scalability
  • availability
  • redundancy
  • disaster recovery
  • RTO/RPO
  • distributed systems
  • quorum
  • split-brain
  • CAP
  • partition tolerance
  • active/active
  • active/passive
  • metro/stretch-cluster

distributed systems are hard!

Resiliency

where is the state?

Resiliency

only distributed state is hard!

CAP

CAP

RPO/RTO

  • Disaster Recovery Metrics
  • Recovery Point Objective
    • How much data is lost?
  • Recovery Time Objective
    • How long does it take to recover?

Quorum/Split-Brain/Partition

  • Metrics to determine if a distributed is healthy
  • What happens if a distributed system falls apart?
    • How does operation continue?
    • What strategies exist to rebuild the system?

metro/stretch-cluster

  • distributed systems that span datacenters
  • distributed systems are already hard
  • even harder across datacenters

Solution Proposal

Technologies

  • OpenStack Cloud Computing Platform
  • Ceph Storage Platform

Text

Architecture

  • One independent OpenStack platform per datacenter
  • 3 types of nodes: mgmt/compute/storage
  • Platform is built with the same automation principles as the new MSP environment

Harware Options

  • Three types of nodes: mgmt/compute/storage
    • mgmt: open stack admin nodes
    • mgmt: openstack network nodes
    • mgmt: ceph admin nodes
    • compute: openstack compute nodes
    • storage: shared storage cluster nodes

Storage Strategy

  • Options for vm storage:
    • shared storage & attached ephemeral local storage
      • automated provisoning & workload resiliency
    • shared storage
      • automated provisioning; no workload resiliency
      • manual provisioning

Resiliency Strategies

  • Resiliency handled in workload
    • phase one: enable fast disaster recoveries
    • phase two: enable active/passive modes
    • phase three: enable live traffic handling across datacenters
  • implement as few distributed systems as possible
    • especially across datacenters
  • strategies on how to handle distributed system partitions must be defined and tested during implementation

OpenStack

OpenStack

OpenStack Statistics

OpenStack Architecture

OpenStack Architecture

OpenStack Mulit-Tenancy

OpenStack Networking

OpenStack Networking

OpenStack and the Transformation of the Data Center

http://www.slideshare.net/lewtucker/open-stack-atlanta-2014tucker

Ceph

Ceph

  • completely distributed storage cluster
  • interfaces for object, block and file-level storage
  • clients talks directly to storage nodes
  • no single point of failure
  • fault-tolerant
  • self-healing
  • self-managing
  • runs on commodity hardware
  • single cluster: just add disks and nodes to scale out

Ceph Architecture

Ceph Architecture

Ceph Architecture

Ceph Dashboard

Ceph Intro

Next Generation Data Center

By Simon Josi

Next Generation Data Center

  • 877