Next Generation Data Center
Agenda
- About Me
- Research Notes
- Overview
- Virtualization
- Cloud Computing
- Buzzword Primer
- Solution Proposal
- OpenStack/Ceph
About Me
- 10+ years of experience in systems engineering automation
- 5+ years of experience in full stack web development
- for 5 years part of the systems engineering team at Puzzle ITC, maintaining 200+ bare metal and virtual servers across three datacenters
- for 2 years part of the development team at Atizo, a community driven, crowdsourcing platform with 15k users running on multiple cloud providers
- for 1 year part of the private cloud engineering team at SWISS TXT for building a platform for SRF/RTS/RSI across two datacenters
Research Notes
- 40h Research
- 12h Presentation preperation
- 12h Presentation design
- Informal exchange with lead srchitect of new Swisscom Application Cloud
Overview
WORKLOAD
COMPUTE
STORAGE
NETWORK
RESILIENCY
AUTOMATION
scalability
availability
redundancy
disaster recovery
RTO/RPO
distributed systems
quorum
split-brain
CAP
partition tolerance
active/active
active/passive
metro/stretch-cluster
highly available
highly automated
stateless
stateful
imutable
ephemeral
shift & load
network function virtualization
virtual network function
software defined networking
converged network
leaf-spine architecture
QoS
global loadbalancing
anycast/dns
virtualization
containers
cloud computing
self-service
multi-tenancy
hyperconvergence
SAN/NAS
virtualization
shared storage
local storage
replication
scale-out storage
configuration mangement
software defined
IaaS
Paas
WORKLOAD
COMPUTE
STORAGE
NETWORK
RESILIENCY
AUTOMATION
network function virtualization
virtual network function
software defined networking
converged network
leaf-spine architecture
QoS
global loadbalancing
anycast/dns
virtualization
containers
cloud computing
self-service
multi-tenancy
hyperconvergence
SAN/NAS
virtualization
shared storage
local storage
replication
scale-out storage
Virtualization
Virtualization
COMPUTE
NETWORK
VIRTUALIZATION
VIRTUALIZATION
VIRTUALIZATION
STORAGE
Compute Virtualization
Compute Virtualization Hypervisors
Compute Virtualization Container Engines
Storage Virtualization
Storage Virtualization Solutions
Network Virtualization
Network Virtualization
Network Virtualization Solutions
OpenStack Neutron
WORKLOAD
COMPUTE
STORAGE
NETWORK
RESILIENCY
AUTOMATION
virtualization
cloud computing
self-service
multi-tenancy
hyperconvergence
configuration mangement
software defined
IaaS
Paas
Cloud Computing
Cloud Computing
Cloud Computing
Service Models
Cloud Computing
Service Models
IaaS
COMPUTE
NETWORK
VIRTUALIZATION
VIRTUALIZATION
VIRTUALIZATION
IAAS
STORAGE
AUTOMATION
IaaS
-
unified management of virtual resources
- compute, storage and network
- acquire resources though a single API or UI
- highly automated resource aquisition
- highly abstracted
- very short provisioning times
- multi-tenancy
IaaS Solutions
for private clouds
PaaS
COMPUTE
NETWORK
VIRTUALIZATION
VIRTUALIZATION
VIRTUALIZATION
IAAS
PAAS
STORAGE
AUTOMATION
AUTOMATION
PaaS
- Platform Services as Resources
- Application Servers
- php, java, python, ...
- Databases
- mysql, postgresql, ...
- Queues and Indexes
- RabbitMQ, Elasticsearch, ...
PaaS Solutions
for private clouds
Buzzword Primer
a.k.a. Buzzword-Bingo a.k.a. Bullshit-Bingo
Workload
- highly available
- highly automated
- immutable
- ephemeral
- persistent
- stateful
- stateless
- shift & load
Compute
- virtualization
- cloud computing
- multi-tenancy
- self-service
- hyperconvergence
Storage
- SAN/NAS
- virtualization
- shared storage
- local storage
- replication
- scale-out storage
Scale-out Storage
Network
- network function virtualization
- virtual network function
- software defined networking
- converged network
- leaf-spine architecture
- QoS
- global loadbalancing
- bgp anycast
Leaf-Spine Architecture
Leaf-Spine Architecture
Leaf-Spine Architecture
Global Load Balancing
- DNS
- BGP anycast
Automation
- configuration mangement
- software defined
- IaaS
- Paas
Resiliency
- scalability
- availability
- redundancy
- disaster recovery
- RTO/RPO
- distributed systems
- quorum
- split-brain
- CAP
- partition tolerance
- active/active
- active/passive
-
metro/stretch-cluster
distributed systems are hard!
Resiliency
where is the state?
Resiliency
only distributed state is hard!
CAP
CAP
RPO/RTO
- Disaster Recovery Metrics
- Recovery Point Objective
- How much data is lost?
- Recovery Time Objective
- How long does it take to recover?
Quorum/Split-Brain/Partition
- Metrics to determine if a distributed is healthy
- What happens if a distributed system falls apart?
- How does operation continue?
- What strategies exist to rebuild the system?
metro/stretch-cluster
- distributed systems that span datacenters
- distributed systems are already hard
- even harder across datacenters
Solution Proposal
Technologies
- OpenStack Cloud Computing Platform
- Ceph Storage Platform
Text
Architecture
- One independent OpenStack platform per datacenter
- 3 types of nodes: mgmt/compute/storage
- Platform is built with the same automation principles as the new MSP environment
Harware Options
- Three types of nodes: mgmt/compute/storage
- mgmt: open stack admin nodes
- mgmt: openstack network nodes
- mgmt: ceph admin nodes
- compute: openstack compute nodes
- storage: shared storage cluster nodes
Storage Strategy
- Options for vm storage:
- shared storage & attached ephemeral local storage
- automated provisoning & workload resiliency
- shared storage
- automated provisioning; no workload resiliency
- manual provisioning
- shared storage & attached ephemeral local storage
Resiliency Strategies
- Resiliency handled in workload
- phase one: enable fast disaster recoveries
- phase two: enable active/passive modes
- phase three: enable live traffic handling across datacenters
- implement as few distributed systems as possible
- especially across datacenters
- strategies on how to handle distributed system partitions must be defined and tested during implementation
OpenStack
OpenStack
OpenStack Statistics
OpenStack Architecture
OpenStack Architecture
OpenStack Mulit-Tenancy
OpenStack Networking
OpenStack Networking
OpenStack and the Transformation of the Data Center
http://www.slideshare.net/lewtucker/open-stack-atlanta-2014tucker
Ceph
Ceph
- completely distributed storage cluster
- interfaces for object, block and file-level storage
- clients talks directly to storage nodes
- no single point of failure
- fault-tolerant
- self-healing
- self-managing
- runs on commodity hardware
- single cluster: just add disks and nodes to scale out
Ceph Architecture
Ceph Architecture
Ceph Architecture
Ceph Dashboard
Ceph Intro
Next Generation Data Center
By Simon Josi
Next Generation Data Center
- 877