Long time software developer.
We'll be looking at Rancher, a recently released Docker deployment tool. We'll also be briefly looking at RancherOS, one of the operating systems you can use to run Rancher.
We need a tool that can help with our Docker deployments which accommodates our scheduling, monitoring and self-service needs.
We'll have some slides that provide an overview but most of the presentation will be done via a live demonstration.
What It Does
What It Does
What It Does
What It Does
All The Pieces
RancherOS is the smallest, easiest way to run Docker in production. Everything in RancherOS is a container managed by Docker. This includes system services such as udev and rsyslog. RancherOS is dramatically smaller than most traditional operating systems, because it only includes the services necessary to run Docker. This keeps the binary download of RancherOS to less than 30 MB. The size may fluctuate as we adapt to Docker. By removing unnecessary libraries and services, requirements for security patches and other maintenance are dramatically reduced. This is possible because with Docker, users typically package all necessary libraries into their containers.
Another way in which RancherOS is designed specifically for running Docker is that it always runs the latest version of Docker. This allows users to take advantage of the latest Docker capabilities and bug fixes.
Everything in RancherOS is a Docker container. We accomplish this by launching two instances of Docker. One is what we call System Docker, which runs the latest Docker daemon as PID 1, the first process on the system. All other system services, like ntpd, rsyslog, and console, are running in Docker containers. System Docker replaces traditional init systems like systemd, and can be used to launch additional system services.
System Docker runs a special container called User Docker, which is another Docker daemon responsible for managing all of the user’s containers. Any containers that you launch as a user from the console will run inside this User Docker. This creates isolation from the System Docker containers, and ensures normal user commands don’t impact system services.
We created this separation because it seemed logical and also it would really be bad if somebody did docker rm -f $(docker ps -qa) and deleted the entire OS.
- Active Directory or OpenLDAP integration
- MySQL for persistence
- HA Configuration
- Shared MySQL DB instance
- Load balancer to spread traffic across the Rancher instances
- A host to run the websocket-proxy on.
All hosts and any Rancher resources, such as containers, load balancers, and so on are created in and belong to an environment. Access control permissions for viewing and managing these resources are then defined by the owner of the environment. Rancher currently supports the capability for each user to manage and invite other users to their environment and allows for the ability to create multiple environments for different workloads. For example, you may want to create a “dev” environment and a separate “production” environment with its own set of resources and limited user access for your application deployment.
Users govern who has the access rights to view and manage Rancher resources within their Environment. Rancher allows access for a single tenant by default. However, multi-user support can also be enabled.
- Hosts are the most basic unit of resource within Rancher and is represented as any Linux server, virtual or physical
- Any modern Linux distribution that supports Docker 1.9.1+
- Ability to communicate with a Rancher server via http or https through the pre-configured port. Default is 8080.
- Ability to be routed to any other hosts under the same environment to leverage Rancher’s cross-host networking for Docker containers
- Rancher also supports Docker Machine and allows you to add your host via any of its supported drivers.
- Rancher supports cross-host container communication by implementing a simple and secure overlay network using IPsec tunneling. Most of Rancher’s network features, such as load balancer or DNS service, require the container to be in the managed network.
- Under Rancher’s network, a container will be assigned both a Docker bridge IP (172.17.0.0/16) and a Rancher managed IP (10.42.0.0/16) on the default docker0 bridge. Containers within the same environment are then routable and reachable via the managed network.
Rancher adopts the standard Docker Compose terminology for services and defines a basic service as one or more containers created from the same Docker image. Once a service (consumer) is linked to another service (producer) within the same stack, a DNS record mapped to each container instance is automatically created and discoverable by containers from the “consuming” service.
- Service High Availability (HA) - the ability to have Rancher automatically monitor container states and maintain a service’s desired scale.
- Health Monitoring - the ability to set basic monitoring thresholds for container health.
- Add Load Balancers - the ability to add a simple load balancer for your services using HAProxy.
- Add External Services - the ability to add any-IP as a service to be discovered.
- Add Service Alias - the ability to add a DNS record for your services to be discovered.
Rancher implements a managed load balancer using HAProxy that can be manually scaled to multiple hosts. A load balancer can be used to distribute network and application traffic to individual containers by directly adding them or “linked” to a basic service. A basic service that is “linked” will have all its underlying containers automatically registered as load balancer targets by Rancher.
- Rancher implements a health monitoring system by running managed network agent’s across it’s hosts to co-ordinate the distributed health checking of containers and services. These network agents internally utilize HAProxy to validate the health status of your applications. When health checks are enabled either on an individual container or a service, each container is then monitored by up to three network agents running on hosts separate to that containers parent host. The container is considered healthy if at least one HAProxy instance reports a “passed” health check
- Rancher handles network partitions and is more efficient than client-based health checks. By using HAProxy to perform health checks, Rancher enables users to specify the same health check policy across applications and load balancers.
- Rancher constantly monitors the state of your containers within a service and actively manages to ensure the desired scale of the service. This can be triggered when there are fewer (or even more) healthy containers than the desired scale of your service, a host becomes unavailable, a container fails, or is unable to meet a health check.
- Rancher supports the notion of service upgrades by allowing users to either load balance or apply a service alias for a given service. By leveraging either Rancher features, it creates a static destination for existing workloads that require that service. Once this is established, the underlying service can be cloned from Rancher as a new service, validated through isolated testing, and added to either the load balancer or service alias when ready. The existing service can be removed when obsolete. Subsequently, all the network or application traffic are automatically distributed to the new service.
- Rancher implements and ships a command-line tool called rancher-compose that is modeled after docker-compose. It takes in the same docker-compose.yml templates and deploys the Stacks onto Rancher. The rancher-compose tool additionally takes in a rancher-compose.yml file which extends docker-compose.yml to allow specifications of attributes such as scale, load balancing rules, health check policies, and external links not yet currently supported by docker-compose.
- A Rancher stack mirrors the same concept as a docker-compose project. It represents a group of services that make up a typical application or workload.
- Rancher supports container scheduling policies that are modeled closely after Docker Swarm.
- In addition, Rancher supports scheduling service triggers that allow users to specify rules, such as on “host add” or “host label”, to automatically scale services onto hosts with specific labels.
- Rancher supports the colocation, scheduling, and lock step scaling of a set of services by allowing users to group these services by using the notion of sidekicks. A service with one or more sidekicks is typically created to support shared volumes (i.e. --volumes_from) and networking (i.e. --net=container) between containers.
- Rancher offers data for both your services and containers. This data can be used to manage your running Docker instances in the form of a metadata service accessed directly through a HTTP based API. These data can include static information when creating your Docker containers, Rancher Services, or runtime data such as discovery information about peer containers within the same service.
- Try to simulate a TLO and Mold-E deployment
- Running in the Amazon cloud
- Services locked down by IT, such as MySQL, RabbitMQ, Graylog and MongoDB, are not managed by Rancher
- Infrastructure services that are under our control are deployed to all containers as System Services
- Alpha host runs the Rancher Server
- Bravo host runs the locked down services
- Charlie and Delta hosts run the simulation workload
- Create a new environment
- Talk about Docker Machine
- Show Infrastructure->Hosts
- Show monitoring console for a host
- Talk about the Catalog
- Show the contents of the Prometheus stack
- Add the TL Catalog
- Deploy Configuration Service (system service)
- Show containers spinning up
- Show container logs
- Show container exec
- Show hosts
- Deploy Reporting Service
- Show logs
- Deploy TLO
- Show logs
- Increase scaling
- Deploy load balancer
- Show containers
- Get Charlie's IP address
- Watch Reporting's log
- Send PUT request
- Show TLO API view
- Show the TLO upgrade
- Show rollback
- Show containers
- Note that the scale is back down to 1
- Poke around console
- minor UI bugs. Rollback, images in catalog not getting refreshed.
- to be safe, might want to run the Server in high availability mode, which means more infrastructure. At the very least, point it to a stable MySQL instance that is backed up regularly
- testing was done with the proprietary Cattle scheduling module. Unsure how solid Swarm or Kubernetes is.
- does not seem to support alerting when something is seriously wrong. Still need an alerting solution. DataDog?
- Not sure how we should handle an ever growing list of releases. Maybe only keep N of releases in Git and prune the rest?
- puts lots of capabilities under one roof -- monitoring, scheduling, self-service deployment, integration with CI stream -- things we have to solve anyway
- supports non-Rancher schedulers. We might want to use Swarm now and move to Kubernetes when we hit Netflix scale. Cattle works but might be vendor lock in?
- Authentication and auditing support should make operations happy
- Possible to deploy to developer laptops, providing the same convenience and rollback capabilities
- Template Catalog is an awesome feature. Pick what you want to deploy and click. CLI is also possible.
- Rancher load balancing is cheaper than Amazon's. Maybe AL can get rid of its internal proxies?
- Can integrate with Amazon's Route53 for DNS resolution
- Open source and the support community seems reponsive
- Thoughtworks suggests we trial it
One of the beautiful things behind open source is that it the project’s control is in the hands of the community when all is said and done. The data remains your data, free from lock-ins of proprietary solutions but built on repeatable standards (Apache License 2.0 / https://github.com/rancher/rancher ) . Rancher is just that, a completely open source platform with over a million downloads and in production with Enterprise’s (including Federal) all over the world. (can send you some examples if you wish)
Additionally, we of course provide Enterprise Support/licensing (exact same code base) and I’m sorry that this was not very clear on the site.
- Rancher is licensed based on the number of logical CPUs (LCPU) on Rancher hosts that are in use by a customer. An LCPU includes a processor in a single core processor, a core in a multi-core processor, or a hyperthreading sibling. The total number of logical CPUs is determined by how they are reported by Linux in /proc/cpuinfo, on all hosts under the management of the Rancher server.
- There is a minimum purchase commitment of 2,000 LCPU’s across two support levels:
- License + Standard support: $50,000/year ($25/LCPU)
- License + Platinum support: $90,000/year ($45/LCPU)
- Additional discounts available for higher LCPU tiers and multi-year terms.
We can of course tailor this for you, so please do use me as a point of contact moving forward. I’d be happy to set up a call to talk financials, our funding, etc…with one of the founders and myself. Would Friday work by chance?
- Do we continue with the current plan of using Rundeck + Amazon ECS to manage our deployments or do we want to course correct and trial Rancher?
- Both systems use Docker Compose files
- Using Rancher means we don't have to write the Rundeck/AWS integration piece
- Using Rancher/Swarm gives us one less vendor-specific piece. Should be possible to switch to an entirely Swarm-based solution if Rancher doesn't work out
- Rundeck/ECS doesn't give us the integrated feedback loop that Rancher does. Once you deploy, you have to switch to another tool to see if your deployment "took".
- Neither ECS or Rancher has specific support for rollbacks that involve database changes -- nobody seems to want to tackle that problem!
By Ronald Kurr