Kirk Haines
wyhaines@gmail.com
kirk-haines@cookpad.com
• Rubyist since 2001
• First professional Ruby web app in 2002
• Dozens of sites and apps, and several web servers since
kirk-haines@cookpad.com
wyhaines@gmail.com
@wyhaines
The "Stack" is the software components that live around our application, enabling it to be accessed by end users, and enabling it's own function.
Traditionally, only some of this is done with Ruby
I've done other parts in Ruby, for Production apps, for a long time. So, let's see how we got to where we are with the modern Ruby stack, and then see how far we can push it in doing those other parts with Ruby.
https://www.ruby-lang.org/en/downloads/releases/
I started here....
There were no real "frameworks".
There were a few tools.
Stack options were limited.
Amrita
CGI
mod_ruby
Apache based stack
Amrita
CGI
mod_ruby
Author: seki <seki@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Sun Nov 17 16:11:40 2002 +0000
* add ERB
XHTML/HTML Templating
Webserver <-> Executable protocol
Embed Ruby inside Apache
Before Rails
Before ERB
Apache
Ruby
Script
CGI
Apache
mod_ruby
Your App
Architecture was much nicer than other Ruby options
Listening on sockets
Attaching to a web server directly
Took it over in 2002, hacked, and released a commercial application, running on Ruby, in 2002
The Ruby "stack" was evolving
Apache
w/ mod_iowa
IOWA
App
on socket
IOWA
App
on socket
IOWA
App
+ WEBrick
IOWA
App
+ WEBrick
Apache
w/ proxypass
Early Rails stacks still Apache centric (sometimes lighttpd)
Rails apps did run as discrete processes
FCGI
MySQL
FCGI and friends (SCGI, for example) was dominant for a while, but there was a lot of unhappiness with it
Surely there must be a better way?
https://groups.google.com/forum/#!msg/comp.lang.ruby/-cI5PtqDc1o/FXRdN9UKPE0J
Announced Jan 19, 2006
Web/Application server written in Ruby
Used a C/Ragel derived HTTP parser
Promised Speed, and delivered
Quickly became the new paradigm
Rails + Mongrel
Rails +
Mongrel
Apache
w/ proxypass
Rails + Mongrel
Rails +
Mongrel
NGINX +
proxy
IOWA supported multiple deployment schemes
IOWA implemented a common request specification
Not Rack; predated Rack, but similar goals
Some IOWA code ended up in Rack
Agnostic interface between Ruby apps and the outside world.
It didn't change the Ruby deployment landscape immediately, but it was a key enabler
Mongrel supported Rack
Other app servers (like Thin) started being developed
Ruby devs were getting more options for their stack
IOWA has server side state management
REALLY nice to eliminate programmer work
However, state was isolated to a single process
All requests for a given session need to go to the same backend process
This requires sticky sessions
There was no good solution
So, the answer? Write it myself! A sticky session supporting load balancing reverse proxy, in Ruby!
You can't do that with Ruby
Ruby is too slow
You need C/C++/Erlang/other-fast-thing
10k 1020 byte files, concurrency of 25; with KeepAlive
Requests per second: 26245.09 [#/sec] (mean)
10k 78000 byte files, concurrency of 25; with KeepAlive
Requests per second: 8273.98 [#/sec] (mean)
10k 78000 byte files with etag, concurrency of 25; with KeepAlive
Requests per second: 25681.19 [#/sec] (mean)
Disclosure: This was timed on a $10/month Digital Ocean instance, running
Ruby 2.5.1. The original benchmarks using Ruby 1.8.7 on cheap dedicated 2008 hardware topped out at about 15000/second.
A pure Ruby stack can do everything I need
Rack got support for Swiftiply around version 0.9
One simple, consistent stack, whether IOWA or Rails
Swiftiply
IOWA
App
IOWA
App
IOWA
App
Rails
App
Rails
App
Analogger
Swiftiply
IOWA
App
IOWA
App
IOWA
App
Rails
App
Rails
App
Analogger
Analogger Speedtest -- larger messages
Testing 100000 messages of 100 bytes each.
Message rate: 130665/second (0.765315254)
Analogger Speedtest -- Fail Analogger, continue logging locally, and monitor for Analogger return, then drain queue of local logs
Testing 100000 messages of 100 bytes each.
Message rate: 61992/second (1.613111685)
Ruby Logger Speedtest (local file logging only) -- larger messages
Testing 100000 messages of 100 bytes each.
Message rate: 72811/second (1.37341377)
Performance tested on $10/month Digital Ocean instance
Inexpensive dedicated 2007 era hardware, with Ruby 1.8.6/7,
delivered about 60% of those numbers. i.e. even a decade ago it was fast enough.
Swiftiply
IOWA
App
IOWA
App
IOWA
App
Rails
App
Rails
App
Analogger
Dozens of sites and apps,
large and small, running Ruby dominated production stacks for the last decade.
You CAN do most of this with Ruby in 2018
With comments, and an example/test, it's less than 200 lines to write a multi-strategy load balancing proxy with Ruby.
# Wrapping the proxy server
#
module Server
def run(host='0.0.0.0', port=9999)
puts ANSI::Code.bold { "Launching proxy at #{host}:#{port}...\n" }
Proxy.start(:host => host, :port => port, :debug => false) do |conn|
Backend.select do |backend|
conn.server backend, :host => backend.host, :port => backend.port
conn.on_connect &Callbacks.on_connect
conn.on_data &Callbacks.on_data
conn.on_response &Callbacks.on_response
conn.on_finish &Callbacks.on_finish
end
end
end
module_function :run
end
Really
WEBrick is powerful, but often ignored
require 'webrick'
require 'webrick/httpproxy'
It's capable, and it's been a part of Ruby since 2002.
$ bundle exec bin/shellac -s hash -b 0.0.0.0:8080 \
-r 'rubykaigi2018.demos4.us::\?(.*)$::https://github.com/#{$1}' \
-t 2:20 -w 1
Requests per second: 1861.69 [#/sec] (mean)
Time per request: 53.715 [ms] (mean)
Time per request: 0.537 [ms] (mean, across all concurrent requests)
Transfer rate: 175182.14 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 9
Processing: 2 53 3.4 53 65
Waiting: 1 53 3.8 53 65
Total: 10 53 3.0 53 65
It's probably fast enough! (Ruby 2.5.1; $10 Digital Ocean instance)
Shellac was knocked together as a proof of concept demo for this talk, and ABSOLUTELY needs more work and more features, and more than a few bug fixes before I'd ever use it for any real task.
#! /bin/sh
read get docid junk
cat `echo "$docid" | \
ruby -p -e '$_=$_.gsub(/^\//,"").gsub(/[\r\n]/,"").chomp'`
while :; do netcat -l -p 5000 -e ./super_simple.sh; done
Just kidding! Don't do this. Please! (It is, more or less, HTTP 0.9 compliant, though...)
$ ruby -run -e httpd -- -p 8080 .
This actually runs a WEBrick server, and it's reasonably quickly.
Requests per second: 2732.33 [#/sec] (mean)
Time per request: 3.660 [ms] (mean)
Time per request: 0.366 [ms] (mean, across all concurrent requests)
Transfer rate: 717.77 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 1 4 0.5 3 17
Waiting: 0 2 0.6 2 16
Total: 1 4 0.6 4 18
use Rack::Static,
:urls => ["/"],
:root => "."
run lambda { |env|
[
200,
{
'Content-Type' => 'text/html',
'Cache-Control' => 'public, max-age=86400'
},
( File.open('index.html', File::RDONLY) rescue nil )
]
}
Requests per second: 3075.24 [#/sec] (mean)
Time per request: 3.252 [ms] (mean)
Time per request: 0.325 [ms] (mean, across all concurrent requests)
Transfer rate: 12237.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 1 3 0.4 3 23
Waiting: 0 3 0.3 3 12
Total: 1 3 0.4 3 23
puma -b tcp://127.0.0.1:5000 -w 1 -t 2:16 ../config.ru
Requests per second: 3075.24 [#/sec] (mean)
Time per request: 3.252 [ms] (mean)
Time per request: 0.325 [ms] (mean, across all concurrent requests)
Transfer rate: 12237.88 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 1 3 0.4 3 23
Waiting: 0 3 0.3 3 12
Total: 1 3 0.4 3 23
Puma
Requests per second: 4545.14 [#/sec] (mean)
Time per request: 2.200 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 18730.94 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 4
Processing: 0 2 0.4 2 7
Waiting: 0 2 0.4 2 7
Total: 1 2 0.4 2 7
NGINX
To my knowledge, nobody has used Ruby to implement a database (yet?)
I have a crazy idea (that I might hack on the flight back home)
Ruby +
distributed gossip type protocol +
SQLite ==
ROMA is a distributed key value store, written in Ruby
ROMA-1
ROMA-2
ROMA-3
$ simple_bench -t 8 -r -n 1000 10.136.73.13:11211
qps=556 max=0.151159 min=0.000541 ave=0.001797
qps=637 max=0.196910 min=0.000487 ave=0.001569
qps=605 max=0.119492 min=0.000540 ave=0.001651
qps=446 max=0.527537 min=0.000553 ave=0.002238
qps=535 max=1.563985 min=0.000483 ave=0.001866
qps=531 max=0.879976 min=0.000606 ave=0.001883
qps=599 max=0.231225 min=0.000534 ave=0.001668
Each node is a $5 Digital Ocean instance.
Writing persistent data (survives restart/reboot)
across at least 2 nodes.
Rakuten claims performance > 10000 queries per second on in-memory data stores, using AWS instances for nodes (2015 slides)
Distributed nature and redundancy supports rolling restarts
Ships with support for multiple storage engines:
Extending it is as simple as writing a Ruby module to provide the implementation
Capistrano is a framework for building automated deployment scripts.
kirkhaines@MacBook-Pro ~/.ghq/github.com/wyhaines/demo-website (staging) $ git push
Git Repo
ProductionServer
ProductionServer
Staging
Server
kirkhaines@MacBook-Pro ~/.ghq/github.com/wyhaines/demo-website (production) $ git push
Git Repo
ProductionServer
ProductionServer
Staging
Server
Requires explicit specification of push target:
git remote add production demo@server_domain_or_IP:proj
By Hashicorp
$ serf members -rpc-auth="blahblahblah"
db-1-nyc1 10.136.13.216:7946 alive role=database
app-demo-1-nyc1 10.81.218.50:7946 alive role=app
app-demo-2-nyc1 10.236.115.112:7946 alive role=app
http-1-nyc1 10.205.169.211:7946 alive role=nginx
$ serf query -rpc-auth="blahblahblah" load-average
Query 'load-average' dispatched
Ack from 'db-1-nyc1'
Ack from 'app-1-nyc1'
Ack from 'http-1-nyc1'
Ack from 'app-2-nyc1'
Response from 'app-2-nyc1': 16:54:16,up,105,days,,20:48,,0,users,,load,average:,0.00,,0.00,,0.00
Response from 'app-1-nyc1': 16:54:16,up,108,days,,1:37,,1,user,,load,average:,0.00,,0.00,,0.00
Response from 'http-1-nyc1': 16:54:16,up,111,days,,1:39,,0,users,,load,average:,0.00,,0.00,,0.00
Response from 'db-1-nyc1': 16:54:16,up,105,days,,23:39,,1,user,,load,average:,0.00,,0.00,,0.00
Total Acks: 4
Total Responses: 4
$ serf query -rpc-auth="blahblahblah" list-handlers
Query 'list-handlers' dispatched
Ack from 'deploy-1-nyc1'
Ack from 'app-1-nyc1'
Ack from 'app-2-nyc1'
Ack from 'http-1-nyc1'
Response from 'app-1-nyc1': query: list-handlers
query: describe-handler
query: load-average
query: df
query: mpstat
query: ping
event: git-index-deploy
Response from 'db-1-nyc1': query: list-handlers
query: describe-handler
query: load-average
query: df
query: mpstat
query: ping
Response from 'app-2-nyc1': query: list-handlers
query: describe-handler
query: load-average
query: df
query: mpstat
query: ping
event: git-index-deploy
Response from 'http-1-nyc1': query: list-handlers
query: describe-handler
query: load-average
query: df
query: mpstat
query: ping
Total Acks: 4
Total Responses: 4
$ serf query -rpc-auth="blahblahblah" describe-handler git-index-deploy
Query 'describe-handler' dispatched
Ack from 'deploy-1-nyc1'
Ack from 'app-1-nyc1'
Ack from 'http-1-nyc1'
Ack from 'app-2-nyc1'
Response from 'app-1-nyc1': Expects a hash code in the payload which will
be queried using git-index. A 'git fetch --all && git pull' will be
executed on all matching repositories.
Response from 'app-2-nyc1': Expects a hash code in the payload which will
be queried using git-index. A 'git fetch --all && git pull' will be
executed on all matching repositories.
Total Acks: 4
Total Responses: 2
{
"protocol": 5,
"bind": "0.0.0.0",
"advertise": "10.136.13.216",
"rpc_addr": "0.0.0.0:7373",
"rpc_auth": "blahblahblah",
"enable_syslog": true,
"log_level": "info",
"replay_on_join": true,
"snapshot_path": "/etc/serf/snapshot",
"tags": {
"role": "app"
},
"retry_join": [
"db-1-nyc1",
"app-2-nyc1",
"http-1-nyc1"
],
"event_handlers" : [
"/usr/share/rvm/wrappers/ruby-2.5.1/serf-handler"
]
}
require 'serf/handler/events/load_average'
require 'serf/handler/events/df'
require 'serf/handler/events/mpstat'
require 'serf/handler/events/ping'
require 'serf/handler/events/git-index-deploy'
Wrote it on the airplane coming home from the Cookpad Bristol Office
Simple configuration - Just require something that uses the DSL
Several bundled handlers come with the gem
require 'serf/handler' unless Object.const_defined?(:Serf) &&
Serf.const_defined?(:Handler)
include Serf::Handler
describe "Return the 1-minute, 5-minute, and 15-minute load averages as a",
"comma separated list of values."
on :query, 'load-average' do |event|
`/usr/bin/uptime`.gsub(/^.*load\s+averages:\s+/,'').split.join(',').strip
end
load_average.rb
require 'serf/handler' unless Object.const_defined?(:Serf) && Serf.const_defined?(:Handler)
include Serf::Handler
describe "Expects a hash code in the payload which will be queried using",
"git-index. A 'git fetch --all && git pull' will be executed on all",
"matching repositories. Deployment hooks are available in order to",
"execute arbitrary code during the deploy process. If any of the",
"following files are found in the REPO root directory, they will be",
"executed in the order described by their name.\n",
".serf-before-deploy\n",
".serf-after-deploy\n",
".serf-on-deploy-failure\n",
".serf-on-deploy-success\n"
on :event, 'git-index-deploy' do |event|
user = `whoami` # Serf's executable environments are stripped of even basic information like HOME
dir = `eval echo "~#{user}"`.strip
`git-index -d #{dir}/.git-index.db -q #{event.payload}`.split(/\n/).each do |match|
hash,data = match.split(/:\s+/,2)
path,url = data.split(/\|/,2)
ENV['SERF_DEPLOY_PAYLOAD'] = event.payload
ENV['SERF_DEPLOY_HASH'] = hash
ENV['SERF_DEPLOY_PATH'] = path
ENV['SERF_DEPLOY_URL'] = url
success = Dir.chdir(path)
break unless success
if FileTest.exist?(File.join(path, ".serf-before-deploy"))
system(File.join(path, ".serf-before-deploy"))
end
success = system("git fetch --all && git pull")
if success && FileTest.exist?(File.join(path, ".serf-on-deploy-success"))
system(File.join(path, ".serf-on-deploy-success"))
elsif !success && FileTest.exist?(File.join(path, ".serf-on-deploy-failure"))
system(File.join(path, ".serf-on-deploy-failure"))
end
if FileTest.exist?(File.join(path, ".serf-after-deploy"))
system(File.join(path, ".serf-after-deploy"))
end
end
end
Simple tool to make/maintain a database of git repositories
Calculates a "fingerprint" using the combination of the hashes of the first two commits in the repo -- it's enough to be reliably unique.
kirkhaines@MacBook-Pro ~/.ghq/github.com/wyhaines/demo-website (production) $ git push
Git Repo
ProductionServer
ProductionServer
Staging
Server
This doesn't need much comment. Chef and Itamae are both Ruby based and capable.
I still think Analogger is pretty sweet if you just need something fast and reliable to get all of your logs into one place.
Fluentd is Ruby (Thanks Treasure Data!) and it's great
A tool to keep a folder or folders synchronized between multiple discrete systems, in a few hundred lines of Ruby.
At EngineYard I wrote essentially an SSH proxy with full stream logging and 2-factor authentication, implemented in Ruby
Still a private project, but hopefully it'll land on Github soon
This chain of thought was inspired by a couple real-world, mostly Ruby stacks that have been in production with few changes for 7+ years.
With some work, one could conceivably fill all the niches with Ruby:
For everyone who signed up, come to our party tonight!
https://www.eventbrite.com.au/e/cookpad-x-rubykaigi-2018-day-2-party-tickets-46009089425
We are hiring! Find one of us, or come to our booth
kirk-haines@cookpad.com / wyhaines@gmail.com / @wyhaines everywhere