Supercharging ActiveRecord with PostgreSQL
Who am I?
Partner at DockYard
GitHub: danmcclain
Twitter: @_danmcclain
PostgreSQL
-
Advanced Data Types
- Arrays
- Hstore (Hash Store, aka Key-Value store)
- JSON
- Ranges
- Powerful Features
- windowing functions
- table inheritance
- Schemas - Separate silos within a single database
What We'll Explore:
Arrays
- PostgreSQL has built in support for arrays of any datatype
- That includes user-defined types as well
- Can be multi-dimensional
- Child arrays must have the same number of elements as their siblings
What We'll Expore:
HStore
- Hash Store
- Key-Value store
- Values are stored as strings, so plan on converting values if you want to store something else
What We'll Explore
Rank and CTEs
- PostgreSQL has window functions
- perform a calculation over a set of records
- rank - determines a record's rank over a specific order
- takes ties into account
- CTEs or Common Table Expressions allow you to declare a specific query as a temporary table to be used in a complex query
Patterns we'll examine: Tagging
Patterns we'll examine: Similar Records
Patterns we'll examine: Single Table Inheritance
Patterns we'll examine: Leaderboards
Tagging
-
We want to apply tags to notes entered into our system
- We don't care about generating a tag cloud
- We want to be able to find all notes with a given tag
Many to Many relation
- Usually, we'll generate a many-to-many relation between notes and tags
- We'll have a tag record for each tag applied to a note
- We'll have to generate a new tag when a user wants to use one we have not previously encountered
Using Arrays for tags
- PostgreSQL has native support for array columns
- We can store the tags as an array column on our notes
- We no longer have to worry about creating tag records when we encounter a new one
Adding our tags array to the model
class AddTagsToNotes < ActiveRecord::Migration
def change
add_column :notes, :tags, :string, array: true
end
end
Finding Tagged Notes
- Now that we are tagging our notes, we need a way of finding them
- PostgreSQL has multiple ways of checking if an array contains a value
- ANY(...)
- @> - contains operator
ANY
- We can use the ANY() function to see if a value is stored in any element of the array
SELECT notes.*
FROM notes
WHERE 'useful' = ANY(notes.tags)
- This won't use any indexes
@> - Contains
- We can use the @> operator to see if an array on the left contains the array on the right of the operator (order matters)
SELECT notes.*
FROM notes
WHERE notes.tags @> '{useful}'
- Without an index, this has the same performance as ANY()
- When we index the array with GIN (Generalized Inverted Index), we can see up to ~50 X speed improvement
GIN indexing our Array
class AddTagsToNotes < ActiveRecord::Migration
def change
add_index :notes, :tags, using: :gin
end
end
Some Benchmarking
- 100,000 notes, each with 100 tags
- 10,000 possible tags
- Table was cached by scanning it once
- Run on a MacBook Pro Retina with 16GB of RAM and SSD
- Noticeably slower reads on a Mac Mini with spinning metal
Finding Via Has_Many
First Run - 370 ms
Nested Loop -> Bitmap Heap Scan on post_tags Recheck Cond: (tag_id = 1) -> Bitmap Index Scan on index_post_tags_on_tag_id Index Cond: (tag_id = 1) -> Index Scan using posts_pkey on posts Index Cond: (id = post_tags.post_id) Total runtime: 370.139 ms
Subsequent calls - 13 ms
Nested Loop -> Bitmap Heap Scan on post_tags Recheck Cond: (tag_id = 2) -> Bitmap Index Scan on index_post_tags_on_tag_id Index Cond: (tag_id = 2) -> Index Scan using posts_pkey on posts Index Cond: (id = post_tags.post_id) Total runtime: 13.511 ms
Array: Finding via ANY
First run: 503 ms
Aggregate -> Seq Scan on posts Filter: ('Tag #1'::text = ANY ((array_tags)::text[])) Rows Removed by Filter: 98989 Total runtime: 503.517 ms
Subsequent runs: 500 ms
Aggregate -> Seq Scan on posts Filter: ('Tag #1'::text = ANY ((array_tags)::text[])) Rows Removed by Filter: 98989 Total runtime: 499.618 ms
Array: Finding via @>
First run - Pre index: 536 ms
Aggregate -> Seq Scan on posts Filter: (array_tags @> '{"Tag #1"}'::character varying[]) Rows Removed by Filter: 98989 Total runtime: 536.183 msFirst run - Post index: 8ms
Aggregate -> Bitmap Heap Scan on posts Recheck Cond: (gin_tags @> '{"Tag #1"}'::character varying[]) -> Bitmap Index Scan on index_posts_on_gin_tags Index Cond: (gin_tags @> '{"Tag #1"}'::character varying[]) Total runtime: 8.056 ms
Results
Method | Run 1 | Run 2 |
---|---|---|
has_many | 370 ms | 13 ms |
Array: ANY | 503 ms | 500 ms |
Array: @< (no index) | 499 ms | 500 ms |
Array: @< (indexed) | 8 ms | 5 ms |
Drawbacks
- You lose the ability to efficiently create a list of tags
- Getting a distinct list of tags from a series of arrays is slow
- It's also fairly complex
- GIN can be slow to update
- Multiple inserts into index (one for each key extracted)
- Can be mitigate somewhat using FASTUPDATE
Notes
- Database-side Cache matters
- You will see a performance increase on subsequent runs, as the results will be in memory
- Results across the board were quicker after the first table scan, as the table was in memory after that
Similar Records
- We want to build out the ability to have similar posts listed for a given post
- We don't care about mapping a similar post back to the source item (don't need belongs_to)
Old Way
- Create a similar_posts model
- post_id: id of the parent post (belongs_to)
- similar_post_id: id of the child post (belongs_to)
- Wire up a has_many though: relation to create a similar_posts attribute on the post model
Using Arrays
- We can build has_many-type functionality using arrays
- This allows us to retrieve all the similar item ids with a single call for the item record
Migration
- We want an attribute on our model to store the ids
class CreateMembers < ActiveRecord::Migration
def change
add_column :posts, :similar_post_ids, :integer, array: true
end
end
has_many like methods
- We can add a method
Post#similar_posts
which will retrieve our item's by their ids stored inPost#similar_post_ids
-
We can also add a
Post#similar_posts=
method that will take the array of items passed to it and store their ids inPost#similar_post_ids
has_many like methods
class Post < ActiveRecord::Base
def similar_posts
self.class.where(id: similar_posts_ids)
end
def similar_posts=(children)
self.similar_posts_ids = children.map(&:id)
end
def similar_posts_ids=(ids)
write_attribute :similar_posts_ids, ids.reject{ |id| id.to_i <= 0 }
end
end
Using our fake has_many
post = Post.find(1)
post.similar_posts # => [Post, Post, Post]
post.similar_posts = [Post, Post] # Updates the similar_post_ids attribute
# the following won't work
post.similar_posts << Post # We aren't observing this array,
# so it won't update the ids attribute
# we can work around it with the following
post.similar_posts += [Post] # calls the setter on the array
Benefits
-
Alternate way of providing has_many
- Avoids additional records to maintain relations
Next Steps
- This example could be built out further using a proxy class in the has_many-like methods
- This would allow us to perform << operations and have ActiveRecord track the changes
Single Table Inheritence
- We are using single table inheritance to differentiate between multiple user types
- We want to add a set of object specific attributes for each user type
- Previously, we could move these attributes to a profile object and that profile would belong to a class
HStore
- PostgreSQL has a Hstore or Hash Store column type
- We can store a set of key-value properties on this column
Enabling Hstore
- Hstore was added to PostgreSQL 9.2 as an extension
- Rails 4.0 added
enable_extension
which will allow you to enable an extension via a migration -
enable_extension 'hstore'
will enable hstore in your database
An Example
- Joe Him at DevMynd details using Hstore with Rails 3.2 via activerecord-postgres-hstore [1]
- He extends ActiveRecord::Base to create an hstore_attribute for his models
- We can apply the concepts to Rails 4
[1]: http://www.devmynd.com/blog/2013-3-single-table-inheritance-hstore-lovely-combination
Our Member Class
- The Member class will serve as the base class for our STI infrastructure
class CreateMembers < ActiveRecord::Migration
def change
enable_extension 'hstore' # Let's enable Hstore
create_table :members do |t|
t.string :username
t.string :type
t.hstore :data # We'll store all our attributes in a column
# called data
t.timestamps
end
end
end
Extending activeRecord
module HstoreAccessor
def hstore_accessor(hstore_attribute, *keys)
Array(keys).flatten.each do |key|
define_method("#{key}=") do |value|
send("#{hstore_attribute}=", (send(hstore_attribute) || {}).merge(key.to_s => value))
send("#{hstore_attribute}_will_change!")
end
define_method(key) do
send(hstore_attribute) && send(hstore_attribute)[key.to_s]
end
end
end
end
end
ActiveRecord::Base.send(:extend, HstoreAccessor)
- We define hstore_accessor, which will define a getter and setter against our hstore column
Creating User class
- Users have an email and age
class User < Member
hstore_accessor :data, :email, :age
end
- Now we can access User#email, and User#age
- We can also validate these attributes
class User < Member
hstore_accessor :data, :email, :age
validates :email, :age, presence: true
end
User.new.email #=> nil
User.new.valid? #=> false
Creating another class
class TwitterUser < Member
hstore_accessor :data, :twitter_handle
end
- Since we only set up twitter_handle, we don't have access to email, which we set up on our User model
twitter_user = TwitterUser.new twitter_handle: '_danmcclain', username: 'dan'
twitter_user.twitter_handle #=> "_danmcclain"
twitter_user.username #=> "dan"
twitter_user.email #=> NoMethodError
Limitations
- HStore stores values as strings, so you have to convert values yourself
- We could extend hstore_attribute so that it performs type casting
Leaderboards
- Need a leaderboard for each game
- Need the ability to return a user's best score for a given game
- Want to page through our scores, but maintain ranking
Rank()
- PostgreSQL provides the Rank windowing function
- Takes ties into account when ranking
SELECT rank() OVER(ORDER BY points DESC), username, points
FROM scores
WHERE game = 'Barking Bostons'
- Rank will be scoped to the current query
┌──────┬──────────┬────────┐ │ rank │ username │ points │ ├──────┼──────────┼────────┤ │ 1 │ Brian │ 300 │ │ 2 │ Dan │ 200 │ │ 3 │ Dan │ 100 │ │ 4 │ Doug │ 30 │ └──────┴──────────┴────────┘
First attempt at getting a user's rank
SELECT rank() OVER (ORDER BY scores.points DESC), username, points
FROM scores
WHERE game = 'Barking Bostons' AND username = 'Dan'
- We get back the user's scores for the game
- BUT the ranks are only for the user's scores, it does not take into account other users
┌──────┬──────────┬────────┐ │ rank │ username │ points │ ├──────┼──────────┼────────┤ │ 1 │ Dan │ 200 │ │ 2 │ Dan │ 100 │ └──────┴──────────┴────────┘
Using Common Table Expressions to get the ranks of a User's Scores
- Common Table Expressions (CTEs) allow you to alias a subquery
WITH top_scores_for_game AS (
SELECT rank() OVER (ORDER BY points DESC), username, points
FROM scores
WHERE scores.game = 'Barking Bostons'
)
SELECT top_scores_for_game.*
FROM top_scores_for_game
- The above query will return all the results from the
WITH
statement
Attempt #2
- We'll use a CTE to get all the scores for a game, then find our user within those results
WITH top_scores_for_game AS (
SELECT rank() OVER (ORDER BY points DESC), username, points
FROM scores
WHERE scores.game = 'Barking Bostons'
)
SELECT top_scores_for_game.*
FROM top_scores_for_game
WHERE top_scores_for_game.username = 'Dan'
┌──────┬──────────┬────────┐ │ rank │ username │ points │ ├──────┼──────────┼────────┤ │ 2 │ Dan │ 200 │ │ 3 │ Dan │ 100 │ └──────┴──────────┴────────┘
Getting ranked results in ActiveRecord
- We can drop this whole block of SQL into a find_by_sql call to get models back
query = <<-SQL
WITH top_scores_for_game AS (
SELECT rank() OVER (ORDER BY points DESC), username, points
FROM scores
WHERE scores.game = 'Barking Bostons'
)
SELECT top_scores_for_game.*
FROM top_scores_for_game
WHERE top_scores_for_game.username = 'Dan'
SQL
Score.find_by_sql query
=> [#<Score ... >, #<Score ... >]
Using postgres_ext for rank and CTE
- postgres_ext has support for ranking results and querying against CTEs
- Return the scores ranked over their points for the game named 'Barking Bostons'
Score.ranked(points: :desc).where(game: 'Barking Bostons')
- Now lets get the user's ranked scores for the game using a CTE
Score.from_cte('top_scores_for_game',
Score.ranked(points: :desc).where(game: 'Barking Bostons')).where(username: 'Dan')
Wrapping Up
- PostgreSQL has some really powerful features baked in
- Your situation ultimately dictates the best approach
- Play with different solutions to see what is the best performance/maintainability trade off for your specific use case
- Don't just use it because "It's new/cool"
Other PostgreSQL features to explore
- JSON
- Views
- Schemas
- Network Types (MAC address, IP Address)
- Ranges
- User defined types
- PostGIS
Questions?
Thank You
- Twitter: @_danmcclain
- GitHub: github.com/danmcclain
- Postgres_ext: github.com/dockyard/postgres_ext
Supercharging ActiveRecord with PostgreSQL
By Dan McClain
Supercharging ActiveRecord with PostgreSQL
We'll go over some common patterns in Rails and how we can rebuild them by off loading the work onto PostgreSQL. This talk was given at Wicked Good Ruby on October 12, 2013
- 14,132