Supercharging ActiveRecord with PostgreSQL

Who am I?


Partner at DockYard
GitHub: danmcclain
Twitter: @_danmcclain

PostgreSQL

  • Advanced Data Types
    • Arrays
    • Hstore (Hash Store, aka Key-Value store)
    • JSON
    • Ranges
  • Powerful Features
    • windowing functions
    • table inheritance
    • Schemas - Separate silos within a single database

What We'll Explore:
Arrays

  • PostgreSQL has built in support for arrays of any datatype
    • That includes user-defined types as well
  • Can be multi-dimensional
    • Child arrays must have the same number of elements as their siblings

What We'll Expore:

HStore

  • Hash Store
  • Key-Value store
  • Values are stored as strings, so plan on converting values if you want to store something else

What We'll Explore 

Rank and CTEs

  • PostgreSQL has window functions
    • perform a calculation over a set of records
  • rank - determines a record's rank over a specific order
    • takes ties into account
  • CTEs or Common Table Expressions allow you to declare a specific query as a temporary table to be used in a complex query

Patterns we'll examine: Tagging

Patterns we'll examine: Similar Records 

Patterns we'll examine: Single Table Inheritance


Patterns we'll examine: Leaderboards

Tagging


  • We want to apply tags to notes entered into our system
  • We don't care about generating a tag cloud
  • We want to be able to find all notes with a given tag

Many to Many relation

  • Usually, we'll generate a many-to-many relation between notes and tags
  • We'll have a tag record for each tag applied to a note
  • We'll have to generate a new tag when a user wants to use one we have not previously encountered

Using Arrays for tags

  • PostgreSQL has native support for array columns
  • We can store the tags as an array column on our notes
  • We no longer have to worry about creating tag records when we encounter a new one

Adding our tags array to the model

class AddTagsToNotes < ActiveRecord::Migration
  def change
    add_column :notes, :tags, :string, array: true
  end
end

Finding Tagged Notes

  • Now that we are tagging our notes, we need a way of finding them
  • PostgreSQL has multiple ways of checking if an array contains a value
    • ANY(...)
    • @> - contains operator

ANY

  • We can use the ANY() function to see if a value is stored in any element of the array

SELECT notes.*
FROM notes
WHERE 'useful' = ANY(notes.tags)
  • This won't use any indexes 

@> - Contains

  • We can use the @> operator to see if an array on the left contains the array on the right of the operator (order matters)

SELECT notes.*
FROM notes
WHERE notes.tags @> '{useful}'
  • Without an index, this has the same performance as ANY()
  • When we index the array with GIN (Generalized Inverted Index), we can see up to ~50 X speed improvement

GIN indexing our Array

class AddTagsToNotes < ActiveRecord::Migration
  def change
    add_index :notes, :tags, using: :gin
  end
end

Some Benchmarking

  • 100,000 notes, each with 100 tags
  • 10,000 possible tags
  • Table was cached by scanning it once
  • Run on a MacBook Pro Retina with 16GB of RAM and SSD
    • Noticeably slower reads on a Mac Mini with spinning metal

Finding Via Has_Many

First Run - 370 ms

Nested Loop
   ->  Bitmap Heap Scan on post_tags
         Recheck Cond: (tag_id = 1)
         ->  Bitmap Index Scan on index_post_tags_on_tag_id
               Index Cond: (tag_id = 1)
   ->  Index Scan using posts_pkey on posts
         Index Cond: (id = post_tags.post_id) 
 Total runtime: 370.139 ms

Subsequent calls - 13 ms

Nested Loop
  ->  Bitmap Heap Scan on post_tags
        Recheck Cond: (tag_id = 2)
        ->  Bitmap Index Scan on index_post_tags_on_tag_id
              Index Cond: (tag_id = 2)
  ->  Index Scan using posts_pkey on posts
        Index Cond: (id = post_tags.post_id)
Total runtime: 13.511 ms

Array: Finding via ANY

First run: 503 ms

Aggregate  
  ->  Seq Scan on posts
        Filter: ('Tag #1'::text = ANY ((array_tags)::text[]))
        Rows Removed by Filter: 98989
Total runtime: 503.517 ms

Subsequent runs: 500 ms

Aggregate
  ->  Seq Scan on posts
        Filter: ('Tag #1'::text = ANY ((array_tags)::text[]))
        Rows Removed by Filter: 98989
Total runtime: 499.618 ms

Array: Finding via @>

First run - Pre index: 536 ms

Aggregate
  ->  Seq Scan on posts
        Filter: (array_tags @> '{"Tag #1"}'::character varying[])
        Rows Removed by Filter: 98989
Total runtime: 536.183 ms
First run - Post index: 8ms
 Aggregate
  ->  Bitmap Heap Scan on posts
       Recheck Cond: (gin_tags @> '{"Tag #1"}'::character varying[])
       ->  Bitmap Index Scan on index_posts_on_gin_tags
             Index Cond: (gin_tags @> '{"Tag #1"}'::character varying[])
 Total runtime: 8.056 ms

Results

Method Run 1 Run 2
has_many 370 ms 13 ms
Array: ANY 503 ms 500 ms
Array: @< (no index) 499 ms 500 ms
Array: @< (indexed) 8 ms 5 ms

Drawbacks

  • You lose the ability to efficiently create a list of tags
    • Getting a distinct list of tags from a series of arrays is slow
    • It's also fairly complex
  • GIN can be slow to update
    • Multiple inserts into index (one for each key extracted)
    • Can be mitigate somewhat using FASTUPDATE

Notes

  • Database-side Cache matters
    • You will see a performance increase on subsequent runs, as the results will be in memory
    • Results across the board were quicker after the first table scan, as the table was in memory after that

Similar Records

  • We want to build out the ability to have similar posts listed for a given post
  • We don't care about mapping a similar post back to the source item (don't need belongs_to)

Old Way

  • Create a similar_posts model
    • post_id: id of the parent post (belongs_to)
    • similar_post_id: id of the child post (belongs_to)
  • Wire up a has_many though: relation to create a similar_posts attribute on the post model

Using Arrays

  • We can build has_many-type functionality using arrays
  • This allows us to retrieve all the similar item ids with a single call for the item record

Migration

  • We want an attribute on our model to store the ids

class CreateMembers < ActiveRecord::Migration
  def change
    add_column :posts, :similar_post_ids, :integer, array: true
  end
end

has_many like methods

  • We can add a method Post#similar_posts which will retrieve our item's by their ids stored in Post#similar_post_ids
  • We can also add a Post#similar_posts= method that will take the array of items passed to it and store their ids in Post#similar_post_ids

has_many like methods

class Post < ActiveRecord::Base
  def similar_posts
    self.class.where(id: similar_posts_ids)
  end
 
  def similar_posts=(children)
    self.similar_posts_ids = children.map(&:id)
  end
 
  def similar_posts_ids=(ids)
    write_attribute :similar_posts_ids, ids.reject{ |id| id.to_i <= 0 }
  end
end

Using our fake has_many

post = Post.find(1)
post.similar_posts # => [Post, Post, Post]
post.similar_posts = [Post, Post] # Updates the similar_post_ids attribute

# the following won't work
post.similar_posts << Post # We aren't observing this array, 
                           # so it won't update the ids attribute

# we can work around it with the following
post.similar_posts += [Post] # calls the setter on the array

Benefits

  • Alternate way of providing has_many
  • Avoids additional records to maintain relations

Next Steps

  • This example could be built out further using a proxy class in the has_many-like methods
    • This would allow us to perform << operations and have ActiveRecord track the changes

Single Table Inheritence

  • We are using single table inheritance to differentiate between multiple user types
  • We want to add a set of object specific attributes for each user type
  • Previously, we could move these attributes to a profile object and that profile would belong to a class

HStore

  • PostgreSQL has a Hstore or Hash Store column type
  • We can store a set of key-value properties on this column

Enabling Hstore

  • Hstore was added to PostgreSQL 9.2 as an extension
  • Rails 4.0 added enable_extension which will allow you to enable an extension via a migration
  • enable_extension 'hstore' will enable hstore in your database

An Example

  • Joe Him at DevMynd details using Hstore with Rails 3.2 via activerecord-postgres-hstore [1]
    • He extends ActiveRecord::Base to create an hstore_attribute for his models
  • We can apply the concepts to Rails 4




[1]: http://www.devmynd.com/blog/2013-3-single-table-inheritance-hstore-lovely-combination

    Our Member Class

    • The Member class will serve as the base class for our STI infrastructure
    
    

    class CreateMembers < ActiveRecord::Migration
      def change
        enable_extension 'hstore' # Let's enable Hstore
    
        create_table :members do |t|
          t.string :username
          t.string :type
          t.hstore :data # We'll store all our attributes in a column
                         # called data
          t.timestamps
        end
      end
    end

    Extending activeRecord

    module HstoreAccessor
        def hstore_accessor(hstore_attribute, *keys)
          Array(keys).flatten.each do |key|
            define_method("#{key}=") do |value|
              send("#{hstore_attribute}=", (send(hstore_attribute) || {}).merge(key.to_s => value))
              send("#{hstore_attribute}_will_change!")
            end
            define_method(key) do
              send(hstore_attribute) && send(hstore_attribute)[key.to_s]
            end
          end
        end
      end
    end
    ActiveRecord::Base.send(:extend, HstoreAccessor)

    • We define hstore_accessor, which will define a getter and setter against our hstore column

    Creating User class

    • Users have an email and age

    class User < Member
      hstore_accessor :data, :email, :age
    end
    • Now we can access User#email, and User#age
    • We can also validate these attributes
    class User < Member
      hstore_accessor :data, :email, :age
      validates :email, :age, presence: true
    end
    
    User.new.email #=> nil
    
    User.new.valid? #=> false
    

    Creating another class

    class TwitterUser < Member
      hstore_accessor :data, :twitter_handle
    end
    • Since we only set up twitter_handle, we don't have access to email, which we set up on our User model
    twitter_user = TwitterUser.new twitter_handle: '_danmcclain', username: 'dan'
    twitter_user.twitter_handle   #=> "_danmcclain"
    twitter_user.username         #=> "dan"
    twitter_user.email            #=> NoMethodError
    

    Limitations

    • HStore stores values as strings, so you have to convert values yourself
      • We could extend hstore_attribute so that it performs type casting

    Leaderboards


    • Need a leaderboard for each game
    • Need the ability to return a user's best score for a given game
    • Want to page through our scores, but maintain ranking

    Rank()

    • PostgreSQL provides the Rank windowing function
    • Takes ties into account when ranking
    SELECT rank() OVER(ORDER BY points DESC), username, points
    FROM scores
    WHERE game = 'Barking Bostons'
    • Rank will be scoped to the current query
    ┌──────┬──────────┬────────┐
    │ rank │ username │ points │
    ├──────┼──────────┼────────┤
    │    1 │ Brian    │    300 │
    │    2 │ Dan      │    200 │
    │    3 │ Dan      │    100 │
    │    4 │ Doug     │     30 │
    └──────┴──────────┴────────┘
    

    First attempt at getting a user's rank

    SELECT rank() OVER (ORDER BY scores.points DESC), username, points
    FROM scores
    WHERE game = 'Barking Bostons' AND username = 'Dan'
    • We get back the user's scores for the game
    • BUT the ranks are only for the user's scores, it does not take into account other users
    ┌──────┬──────────┬────────┐
    │ rank │ username │ points │
    ├──────┼──────────┼────────┤
    │    1 │ Dan      │    200 │
    │    2 │ Dan      │    100 │
    └──────┴──────────┴────────┘
    

    Using Common Table Expressions to get the ranks of a User's Scores

    • Common Table Expressions (CTEs) allow you to alias a subquery
    WITH top_scores_for_game AS (
      SELECT rank() OVER (ORDER BY points DESC), username, points
      FROM scores
      WHERE scores.game = 'Barking Bostons'
    )
    SELECT top_scores_for_game.*
    FROM top_scores_for_game
    • The above query will return all the results from the WITH statement

    Attempt #2

    • We'll use a CTE to get all the scores for a game, then find our user within those results

    WITH top_scores_for_game AS (
      SELECT rank() OVER (ORDER BY points DESC), username, points
      FROM scores
      WHERE scores.game = 'Barking Bostons'
    )
    SELECT top_scores_for_game.*
    FROM top_scores_for_game
    WHERE top_scores_for_game.username = 'Dan'
    ┌──────┬──────────┬────────┐
    │ rank │ username │ points │
    ├──────┼──────────┼────────┤
    │    2 │ Dan      │    200 │
    │    3 │ Dan      │    100 │
    └──────┴──────────┴────────┘
    

    Getting ranked results in ActiveRecord

    • We can drop this whole block of SQL into a find_by_sql call to get models back
    query = <<-SQL
    WITH top_scores_for_game AS (
      SELECT rank() OVER (ORDER BY points DESC), username, points
      FROM scores
      WHERE scores.game = 'Barking Bostons'
    )
    SELECT top_scores_for_game.*
    FROM top_scores_for_game
    WHERE top_scores_for_game.username = 'Dan'
    SQL
    
    Score.find_by_sql query
    => [#<Score ... >, #<Score ... >]
    

    Using postgres_ext for rank and CTE

    • postgres_ext has support for ranking results and querying against CTEs
    • Return the scores ranked over their points for the game named 'Barking Bostons'
    Score.ranked(points: :desc).where(game: 'Barking Bostons')

    • Now lets get the user's ranked scores for the game using a CTE

    Score.from_cte('top_scores_for_game',
      Score.ranked(points: :desc).where(game: 'Barking Bostons')).where(username: 'Dan')

    Wrapping Up

    • PostgreSQL has some really powerful features baked in
    • Your situation ultimately dictates the best approach
    • Play with different solutions to see what is the best performance/maintainability trade off for your specific use case
    • Don't just use it because "It's new/cool"

    Other PostgreSQL features to explore

    • JSON
    • Views
    • Schemas
    • Network Types (MAC address, IP Address)
    • Ranges
    • User defined types
    • PostGIS

    Questions?

    Thank You


    • Twitter: @_danmcclain
    • GitHub: github.com/danmcclain
    • Postgres_ext: github.com/dockyard/postgres_ext
    Made with Slides.com