Sql Performance for Rails

@h6165

Abhishek Yadav

ரூபீ ப்ரோக்ராமர்

Co-organizer: Chennai.rb

Topics covered

Indexing
N+1s
Joins
Pagination and Reports
Geoquerying
Serialization

Indexing

Indexing - the goal

The goal is the avoid full table scans

Indexing - the basics

An index is a on disk data structure
Like the index in a book, it can point to the row where our data sits
We have to create indexes manually, just like we create tables
An index is created for a table and a column or columns

Indexing - the basics

Example (Rails and mysql) -


-- Create an index

CREATE INDEX users_on_email ON users (email);


-- It is used for this query
select * from users where email="abhishek@example.com";


ActiveRecord::Migration.add_index :users, :email

User.where(email: "abhishek@example.com")

Indexing - the essentials

We need indexes for WHERE clause
We need indexes for ORDER BY clause
We need indexes for reference columns
We need compound indexes sometimes
We don't need low selectivity indexes
We need function indexes sometimes
We need partial indexes sometimes

Indexing - ORDER BY clause

A query requesting less data will also do a full table scan if the ORDER BY clause has no index


-- Full table scan if there is no index on created_at

EXPLAIN SELECT * from users ORDER BY created_at DESC LIMIT 10;

+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | users | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 50019 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+

Indexing - ORDER BY clause - composite index

A query with multiple columns in ORDER BY needs a composite index


-- Full table scan if there is no composite index

explain SELECT * from users ORDER BY created_at, first_name LIMIT 10;

+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | users | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 50019 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+

Indexing - ORDER BY clause - composite index

For a composite index sequence is also important
Best to keep high-cardinalty column first


-- create the compound index

CREATE INDEX users_on_created_at_first_name ON users (created_at, first_name);


-- Check explain

explain SELECT * from users ORDER BY created_at, first_name LIMIT 10;
+----+-------------+-------+------------+-------+---------------+--------------------------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key                            | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------------------------+---------+------+------+----------+-------+
|  1 | SIMPLE      | users | NULL       | index | NULL          | users_on_created_at_first_name | 773     | NULL |   10 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+--------------------------------+---------+------+------+----------+-------+

Indexing - Pattern match

A %abhishek% will not be able to use any index
A abhishek% will be able to use an index

Indexing - Polymorphic associations - composite index

For a polymorphic association, we need a composite index on the (id,type) combination

Indexing - Reference IDs (foreign-keys)

The foreign-key ids can use indexes (belongs_to, habtm)
Need not add index if a foreign key constraint is present

Indexing - Function indexes

SQL functions applied on a column will not be able to use indexes
Mysql (pre 5.7) doesn't have function indexes
Postgresql has them


-- Does not use the index on email (lower is not needed in mysql)
select * from users where lower(email)="abhishek@example.com";

Indexing - Partial indexes

Partial indexes allow us to index part a subset of a table
Useful when we have a pseudo-delete implementation
Available in Postgres

Indexing - Low selectivity indexes

Indexes on columns that have few distinct values are useless
They also add to write cost
Example - gender

N+1s

Handling N+1s

Are the biggest cause of performance problems in Rails apps
Problem much bigger with APIs and reports
Easy to introduce, not as easy to fix

Handling N+1s - the basics

Happen when we run a query inside a loop
Can be fixed by eager-loading


# Leads to N+1
@users = User.featured.order(:rating).limit(20)

<% @users.each do |user| %>
  <%= user.name %> has <%= user.posts.size %> posts
<% end %>


# Eager loading
@users = User.featured.order(:rating).limit(20).includes(:posts)
@users = User.featured.order(:rating).limit(20).eager_load(:posts)

Handling N+1s - eager loading

includes fires another query, eager_load does an OUTER JOIN
includes generally sufficient 80% of times
eager_load can lead to a slow query


# Leads to N+1
@users = User.featured.order(:rating).limit(20)

<% @users.each do |user| %>
  <%= user.name %> has <%= user.posts.size %> posts
<% end %>


# Eager loading
@users = User.featured.order(:rating).limit(20).includes(:posts)
@users = User.featured.order(:rating).limit(20).eager_load(:posts)

Handling N+1s - unintended

Api building library generally require us to specify methods that get the needed data. And then the methods are run over a collection or records
If any of the methods contain a query, N+1s happen
Eager loading doesn't help

# This method will always fire a query. Eager loading or not
def ratings_count
  self.ratings.count
end

Handling N+1s - unintended

The only solution is to not use any query methods in such scenario (API building, reports etc)
Better to use Ruby code based alternatives in some places
Eg1 - count

# counting in Ruby is better than using SQL count

def ratings_count
  self.ratings.to_a.count
end

Handling N+1s - unintended

Eg2 - overuse of scopes


class User

  # Leads N+1s, despite the eager-loading,
  # because Rating.active is a scope that fires another query
  def active_ratings
    self.ratings.active
  end

  # A simple alternative, doesn't cause N+1
  def active_ratings2
    self.ratings{ |r| r.active? }
  end

end

Handling N+1s - unintended

Eg2 - avoid the pluck



# Pluck will always fire a query
def ratings_sum
  ratings.pluck(:stars).sum
end

# Here if ratings is eager-loaded, we avoid the extra query
def ratings_sum2
  ratings.select(&:stars).sum
end

Handling N+1s - eager-load overdone

Eager load calls can end-up loading too much data



# Restaurants in the city may be few
# but some of them may have too many comments or check-ins

Restaurant.in_city(city)
  .includes(:ratings => [:rater])
  .includes(:comments => [:commenter, :comment_ratings])
  .includes(:check_ins => [:customer])

Joins

Joins can easily become bottlenecks if not done right
Outer joins and inner joins work differently
Use of indexes is confusing
De-normalizing can help

Joins - the risk

The biggest risk is when we request a join of two large tables
Database may have to do MxN amount of work
Indexes can help somewhat, but at least one table will undergo full scan
With habtms, we might be joining three tables


# Get top twenty restaurants by ratings
# Doest show those without ratings

Restaurant.joins(:ratings)
    .select("restaurants.id, count(ratings.restaurant_id) as ratings_count")
    .group("ratings.restaurant_id")
    .order("ratings_count desc")
    .limit(20)

Joins - recommendations

Inner Joins work best when one set is small
We can unsure that by introducing some kind of scope, on at least one side


# Get top twenty Restaurants by recent ratings
# Doest show restaurants without ratings

Restaurant.joins(:ratings)
    .where("ratings.created_at > ?", 3.month_ago)
    .select("restaurants.id, count(ratings.restaurant_id) as ratings_count")
    .group("ratings.restaurant_id")
    .order("ratings_count desc")
    .limit(20)

Joins - sub-queries

Inner Joins can sometimes be avoided by sub-queries


# Get restaurants that were rated by the user'r friends

rating_ids = Rating.where(user_id: users.friend_ids)
                   .select(:restaurant_id)
Restaurant.where(id: rating_ids)

# The above goes as one query

Joins - denormalize

Add columns that keep the data we are getting from the joins


# Add a column ratings_count that keeps the count
# Can use Rails's counter-cache

User.order("ratings_count desc")
    .limit(20)

Pagination and reports

Pagination - exports

In some scenarios, like csv exports, pagination is not naturally required. This can lead to query performance degradation
Sometimes the degradation is noticed after a duration of use.
Also happened to me for non csv exports exports (API, we-report)

Pagination - exports

Solution-1: always have some kind of limits (like data from last one year). Product owners don't always agree to this, so having something arbitrary and sizeable is also ok.
Solution-2: do reports in background, send them via email

Pagination - pages

Page based pagination also suffers from performance problem on larger offsets (when page nos are high)
This is because the sql offset can't work without full table scan
When data-set grows in size, use alternatives like window-functions
Infinite scroll type pagination always performs fine, should be chosen when possible.

Geo-quering

Geo-querying - closest to given location

'Show me ten restaurants closest to my location' is a common query, specially in mobile applications
Simplest way of doing this is to calculate distance in a sql query

SELECT
restaurants.*,
  ROUND(
    3956 * 2 * ASIN ( SQRT (
      POWER( SIN((13.14 - restaurants.latitude)*pi()/180 / 2),2)
      + COS(orig_lat * pi()/180) 
      * COS(restaurants.latitude *pi()/180) 
      * POWER(SIN((80.12 - restaurants.longitude) *pi()/180 / 2), 2) 
    ) ),2) as distance
FROM
 restaurants;

Geo-querying - closest to given location

The calculation is slow on large datasets as it performs full table scan
Indexing is difficult, and database dependent

SELECT
restaurants.*,
  ROUND(
    3956 * 2 * ASIN ( SQRT (
      POWER( SIN((13.14 - restaurants.latitude)*pi()/180 / 2),2)
      + COS(orig_lat * pi()/180) 
      * COS(restaurants.latitude *pi()/180) 
      * POWER(SIN((80.12 - restaurants.longitude) *pi()/180 / 2), 2) 
    ) ),2) as distance
FROM
 restaurants;

Geo-querying - possible alternatives - 1

Add a max distance clause

SELECT
restaurants.*,
  ROUND(
    3956 * 2 * ASIN ( SQRT (
      POWER( SIN((13.14 - restaurants.latitude)*pi()/180 / 2),2)
      + COS(orig_lat * pi()/180) 
      * COS(restaurants.latitude *pi()/180) 
      * POWER(SIN((80.12 - restaurants.longitude) *pi()/180 / 2), 2) 
    ) ),2) as distance
FROM
 restaurants;
WHERE 
 distance < 100

Geo-querying - possible alternatives - 2

Use geo-spacial indexes
Postgres has them for a while
Mysql-5.7 has them

Geo-querying - possible alternatives - 3

Use Elasticsearch or similar

rails-sql-perf

By Abhishek Yadav

rails-sql-perf

1,126

Abhishek Yadav

h6165

Sql Performance for Rails

@h6165

Topics covered

Indexing

Indexing - the goal

Indexing - the basics

Indexing - the basics

Indexing - the essentials

Indexing - ORDER BY clause

Indexing - ORDER BY clause - composite index

Indexing - ORDER BY clause - composite index

Indexing - Pattern match

Indexing - Polymorphic associations - composite index

Indexing - Reference IDs (foreign-keys)

Indexing - Function indexes

Indexing - Partial indexes

Indexing - Low selectivity indexes

N+1s

Handling N+1s

Handling N+1s - the basics

Handling N+1s - eager loading

Handling N+1s - unintended

Handling N+1s - unintended

Handling N+1s - unintended

Handling N+1s - unintended

Handling N+1s - eager-load overdone

Joins

Joins

Joins - the risk

Joins - recommendations

Joins - sub-queries

Joins - denormalize

Pagination and reports

Pagination - exports

Pagination - exports

Pagination - pages

Geo-quering

Geo-querying - closest to given location

Geo-querying - closest to given location

Geo-querying - possible alternatives - 1

Geo-querying - possible alternatives - 2

Geo-querying - possible alternatives - 3

rails-sql-perf

More from Abhishek Yadav