Trending algorithms

by: Zlatko Đurić (@zladuric)

The Problem

The problem: what's the most popular stuff in the app?

  db.posts.find().sort({ views: -1 });

"No, no, we want, like stuff that's recent!"

  db.posts.find({ ts: { $gt: YESTERDAY } }).sort({ views: -1 });

"Hey, my super-funny joke from yesterday isn't showing?!"

  (Timestamp * TS_VALUE) + VIEWS / 3 + VIEWS_SINCE_YESTERDAY * 5

	+ VIEWS_IN_LAST_HOUR * 100 + ....

Define what's "hot"?

  • popular == trending?
  • decay/gravity
  • weight,  affinity


  • Hacker News ranking
  • Reddit ranking
  • Some other mentions
  • How it works with MongoDB
  • "Simple popularity algorithm"

Hacker News Ranking Algorithm

  popularity = (p-1) / (t+2)^g

p - points

g - gravity

t - age in hours

(= gravity* 1.8 timebase* 120 front-threshold* 1
       nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)

    (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
      (* (/ (let base (- (scorefn s) 1)
              (if (> base 0) (expt base .8) base))
            (expt (/ (+ (item-age s) timebase*) 60) gravity))
         (if (no (in s!type 'story 'poll))  .8
             (blank s!url)                  nourl-factor*
             (mem 'bury s!keys)             .001
                                            (* (contro-factor s)
                                               (if (mem 'gag s!keys)
                                                   (lightweight s)

Reddit ranking algorithm

from datetime import datetime, timedelta
from math import log

epoch = datetime(1970, 1, 1)

def epoch_seconds(date):
    td = date - epoch
    return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)

def score(ups, downs):
    return ups - downs

def hot(ups, downs, date):
    s = score(ups, downs)
    order = log(max(abs(s), 1), 10)
    sign = 1 if s > 0 else -1 if s < 0 else 0
    seconds = epoch_seconds(date) - 1134028003
    return round(sign * order + seconds / 45000, 7)

Other notable mentions

  • PageRank, EdgeRank
  • Bayesian average
  • Wilson score
  • Britney Spears problem

Britney Spears

how many items - x = x + 1
sum of items - x = x + n
average - 3 counters + 3 ops

Most frequent item?

Majority rules

Ant algorithm


mapReduce - slow for real-time querying

aggregation - has other limitations

Redis - helps caching

Simple popularity algorithm

  p = (p + t) / 2

p - popularity

t - timestamp of the current action

Time for a beer

Made with