Trending algorithms

by: Zlatko Đurić (@zladuric)

The Problem

The problem: what's the most popular stuff in the app?


  db.posts.find().sort({ views: -1 });

"No, no, we want, like stuff that's recent!"


  db.posts.find({ ts: { $gt: YESTERDAY } }).sort({ views: -1 });

"Hey, my super-funny joke from yesterday isn't showing?!"


  (Timestamp * TS_VALUE) + VIEWS / 3 + VIEWS_SINCE_YESTERDAY * 5

	+ VIEWS_IN_LAST_HOUR * 100 + ....

Define what's "hot"?

  • popular == trending?
  • decay/gravity
  • weight,  affinity

Agenda

  • Hacker News ranking
  • Reddit ranking
  • Some other mentions
  • How it works with MongoDB
  • "Simple popularity algorithm"

Hacker News Ranking Algorithm


  popularity = (p-1) / (t+2)^g

p - points

g - gravity

t - age in hours

https://medium.com/hacking-and-gonzo/how-hacker-news-ranking-algorithm-works-1d9b0cf2c08d

(= gravity* 1.8 timebase* 120 front-threshold* 1
       nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)

    (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
      (* (/ (let base (- (scorefn s) 1)
              (if (> base 0) (expt base .8) base))
            (expt (/ (+ (item-age s) timebase*) 60) gravity))
         (if (no (in s!type 'story 'poll))  .8
             (blank s!url)                  nourl-factor*
             (mem 'bury s!keys)             .001
                                            (* (contro-factor s)
                                               (if (mem 'gag s!keys)
                                                    gag-factor*
                                                   (lightweight s)
                                                    lightweight-factor*
                                                   1)))))

Reddit ranking algorithm

from datetime import datetime, timedelta
from math import log

epoch = datetime(1970, 1, 1)

def epoch_seconds(date):
    td = date - epoch
    return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)

def score(ups, downs):
    return ups - downs

def hot(ups, downs, date):
    s = score(ups, downs)
    order = log(max(abs(s), 1), 10)
    sign = 1 if s > 0 else -1 if s < 0 else 0
    seconds = epoch_seconds(date) - 1134028003
    return round(sign * order + seconds / 45000, 7)

https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9

Other notable mentions

  • PageRank, EdgeRank
  • Bayesian average
  • Wilson score
  • Britney Spears problem

Britney Spears

how many items - x = x + 1
sum of items - x = x + n
average - 3 counters + 3 ops

Most frequent item?

Majority rules

 http://www.americanscientist.org/issues/pub/the-britney-spears-problem

Ant algorithm 

 http://www.americanscientist.org/issues/pub/the-britney-spears-problem

MongoDB

mapReduce - slow for real-time querying

aggregation - has other limitations

Redis - helps caching

Simple popularity algorithm


  p = (p + t) / 2

p - popularity

t - timestamp of the current action

http://stackoverflow.com/questions/11128086/simple-popularity-algorithm

Time for a beer

Trending algorithms

By Zlatko Đurić

Trending algorithms

How to show "What's hot" in the news feed? What makes an item popular? How to customize a news feed per user?

  • 601