by: Zlatko Đurić (@zladuric)
The problem: what's the most popular stuff in the app?
db.posts.find().sort({ views: -1 });
"No, no, we want, like stuff that's recent!"
db.posts.find({ ts: { $gt: YESTERDAY } }).sort({ views: -1 });
"Hey, my super-funny joke from yesterday isn't showing?!"
(Timestamp * TS_VALUE) + VIEWS / 3 + VIEWS_SINCE_YESTERDAY * 5
+ VIEWS_IN_LAST_HOUR * 100 + ....
popularity = (p-1) / (t+2)^g
p - points
g - gravity
t - age in hours
https://medium.com/hacking-and-gonzo/how-hacker-news-ranking-algorithm-works-1d9b0cf2c08d
(= gravity* 1.8 timebase* 120 front-threshold* 1
nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)
(def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
(* (/ (let base (- (scorefn s) 1)
(if (> base 0) (expt base .8) base))
(expt (/ (+ (item-age s) timebase*) 60) gravity))
(if (no (in s!type 'story 'poll)) .8
(blank s!url) nourl-factor*
(mem 'bury s!keys) .001
(* (contro-factor s)
(if (mem 'gag s!keys)
gag-factor*
(lightweight s)
lightweight-factor*
1)))))
from datetime import datetime, timedelta
from math import log
epoch = datetime(1970, 1, 1)
def epoch_seconds(date):
td = date - epoch
return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)
def score(ups, downs):
return ups - downs
def hot(ups, downs, date):
s = score(ups, downs)
order = log(max(abs(s), 1), 10)
sign = 1 if s > 0 else -1 if s < 0 else 0
seconds = epoch_seconds(date) - 1134028003
return round(sign * order + seconds / 45000, 7)
https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9
how many items - x = x + 1
sum of items - x = x + n
average - 3 counters + 3 ops
Most frequent item?
Majority rules
http://www.americanscientist.org/issues/pub/the-britney-spears-problem
http://www.americanscientist.org/issues/pub/the-britney-spears-problem
mapReduce - slow for real-time querying
aggregation - has other limitations
Redis - helps caching
p = (p + t) / 2
p - popularity
t - timestamp of the current action
http://stackoverflow.com/questions/11128086/simple-popularity-algorithm