Beyond Worst-case Analysis Reading Group Lecture 3
Victor Sanches Portella
Computer Science @ UBC
May, 2020
Main/Slow Memory with \(N\) pages
Fast Memory (Cache) with \(k\) pages
Page requests arrive online
0 cost if in cache
1 cost if not in cache
Move to cache, may evict another page
First In, First Out (FIFO)
Least Recently Used (LRU)
Furthest In the Future (FIF)
Both perform equally bad in the worst case analysis
Idea: Compare with algorithms that know the entire sequence before-hand
Optimal Offline Algorithm (OPT)
Competitive Ratio of \(A\):
# cache misses of \(A\) in the sequence of requests \(\sigma\)
For any sequence \(\sigma\) and any algorithm \(B\)
\(A\) is \(\alpha\)-instance optimal
... this is really strong
We want to compare the online algorithms with a optimal offline algorithm. Do we know any optimal offline algorithm?
Theorem 2.1 (Belady's Theorem) The Furthest In the Future (FIF) algorithm is offline optimal
1
5
3
2
4
5
3
5
1
2
4
1
2
...
k
k+1
Proof:
Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).
Cache size \(k\). Main memory size \(N = k+1\).
Online algorithm \(A\).
Idea: Construct (inductively) a seq. where \(A\) misses all the time
\(A\) evicts:
...
Cost of FIF?
Warm-up
Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).
Proof:
Cache size \(k\). Main memory size \(N = k+1\).
Online algorithm \(A\).
Idea: Construct (inductively) a seq. where \(A\) misses all the time
Cost of FIF?
\(N-1 = k\) distinct of \(q\)
Cache of size \(k\)
Only \(q\) is out of cache during this time (post warm-up)
Length \(\geq k\)
Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).
Proof:
Cache size \(k\). Main memory size \(N = k+1\).
Online algorithm \(A\).
Idea: Construct (inductively) a seq. where \(A\) misses all the time
Every deterministic algorithm performs at least \(k\) times worse than OPT
In practice, algorithms such as LRU do not suffer from this problem
Horrible performance
This lower bound holds even for algorithms that can look a finite amount into the future.
Maybe competitive ratio doesn't reflect real-world performance
OPT is incredibly powerful
Is competitive ratio useful to compare algorithms?
Theorem 4.3 LRU has competitive ratio \(k\)
Proof:
Let \(\sigma\) be a rquest sequence.
Divide \(\sigma\) in block \(\sigma_1, \dotsc, \sigma_b\), where \(\sigma_i\) is maximal prefix with \(k\) distinct pages
Claim 1:
LRU suffers at most \(k\) cache misses in each block.
Claim 2:
FIF suffers at least \(b\) faults.
Claim 1:
LRU suffers at most \(k\) cache misses in each block.
Claim 2:
FIF suffers at least \(b\) faults.
After a page \(p\) has been placed in cache, \(k\) distinct requests need to happen for \(p\) to be evicted.
Each block has \(\leq k\) distinct pages
\(\implies\) A page is placed in cache in a block is not removed during the same block.
Claim 1:
LRU suffers at most \(k\) cache misses in each block.
Claim 2:
FIF suffers at least \(b\) faults.
At the end of a block \(\sigma_i\), the cache is full with the \(k\) pages of \(\sigma_i\)
The first page of \(\sigma_{i+1}\) ( ) is not in cache.
\(\implies\) FIF faults at least once per block
Competitive ratio shows that the famous LRU has optimal competitive ratio.
The ratio may be good to compare algorithms
Not quite. Another "optimal" algorithm: Flush-When-Full (FWF)
Clears entire cache in the case of a single page fault.
FWF still suffers at most \(k\) loss in a block from the previous proof
Comp. ratio \(k\), but this algorithm is HORRIBLE
Explains performance of algorithms in practice?
Useful to design new algorithms?
No. \(k\) times far from OPT is not what happens in practice.
Useful to compare algorithms?
Kind of. Shows the LRU is optimal, but other stupid algorithms are also optimal.
Not really. FWF shows that competitive ratio ignores somethings which are really important.
Comp. analysis is a incredibly useful technique. But in this case it is not working well out-of-the-box.
OPT is too powerful, it is unfair to compare it with online algorithms
Idea: Compare an algorithm with a handicapped OPT
# Faults of \(A\) with cache \(k\) on the seq. \(\sigma\)
Compare \(A\) with cache \(k\) against OPT with cache \(h \leq k\)
Still is a worst-case analysis
Theorem 4.3 LRU has competitive ratio \(k\)
Claim 1:
LRU suffers at most \(k\) cache misses in each block.
Claim 2:
FIF suffers at least \(b\) faults.
What if FIF has only \(h \leq k\) of cache?
\(h-1\) of space
\(k\) distinct pages \(\neq p\)
k - (h - 1) faults per block
\((k - h + 1)b\) faults
Cache
Claim 1:
LRU suffers at most \(k\) cache misses in each block.
Claim 2:
FIF suffers at least \(b\) faults.
\((k - h + 1)b\) faults
\(h = k/2\)
\(h = k/2\)
How to use this "in practice"
See how much cache you need for your need based on the performance of OPT
Double the amount of cache, so LRU performs almost as well as OPT
Any theoretical use for resource augmentation?
Theorem 6.1
For all \(\varepsilon, \delta > 0\), all \(n\), all request seq. \(\sigma\),
there is \(S \subseteq \{1, \dotsc, n\}\) with \(|S| \geq (1 - \varepsilon)n\) such that, for all \(k \in S\),
either
or
Good relative performance
Good absolute performance
Caveat: Order of quantifiers
Fix \(\varepsilon, \delta, n, \sigma\). Let \(k, g \in \mathbb{N}\).
By the resource aug. theorem,
When removing \(g\) pages from LRU, one of these happens:
Good \(k\)
Bad \(k\)
If # bad sizes \(\leq \varepsilon n\), we are done.
Suppose \(> \varepsilon n\) bad k's. Idea:
\(\sim \varepsilon n\) bad k's
All absol. good!
\(g\) bads
\(\leq \varepsilon n\)
I will use without proof:
Pick \(t \in \{1, \dotsc, n\}\) such that we have \(\sim \varepsilon n\) bad k's between \(1\) and \(t - g\)
\(\sim \varepsilon n\) bad k's
All absol. good!
\(g\) bads
\(\leq \varepsilon n\)
Good \(k\)
Total of \(2 \varepsilon n\) bad k's. Use \(\varepsilon/2\) instead.
Resource augmentation is a way to make more fine-grained analysis when to problem is parameterized by some kind of resource
This technique yields non-interpretable claims, but was very useful to derive good results (loosely competivive)
This is still a worst-case analysis. We ignore the form of the input.
LRU works well in practice because of the form of real-world data.