Beyond Worst-case Analysis Reading Group Lecture 3

Victor Sanches Portella

Computer Science @ UBC

May, 2020

Main/Slow Memory with **\(N\) **pages

Fast Memory (Cache) with **\(k\)** pages

Page requests arrive **online**

0 cost if in cache

1 cost if not in cache

Move to cache, may evict another page

**F**irst **I**n, **F**irst **O**ut** (FIFO)**

**L**east **R**ecently **U**sed** (LRU)**

**F**urthest **I**n the **F**uture** (FIF)**

Both perform equally bad in the worst case analysis

**Idea: **Compare with algorithms that know the entire sequence before-hand

**Opt**imal Offline Algorithm ** (OPT)**

**Competitive Ratio of \(A\):**

\mathrm{cost}(A, \sigma) =

# cache misses of \(A\) in the sequence of requests \(\sigma\)

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(A, \sigma)}{\mathrm{cost}(\text{OPT}, \sigma)}

\geq 1

\alpha

\mathrm{cost}(A, \sigma) \leq \alpha \;\mathrm{cost}(\text{OPT}, \sigma)

For **any** sequence \(\sigma\) and **any** algorithm \(B\)

\leq \alpha \;\mathrm{cost}(B, \sigma)

**\(A\) is \(\alpha\)-instance optimal**

... this is really strong

We want to compare the online algorithms with a optimal offline algorithm. Do we know any optimal offline algorithm?

**Theorem 2.1 (Belady's Theorem) **The Furthest In the Future (FIF) algorithm is offline optimal

1

5

3

2

4

5

3

5

1

2

4

\sigma

1

2

...

k

k+1

**Proof:**

**Theorem 4.1 **Every deterministic algorithm has competitive ratio at least the cache size \(k\).

Cache size **\(k\). **Main memory size **\(N = k+1\)**.

Online algorithm **\(A\)**.

**Idea: **Construct (inductively) a seq. where \(A\) misses all the time

\sigma

\(A\) evicts:

p_1

p_1

p_2

p_2

p_3

p_3

p_4

...

\text{cost}(A, \sigma) = |\sigma|

**Cost of FIF?**

Warm-up

**Theorem 4.1 **Every deterministic algorithm has competitive ratio at least the cache size \(k\).

**Proof:**

Cache size **\(k\). **Main memory size **\(N = k+1\)**.

Online algorithm **\(A\)**.

**Idea: **Construct (inductively) a seq. where \(A\) misses all the time

\text{cost}(A, \sigma) = |\sigma|

**Cost of FIF?**

\(N-1 = k\) distinct of \(q\)

q

q

p

Cache of size \(k\)

\implies

Only \(q\) is out of cache during this time (post warm-up)

\text{cost}(\text{FIF}, \sigma) \leq k + \frac{|\sigma|}{k}

Length \(\geq k\)

**Theorem 4.1 **Every deterministic algorithm has competitive ratio at least the cache size \(k\).

**Proof:**

Cache size **\(k\). **Main memory size **\(N = k+1\)**.

Online algorithm **\(A\)**.

**Idea: **Construct (inductively) a seq. where \(A\) misses all the time

\text{cost}(A, \sigma) = |\sigma|

\text{cost}(\text{FIF}, \sigma) \leq k + \frac{|\sigma|}{k}

\displaystyle \frac{\text{cost}(A, \sigma)}{\text{cost}(\text{FIF}, \sigma)}

\text{cost}(A, \sigma) = |\sigma|

\displaystyle \geq \frac{|\sigma|}{k + \frac{|\sigma|}{k}}

\displaystyle \longrightarrow k

\displaystyle \text{as}~|\sigma| \to \infty

Every deterministic algorithm performs at least \(k\) times worse than OPT

In practice, algorithms such as LRU do not suffer from this problem

Horrible performance

This lower bound holds even for algorithms that can look a finite amount into the future.

Maybe competitive ratio doesn't reflect real-world performance

OPT is incredibly powerful

Is competitive ratio useful to compare algorithms?

**Theorem 4.3 **LRU has competitive ratio \(k\)

**Proof:**

Let \(\sigma\) be a rquest sequence.

\sigma

Divide \(\sigma\) in block \(\sigma_1, \dotsc, \sigma_b\), where \(\sigma_i\) is maximal prefix with \(k\) distinct pages

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

**Claim 1:**

LRU suffers at most \(k\) cache misses in each block.

**Claim 2:**

FIF suffers at least \(b\) faults.

\displaystyle \frac{\text{cost}(A, \sigma)}{\text{cost}(\text{FIF}, \sigma)}

\displaystyle \leq \frac{k b}{b}

\displaystyle =k

\sigma

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

**Claim 1:**

LRU suffers at most \(k\) cache misses in each block.

**Claim 2:**

FIF suffers at least \(b\) faults.

After a page **\(p\)** has been placed in cache, **\(k\) **distinct requests need to happen for **\(p\)** to be evicted.

Each block has \(\leq k\) distinct pages

\(\implies\) A page is placed in cache in a block is not removed during the same block.

\sigma

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

**Claim 1:**

LRU suffers at most \(k\) cache misses in each block.

**Claim 2:**

FIF suffers at least \(b\) faults.

At the end of a block \(\sigma_i\), the cache is full with the \(k\) pages of \(\sigma_i\)

The first page of \(\sigma_{i+1}\) ( ) is not in cache.

\(\implies\) FIF faults at least once per block

Competitive ratio shows that the famous LRU has optimal competitive ratio.

The ratio may be good to compare algorithms

Not quite. Another "optimal" algorithm: **Flush-When-Full (FWF)**

Clears entire cache in the case of a single page fault.

FWF still suffers at most \(k\) loss in a block from the previous proof

Comp. ratio \(k\), but this algorithm is **HORRIBLE**

**Explains performance of algorithms in practice?**

**Useful to design new algorithms?**

No. \(k\) times far from OPT is not what happens in practice.

**Useful to compare algorithms?**

Kind of. Shows the LRU is optimal, but other stupid algorithms are also optimal.

Not really. FWF shows that competitive ratio ignores somethings which are really important.

**Comp. analysis is a incredibly useful technique. But in this case it is not working well out-of-the-box.**

OPT is too powerful, it is unfair to compare it with online algorithms

**Idea:** Compare an algorithm with a *handicapped *OPT

\text{cost}(A, k, \sigma) =

k

# Faults of \(A\) with **cache \(k\) **on the seq. \(\sigma\)

Compare \(A\) with cache \(k\) against OPT with cache \(h \leq k\)

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(A, k, \sigma)}{\mathrm{cost}(\text{OPT}, h, \sigma)}

Still is a worst-case analysis

**Theorem 4.3 **LRU has competitive ratio \(k\)

**Claim 1:**

LRU suffers at most \(k\) cache misses in each block.

**Claim 2:**

FIF suffers at least \(b\) faults.

What if FIF has only \(h \leq k\) of cache?

\sigma

\sigma_i

k~\text{distinct}

\sigma_{i+1}

k~\text{distinct}

p

p

\(h-1\) of space

\(k\) distinct pages \(\neq p\)

\implies

k - (h - 1) faults per block

\((k - h + 1)b\) faults

**Cache**

**Claim 1:**

LRU suffers at most \(k\) cache misses in each block.

**Claim 2:**

FIF suffers at least \(b\) faults.

\((k - h + 1)b\) faults

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(\text{LRU}, k, \sigma)}{\mathrm{cost}(\text{OPT}, h, \sigma)}
\leq \frac{k}{k - h + 1}

\displaystyle
\mathrm{cost}(\text{LRU}, 2k, \sigma) \leq 2 \mathrm{cost}(\text{OPT}, k, \sigma)

\(h = k/2\)

\displaystyle
\mathrm{cost}(\text{LRU}, 2k, \sigma) \leq 2 \mathrm{cost}(\text{OPT}, k, \sigma)

\(h = k/2\)

**How to use this "in practice"**

See how much cache you need for your need based on the performance of OPT

Double the amount of cache, so LRU performs almost as well as OPT

**Any theoretical use for resource augmentation?**

**Theorem 6.1**

For all \(\varepsilon, \delta > 0\), all \(n\), all request seq. \(\sigma\),

there is \(S \subseteq \{1, \dotsc, n\}\) with \(|S| \geq (1 - \varepsilon)n\) such that, for all \(k \in S\),

\text{cost}(\text{LRU}, k, \sigma)
\leq O(\frac{1}{\varepsilon} \log \frac{1}{\delta}) \text{cost}(\text{OPT}, k, \sigma)

either

\text{cost}(\text{LRU}, k, \sigma)
\leq \delta |\sigma|

or

Good relative performance

Good absolute performance

**Caveat: **Order of quantifiers

Fix \(\varepsilon, \delta, n, \sigma\). Let \(k, g \in \mathbb{N}\).

By the resource aug. theorem,

\displaystyle
\mathrm{cost}(\text{LRU}, k + g, \sigma) \leq \frac{k + g}{g + 1} \mathrm{cost}(\text{OPT}, k, \sigma)

When removing \(g\) pages from LRU, one of these happens:

\displaystyle
\mathrm{cost}(\text{LRU}, k, \sigma) \leq 2 \mathrm{cost}(\text{LRU}, k + g, \sigma)

\displaystyle
\mathrm{cost}(\text{LRU}, k, \sigma) > 2 \mathrm{cost}(\text{LRU}, k + g, \sigma)

Good \(k\)

Bad \(k\)

If # bad sizes \(\leq \varepsilon n\), we are done.

Suppose \(> \varepsilon n\) bad k's. **Idea:**

1

n

t

t - g

\(\sim \varepsilon n\) bad k's

All absol. good!

\(g\) bads

\(\leq \varepsilon n\)

\displaystyle
\mathrm{cost}(\text{LRU}, k + 1, \sigma) \leq \mathrm{cost}(\text{LRU}, k, \sigma)

I will use without proof:

\displaystyle
\mathrm{cost}(\text{LRU}, t, \sigma) < \frac{1}{2} \mathrm{cost}(\text{LRU}, k_1, \sigma)

Pick \(t \in \{1, \dotsc, n\}\) such that we have \(\sim \varepsilon n\) bad k's between \(1\) and \(t - g\)

1

t - g

k_1

\geq g

k_2

\geq g

k_3

\geq g

\dotsm

k_4

\sim \frac{\varepsilon n}{b}~\text{bad k's}

\displaystyle
< \frac{1}{2^2} \mathrm{cost}(\text{LRU}, k_2, \sigma)

\displaystyle
< \frac{1}{2^3} \mathrm{cost}(\text{LRU}, k_3, \sigma)

\displaystyle
\dotsm

\displaystyle
< 2^{-\varepsilon n/g} \mathrm{cost}(\text{LRU}, 1, \sigma)

\displaystyle
= 2^{-\varepsilon n/g} |\sigma|

\delta
= 2^{-\varepsilon n/g} |\sigma|

\displaystyle g = \frac{\varepsilon n}{\log(1/\delta)} \leq \varepsilon n

\iff

t \geq k_1 + g

k_1 \geq k_2 + g

k_2 \geq k_3 + g

1

n

t

t - g

\(\sim \varepsilon n\) bad k's

All absol. good!

\(g\) bads

\(\leq \varepsilon n\)

\displaystyle
\mathrm{cost}(\text{LRU}, k, \sigma) \leq 2 \frac{k + g}{g + 1} \mathrm{cost}(\text{OPT}, k, \sigma)

Good \(k\)

Total of \(2 \varepsilon n\) bad k's. Use \(\varepsilon/2\) instead.

\displaystyle
\leq 2 \frac{n + g}{g} \mathrm{cost}(\text{OPT}, k, \sigma)

\displaystyle
\leq O\left(\frac{1}{\varepsilon} \log \frac{1}{\delta}\right) \mathrm{cost}(\text{OPT}, k, \sigma)

Resource augmentation is a way to make more **fine-grained analysis** when to problem is parameterized by some kind of **resource**

This technique yields non-interpretable claims, but was very useful to derive good results (**loosely competivive**)

This is still a **worst-case analysis.** We ignore the form of the input.

LRU works well in practice **because of** the form of real-world data.