Online Paging Problem

Online Paging (Sledor & Tarjan '85)

Main/Slow Memory with \(N\) pages

Fast Memory (Cache) with \(k\) pages

1

2

3

4

5

6

7

8

9

10

11

12

5

1

11

9

Page requests arrive online

5

1

11

9

1

7

1

0 cost if in cache

1 cost if not in cache

Move to cache, may evict another page

7

1 Deciding which pages to evict

Goal: Algorithm that evicts pages in a way to minimize cache misses

First In, First Out (FIFO)

Least Recently Used (LRU)

Furthest In the Future (FIF)

Both perform equally bad in the worst case analysis

Idea: Compare with algorithms that know the entire sequence before-hand

Optimal Offline Algorithm (OPT)

Competitive Analysis

Competitive Ratio of \(A\):

\mathrm{cost}(A, \sigma) =

# cache misses of \(A\) in the sequence of requests \(\sigma\)

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(A, \sigma)}{\mathrm{cost}(\text{OPT}, \sigma)}

\geq 1

\alpha

\mathrm{cost}(A, \sigma) \leq \alpha \;\mathrm{cost}(\text{OPT}, \sigma)

For any sequence \(\sigma\) and any algorithm \(B\)

\leq \alpha \;\mathrm{cost}(B, \sigma)

\(A\) is \(\alpha\)-instance optimal

... this is really strong

Competitive Analysis for Online Paging

An Optimal Offline Algorithm

We want to compare the online algorithms with a optimal offline algorithm. Do we know any optimal offline algorithm?

Theorem 2.1 (Belady's Theorem) The Furthest In the Future (FIF) algorithm is offline optimal

1

5

3

2

4

5

3

5

1

2

4

\sigma

A Lower Bound on the Competitive Ratio

1

2

...

k

k+1

Proof:

Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).

Cache size \(k\). Main memory size \(N = k+1\).

Online algorithm \(A\).

Idea: Construct (inductively) a seq. where \(A\) misses all the time

\sigma

\(A\) evicts:

p_1

p_2

p_3

p_4

...

\text{cost}(A, \sigma) = |\sigma|

Cost of FIF?

Warm-up

A Lower Bound on the Competitive Ratio

Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).

Proof:

Cache size \(k\). Main memory size \(N = k+1\).

Online algorithm \(A\).

Idea: Construct (inductively) a seq. where \(A\) misses all the time

\text{cost}(A, \sigma) = |\sigma|

Cost of FIF?

\(N-1 = k\) distinct of \(q\)

q

p

Cache of size \(k\)

\implies

Only \(q\) is out of cache during this time (post warm-up)

\text{cost}(\text{FIF}, \sigma) \leq k + \frac{|\sigma|}{k}

Length \(\geq k\)

A Lower Bound on the Competitive Ratio

Theorem 4.1 Every deterministic algorithm has competitive ratio at least the cache size \(k\).

Proof:

Cache size \(k\). Main memory size \(N = k+1\).

Online algorithm \(A\).

Idea: Construct (inductively) a seq. where \(A\) misses all the time

\text{cost}(A, \sigma) = |\sigma|

\text{cost}(\text{FIF}, \sigma) \leq k + \frac{|\sigma|}{k}

\displaystyle \frac{\text{cost}(A, \sigma)}{\text{cost}(\text{FIF}, \sigma)}

\text{cost}(A, \sigma) = |\sigma|

\displaystyle \geq \frac{|\sigma|}{k + \frac{|\sigma|}{k}}

\displaystyle \longrightarrow k

\displaystyle \text{as}~|\sigma| \to \infty

Discussing the Lower bound

Every deterministic algorithm performs at least \(k\) times worse than OPT

In practice, algorithms such as LRU do not suffer from this problem

Horrible performance

This lower bound holds even for algorithms that can look a finite amount into the future.

Maybe competitive ratio doesn't reflect real-world performance

OPT is incredibly powerful

Is competitive ratio useful to compare algorithms?

LRU is online optimal

Theorem 4.3 LRU has competitive ratio \(k\)

Proof:

Let \(\sigma\) be a rquest sequence.

\sigma

Divide \(\sigma\) in block \(\sigma_1, \dotsc, \sigma_b\), where \(\sigma_i\) is maximal prefix with \(k\) distinct pages

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

Claim 1:

LRU suffers at most \(k\) cache misses in each block.

Claim 2:

FIF suffers at least \(b\) faults.

\displaystyle \frac{\text{cost}(A, \sigma)}{\text{cost}(\text{FIF}, \sigma)}

\displaystyle \leq \frac{k b}{b}

\displaystyle =k

LRU is online optimal

\sigma

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

Claim 1:

LRU suffers at most \(k\) cache misses in each block.

Claim 2:

FIF suffers at least \(b\) faults.

After a page \(p\) has been placed in cache, \(k\) distinct requests need to happen for \(p\) to be evicted.

Each block has \(\leq k\) distinct pages

\(\implies\) A page is placed in cache in a block is not removed during the same block.

LRU is online optimal

\sigma

\sigma_1

k~\text{distinct}

k+1\text{-th}

\sigma_2

k~\text{distinct}

\sigma_3

k~\text{distinct}

\dotsm

\sigma_b

k~\text{distinct}

Claim 1:

LRU suffers at most \(k\) cache misses in each block.

Claim 2:

FIF suffers at least \(b\) faults.

At the end of a block \(\sigma_i\), the cache is full with the \(k\) pages of \(\sigma_i\)

The first page of \(\sigma_{i+1}\) ( ) is not in cache.

\(\implies\) FIF faults at least once per block

Competitive Ratio as a Measure of Optimality

Competitive ratio shows that the famous LRU has optimal competitive ratio.

The ratio may be good to compare algorithms

Not quite. Another "optimal" algorithm: Flush-When-Full (FWF)

Clears entire cache in the case of a single page fault.

FWF still suffers at most \(k\) loss in a block from the previous proof

Comp. ratio \(k\), but this algorithm is HORRIBLE

Competitive Analysis for Online Paging

Explains performance of algorithms in practice?

Useful to design new algorithms?

No. \(k\) times far from OPT is not what happens in practice.

Useful to compare algorithms?

Kind of. Shows the LRU is optimal, but other stupid algorithms are also optimal.

Not really. FWF shows that competitive ratio ignores somethings which are really important.

Comp. analysis is a incredibly useful technique. But in this case it is not working well out-of-the-box.

Resource Augmentation

OPT is too powerful, it is unfair to compare it with online algorithms

Idea: Compare an algorithm with a handicapped OPT

\text{cost}(A, k, \sigma) =

k

# Faults of \(A\) with cache \(k\) on the seq. \(\sigma\)

Compare \(A\) with cache \(k\) against OPT with cache \(h \leq k\)

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(A, k, \sigma)}{\mathrm{cost}(\text{OPT}, h, \sigma)}

Still is a worst-case analysis

LRU with Resource Augmentation

Theorem 4.3 LRU has competitive ratio \(k\)

Claim 1:

LRU suffers at most \(k\) cache misses in each block.

Claim 2:

FIF suffers at least \(b\) faults.

What if FIF has only \(h \leq k\) of cache?

\sigma

\sigma_i

k~\text{distinct}

\sigma_{i+1}

k~\text{distinct}

p

\(h-1\) of space

\(k\) distinct pages \(\neq p\)

\implies

k - (h - 1) faults per block

\((k - h + 1)b\) faults

Cache

LRU with Resource Augmentation

Claim 1:

LRU suffers at most \(k\) cache misses in each block.

Claim 2:

FIF suffers at least \(b\) faults.

\((k - h + 1)b\) faults

\displaystyle \sup_{\sigma } \frac{\mathrm{cost}(\text{LRU}, k, \sigma)}{\mathrm{cost}(\text{OPT}, h, \sigma)} \leq \frac{k}{k - h + 1}

\displaystyle \mathrm{cost}(\text{LRU}, 2k, \sigma) \leq 2 \mathrm{cost}(\text{OPT}, k, \sigma)

\(h = k/2\)

LRU with Resource Augmentation

\displaystyle \mathrm{cost}(\text{LRU}, 2k, \sigma) \leq 2 \mathrm{cost}(\text{OPT}, k, \sigma)

\(h = k/2\)

How to use this "in practice"

See how much cache you need for your need based on the performance of OPT

Double the amount of cache, so LRU performs almost as well as OPT

Any theoretical use for resource augmentation?

Loosely Competitive

Theorem 6.1

Loosely Competitive

For all \(\varepsilon, \delta > 0\), all \(n\), all request seq. \(\sigma\),

there is \(S \subseteq \{1, \dotsc, n\}\) with \(|S| \geq (1 - \varepsilon)n\) such that, for all \(k \in S\),

\text{cost}(\text{LRU}, k, \sigma) \leq O(\frac{1}{\varepsilon} \log \frac{1}{\delta}) \text{cost}(\text{OPT}, k, \sigma)

either

\text{cost}(\text{LRU}, k, \sigma) \leq \delta |\sigma|

or

Good relative performance

Good absolute performance

Caveat: Order of quantifiers

Proof of Theorem 6.1

Fix \(\varepsilon, \delta, n, \sigma\). Let \(k, g \in \mathbb{N}\).

By the resource aug. theorem,

\displaystyle \mathrm{cost}(\text{LRU}, k + g, \sigma) \leq \frac{k + g}{g + 1} \mathrm{cost}(\text{OPT}, k, \sigma)

When removing \(g\) pages from LRU, one of these happens:

\displaystyle \mathrm{cost}(\text{LRU}, k, \sigma) \leq 2 \mathrm{cost}(\text{LRU}, k + g, \sigma)

\displaystyle \mathrm{cost}(\text{LRU}, k, \sigma) > 2 \mathrm{cost}(\text{LRU}, k + g, \sigma)

Good \(k\)

Bad \(k\)

If # bad sizes \(\leq \varepsilon n\), we are done.

Suppose \(> \varepsilon n\) bad k's. Idea:

1

n

t

t - g

\(\sim \varepsilon n\) bad k's

All absol. good!

\(g\) bads

\(\leq \varepsilon n\)

Proof of Theorem 6.1

\displaystyle \mathrm{cost}(\text{LRU}, k + 1, \sigma) \leq \mathrm{cost}(\text{LRU}, k, \sigma)

I will use without proof:

\displaystyle \mathrm{cost}(\text{LRU}, t, \sigma) < \frac{1}{2} \mathrm{cost}(\text{LRU}, k_1, \sigma)

Pick \(t \in \{1, \dotsc, n\}\) such that we have \(\sim \varepsilon n\) bad k's between \(1\) and \(t - g\)

1

t - g

k_1

\geq g

k_2

\geq g

k_3

\geq g

\dotsm

k_4

\sim \frac{\varepsilon n}{b}~\text{bad k's}

\displaystyle < \frac{1}{2^2} \mathrm{cost}(\text{LRU}, k_2, \sigma)

\displaystyle < \frac{1}{2^3} \mathrm{cost}(\text{LRU}, k_3, \sigma)

\displaystyle \dotsm

\displaystyle < 2^{-\varepsilon n/g} \mathrm{cost}(\text{LRU}, 1, \sigma)

\displaystyle = 2^{-\varepsilon n/g} |\sigma|

\delta = 2^{-\varepsilon n/g} |\sigma|

\displaystyle g = \frac{\varepsilon n}{\log(1/\delta)} \leq \varepsilon n

\iff

t \geq k_1 + g

k_1 \geq k_2 + g

k_2 \geq k_3 + g

Proof of Theorem 6.1

1

n

t

t - g

\(\sim \varepsilon n\) bad k's

All absol. good!

\(g\) bads

\(\leq \varepsilon n\)

\displaystyle \mathrm{cost}(\text{LRU}, k, \sigma) \leq 2 \frac{k + g}{g + 1} \mathrm{cost}(\text{OPT}, k, \sigma)

Good \(k\)

Total of \(2 \varepsilon n\) bad k's. Use \(\varepsilon/2\) instead.

\displaystyle \leq 2 \frac{n + g}{g} \mathrm{cost}(\text{OPT}, k, \sigma)

\displaystyle \leq O\left(\frac{1}{\varepsilon} \log \frac{1}{\delta}\right) \mathrm{cost}(\text{OPT}, k, \sigma)

Final Remarks

Resource augmentation is a way to make more fine-grained analysis when to problem is parameterized by some kind of resource

This technique yields non-interpretable claims, but was very useful to derive good results (loosely competivive)

This is still a worst-case analysis. We ignore the form of the input.

LRU works well in practice because of the form of real-world data.