Processes And Scheduling
Threads + Synchronization
Memory
Examples: your computer needs to
1. How do we deal with external devices?
- How do we get data to/from the device?
- How do we deal with the fact that the device might not be ready yet?
2. A special external device: the disk
- Disk speed (or lack thereof)
- I/O Characteristics of disks
3. How to use disk head scheduling to speed up disk access
4. How to dodge the problem by using SSDs instead
Prior to now in this class, we've been looking at stuff only on the CPU. Things like the TLB/MMU, scheduling, etc. all take place on the CPU.
This image is from an earlier lecture in the class. Note the line labeled "system bus" near the bottom. This is the piece of hardware that the CPU will use to talk to the other components of the system.
Note: The word "bus" simply means "a single wire that all entities connect to and talk/listen on". In a bus-like architecture, everyone can hear you when you talk.
Here, everything from the last slide except the system bus has been condensed into the black box on the left. The system bus, which was at the bottom of that square, now links the CPU to other parts of the system, including some of the external devices we talked about earlier
Let's zoom in on one of these external devices.
There are four pieces of hardware associated with an I/O device:
In order to access the external device, the system must take the following actions:
External devices may take a while to respond. For instance, access to a tape drive may take several seconds. Access to asynchronous devices (e.g. keyboards) might take a very long time. Think about what happens if you're waiting for a keyboard input, but your user is on vacation in Hawaii. You might be stuck waiting for weeks! So we need some way to handle this waiting without degrading performance.
Side Note: Polling is not necessarily busy waiting!
int read(){
Device* device = determine_device();
device->issue_read();
while(!device->status_is_ready()){
// wait
}
return <result of device read>
}
int read(){
Device* device = determine_device();
device->issue_read();
if(!device->status_is_ready()){
thread_current()->read_sema.down();
}
return <result of device read>
}
void timer_interrupt(){
< Check sleep timers >
// Check device once! If not ready, wait to next interrupt to check again
for(device in busy_device_list){
if (device.status_is_ready()){
device->waiting_thread->read_sema.up();
}
}
}
Example of a busy wait
Example of polling without busy waiting
Note: do not try to use this code for Pintos. It doesn't actually work, and Pintos handles this level of complexity for you.
Don't knock simplicity!
It is much better to write a system that you can successfully implement rather than waste $50 million on a system that you discover is too complex to use.
Simplicity is fast development. Simplicity is fewer bugs. Simplicity is elegance.
But simplicity can also mean a lack of performance or features. What can we do if we don't want to pay the cost of repeatedly checking the status register?
Instead of having the CPU check, the device notifies the CPU (by sending a message on the bus) when the I/O operation is complete.
This is slightly more complicated: the external device and system need to both have support for interrupts. When the device is finished, it triggers an interrupt on the CPU just like a timer interrupt. The CPU will respond using the standard mechanisms (push registers, jump to interrupt vector, index by interrupt number, etc.)
Seems suboptimal: lots and lots of interrupts. Each read requires control unit transfer, waiting at the CPU, etc. etc.
Can we do better?
Direct memory access is a hardware-mediated protocol (i.e. the hardware needs to support it) which allows the device to write directly into main memory.
Instead of data-in/data-out registers, we have an address register and a DMA controller. The CPU tells the device the destination of the transfer.
The DMA controller reads/writes main memory directly over the system bus, and notifies the CPU when the entire transfer is complete, instead of a single word.
Can lead to bus or memory contention when heavily used, but still has better performance than interrupting the CPU.
Note: there are extra registers and memory on the device, and we need some control mechanisms to allow the device to write directly into main memory now.
This seems easier. What really happened was that a lot of the complexity has been moved into hardware. This is both faster and easier, for us as programmers. For the hardware designers, this is more complex.
DMA, if you can pay for the hardware.
DMA, if you can pay for the hardware.
Polling or interrupts. We're getting data regularly anyways, polling might legitimately be best here.
There is no such thing as "best in general." Every time you say something is better than something else, you are (whether you mean to or not) attaching a goal to this statement.
Image Credits: Danlash @ en.wikipedia.org
Faster, Smaller, More Expensive
Light can travel around the Earth's equator 7.5 times in a single second.
A nanosecond is so short that light travels almost exactly one foot per nanosecond.
For a relatively fast hard drive, the average data access has a latency (request made to data available) of 12.0 ms.
Action | Latency | 1ns = 1s | Conversation Action |
---|---|---|---|
Values taken from "Numbers Every Programmer Should Know", 2020 edition
1s
4s
1m 40s
138d 21hr+
1ns
4ns
100ns
12ms
L1 Cache Hit
L2 Cache Hit
Cache Miss*
* a.k.a. Main Memory Reference
Disk Seek
Talking Normally
Long Pause
Go upstairs to ask Dr. Norman
Walk to Washington D.C. and back
DISK ACCESS
Other Destinations:
Short answer: instead of being controlled by electricity traveling down a copper or silicon trace, data retrieval from a hard drive is controlled by chunks of metal flying around at incredible speeds. It turns out it's very hard to accurately fling metal around at anything near the speed of light.
Platter: a thin metal disk that is coated in magnetic material. Magnetic material on the surface stores the data.
Each platter has two surfaces (top and bottom)
Drive Head: Reads by sensing magnetic field, writes by creating magnetic field.
Floats on an "air" cushion created by the spinning disk.
Sector: Smallest unit that the drive can read, usually 512 bytes large.
Drives are mounted on a spindle and spin around it (between 5,000 and 15,000 rpm)
Sectors are organized in circular tracks. Data on the same track can be read without moving the drive head.
{
Cylinder
A set of different surfaces with the same track index
Comb
A set of drive heads.
Even though we don't usually exploit this in the operating system.
Why? Because more disk travels past the read head per unit time closer to the edge of the platter!
Look at the diagram: even if the rings were the same thickness (which they aren't because I screwed up), the orange arc would clearly be larger than the blue one.
In fact, if you do some calculus, you find that 2x the distance from the spindle = 2x the area per second
The CPU presents the disk with the address of a sector:
Disk controller moves heads to the appropriate track.
Wait for sector to appear under the drive head.
Read or write the sector as it spins by.
Disk I/O time =
seek time + rotation time + transfer time
Remember that these times may change depending on where on the disk the data is stored and what the disk is currently doing!
Platters/Heads | 6/12 |
Capacity | 8 TB |
Rotation Speed | 7200 RPM |
Average Seek Time | 8.5 ms |
Track-to-track seek | 1.0 ms |
Surface Transfer BW (avg) | 180 MB/s |
Surface Transfer BW (max) | 220 MB/s |
Host transfer rate | 600 MB/s |
Cache size | 256MB |
Power-at-idle | 7.2 W |
Power-in-use | 8.8 W |
Disk I/O time =
seek time + rotation time + transfer time
For each read, need to seek, rotate, and transfer
Ouch.
Note: reviews claim that this disk tends to become nonfunctional very often. Do not consider this an endorsement.
Platters/Heads | 6/12 |
Capacity | 8 TB |
Rotation Speed | 7200 RPM |
Average Seek Time | 8.5 ms |
Track-to-track seek | 1.0 ms |
Surface Transfer BW (avg) | 180 MB/s |
Surface Transfer BW (max) | 220 MB/s |
Host transfer rate | 600 MB/s |
Cache size | 256MB |
Power-at-idle | 7.2 W |
Power-in-use | 8.8 W |
Disk I/O time =
seek time + rotation time + transfer time
We need to seek/settle/rotate once for all the reads, then transfer data from the disk.
|
83 |
---|
Running these requests in FIFO order has resulted in the disk head moving 550 tracks. Can we do better than this?
Our first two moves are clearly dumb.
Rearrange our access order to greedily minimize the distance traveled between tracks.
150 | 16 | 147 | 14 | 72 | 83 |
---|
Outstanding Requests:
Access Order:
Running these requests in SSTF order has resulted in the disk head moving 221 tracks. Can we do better?
Notice that the jump in red is a particularly bad one. It occurs because SSTF walks us towards the high end of the disk, then "traps" us so that we have no choice but to jump all the way across the disk.
Move the head in one direction until we hit the edge of the disk, then reverse.
Simple optimization: reset when no more requests exist between current request and edge of disk (called LOOK).
150 | 16 | 147 | 14 | 72 | 83 |
---|
Outstanding Requests:
Access Order:
Running these requests in LOOK order has resulted in the disk head moving 187 tracks. Can we do better?
150 | 16 | 147 | 14 | 72 | 83 |
---|
Outstanding Requests:
Access Order:
16 | 14 | R | 150 | 147 | 83 | 72 |
---|
Takes advantage of a hardware feature that allows the disk head to reset to the outer edge almost-instantly.
Move the head towards inner edge until head reaches spindle, then reset to outer edge.
Reset immediately once no more requests towards inner edge: C-LOOK.
This requires us to move 118 tracks, plus whatever time it takes to execute the disk head reset (varies by drive model).
Disks can be partitioned into logically separate disks. Partitions can be used to pretend that there are more disks in a system than there actually are, but are also useful for performance, since access within a partition has shorter seeks.
In cases where extreme performance is needed (usually only for database servers), we can short-stroke the disk by restricting the partitions to the outer 1/3rd of the disk.
Solid state drives have no moving parts. This results in:
They are typically implemented using a technology called NAND flash.
unset(1)?
BANNED
set(4)
This makes it nearly impossible to overwrite data in-place!
53
62
erp
Like an Etch-A-Sketch!
You can set whatever pixels you want by turning the knobs, but the only way to erase them is to reset everything at once (by shaking the unit).
Pages function similarly to sectors on a hard drive: reads and writes occur in units of minimum one page.
See primary slide decks for examples of how SSD remapping works.
Generally, use the following ideas:
We fix this the same way we fixed disk operations: try to avoid having to perform the slow operations by optimizing data layouts!
Write 1 page, erase 1 block, etc. 128 times
\( 128 \times \left( 50 \times 10^{-3} \text{ms} + 3 \text{ms} \right) \approx 385 \text{ms} \)
or 3.005 ms per page.
Write 1 page, write a second page, etc. 128 times,
then erase
\( 128 \times \left( 50 \times 10^{-3} \text{ms} + \right) + 3 \text{ms} \approx 9.4 \text{ms} \)
or 0.08 ms per page.
By amortizing erase costs, we get a 37.5x speedup!
When a user "erases" a page, lie and say the page is erased, without actually erasing, but mark the page as unusable. Then, at some later time, erase a bunch of unusable pages together.
Flash memory can stop reliably storing data:
To improve durability, the SSD's controller will use several tricks:
Metric | Spinning Disk | NAND Flash |
---|---|---|
Capacity/Cost | Excellent | Good |
Sequential IO/Cost | Good | Good |
Random IO/Cost | Poor | Good |
Power Usage | Fair | Good |
Physical Size | Fair | Excellent |
Resistance to Physical Damage | Poor | Good |
As we can see, SSDs provide solid all-around performance in all areas while being small--this is why they have essentially taken over the personal electronics arena (most phones, tablets, and laptops come with SSDs these days).
The capacity/cost ratio of hard drives means that they are a mainstay in servers and anything else that needs to store large amounts of data.
It may be a very powerful potato, but it's still a potato. The ability to communicate with external devices is required if a computer is to be useful.
These devices all have different properties and access patterns.
To do this, it uses one of three mechanisms to wait for I/O and do the actual data transfer:
Disks function as stable storage for the OS, as well as providing a large, cheap store of space.
A single disk read might take tens of millions of CPU cycles. This is because disks are complex mechanical devices that can only move so fast.
There are several major scheduling algorithms, including:
We can also partition a disk to reduce seek time for requests in a partition.
Much faster than disk drives (on average)
But don't have as much storage capacity and have to have extra code in the OS to support certain operations.
The questions in the following slides are intended to provoke some thought and tie back what you've learned in this section to things you've studied previously in this course, or to really test your knowledge of what you learned in this unit.