The right tool for the job: positioned IO for Zip archives

pread is a little known POSIX API that allows one to issue file reads at an arbitrary offset in a thread-safe fashion, as it won’t mutate the file’s underlying cursor.

The Linux man-pages give us insight into pread’s usefulness along with its sibling, pwrite.

The pread() and pwrite() system calls are especially useful in multithreaded applications. They allow multiple threads to perform I/O on the same file descriptor without being affected by changes to the file offset by other threads.

Source: pread(2) – Linux manual page

I like to imagine a ZIP archive containing two compressed, but still large files. And we want to perform some independent operation, like count the number of words in each file. A single-threaded streaming implementation would result in too much IO idle time. It is very likely that IO throughput exceeds decompression throughput, especially with the slower, but time honored DEFLATE compression and today’s fast disks.

One alternative would be to task an IO thread to slurp up the compressed bytes and whisk them away to a worker thread for decompression and word counting. But our space complexity becomes more difficult to reason about as our byte buffers need to hold the files in memory. Worst case, we find ourselves with nearly the entire archive in memory.

This is where pread comes into play. Each worker thread is free to issue any pread on the underlying zip archive file without affecting others, allowing concurrent processing.

Traditional vs pread

The first question I had was: is there any performance impact if we use pread to sequentially read through the file? So I set up a benchmark to read a 7.5GB file.

use std::{io::Read, os::unix::fs::FileExt};

fn main() {
    let args = std::env::args().collect::<Vec<_>>();
    let path = args.get(2).expect("Usage: <file> <method>");
    let mut file = std::fs::File::open(path).expect("to open file");
    let mut buf = vec![0; 1 << 16];

    match args.get(1).expect("Usage: <file> <method>").as_str() {
        "traditional" => loop {
            let read = file.read(&mut buf).expect("to read file");
            if read == 0 {
                break;
            }
        },
        "pread" => {
            let mut offset = 0;
            loop {
                // read_at delegates to pread underneath the covers
                let read = file.read_at(&mut buf, offset).expect("to read_at file");
                if read == 0 {
                    break;
                }
                offset += read as u64;
            }
        }
        _ => panic!("Invalid method"),
    }
}

The result: a tie. There is no performance implication for using pread sequentially.

Windows

On Windows, pread is not available. This can be solved a few ways:

The positioned-io Rust crate creates a memory mapped buffer with Win32 APIs like MapViewOfFile
The .NET team documented that they were able to reconcile pread and Window’s ReadFile by ignoring offset reads when the file isn’t seekable.
Another solution is to emulate pread using a mutex to obtain exclusive access so that one can seek and read the desired position before restoring the original position upon completion. This is what Go does. In fact, Go has an interface that provides this functionality, ReaderAt.

The solution the .NET team came up with seems optimal, but I’m interested in testing what the performance trade-off under Go’s mutex implementation (but ported to Rust).

#[cfg(not(unix))]
struct MyFile(std::sync::Mutex<std::fs::File>);

#[cfg(unix)]
struct MyFile(std::fs::File);

impl ReaderAt for MyFile {
    #[cfg(not(unix))]
    fn read_at(&self, buf: &mut [u8], offset: u64) -> std::io::Result<usize> {
        let mut lock = self.0.lock().unwrap();
        let original_position = lock.stream_position()?;
        lock.seek(SeekFrom::Start(offset))?;
        let result = lock.read(buf);
        lock.seek(SeekFrom::Start(original_position))?;
        result
    }

    #[cfg(unix)]
    fn read_at(&self, buf: &mut [u8], offset: u64) -> std::io::Result<usize> {
        self.0.read_at(buf, offset)
    }
}

impl From<std::fs::File> for MyFile {
    #[cfg(not(unix))]
    fn from(file: std::fs::File) -> Self {
        Self(std::sync::Mutex::new(file))
    }

    #[cfg(unix)]
    fn from(file: std::fs::File) -> Self {
        Self(file)
    }
}

In a single threaded benchmark, the emulated pread implementation performed no worse than any other method, even on memory constrained spinning disks, where I imagined the seeking would move the disk actuator into place, so seeking back and forth would be troublesome. Based on benchmarks, this mental model is incorrect, and seeking forward must just be updating an offset within the file handle at the OS level.

This observation lends credibility to the .NET team’s decision to force traditional sequential reads as random access reads, as there is no performance loss for reading when controlling the file position yourself. As a result, .NET tracks the file offset only in memory and does not synchronize with the OS, netting a reported 25x improvement in seek benchmarks.

Multithreading

Let’s add contention to mix and divide the file into 4 chunks and allocate a thread to read each chunk.

let mut file = std::fs::File::open(path).expect("to open file");
let len = file.metadata().expect("to get metadata").len();
let file = MyFile::from(file);
std::thread::scope(|s| {
    let chunks = len / 4;
    for i in 0..4 {
        let f = &file;
        s.spawn(move || {
            let mut buf = vec![0; 1 << 16];
            let mut offset = i * chunks;
            loop {
                let read = f.read_at(&mut buf, offset).expect("to read_at file");
                if read == 0 {
                    break;
                }
                offset += read as u64;
            }
        });
    }
})

In benchmarks, I forced the emulated version to run, and noticed that it experienced slightly more than a 2x slowdown from the single threaded versions, whereas pread had no such degradation.

A 2x slowdown sounds bad, but this is a worst case scenario: multiple threads performing only IO. If the performance gap is still irritating, do what the .NET team did and always invoke ReadFile Win32 API with an offset, which a Rust File on Windows offers via seek_read. It’s not quite exact, as seek_read delegates to NtFileRead due to soundness concerns and will mutate the underlying file cursor – but one can paper over these issues as needed.

Conclusion

Both Go and .NET use positioned IO (real or emulated) to great effect to ease file IO under concurrent scenarios. Go exposes a ReaderAt interface, while all file IO in .NET is positioned.

As benchmarked, the overhead of pread is minimal:

No overhead for sequential preads from a single thread
No overhead for concurrent preads either
No overhead to emulate pread for single threaded reads
Only a 2x overhead to emulate concurrent pread under s worst case scenario.

Positioned IO makes it easier to fan out CPU bound processing code across multiple streams within the same file – exactly what a Zip implementation needs.

The right tool for the job: positioned IO for Zip archives

Table of Contents

Traditional vs pread

Windows

Multithreading

Conclusion

Comments