There and back again with zstd zips

In pdx.tools, users have the option to upload their EU4 save files. Behind their .eu4 extension, these files are typically zips; zips that aren’t compressed very well. Extracting and recompressing the files nets around a 20% reduction in size.

As the person who pays for file storage, storing poorly compressed zips isn’t appetizing. Though I can understand why a game may prefer a lower compression level to favor throughput over compression ratio, as it’ll consume fewer resources when making background snapshots.

A couple years ago, I implemented a solution for better long term storage:

Optimizations found in how saves are uploaded, downloaded, and stored at rest resulted in a 2-3x reduction in bandwidth and up to a 2x reduction in time to parse shared saves

[…]

The solution is to use a modern compression algorithm called brotli. I sampled a few saves and found it compressed 2x better than gzip and 3x better than [the poorly compressed] zips. Since brotli only performs compression, we needed an archival format where one can store files. That’s where tar comes into view. In short, when a user uploads a save, we extract the files from the zip and place them into a tar file before the entire payload is brotli compressed

So files went from Deflated zips to brotli’d tarballs.

But then cracks started to show. The most visible problem with the tar format is that it required a lot of internal plumbing to efficiently accommodate decompressed, disparate files. And it was a source of bugs as the tarball branch is the path less taken due to only firing when accessing an uploaded save.

Since brotli lacks a magic number it becomes difficult to identify brotli compressed data. And even if one needed to access a tiny file within the tar, one pays the brotli price for the entire archive.

I flirted with translating all tarballs to asar, a format where the data for files is contiguously laid out, which allows it to mirror the uncompressed EU4 save file format. The asar format is not ubiquitous, so I had to write my own encoder and decoder. In the end, while the internal plumbing issues would be fixed with asar, I wasn’t entirely taken with the idea of maintaining a file format nor decompressing the entire archive for one file, so I shelved it.

I started to have second thoughts about brotli too and with the advent of the DecompressionStream API, I gave into these thoughts and wrote a compression benchmark. I had expected the new API, which is deflate only, to trounce the competition, but the Wasm implementations won.

Further investigation was needed. Five paragraphs earlier, I quoted myself that switching to brotli reduced the time to parse files. The reason being that browsers can transparently decompress content that is brotli encoded, allowing access to sweet efficiency gains that come from avoiding a user space decompression implementation.

At least that was the thought.

A few weeks ago, I decided to reexamine zips. I had known that individual files in a zip could sport different compression algorithms, but I had never tried anything other than Deflate. Looking over section 4.4.3 of the zip file format specification, I noticed Zstd is supported, but no brotli.

This caught my interest as Zstd was a top teir performer in the compression benchmarks. Now, it was time to test it within the app on a typical save file of around 7.7MB, as benchmark results should only be extrapolated so far.

Brotli vs Zstd

First, I measured the reduction in file size and how long it took for the input zip to be transcoded into the new format.

Name Reduction Elapsed (ms)
Rezipping 17%
zstd (3) 40% 463
zstd (5) 45% 755
zstd (7) 50% 1256
brotli (4) 32% 1481
brotli (9) 54% 4210

Table parentheses represent compression level. Overall, Zstd is faster and achieves a higher compression ratio.

Next Wasm payload size for transcoding. Reported sizes are compressed.

At first glance, both appear heavyweight, and they are, there’s no doubt about that. But for pure zstd encoding, the payload size drops to 136 kB. Brotli doesn’t see any additional benefit.

The good news is that no payload size increase was measured when adding in a pure zstd decoder to the app when fat-lto is enabled in zstd-rs.

For the final test, parsing performance, the time to fetch the file from Chrome’s disk cache must be included as the transparent brotli decoding from the Content-Encoding header isn’t free. And once accounting for this, brotli and zstd were neck and neck. Brotli had faster parsing time as user space didn’t need to decompress data, while zstd had faster disk cache fetches as Chrome didn’t need to decompress anything.

I’m smarting a bit from too eagerly adopting the brotli tarball, as this represents a clear win for zstd zips:

This is quite the testament to Wasm engines that native implementations can be replaced without loss. In fact, this is a reason why browsers have been hesitant around advocating for JS compression stream APIs (source)

Zips are dead, long live zips!

There are a few areas for improvement, like creating and extracting zip archives with zstd compressed files isn’t widely supported. Builtin Windows explorer doesn’t allow it and 7zip will be receiving support soon. One needs to rely on a fork of 7zip (or for the linux users, a fork of p7zip) to operate on zstd zips.

Whereas I don’t think there’ll ever be a strong uptake in asar adoption, I can see widespread support for zstd zips within a few years.

The hardest question might be what is the best zstd compression level. If we assume that the users have a 30 Mb/s uplink, then any compression level that shaves off an additional 30 Mb per second is a win. Running the numbers, we arrive at the following results that include time to transfer + transcode.

Zstd level 3 looks enticing, but would result in 12.5% and 25% more capacity required than level 5 and 7 respectively. And when one has to pay storage costs, pushing more compute to the client can be a worthwhile tradeoff. Interestingly, AWS Athena documents that levels 6-9 are preferred when not using the default level of 3. Level 7 falls nicely within that range.

Zstd has a lot of other parameters that I’ve decided to ignore, like long distance matching that would be beneficial but has the significant downside of requiring all clients decompressing the data to have the same amount of the memory as the client that compressed it (source).

I’ve also ignored other Deflate implementations when comparing zip creation. Implementations like libdeflate set a new standard when it comes to performance, and as I outlined in DEFLATE yourself for faster Rust Zips, I observed libdeflate improve throughput by 80%. Unfortunately, the zip-rs crate doesn’t quite have the extension points where alternate implementations can be substituted in during zip creation. This might be speculating too far, but I reason that a libdeflate alternative wouldn’t be a contender as its compression ratio would suffer.

Content-Encoding: a double edged sword?

I used to view content encoding as a clear win when compared to decompressing in user space. But I think there’s an issue with middlemen and cutting them out can reduce the amount of headache.

I recently came across a bug with next.js where the content encoding of a request changes depending on if the server launched with debug mode or production. This bug is preventing me from upgrading from newer versions of next.js.

I’ve had issues where requests and responses that are brotli encoded take an inordinate amount of time when testing behind miniflare, and it’s made me think that it is decompressing and re-serving it.

Serving precompressed assets is a bit of a pain. Like Cloudflare pages doesn’t support it and Cloudflare workers need coaxing:

let resp = await getAssetFromKV(event, { mapRequestToAsset, cacheControl });

// Make a new response with the same body but using manual encoding.
if (event.request.url.endsWith(".bin")) {
  resp = new Response(resp.body, {
    status: resp.status,
    headers: resp.headers,
    encodeBody: "manual"
  });
  resp.headers.set("Content-Encoding", "br");
  return resp;
}

For Next.js, the configuration looks different:

async headers() {
   return [{
      source: "/:path*.bin",
      headers: [
        {
          key: "Content-Encoding",
          value: "br",
        },
      ],
    },
  ];
}

Writing code specific to a vendor’s environment is the definition of vendor lock-in.

It may seem odd to want to serve precompressed assets, but sometimes cranking the brotli quality to 11 can have a significant impact that I want users to benefit from. Or more importantly, it makes it easier to host large, trivially compressible files that would otherwise run afoul of size limits.

From my testing here, it makes me think that when I need precompressed assets, I should consider using zstd.

Comments

If you'd like to leave a comment, please email [email protected]