The WebAssembly value proposition is write once, not performance

In his 2024 programming predictions video, Theo downplayed WebAssembly (Wasm):

I don’t think [Wasm is] going anywhere next year. […]

When I see all of these new ways of building Rust based web applications, I cringe a tiny bit […], the size of the binaries are still pretty bad, the performance wins aren’t there, and the flexibility of a language like Rust doesn’t really mesh with the magic of composability on the frontend.

Theo’s viewpoint isn’t wrong. Web apps where I heavily employ Wasm don’t use Rust UI frameworks like leptos, and I agree with Theo that this isn’t changing in 2024 for the frustratingly simple answer that Wasm driven UIs don’t unlock new potentials or facilitate development.

But there’s so much more to Wasm than UI Rust frameworks.

Let’s look at the specific points brought up:

Performance can be a wash

I agree with the video’s sentiment that speedups may not materialize simply because Wasm is included in the equation, but the video leads watchers astray by only associating Wasm with performance.

I think a lot of the wars around web performance are not the most well thought out, and even React’s performance isn’t that bad. […]. Wasm frameworks have a ton of potential for performance, but not in how quickly they can update the DOM because that binding is done through JavaScript most of the time.

The magic of Wasm is the ability to do heavy computations that you don’t normally have the ability to do in JavaScript like really complex math for rasterizing an image or applying a filter or a mask.

I can’t help but feel like there is widespread tunnel vision on the performance aspect of Wasm, which leads to misunderstanding. Not that Wasm isn’t or can’t be fast, but one should have realistic expectations that when even focussing on pure computations (ie: excluding any web APIs), Wasm may not be the fastest.

In an application benchmark, I noted that a “native Node.js module is approximately 1.75x-2.5x faster than [Wasm]”.

In a hashing microbenchmark using Wasm SIMD, I reported that Wasm was 4x slower than native at large payloads, in part due to Wasm SIMD being limited to 128 bit registers compared to the 256 bit AVX2 implementation. Though at smaller payloads Wasm demonstrated superior performance as the overhead was less than N-API calls.

Even in comparisons with JS implementations, there may not be a clear winner. I wrote a compression shootout site where one can pit Wasm against JS implementations, and one of the takeaways was:

Comparing Wasm Deflate vs JS Deflate, in decompression, miniz is 2x faster than fflate on Firefox, while fflate is 2x faster on Chrome in compression

This was surprising as I imagined that compression is a “heavy computation” and Wasm would have handily beaten JS.

Don’t gravitate towards Wasm under the guise of performance. Always profile first. Make sure the juice is worth the squeeze.

I’m not a fan when I hear others insinuate Wasm is a performance silver bullet, but I can’t fault anyone for this viewpoint, as Webassembly.org perpetuates this by ordering “Efficient and fast” first among its features. And it certainly seems intuitive that transpiled C code should be faster than weakly typed JS, but don’t forget that there’s probably been a decade and a 100 million dollars invested into optimizing browser engines prior to Wasm even existing.

I figured I’d write this to elaborate on non-performance features of Wasm, which is rich coming from me as I believe performance is a feature, and I keep a running log of how performance has influenced the development of an application. You know I’m not one to discount the importance of performance. It’s just not how I see Wasm.

Write once

While there are ways to embed Wasm runtimes in other languages so that one can just distribute the Wasm binary to achieve write once, run anywhere, I see the magic of Wasm as allowing write once, compile anywhere for running code on web platforms originating from C, C++, and most importantly (for this article) Rust.

To summarize the current state of an article I wrote over three years ago, about how my bet on Rust has been vindicated, the same Rust code base is deployed to a myriad of environments:

All of these environments see the same behavior (ignoring deployment synchronization issues). Without Wasm, how would I ensure client browsers interpret input the same way as everything else? I most likely would have dropped that use case, as it would have been too difficult to reimplement everything in JS.

You better believe new browser APIs unlock new possibilities, and Wasm is no exception.

Consistency is king

Having these environments consistent is far more important than performance, and here are a couple examples off the top of my head from projects of mine.

The file input uses an imaginary calendar which is essentially the Gregorian calendar without leap years, and negative years representing BCE. Adding a given duration to a date in this calendar is not computationally intensive, but it is error prone. For example, if we add a day to -3.12.31, is it -4.01.01, -2.01.01, or -3.12.30?

The input parsing logic that is in servers, CLIs, shared libraries, and Wasm is for an undocumented, proprietary file format. I need to be able to easily push out new discoveries of this file format to all environments as conveniently as possible.

Others recognize the benefit of reusing proven production code too. It’s why work was done to get the image processing library, Sharp, to compile to Wasm. No need to reinvent libvips and all of its dependencies. The Sharp Wasm benchmarks are saved for the end as they show the same performance story we’ve seen before, sometimes Wasm can be very competitive with native implementations, and sometimes it is not.

I, and others, don’t need Wasm to be the fastest for it to be the right choice. Developer productivity is more important and writing the same logic in JS (and other languages) is not productive.

Admittedly, the value of “write once” is watered down if the codebase is in Python or another language, but I have to imagine that if one chooses Python then interoperability isn’t very high on the list.

Composability

Setting aside the Wasm Component Model, which once fleshed out and standardized will allow composition of Wasm modules; the composability story of Wasm inside a UI driven by React is already good enough.

The mental model that has worked best for me is to think of the functionality exposed in Wasm as a remote server call with negligible latency.

Many web developers will already be familiar with TanStack Query for managing requests. The good news is that TanStack Query is a generic state management library over any asynchronous task, so we are able to leverage it to communicate with Wasm housed in a web worker, so operations don’t block the UI thread. I showcased this a bit in a previous article, Rethinking web workers and edge compute with Wasm.

I don’t think anyone would argue that TanStack Query prohibits composability, so use it or create your own hooks with a similar API.

The biggest hurdle is probably web worker ergonomics and bundling Wasm, but these are solvable with comlink and any bundler that can digest assets (or any of the other ways that I outlined on how to publish and consume Wasm). The good news is that others are hard at work making this ceremony easier in the future with JS import attributes.

For those looking for type safe Wasm communication with Rust and Typescript, I can recommend tsify, as in spite of its quirks, it found a couple dozen errors amongst a thousand lines of hand coded types.

Wasm size

Another complaint against Wasm is the size of the payload.

Using Go? You’re not going to get smaller than 2MB Wasm bundles. That’s horrendous, but blame the source language, not Wasm, which can be as small as a couple hundred bytes.

If one isn’t using a toolset geared towards Wasm, then bloat should be expected. Rust is able to produce very small and efficient Wasm:

The Wasm size for computing the HighwayHash is 2kB (compressed). There’s a whole story behind avoiding memory allocations to achieve such a minimal payload, and I’d bet $5 that an equivalent JS implementation couldn’t fit within 2kB (and you already know I won’t be making any bets on performance differences).

Even when not optimizing for size, Wasm can still be competitive. For instance, the JS deflate decompressors of pako and fflate are 15 and 12kB respectively, and miniz is 25kB. Is Wasm bigger? Yes, but a 10kB is not a dealbreaker for me, and that includes the foundations of memory allocation and standard library data structures that will be amortized with additional uses in an application Wasm bundle. If minimizing size was top priority, I’d be using the builtin Compression Streams API

I’d go as far as to say that the fact that a zstd decoder can fit in 48kB is a game changer. No need to wait for browser or proxy support to try out new compression codecs.

No language makes it easier to dial into the desired compromises for the sake of binary size than Rust. In the dark side of inlining and monomorphization, changing a couple lines of code to use dynamic dispatch for serde deserialization reduced the Wasm size by half at a cost of 35% increased latency when deserializing.

I love when tradeoffs like this exist! Sometimes binary size is priority number one, and the performance sacrifice is worth it. We, as programmers, have the power to choose what is most appropriate for our situation.

Closing thoughts

Some may say that Wasm isn’t so different than Java and the JVM, so I’ll point them to more informed discussions, but there’s a reason that Wasm “won”.

Wasm is the key to unlocking the full potential of the web by practically allowing the same codebase to also target the web, and Wasm is fast enough, slim enough, ergonomic enough, and safe enough to not hinder this mission.

Comments

If you'd like to leave a comment, please email [email protected]