The subjective nature of JS libraries exposing an off the main thread API

For the uninitiated, JavaScript environments like node.js and the browser have a main thread that runs basically everything. Exhausting the main thread can have consequences, as MDN puts it: “long-running JavaScript functions can block the thread, leading to an unresponsive page and a bad user experience”. In the context of this article, responsiveness describes the UI’s ability to respond to input and animations remaining fluid; not the ability for the UI to adapt to different screens sizes.

If long-running or computationally intense functions on the main thread are bad, whose responsibility is it to offload the function: the developer of the application or the library?

We need to approach this question from several angles so let’s start by pretending we are publishing a library:

export function compute() {
  let sum = 0;
  for (let i = 0; i < 200000000; i++) {
    sum += i;
  }
  return sum;
}

The example has nothing to hide. The computation is synchronous. But let’s muddy the waters by adding a keyword:

-export function compute() {
+export async function compute() {

Now, library users calling the function will write await compute() and may think that the API offloaded the workload, but they would be sorely mistaken.

The example, while contrived, is not far from real life libraries. Not to pick on hash-wasm, as it is far from being alone (and the example is cherry-picked), but one can write the following code to hash a payload:

import { sha1 } from 'hash-wasm';
const result = await sha1('demo');

All is forgiven if the hashing was thought to be offloaded, but the actual computation of SHA1 is performed on the main thread and is blocking. The hash function returns a promise as the Wasm initialization step is asynchronous. There is nothing in the function signature to communicate that the synchronous code can far outweigh the cost of initialization for large payloads.

It’s all about the pit of success. If I’m working with a function with a runtime that scales based on given parameters, and I see the function returns a promise, I shouldn’t fault my intuition for thinking that the computation is off the main thread, afterall, this intuition is born from exposure to such functions as:

And older APIs that use callbacks to offload tasks:

My intuition isn’t always right for platform APIs. The poster child for promises, fetch, has Response.json(), which returns a promise but parses the JSON on the main thread. I had to do way more investigation than anticipated to determine this behavior. In fact, all the API documentation links I gave earlier omit this crucial detail too. Now, I’m second guessing myself. Do some of the examples block the main thread too? I desperately wish for MDN to include this information in the docs. I can understand if the various API specs leave the behavior as a browser implementation detail, but it would be invaluable to see if browser implementations agree that certain APIs do not run on the main thread.

Back to JSON parsing. Parsing data is unnatural for a UI thread, but there’s not really a better option when the primitive for achieving concurrency on the web, web workers, is based around message passing where values are cloned at the same speed as JSON parsing, and exceptionally few types can be transferred cheaply. So nothing is gained by parsing the json in a web worker and passing the object back to the UI thread, as the object is cloned. To see any benefit, one would need to whittle the parsed json down in a web worker before returning it, which seems like something that would greatly complicate interfacing with Response.json().

There’s essentially two problems being touched upon. One is that even in platform APIs, synchronous code with a runtime correlated with the input is smuggled inside functions declared as asynchronous. And two, there are many constraints and limitations when working with concurrency on the web.

CPU bound code inside an async method

I’ve previously recommended library authors avoid writing async methods that contain a significant amount of blocking code as it can be deceptive. Instead, make it the caller’s responsibility to thread the individual sync and async calls together, so it is more apparent what the cost to the main thread will be. In a twisted way, I’m advocating that instead of writing:

const resp = await fetch('http://example.com/movies.json');
const data = await resp.json();

to write:

const resp = await fetch('http://example.com/movies.json');
const rawBody = await resp.arrayBuffer();

const decoder = new TextDecoder();
const data = JSON.parse(decoder.decode(rawBody));

Great (said sarcastically), make others write longer, more error prone code for the sake of “purity”. In this alternate reality, there’d be a fetchJson npm package within a week, millions of downloads, thousands of dependents, and a supply chain attack once it reached ubiquity.

In this aspect, I’m a realist. Most developers can and should stick to Response.json() that I so rashly called deceptive. I know I will. Most of my apps don’t show text decoding and JSON parsing as hot spots in profiles. But when they do show, I move the parsing to a web worker and communicate with the subset of the data as is required.

The advice is more targeted to library developers to focus on exposing an intuitive API over convenience. This is why, in my own libraries, and libraries like tree-sitter wasm, we force the asynchronous initialization first before the synchronous parsing.

JavaScript isn’t unique in that it allows CPU bound code inside an asynchronous function. This footgun exists inside most languages (if not every language with a similar model of async/await). The difference is that C# programmers can easily call Task.Run(), Rust programmers have spawn_blocking, but JavaScript programmers do not have an analogous tool that is as ergonomic at their disposal.

Amusingly, calling Task.Run() must be so convenient in C# that there’s etiquette surrounding its use, and in the “extremely rare edge case” where CPU bound code is embedded inside an async function, the parent caller should wrap it inside an Task.Run() by seeing that it is CPU bound from the function documentation.

This supposed rarity of code that combines asynchronous yet CPU bound routines is a daily occurence to me: instantiating wasm to run a CPU-bound function over asynchronously read file data. I tend to solve this problem by deliberately stuffing everything inside a web worker. It works, but web workers aren’t doing themselves any favors by being unergonomic.

Web worker ergonomics

Web workers are not easy to use, and people complain about the difficulty. One could write a raw worker that imports functionality by invoking the clunky importScripts(), and pass messages back and forth with postMessage. To preserve sanity, one needs to add comlink and a bundler into the tech stack. Comlink hides the message passing behind an async/await interface, and a bundler allows one to author code in a manner that is nearly seamless with other code in the project.

Successful bundling doesn’t exclude one from falling victim to cross origin issues if the worker code is hosted on a CDN or CSP issues when the worker is inlined.

As someone who has quite a bit of experience with bundlers, including writing a couple plugins, I tend to advocate for minimal use and configuration of bundlers as they can be inscrutable. With the rise of meta frameworks like next.js that abstract away the bundler, many developers may not even interface with the bundler. This is a good thing. However, I’ve lost countless hours devising workarounds to bugs in how next.js configures the bundler to interface with web workers. At one point, importing shared code in both the main thread and web worker required a build step to literally copy and paste functionality into two files to be imported separately. To be fair, web worker support has gotten much better in next.js since the last 6 or so months, but there’s a host of open issues surrounding web workers and when things go wrong, there’s a lot of frustration, and a lot of the vitriol directed at maintainers.

This frustration is compounded when dealing with Wasm inside a web worker as that typically means more bundler configuration, trawl through documentation, and unhelpful internet articles. Once one lands on a minimal amount of bundler configuration to get things working, how long will that last before either the bundler or meta framework inexplicably breaks? Wasm and web workers shouldn’t be afterthoughts if we want a better web.

I don’t blame bundlers or meta frameworks – at least not entirely. Too often I feel as if browsers and the standards blame application developers for poor responsiveness on the web without giving them the tools that are easy enough to use to fix the problem.

Again, I’m writing this coming from a decent amount of experience. There are junior devs, tinkerers, or people just trying to get the job done. Why has the web done such a horrible job at making these things easy?

And things aren’t going to get any easier any time soon. As mentioned by Surma in The State Of Web Workers In 2021, there are several parties interested in improving the status quo. The farthest one appears to be JS Module Blocks, but is advancing slowly and isn’t implemented anywhere.

Libraries

With the advent of APIs such as File System Access API, more developers will think of awesome ideas that involve a CPU-bound task over files – files that can be quite large. Ideas that may involve:

Hashing is a particularly good example as if the official SubtleCrypto.digest() isn’t blocking, then it should follow that other hashing libraries shouldn’t block too, right?

Given the difficulties of writing and bundling web worker code, can we as library developers make it easier for other developers to build responsive applications by including a builtin mechanism for offloading compute?

The good news is that it is possible to expose the desired API, as demonstrated by PapaParse, which has configuration available to run the CSV parser on a web worker. But that’s where the good news ends.

  • PapaParse inlines the worker as a blob, which is normally a good thing, as a self contained file is much easier to integrate, but Blob URLs have no concept of origin or path (source) and may not be a good fit.
  • Another library I know that uses web workers is ffmpeg.wasm, but it imports more code by doing a string replace over “ffmpeg-core.js” for hard-coded paths, which makes integration difficult if the application developer has some sort of caching scheme.
  • PapaParse does not work with node.js worker_threads, so it can only offload work in a browser environment
  • ffmpeg.wasm does not work in an environment where multi-threading is unavailable like Cloudflare Workers
  • ffmpeg.wasm can’t be called from inside a web worker
  • Neither library will defer handling and bundling of web workers to bundlers, so their usage is inflexible.

This is not me disparaging PapaParse and ffmpeg.wasm, it’s just that this is a hard problem. A library would need to cater to the following use cases:

  • Async in the browser: both in and outside web workers
  • Async in node via worker_threads
  • Sync for environments without multi-threading
  • Sync for developers who know how to integrate the library better (maybe it comes with a reduced file size).
  • Easily integrated with and without a bundler

PapaParse gives us inspiration for how we could communicate the use case via a runtime configuration option:

Papa.parse(bigFile, {
	worker: true,
	step: function(row) {
		console.log("Row:", row.data);
	},
});

The worker property instructs the creation of a web worker, but it seems possible that this option can be combined with package entry points that allow us to produce library distributions that target node.js separately from the browser. We’ll see this shortly.

We could complicate this further by making worker numeric to represent the number of workers to instantiate. This would mean our library would now need to manage its own thread pool. The thought of every library coding a thread pool is revolting. Thread.js claims to solve the thread pool issue as well as abstracting away worker_threads and web workers, but its lack of webpack 5 support is blocking, as a generic library should never prescribe a list of approved bundlers. In fact, we should strive to produce a bundler-less distribution so that our library is accessible to all, like how accessibility is the thought that underpins why the Wasm libraries I author, distribute a version where the Wasm payload is base64 inlined.

Example code and configuration

Enough talk, more code. If we wanted to wrap our compute example from earlier in an asynchronous interface, this is how we could do it. There’ll be some notes afterwards, so use the code as more of a reference.

// worker.ts
import { compute } from "./compute";
export { compute }
// worker_browser.ts
import { expose } from "comlink";
import * as mod from "./worker";

expose(mod);
// worker_node.ts
import { expose } from "comlink";
// @ts-ignore
import nodeEndpoint from "comlink/dist/esm/node-adapter.mjs";
import { parentPort } from "worker_threads";
import * as mod from "./worker";

expose(mod, nodeEndpoint(parentPort));
// types.ts
export type MyComputer = typeof import("./worker");

export interface ComputeOptions {
  worker: boolean;
}
// index_browser.ts
import { wrap } from "comlink";
import { compute as computeSync } from "./compute";
import { ComputeOptions, MyComputer } from "./types";
export * from "./types";

export function compute(
  options?: Partial<ComputeOptions> & { worker?: false | undefined }
): number;

export function compute(
  options?: Partial<ComputeOptions> & { worker: true }
): Promise<number>;

export function compute(
  options?: Partial<ComputeOptions>
): number | Promise<number> {
  if (options?.worker) {
    const worker = new Worker(new URL("worker_browser.ts", import.meta.url), {
      type: "module",
    });
    const computer = wrap<MyComputer>(worker);
    return computer.compute().finally(() => {
      computer[releaseProxy]();
      worker.terminate();
    });
  } else {
    return computeSync();
  }
}
// index_node.ts
import { wrap } from "comlink";
// @ts-ignore
import nodeEndpoint from "comlink/dist/esm/node-adapter.mjs";
import { resolve } from "path";
import { Worker } from "worker_threads";
import { compute as computeSync } from "./compute";
import { ComputeOptions, MyComputer } from "./types";
export * from "./types";

export function compute(
  options?: Partial<ComputeOptions> & { worker?: false | undefined }
): number;

export function compute(
  options?: Partial<ComputeOptions> & { worker: true }
): Promise<number>;

export function compute(
  options?: Partial<ComputeOptions>
): number | Promise<number> {
  if (options?.worker) {
    const worker = new Worker(resolve(__dirname, "./worker_node.cjs"));
    const computer = wrap<MyComputer>(nodeEndpoint(worker));
    return computer.compute().finally(() => {
      computer[releaseProxy]();
      worker.terminate();
    });
  } else {
    return computeSync();
  }
}
// rollup.config.js
import typescript from "@rollup/plugin-typescript";
import OMT from "@surma/rollup-plugin-off-main-thread";

const outdir = (fmt) => `./dist/${fmt}`;
const fmt = (input) => (isNode(input) ? "node" : "browser");
const isNode = (input) => input.includes("node");

const rolls = (input) => ({
  input,
  output: {
    dir: outdir(fmt(input)),
    format: isNode(input) ? "cjs" : "esm",
    entryFileNames: `[name].${isNode(input) ? "cjs" : "js"}`,
  },
  external: ["comlink", "worker_threads", "path"],
  plugins: [
    typescript({ outDir: outdir(fmt(input)), rootDir: "src" }),
    ...(!isNode(input) ? [OMT()] : []),
  ],
});

export default [
  rolls("./src/index_node.ts"),
  rolls("./src/worker_node.ts"),
  rolls("./src/index_browser.ts"),
];

After seeing all that glue code, it’s no wonder this approach to authoring libraries hasn’t caught on. A lot of the bloat comes from creating an entry point for browsers (index_browser.ts) and an entry point for node (index_node.ts), as these reference different imports and URLs. The worker code is also duplicated for the same reason: we need to import the comlink node adapter for the node worker but not in the browser. These allow us to reference the outputs that rollup produces in our package.json as entry points:

{  
  "exports": {
    ".": {
      "node": "./dist/node/index_node.cjs",
      "default": "./dist/browser/index_browser.js"
    }
  },
}

I’m normally dependency adverse but I’m a big proponent of comlink, as it makes workers (both in the browser and node) communication much simpler (and there’s already enough glue code as there is). We list comlink as external so that rollup doesn’t bundle it into the distribution. The added benefit is that if two libraries use the above approach to offload compute, the application only needs to pay for comlink once (mileage may vary on bundler and configuration). The good news is that comlink is small enough that size isn’t often an issue if it is duplicated. Another caveat, albeit much less significant, is we need @ts-ignore to import the node worker adapter as discussed in the comlink repo.

The rollup off the main thread plugin (OMT) only works with web workers (and not worker_threads), so that requires us to write rollup configuration to process the node entry point and node worker separately. We then have to reference the distributable version of worker_node.cjs instead of the typescript source (worker_node.ts) like we could do for web workers.

We write Typescript function overload signatures for compute to communicate how the return type changes based on arguments. The signatures are duplicated between the node and browser index files as otherwise one will receive an error about “Function implementation is […] not immediately following the declaration”. Hopefully it is only mildly inconveniencing to keep them in sync.

More concerning is that web workers and threads are ephemerally allocated for each function invocation and then terminated afterwards. This may be fine if the functions are known to always be long running (and always is a strong word). Some users may hash only short strings and a worker allocation could dominate in profiling. A reasonable solution would be to introduce a thread pool, but as mentioned earlier the thought of each library implementing a thread pool makes me shudder. Multiply this problem by the number of libraries employing a thread pool and it would seem like every mundane app could spawn an unmanageable number of web workers.

Our build that is destined for the browser contains an esm web worker which has poor support; 84% at the time of writing and is not implemented for Firefox. Compare that to 97% for web workers and 93% for Wasm. In reality what we’ve really created is a build for bundlers (just not webpack 4 as support for import.meta.url was introduced in webpack 5). We can take this one step farther by creating yet another entry point for browsers or older clients by inlining the worker inside the bundle via the same flow as shown in the article Building module web workers for cross browser compatibility with rollup. In short:

  1. Have rollup process worker_browser.ts, inline comlink, compress the code, and output into an intermediate directory with an exported string (relevant snippet of the rollup config copied below):
     plugins: [
     // ...
     ...(isInline(input) ? [resolve(), terser()] : []),
     ...(isInlineWorker(input)
      ? [
          {
            name: "stringify",
            renderChunk(code) {
              return `export default ${JSON.stringify(code)};`;
            },
          },
        ]
      : [])
    ]
    
    The above departs from the linked article in outputting the code as a JSON string as it will escape quotes that may be contained within the code.
  2. Write a index_inline.ts that references the output from step 1
    import workerString from '../dist/worker/worker_browser.js';
    // ...
    export function compute(
      options?: Partial<ComputeOptions>
    ): number | Promise<number> {
      if (options?.worker) {
        const workerBlob = new Blob([workerString]);
        const workerUrl = URL.createObjectURL(workerBlob);
        const worker = new Worker(workerUrl);
        const computer = wrap<MyComputer>(worker);
        return computer.compute().finally(() => {
          computer[releaseProxy]();
          URL.revokeObjectURL(workerUrl);
          worker.terminate();
        });
      } else {
        return computeSync();
      }
    }
    
  3. Update rollup config to add our new index_inline.ts and output umd (instead of the cjs and esm that we previously restricted ourselves too).
  4. Add the our umd output to package.json:
    { 
      "browser": "./dist/inline/index_inline.js",
      "exports": {
        // ...
      },
    }
    
  5. Optionally, we could provide an ./inline entry point, but entry points are still new, so expect some teething issues in tooling. For instance, Typescript will be gaining support for entry points in 4.7, which will be released in a couple days.

If we were only targeting browser environments we could ditch the rollup configuration for meta-bundlers like microbundle, but microbundle does not support node.js, an inlined worker build, nor extension support if we needed to include Wasm.

Speaking of Wasm, if you are inlining Wasm as base64 and are worried about paying for the bandwidth of including the Wasm twice, once in the entry point and another in the worker, did you know that you can postMessage a WebAssembly.Module as seen in the WebAssembly.compile example? Modules are not transferable, will be cloned, and could cause a recompilation according to the spec, but it recommends “engines should attempt to share/reuse internal compiled code when performing a structured serialization”, and so in effect the clone may be seen as a transfer.

Instead of exposing a single function that could be either sync or async, it may be prudent to expose a sibling that is purely synchronous, so that bundlers can remove any reference to comlink or web workers when the client is only using the synchronous function. I’ve not tested this, so take this thought with a grain of sand.

Despite multiple shortcomings, I’m rather pleased at our library’s outcome:

  • It offers an API that is synchronous and asynchronous
  • The asynchronous API will offload compute to the appropriate worker
  • It is bundler friendly
  • It is “let me just copy and paste the js code from the CDN into my own project” friendly

Is it worth it? Seems highly subjective.

Conclusion

If we were talking about another language like Rust or C#, the question of “should I offer an off main thread API for my CPU-bound function in my library” would be a resounding and categorical no. It is just too easy for application developers to send something to another thread for it to be worthwhile. The worst case scenario for library developers is a function that combines asynchronous IO work and CPU-bound work, but solutions exist where either the API can be broken up into different functions (async IO and sync CPU-bound), or the function can be documented as having significant CPU-bound work. Even a smuggled Task.Run() wouldn’t be seen as egregious depending on the context.

For JavaScript, the answer is less clear. The easy answer is no, it’s not worth it. There are too many environments and too many bundlers for one to easily satisfy them all. A library author can’t predict all the use cases users will have, and excluding them is probably worse than if we only offered a synchronous API. Not to mention the extra work required to write code and configuration to manage off the main thread work would be a hassle for every library developer.

But on the flip side, who better is there to deliver a responsive and ergonomic API than library developers? Why perpetuate an unresponsive web when we can do something about it? Web workers aren’t nice to use and we might be able to shield newcomers from this pain.

With these thoughts, I’ve settled on a rough guideline: introduce an API for off the main thread computation if the library in question is large enough such that it could be the center of an app (preferably many apps so that the cost of creation is offset by the ease of use). ffmpeg.wasm is a good example of such a library. Utility libraries will keep their synchronous nature, though I will chafe under this recommendation at the thought of hashing libraries being synchronous while the subtle crypto digest API is able to offload work.

This conclusion leaves a bit of a sour taste, but it does seem that the responsibility of a responsive web falls to the application developer and their skills to splice workers into their code and bundler configuration. Good luck!

Comments

If you'd like to leave a comment, please email [email protected]