Rethinking web workers and edge compute with Wasm

Web workers are crucial for responsive client side compute, but they have a notorious ergonomic hurdle, so large that I’ve previously debated whose responsibility it is to offload compute on the web: the library or the application developer. Not to spoil that article, but the decision is nuanced, essentially boiling down to: it is the library developer’s responsibility if it is the centerpiece of applications.

Here we’ll focus on the application side.

What’s with the “Rethinking” in the title then? How should web workers be rethought?

It dawned on me that I could be treating web workers almost as if I am communicating with a server, and that the same tools for managing application logic for remote communications, can be used for local message passing with web workers.

The demo: Rocket League parser

I put a demo together, the Rocket League Replay Parser (rl-web), where one can select a local replay file (or use the provided sample) and it is parsed in the browser inside a web worker.

In a sense, the user “uploads” the file to the web worker and then the UI makes requests to the web worker for information about the parsed file.

Sounds an awful lot like HTTP requests against a traditional server, right? What if we could use the same libraries that make management of HTTP requests a breeze, and apply them to web workers?

The first step of this journey is to use comlink, which sets up an async facade to web workers. I consider comlink a must-have for any site seriously using web workers. We can now use the async interface as a building block.

TanStack Query for web workers

TanStack Query (née React Query until it was recently made framework agnostic), a library for “powerful asynchronous state management”, is typically paired with remote fetches, as seen below in the classic React example.

function Example() {
  const { isLoading, error, data } = useQuery({
    queryKey: ['repoData'],
    queryFn: () =>
      fetch('https://api.github.com/repos/tannerlinsley/react-query').then(res =>
        res.json()
      )
  })

  if (isLoading) return 'Loading...'

  if (error) return 'An error has occurred: ' + error.message

  return null; // ...
}

But as evidenced by the generic wording of TanStack Query’s tagline, and the fact that queryFn is directly invoking fetch, we can adapt it for web worker usage.

function Example() {
  // get reference to our web worker:
  const parser = useReplayParser();

  const { isLoading, error, data } = useQuery({
    queryKey: ['replayData'],
    queryFn: () => parser().replayInfo()
  })

  // ...
}

The biggest gain is that interacting with a server and web worker now looks and feels the same.

Using TanStack/React query for web workers isn’t novel. In his YouTube video, Jack Herrington shows this functionality off too, among other use cases.

Web worker and server similarities

Should web workers and servers actually be treated the same, though? Or is TanStack Query a hammer and everything is a nail? We can go over the motivations that TanStack Query posited on the site on how server state is different from client state. Then we can compare that to web workers.

[server state] is persisted remotely in a location you do not control or own

⚖️ Debatable for web workers. A web worker is entirely within the client’s control. No other user can directly manipulate the state of the web worker. An argument can be made that the data residing within the web worker is at the same level of accessibility as server state, as if the web worker does not expose a mechanism for retrieving or updating data, then the client has no access.

Requires asynchronous APIs for fetching and updating

✔️ Web workers require asynchronous communication by nature

Implies shared ownership and can be changed by other people without your knowledge

❌ Unlike server state, others can’t change our web worker state without our knowledge

Can potentially become “out of date” in your applications if you’re not careful

✔️ TanStack query with web workers also benefit from recalculating data with dependent queries and query invalidation, so that UI data is never out of date.

A decent fit; not too far off the mark.

From Kent C. Dodds point of view, there are two types of state: server cache and UI state. This muddies the water a bit. If I was forced to choose between the two, I’d say Web workers hold UI state behind an asynchronous interface. On page refresh, all state within a web worker is lost if not persisted locally. And the data returned from a Web worker is fresh until the user invalidates it.

As unsatisfying as this might sound, whether TanStack Query is right for you may depend on whether you are currently leveraging it as a server cache, or find yourself writing a poor man’s version anyways for Web workers.

Web worker configuration and hooks

In an application that has web workers and makes network requests, we’ll want to supply different configurations depending on the action, as there are several important defaults that are irrelevant when interfacing with web workers where state can’t change without our knowledge.

const workerQueryOptions = Object.freeze({
  networkMode: "always",
  staleTime: Infinity,
  refetchOnMount: false,
  refetchOnWindowFocus: false,
  refetchOnReconnect: false,
} as const);

useQuery({
  ...workerQueryOptions,
  ...options,
});

const workerMutationOptions = Object.freeze({
    networkMode: "always"
} as const)

useMutation({
  ...workerMutationOptions,
  ...options,
});

It is possible to instantiate completely separate QueryClient for network and web worker requests, but I decided against it to leave room for as much interplay between the two query mechanisms as possible.

I also decided against introducing a custom hook, like useWorkerQuery and useWorkerMutation where the default options shown above are set and the caller receives a primed web worker in the queryFn, simply due to the overwhelming type signature required. Feel free to let your eyes glaze over:

function useWorkerQuery<
  TQueryFnData = unknown,
  TError = unknown,
  TData = TQueryFnData,
  TQueryKey extends QueryKey = QueryKey
>({
  queryFn,
  ...options
}: Omit<UseQueryOptions<TQueryFnData, TError, TData, TQueryKey>, "queryFn"> & {
  queryFn: (
    parser: ReplayParserClient,
    context: QueryFunctionContext<TQueryKey>
  ) => TQueryFnData;
}) {
  const parser = useReplayParser();
  return useQuery({
    ...workerQueryOptions(),
    ...options,
    queryFn: (ctx) => queryFn(parser(), ctx),
  });
}

And this doesn’t work with all overloads of useQuery, and requires refinement. Better to jettison it. A TanStack Query maintainer agrees, abstractions over it should be as narrow and specific as possible so that one doesn’t trip over types. Otherwise, don’t abstract it.

A soft rule of thumb for application developers: if an abstraction requires more lines of code related to type signatures than the implementation and amount of code saved, then perhaps it is best to avoid said abstraction. Especially since it increases coupling with a library’s types. I would not look forward to updating the signature with each major release.

State, state, and more state

A controversial statement, but I find a state manager, like Context, Zustand, or Redux Toolkit, is still required when working with TanStack Query. It’s controversial as it goes against the takeaway from the article React Query as a State Manager

React Query is great at managing async state globally in your app, if you let it. […] resist the urge to sync server data to a different state manager

I believe another state manager is required, as unlike query functions, the status of useMutation is not shared across usages. In a traditional CRUD app, this limitation isn’t too restrictive as useMutation may be a rarity, and it may have a corresponding useQuery that can be updated from the mutation response. To workaround a lack of a useQuery, one could concoct a synthetic useQuery for mutation data by having it disabled, a type casted no-op query function, and forgoing nearly everything else (errors, loading, and type safe updates). But that doesn’t sound too appetizing, does it?

To ground the discussion, imagine we want the same behavior regardless of if a user parses a replay by selecting a file with an input element or if they drag and drop a file onto the page.

The useMutation instance to load the file into the web worker can be hoisted up to a common ancestor between the input and drop zone. This takes care of the actual firing of the mutation, but we may want a loading spinner or render the output from the file load in a far off location. Hoisting the useMutation even farther up is not conducive from an ergonomic and performance standpoint, as it may require excessive prop drilling through unrelated components or trigger unnecessary re-renders.

Hence, an additional state manager is needed to allow fine grain updates to the UI. I’ve found a small amount of success by hooking mutation callbacks into state updates for data, and using query filters to know if there is an worker action in flight and a loading spinner may be necessary.

import { useIsFetching, useIsMutating } from "@tanstack/react-query";

export function useFilePublisher() {
  const queryClient = useQueryClient();
  const parser = useReplayParser();

  // zustand actions to update store depending on if
  // the file was successfully parsed.
  const { parseError, parsed } = useReplayActions();

  const { mutate } = useMutation({
    mutationKey: ["worker", "parse"],
    mutationFn: (input: ParseInput) => parser().parse(input),
    onSuccess(data, variables, _context) {
      parsed(data, parseMode, { input: variables });
    },
    onError(error, variables, _context) {
      parseError(error, { input: variables });
    },
  });

  return { mutate };
}

// assuming all queries / mutations with the worker are done
// under the "worker" umbrella we can use that to know
// if we should show some sort of loading spinner.
export const useIsActionInFlight = () =>
  useIsFetching(["worker"]) + useIsMutating(["worker"]) !== 0;

The result isn’t too clunky if this doesn’t need to be repeated frequently.

Obstruction

I have a use case where TanStack Query was a bit of an obstruction: when a user clicks a button, the Wasm generates a large payload that is immediately downloaded. The code is straightforward:

function downloadData(data: BlobPart, fileName: string) {
  // ...
}

function downloadJson({ replay }) {
  const parser = useReplayParser();
  return (
    <button onClick={async () => {
      const { data } = await parser().replayJson();
      downloadData(data, replay.input.jsonName());
    }}>Download</button>
  );
}

When we add TanStack Query, things get a bit more complex as we need to query on demand and be doubly sure that we don’t cache the response, otherwise we’d potentially subject the user to prolonged high memory usage.

const workerQuery = useQuery({
  queryKey: ["worker", "json", replay.input.path()],
  ...workerQueryOptions,
  queryFn: async () => {
    const { data } = await parser().replayJson();
    downloadData(data, replay.input.jsonName());

    // I don't want this large data cached
    return null;
  },
  cacheTime: 0,
  enabled: false,
});

It may be more code, but at least we get errors, loading states, and remain consistent in using queries with other parts of our code.

The Edge

One thing I love about the flexibility of Wasm, is that it can be deployed at the edge. I know it’s a bit of a buzzword, but I am excited about edge computing. My applications are world wide, and it would be nice for as much as possible, if not everything, to be deployed at the edge, so all users see low latency.

We first need to add edge ready endpoints in a Next.js application.

The most annoying thing with Next.js and Wasm is the requirement to include a ?module query param for the Wasm import.

import wasmModule from "../../crate/pkg/rl_wasm_bg.wasm?module";
import { ReplayParser } from "@/features/worker/ReplayParser";
import init, * as RlMod from "../../crate/pkg/rl_wasm";

export const config = {
  runtime: "experimental-edge",
};

async function instantiateParser() {
  await init(wasmModule);
  return new ReplayParser(RlMod);
}

export default async function handler(req: Request) {
  // ...
}

For those curious why the query param is necessary, it is due to Cloudflare Workers externally compiling the Wasm for us, as Wasm compilation is disallowed on Cloudflare Workers. So my guess is that this tells the Next.js webpack internals to split out the Wasm to be eligible for Cloudflare Workers.

The next step is to exclude our Wasm from being processed as an asset, which is needed for the Browser.

config.module.rules.push({
  test: /\.(wasm|replay)$/,
  resourceQuery: {
    not: /module/,
  },
  type: "asset/resource",
});

And for those who love type safety (I know I do), you’ll need to create a custom module so TypeScript knows what a ?module file returns.

declare module "*?module" {
  const content: WebAssembly.Module;
  export default content;
}

With the setup complete, if we are to keep with the theme of the article, what is being rethought here?

Well, I think it’s important to rethink how an application is architected in an edge first environment.

If we have Wasm on the client side for analyzing files, we can theoretically deploy the same Wasm to the edge if we wanted a file upload endpoint that would validate or extract data for a database. That means I can rethink how the underlying Rust code is imported in the node.js server, moving it from a N-API compiled module to Wasm.

Aside: Presigned file uploads

A small rant if you will. I greatly enjoy videos and interviews that Theo (of T3 fame) produces. Always engaging and insightful. A great resource for senior devs. I don’t always agree with Theo’s opinion, which in and of itself is fine, but sometimes they are too dismissive of solutions that I believe are a good fit. For instance, here is a tweet about file uploads:

Every time someone complains that serverless/tRPC/NextJS/Vercel are bad “because multi-part file upload is hard” I lose a bit of sanity

You’re using S3. Please use presigned POST URLs. aaa[…]

@t3dotgg | Theo - ping.gg | Aug 28, 2022

I’m guilty of this. In other projects, I have users upload to the server, which uploads to S3, and I find myself chronically second guessing this solution. Then I stop and walkthrough why sending a file directly to the server is a good fit:

  • The server needs to parse the file to validate and extract info for the database, why preemptively upload to S3?
  • Presigned URLs would require additional communications with the server to alert when a file has been uploaded. There’s vendor specific webhooks, but in this age of so many compatible S3 servers (B2, R2, Wasabi, etc), I want to avoid vendor lock-in if possible.
  • Sending to the server first allows one to preprocess the upload (eg: improve data compression).
  • A simpler mental model

The takeaway is that even when you see the zeitgeist shifting away from your current solution, check your understanding of the problem space.

The good news is that parsing multi-part file uploads are simple on the edge, requiring only a few lines of code and no 3rd party dependencies:

async function getFileData(req: Request): Promise<ArrayBuffer> {
  const form = await req.formData();
  const file = form.get("file");
  if (file === null || !(file instanceof File)) {
    throw new Error("file incorrectly provided");
  }

  return file.arrayBuffer();
}

For the T3/tRPC fans, it probably will require hosting that endpoint outside of tRPC until it properly supports file uploads through additional content types and has improved support for the edge.

Aside: Prove ideas on smaller projects

The Rocket League Replay Parser (rl-web) is a tech demo, but it is conceptually similar to another project of mine, PDX Tools, which parses EU4 save files and is built on a similar tech stack. PDX Tools is much bigger and complex, so I’m loath to experiment on it, as I know that even if the experiment is successful it could take me a week or two to fully implement it.

On the other hand, I can completely rewrite rl-web in a few days. Though rl-web is a much smaller project, since the two projects are similar, I can easily extrapolate learnings from one to the other, allowing me to be more confident in refactors.

So if I could impart advice, whenever you’re stuck, burned out, or too scared to try out new tech or a library on your magnum opus, create a smaller project. Remove constraints and guardrails and experience pure unadulterated exploration and discovery.

To give additional examples of how I’ve learned with a smaller repo:

  • Do I like tailwind (hint: the answer is yes).
  • How to style and structure a drag and drop file load
  • How newer React state managers (Jotai, Zustand, etc) work in practice
  • The best way to tie web workers into React’s lifecycle
  • Can I take advantage of any of Next.js 13 features?
  • What does Vercel hosting look like instead of using the static site output from next export and using Cloudflare pages.

When working on a large project is overwhelming, try introducing smaller changes on a less critical project. It may help solve decision fatigue.

Conclusion

Apologies. This turned out to be more of a ramble than anticipated, but here’s what I want to convey:

Web workers are an asynchronous form of state and compute that are not ergonomic. Tools like comlink are must-haves, and depending on the use case, TanStack Query may be a good fit too. Just remember that it is geared towards network requests for shared, mutable data, and that it doesn’t completely obviate the need for a UI state manager.

The edge is an exciting place to be deploying backend code, as we can use existing Wasm from client side to validate data server side. And can accomplish the same with less code and dependencies than it would take on Node.js.

Comments

If you'd like to leave a comment, please email [email protected]