Too edgy, a serverless search

There’s one endpoint in PDX Tools that has been a bit of a thorn in my side. It’s a critical, resource intensive endpoint that accesses proprietary embedded assets, but it’s executed very infrequently. For a site that recently broke a million monthly requests, only about 100 of these are to this endpoint.

The endpoint in question digests an uploaded compressed file and returns a payload that is used for validation and persisted into a database. In the age-old space-time tradeoff, the file parser trades memory for speed with tape-based parsing as employed in simdjson for this proprietary format held entirely in uncompressed memory. I don’t regret making this tradeoff when writing the parser, as multi-gigabyte parsing throughput is a powerful feature for many users.

The tradeoff is that current memory usage is a bit excessive. Parsing a 50MB file requires a 200MB tape for a total of 250MB, and a 100MB file would require a total of 500MB. I could learn a thing or two from simdjson, as it is better memory optimized, fitting its token types in varying lengths from 64 to 128 bits, while I need 192 bits for every single token. If I adopt a tape encoding like simdjson, the best case scenario would see a total memory reduction of about 50%. I blame Rust enums for the bloated 192 bits. They are so ergonomic and easy to reach for, but sometimes they aren’t space efficient. I’ll explore this alternative tape encoding in the future.

The reason I harp on memory usage is that a function potentially consuming over 512MB limits hosting options as I found out this weekend.

It would be awesome to host the entire app on the edge, but the 128MB memory limit imposed by both Cloudflare Workers and Fastly’s compute@edge would require splitting the function into a standalone endpoint hosted elsewhere. I’m witnessing the birth of microservices.

Until these limits are raised, let’s consider more traditional serverless providers if the function is spun off into its own service . I’m not interested in a VPS, which is where the app is currently hosted as of time of writing, as I’m paying a flat rate for a box that is 99.99% idle. I want something where I pay for only what I use, and I have no interest in needing special libraries like AWS lambda; I want smooth local DX that is the same as production.

Here are some services that I looked at but didn’t go with:

Shuttle.rs must build from source, and doesn’t allow direct uploads, which is important when the source doesn’t have access to proprietary assets.
Railway, despite advertising support for dockerfiles seems to only support it when building from source (which isn’t possible here).
Render has flat and usage pricing that is too expensive
fly.io has a generous free tier for 3 256MB instances

At first glance, fly.io’s free tier may not seem possible, but I learned that one can allocate swap prior to starting the application.

FROM alpine:3.18
RUN echo -e > /start.sh "#!/bin/sh -e\n\
  fallocate -l 1024MB _swapfile\n\
  mkswap _swapfile\n\
  echo 10 > /proc/sys/vm/swappiness\n\
  swapon _swapfile\n\
  echo 1 > /proc/sys/vm/overcommit_memory\n\
  /app" && chmod +x /start.sh
COPY /pdx-tools-api /app
CMD ["/start.sh"]

Two amusing tidbits came out of testing this:

The dockerfile is invalid. If you try and run it, alpine will fail with “can’t create /proc/sys/vm/swappiness: Read-only file system”. It’s only due to fly.io “transmogrifying container images into Firecracker micro-VMs” behind the scenes that the swap instructions will succeed.
I was expecting horrid performance on the 100MB file as most of the allocated memory would be swap. However, I was pleasantly surprised with tolerable performance. This must be a testament to the mostly sequential memory access and fast disks that underpin fly.io

While fly.io is a solid contender, the ultimate winner is GCP’s Cloud Run:

Usage based pricing that lands me well within the free tier
Container based, so no code changes to accommodate production environment
No need for Cloud Run to access source project

I’m sure there are equivalent offerings from AWS and Azure, but I don’t know them. Maybe it’s AWS ECS. Whatever it is, I’m all set as long as there’s a service that can just run arbitrary dockerfiles.

Cold Starts

With a service that can scale to zero, I’ve been wary of cold starts. I know Theo has claimed 10 second cold start time in his videos. Thankfully, in testing, I’ve seen nowhere near this level. The median for container startup latency hovers around 100ms and the worst 99th percentile has been 250ms, which is totally acceptable given that the function runs for 500ms.

My assumption is that the purported 10 second cold start time comes from 500MB images bundling an entire Next.js and Prisma application, which is 100x bigger than the Rust image I’m talking about, and won’t start up nearly as fast. If I wanted to subject myself to vendor lock-in, lambda-perf records Rust on AWS lambda with an average cold start of ~20ms.

But I’m not worried. If I wanted to optimize the cold start away, I’d have clients prime the pump with a wakeup call prior to compressing the payload for transit.

Besides, the stage after we call the function, where the file is uploaded to an S3 provider, can be much worse. I’ve witnessed it taking over 30 seconds.

These “cold starts” are an easy price to pay.

What now?

Why did I take a monolith running on a VPS and split out a function onto a serverless provider? Why am I complicating my life?

I mentioned earlier about moving the application to be hosted on edge servers. This way I could ditch the VPS and save some money.

Is this the meme of being penny wise but pound foolish; a developer spending hours of their time to save $24 bucks a month? Or how frontend developers must always chase the next shining thing? Yes, but to that I say, this is a side project, it’s my prerogative if I want to spend my time learning about new technologies and paradigms. And if I am able to successfully run within free tiers, well, now I can afford Youtube Premium.

To me, it’s more of an opportunity to straighten out the frankenstein build system, where the Next.js backend is built into a dockerfile while the frontend is statically exported into the soft-deprecated Cloudflare worker sites. Whenever I build the site, there’s build errors complaining about the mishmash, which makes me nervous that the build system is brittle and I’m living on borrowed time.

It’d be nice to be on the happy path, simplify the build, and rest easy.

Am I ironically making my life more complicated by spinning out microservices in pursuit of simplifying the build? Yeah, but the hope is limits can be raised on edge compute, and once they do, I can fold the microservices back in.

Too edgy, a serverless search

Table of Contents

Cold Starts

What now?

Comments