Next.js on Cloudflare: a gem with rough edges

I came across some high usage under my Vercel account.

Vercel usage donut charts. Cropped for brevity

Vercel usage donut charts. Cropped for brevity

What is “Fast Origin Transfer”? The docs are vague, and whatever it is, seems like I’m about at my limit.

Vercel used to have straightforward pricing, but they recently updated their model.

Instead of two large, combined metrics (bandwidth and functions), you will now have granular pricing which allows you to optimize each metric individually.

For bandwidth and functions, Vercel’s admin dashboard exposes the top contributing paths for each usage, and it’s easy enough to triage issues and optimize accordingly.

This is not the case for the new fine grained usage patterns, like Fast Origin Transfer. The docs for investigating Fast Origin Transfer usage leaves one wanting more:

Limited docs on investigating Fast Origin Transfer

Limited docs on investigating Fast Origin Transfer

The docs make the assumption that Fast Origin Transfer is correlated with invocations. What about lop-sided requests, where infrequent but large responses dominate usage but won’t show up in the number of times requested?

When pricing calculations are updated, it is essential to maintain transparency.

If I had to guess, my culprit lies in an endpoint that acts as a reverse proxy to an S3 bucket. The endpoint runs on the Next.js’s edge runtime, and the Vercel’s edge runtime runs on Cloudflare. And since Backblaze is the S3 provider, I don’t think I’m dinged for bandwidth with two Bandwidth Alliance partners considering there’s a whole article on how this setup results in free data transfer. Thus, my operating assumption is that fetching the data is free but not sending it.

Experiments could determine the logic of this black box and if I’ve identified the culprit. If that was the culprit, I’m not prepared to remove the S3 proxy. Time to look elsewhere.

Migrating to Cloudflare Pages

Vercel is not the only Next.js host in town. In addition to a standalone Next.js output that allows Next.js to be deployed on any VPS, two full service hosts are Netlify and Cloudflare Pages.

For the purposes of this post, I want to focus on what was required to migrate from Vercel to Cloudflare.

Since I already subscribe to Cloudflare’s Workers Paid plan, for $5/mo, the limits are:

  • 10M requests/month
  • 30k seconds of CPU time/month

These limits do not apply to static assets! And I can tell you right now, that the site I’m migrating doesn’t make 10 million API requests per month. Most importantly there is no limit on bandwidth.

Cloudflare also takes the cake when it comes to performance:

  • Requests are served with HTTP/3 and I consistently observe lower latency than with Vercel.
  • I never recorded an API request to Vercel’s edge runtime with a latency of less than 150ms, yet I routinely observed Cloudflare serve the same request in 15ms. A 10x improvement.

With the benefits out of the way, Cloudflare Pages doesn’t always come recommended for Next.js.

Edge Runtime

The largest caveat will be the lack of a nodejs runtime. Only routes declared with the edge runtime will run on Cloudflare’s Worker platform. This will be a deal breaker for most, but not me.

You see, I have API endpoints that run on nodejs as they talk to a database, but I can’t deploy them on Vercel due to Vercel’s 4.5MB body size limit on serverless functions. So Vercel was already restricted to running edge functions while nodejs functions are split off to run on a VPS. If this technique interests you, read how to Split Next.js across hosting providers.

To be fair, Cloudflare does have body size limits too, but at 100MB they are 20x more permissive. I did run into this limit when running a self hosted docker registry behind Cloudflare and documented what happened when a new ISP routed traffic through a CGNAT. For the app in question, 100MB will allow plenty of headroom.

When it comes to split deployments, unfortunately Cloudflare’s packaging tool, next-on-pages, will error due to the presence of nodejs even if those routes are latent. Thankfully, patch-package can be leveraged to remove the error. Patching transpiled node dependencies isn’t fun (critical mass of Typescript in nodejs when?), but it beats forking the project.

Once node-postgres supports Next.js edge runtime, there’s a world where all routes are edge compatible. Even if running everything on edge has dubious performance improvements due to potential multiple round trips needed with a database (something Cloudflare is hoping can be mitigated through their Smart Placement feature), removing the need for a split deployment would be welcomed.

fetch Compatibility

Switching over to Cloudflare Pages had a few compatibility speedbumps.

Next.js has a fetch cache of 2MB, much to the chagrin of a few. Since I didn’t need the cache, I could disable the following so I wouldn’t see the console spammed with warnings.

// Use "no-store" to fix the following error:
// > Failed to set fetch cache
// > [Error: fetch for over 2MB of data can not be cached]
fetch("/mydata", { cache: "no-store" });

On Cloudflare workers, however, “no-store” will cause a “The cache field is not implemented” exception to be thrown. Bit of a contentious topic too. I wonder what Vercel does behind the scenes as their edge runtime runs on Cloudflare Workers, so you’d imagine the same limitations exist.

So I removed the “no-store” and decided that being spammed is better than a failing function. Perhaps if it becomes too much of a headache, I can conditionally set fetch parameters based on the environment.

On the topic of caching, Cloudflare gives a lot more transparency and levers, and I was curious if I could have Cloudflare cache the contents that are fetched from a S3 bucket, which takes around 1 second to start responding with data.

I assumed that responses with a Cache-Control would automatically be cached. This is not the case as evident by the cf-cache-status: DYNAMIC header in the response, which means that Cloudflare will always go to the origin for this request.

I initially tried to see if we could coerce the version of fetch that Cloudflare instruments to cache everything, per what the docs led me to believe:

// Did not work :(
fetch("/mydata", {
  cf: { cacheEverything: true }
})

No dice. I needed to set up a Cache Rule through the UI. I’m still trying to figure out what on earth cacheEverything means.

Still, the results ended up fantastic. Instead of 1 second of latency for first byte, Cloudflare’s cache starts responding in 20ms. That’s a 50x decrease in latency.

Amusingly, as part of this Cache Rule, Cloudflare overwrites the Cache-Control header to be:

public, max-age=14400

This wasn’t what I configured the response with. Digging through Cloudflare’s cache configuration settings, I saw that the default Browser Cache TTL is 4 hours. I changed this to “Respect Existing Headers”. Not sure why that’s the default, but changing it fixed the issue!

Perhaps having this much freedom will turn out to be a double edged sword.

Routing and Headers Compatibility

In Next.js one can define headers for all requests with the following config:

  headers: () => [
    {
      source: "/:path*",
      headers: [
        {
          key: "Content-Security-Policy",
          value: "default-src 'self';", /* ... */
        },
      ],
    },
  ]

With Next on Pages, by default only non-static files will have the headers, as the autogenerated _route.json signals to Cloudflare that these are static files and to not look at our Next.js config. This is a cost saving mechanism as static file requests are free, but function executions above a certain amount are not.

This posed a problem for me as I had Javascript files that needed to be annotated with Cross-Origin-Opener-Policy to run ffmpeg.wasm in the browser and would fail to load due to a policy violation without it.

To influence headers for static content on Cloudflare pages, one needs a public/_headers:

/*
  Cross-Origin-Embedder-Policy: require-corp

This is when I learned that Vercel hard links the public directory in the output and Next on Pages appends to the output, causing the “original” file to be updated as well. I’m happy to have contributed the fix, but I’m a little shaken that I’m seemingly the first person to care about headers with a project coming up on 2 years old.

To make it easier to keep headers in sync between next.config.js and public/_headers, they both come from the same source. Nothing fancy. A node js module that Next.j imports from and the build system automatically appends to _headers. Still, it’s annoying to maintain this, but better than having out of sync environments.

Another thing to note is how Vercel and Cloudflare Pages treat trailing slashes. In my Next.js site, I include a docusaurus build, which outputs files like /docs/index.html to be accessed at /docs. Cloudflare Pages does not work this way and in fact, there is a whole guide covering the differences in how hosts have different slash and index resolution behavior. The fix was to configure docusaurus with trailingSlash: false so both Vercel and Cloudflare Pages treat them the same.

Conclusion

Next.js on Cloudflare Pages is a bit of a second class citizen. It is almost a certainty to meet a migration speed bump in the edge environment, routing configuration, or bugs in build tools, and this assumes that your application can already run purely off the edge!

Despite these rough edges, Cloudflare undoubtedly has the edge (puns intended) when it comes to value and performance that users are sure to notice:

  • Edge functions responding 10x faster
  • A more powerful cache allows speedup of up to 50x.

In a parallel universe, Vercel’s pricing would be more transparent, and I would have never checked out Cloudflare. But it’s not, and I did. So my recommendation is if hosting Next.js on Vercel is giving some pains, consider Cloudflare (or Netlify or something else) if only to gauge the vendor lock in of your application.

Comments

If you'd like to leave a comment, please email [email protected]