The hidden nuance of the JavaScript File API

Table of Contents

Did you know that the files originating from an file input don’t have all their data buffered into memory? Seems intuitive that this is how JS would work, otherwise web sites operating over files would be terribly memory inefficient.

Don’t let this efficient File deceive you. If you create one yourself, you’ll soon find that you need to buffer all data into memory.

To illustrate, below is an example that takes a file input and slices it into 4MB chunks.

<input type="file" onchange="processFile(...arguments)" />

<script>
  const CHUNK_SIZE = 4 * 1024 * 1024;
  function processFile(e) {
    const file = e.currentTarget.files[0];
    const start = performance.now();
    const fileSize = file.size;
    for (let i = 0; i < fileSize; i += CHUNK_SIZE) {
      const end = Math.min(i + CHUNK_SIZE, file.size);
      const chunk: /* Blob */ = file.slice(i, end);
      // do something with chunk
    }
    const end = performance.now();
    console.log(`${(end - start).toFixed(2)}ms`);
  }
</script>

Benchmarking on any sized file will be instantaneous, so we can infer that browsers try to delay as much work as possible until the file contents are required:

const buffer = await chunk.arrayBuffer();

Which can be streamed as well:

const stream = chunk.stream();

File seems like a great API that facilitates memory efficient code, so it’s intuitive that library authors would write functions that accept File objects. It’s a great developer and user experience.

But what if you have a file that does not originate from a control or gesture (eg: file input, drag and drop)? You can create a File, but the only constructor requires all data, up front, in memory!

Thus, it is impossible for us to match the memory efficiency of browsers’ disk-backed files. No fair!

It was a rude awakening working with a production system that was oriented around File objects, and my data was in a stream. Buffering this stream was a short term solution but large files would fail due to size limits on a Node.js buffer (limits that have been removed in Node v22).

Speaking of Node.js, they also ran into this problem. There’s a Node.js thread covering this topic and diving into the File API spec. The spec could use some elaboration as the ambiguity confused myself and others when it comes to memory vs disk-backed files. The Node.js thread concludes with the introduction of a new experimental Node API: fs.openAsBlob (though not without some potential size issues)

Not to add to the confusion, but Chrome implements a form of Blob storage that can transparently store blobs on disk:

If the in-memory space for blobs is getting full, or a new blob is too large to be in-memory, then the blob system uses the disk. This can either be paging old blobs to disk, or saving the new too-large blob straight to disk.

Browser ingenuity never ceases to amaze me. It’s almost like we don’t need to worry about in-memory efficiency. A philosophy that tracks, as it’s taken until 2024 for browsers to start implementing iterator helpers.

Perhaps if JS is learning from Rust, it can also learn Rust’s Seek trait and allow us to create a File from a seekable stream.

Even if Chrome could move blobs to disk without caveats, I would be uncomfortable relying on this implementation detail as other browsers and Node.js may behave differently. In the end, the production system could be updated to work over forward iterating file streams.

I may be in the minority, but a more explicit and flexible File and blob API would have saved me from this rabbit hole, but I understand why we ended up here. The File API is the same age as a teenager, and Blob is even older, so any improvements are likely to be transparent so as to not require spec changes to such long-standing APIs. These subtle changes make the API come off as privileged and make web development more difficult than it needs to be.

Time will tell how many of these APIs I encounter.

Comments

If you'd like to leave a comment, please email [email protected]