SQL reduction: ask for forgiveness
Published on:Table of Contents
In an app that has S3 files and a Postgres database storing file metadata, we might have a DELETE
route that looks something like:
- Select file from db, and if it does not exist throw a 404.
- If the file’s owner isn’t our session’s user, check the db if the session’s user is an admin
- Delete file from the db
- Delete file from S3
The above steps compose nicely: imagine middleware where we can annotate routes and have a realized FileDb
object that we know we have authorization over when given as a parameter into our route handler:
async function myHandler(file: FileDb, s3: S3) {
await file.delete();
await s3.delete(file.id);
}
// Implicitly or explicitly behind the scenes:
export const DELETE = withDb(withAuthorizedFile(withS3(myHandler)));
How many bugs and optimization opportunities can you spot?
Logic Bugs
Let’s start with the logical bugs. If the deletion from S3 fails we risk DB drift, where our DB is no longer an accurate representation of what is stored in S3.
Transactions allow us to rollback, so we should only commit the db deletion when we have the S3 deletion confirmation (ie: no exceptions thrown). So we add in a couple more steps:
- Begin transaction
- Delete file from the db
- Delete file from S3
- Commit transaction
Another bug exists: what if the owner of the file changes between steps? We’d risk an unauthorized deletion!
It might be tempting to begin the transaction at first step and call it a day, but the race condition would still be present, as illustrated below:
QUERY 1 QUERY 2
BEGIN;
SELECT id, owner FROM files
WHERE id = '1';
BEGIN;
UPDATE files SET owner = 'a';
END
CHECK owner; -- ....
DELETE FROM files WHERE id=$id;
END;
Our database file would be deleted, despite being owned by an unintended user! This is known as the read-modify-write anti-pattern.
Adding a row level lock of SELECT ... FOR UPDATE
solves the issue, but now as application developers we face a tough choice and have to decide how our abstraction changes to avoid this logic bug. Do we:
- Update middleware to create a transaction and a lock on the row for the duration of the route?
- Split the middleware in two, and push the decision on whether to lock to the route. For example, if a route declares a
FileDbLocked
, the route is wrapped in a transaction.
Neither solution is appetizing. Both come with performance degradations in the form of taking a needless lock or holding the lock for longer than necessary.
In times like these, we need to ask ourselves if the abstraction is really pulling its weight and provides a solid scaling foundation. Sometimes it’s only after we unwind database abstractions that we are able to truly leverage the database to its full potential.
Optimizations
For our DELETE
endpoint, we issue up to 3 SQL queries. Doesn’t sound bad if you’re using SQLite, but for clients that may be far from the database (like edge compute), network latency will quickly add up. Three round trips isn’t something to lose sleep over, but bear with the contrived example.
One technique is to move the admin check away from the DB and into the session. We store the admin privilege in a JWT cookie. No need to feel uncomfortable accepting this data from clients, JWTs are tamperproof. There are some caveats with this approach:
- Tamperproof but not private (if not encrypted)
- Server lookup still required to know if JWT has been invalidated by a logout (session data store can be something faster than postgres)
- Potential for session drift if not refreshed when DB is updated (eg: user is downgraded from admin to regular user, but their session still states they are an admin)
That’s a lot of caveats to go from 3 to 2 queries, but none may be a deal breaker as one has to manage session data anyways.
Ask for forgiveness
What if we flip the script. Instead of the DELETE route handler asking if the user has permission to delete the row, what if we assume the user can delete and ask for forgiveness via a rollback if we err.
This is accomplished via RETURNING
.
DELETE FROM files where id = 'abc'
RETURNING owner;
We check if the returned owner is authorized and if not, rollback.
What’s nice is that our DELETE statement can return exactly what we’re interested in, whereas our aforementioned middleware may realize the entire row into memory, which can be expensive for wide rows.
Here’s what our logic looks like now with only 1 SQL statement:
- Begin transaction
- Delete file from the db returning owner id
- If no file was deleted, throw 404
- If session isn’t admin and isn’t the owner: throw 403
- Delete file from S3
- Commit transaction
Pretty nice, right? We get performance and mitigate a class of DB inconsistencies. But what can’t be understated is the mental model shift. We moved from being able to write route handlers that could assume resource access to making each handler perform these checks. It would be all too easy to omit these checks or write code against an incorrect mental model.
Know your database
Get comfortable with how your database driver exposes errors. If we disallow uploading files that already exist (based on the file’s hash), don’t write application code to detect it via a SELECT
(remembering to include the query in a transaction and use FOR UPDATE
). Instead use a unique index, introspect the error on insert failure, and if you see code 23505, you know the insert failed due to a unique constraint violation.
In a similar vein, don’t be afraid of conflicts. If a user is logging in, but it’s unclear if they are new, don’t query the database. Act like they are new. Allocate a surrogate key and create a record for them and use this record for the next steps in the login flow. This way with a single SQL statement we can get the existing and new user data. This is known as an upsert operation, and if it’s unclear what field to update on conflict, create a last_logged_in
column so you guarantee the return result set has a row.
Prefer sub-queries over multiple queries. In a cursor-based pagination, I was able to avoid realizing an intermediate query to determine the cursor’s offset by inlining that query into the pagination step as a condition that was invoked as a sub-query. Simple, effective, and definitely a test of the application’s database abstraction.
Last but not least, move application code to the database. For instance, in a world where files have an associated score and upload session, we may want to return the top 10 files by score without duplicate upload sessions being represented. In code, we could accomplish this with two SQL queries and an intermediate step of sorting and grouping, but we can use window functions and let the database do everything in a single query:
SELECT id, score, upload_session FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY upload_session ORDER BY score) AS rn
FROM files
) ranked
WHERE rn = 1
ORDER BY score
LIMIT 10;
To wrap the post up, be on the lookout for consolidating and reducing the number of SQL statements. It definitely shouldn’t be the end goal to have as few SQL statements as possible, as there may be tradeoffs (like leveraging JWT) and in leaving behind ergonomic abstractions where developers don’t need to think about race conditions and what type of row locking is appropriate.
At some point, developers need to be aware of the semantics of data they’re querying and mutating. And I want to lean into this. When situations arise where it’s easier, more efficient, or less bug prone to ask the database for forgiveness than permission, developers shouldn’t be hamstrung by premature abstractions.
Comments
If you'd like to leave a comment, please email [email protected]