Accessing public and private B2 S3 buckets in Rust

The AWS S3 Storage API is ubiquitous and has been picked up by other 3rd party storage vendors like Backblaze B2, Minio, Wasabi, Storj, and IDrive. This is excellent for developers and sysadmins as it facilitates integration testing and experimentation with cloud storage providers. There is an AWS SDK available for 10 languages so chances are you can use the official SDK and connect it to a non-AWS endpoint.

Rust is the 10th language with an AWS SDK and it’s unlisted at the previous link, as it was released for developer preview a month and a half ago. Examples are sparse and connecting with 3rd party providers is even scarcer. So let’s do our part and fill the void a bit.

Fetch public files with HTTP client

Before getting to the main content, I want to share a little trick using public S3 compatible buckets.

Did you know that you can ergonomically access public files with just an HTTP client?

As we’ll see, the AWS SDKs are large and can significantly increase compile time and code size. Adding the SDK as a dependency can be killing a fly with a sledgehammer, especially due to the ceremony needed to instantiate whatever AWS service and client required. It can be quite preferable to use your language’s built in HTTP client or the lightest weight one available.

For me that involves using the attohttpc crate, which is a minimal HTTP client library. I’ve used this to great effect in my projects where test fixtures were migrated from Git LFS to Backblaze B2, otherwise I risked an expensive bill. Read more about using Backblaze B2 as a cheaper alternative to Github’s Git LFS.

The code resembles the following:

let filename = "my-data.zip";
let bucket = "MY_BUCKET";
let url = format!(
    "https://{}.s3.us-west-002.backblazeb2.com/{}",
    bucket,
    filename
);
let resp = attohttpc::get(&url).send()?;

if !resp.is_success() {
    bail!("expected a 200 code from s3");
} else {
    let data = resp.bytes()?;
    std::fs::create_dir_all(cache.parent()?)?;
    std::fs::write(&cache, &data)?;
    data
}

Very simple. Much simpler than any alternative. The one caveat demonstrated above is that remote files are stored in a cache directory so that future test invocations don’t require internet access.

Private bucket

Our HTTP client trick won’t get us too far when we need ergonomic access to the other APIs that S3 offers, so it’s time to dip our toes into Rust crates that will allow us to flex what S3 is capable of. Our goal will be to stream an object from a private S3 bucket to a local file.

AWS Rust SDK

In order to use the AWS Rust SDK for S3, we need to first add 4 dependencies:

 [dependencies]
+aws-config = "0.4.1"
+aws-sdk-s3 = "0.4.1"
+tokio = { version = "1", features = ["full"] }
+tokio-stream = "0.1.8"

Be warned, these four crates bring over 100 dependencies with them. It can be hard to stomach, but if one is leveraging the tokio runtime for other async tasks or will be making extensive use of the SDK, then the cost will be amortized over time.

Anyways, the code to connect the Rust AWS SDK to a third party S3 provider is below.

use aws_sdk_s3::{Client, Config, Credentials, Endpoint, Region};
use tokio_stream::StreamExt;

async fn main() -> anyhow::Result<()>
    let access_key = "MY_ACCESS_KEY";
    let secret_key = "MY_SECRET_KEY";

    // One has to define something to be the credential provider name,
    // but it doesn't seem like the value matters
    let provider_name = "my-creds";
    let creds = Credentials::new(&access_key, &secret_key, None, None, provider_name);

    let b2_s3 = "https://s3.us-west-002.backblazeb2.com";
    let b2_endpoint = Endpoint::immutable(b2_s3.parse().unwrap());

    let config = Config::builder()
        .region(Region::new("us-west-002"))
        .endpoint_resolver(b2_endpoint)
        .credentials_provider(creds)
        .build();

    let client = Client::from_conf(config);
    download_object(&client, "my-obj-key").await?;
    Ok(())
}

async fn download_object(client: &Client, key: &str) -> anyhow::Result<()> {
    let mut obj = client
        .get_object()
        .bucket("MY_BUCKET")
        .key(key)
        .send()
        .await
        .with_context(|| format!("unable to retrieve: {}", key))?;

    let out_path = Path::new("assets").join(key);
    std::fs::create_dir_all(out_path.parent().unwrap()).context("cannot create directories")?;

    let mut file = File::create(&out_path)
        .with_context(|| format!("unable to create {}", out_path.display()))?;

    while let Some(bytes) = obj.body.next().await {
        let data = bytes.context("download interrupted")?;
        file.write(&data).context("unable to write to file")?;
    }

    Ok(())
}

The code is nothing special, but since it took me some digging to uncover how to set everything up, I figured others would appreciate it. The AWS docs always seem to assume instantiation from the environment, which doesn’t make sense for 3rd party providers.

Alternatives

The official AWS SDK isn’t the only option for communicating via the S3 API. The two other crates I’m familiar with are Rusoto and rust-s3. Before the AWS SDK, Rusoto was the preeminent way to communicate with AWS compatible services. Rusoto also came with the trade-off and code size that the async ecosystem brings, but it worked well. However, now that the official AWS SDK has been released (albeit in developer preview), the Rusoto crates seem doomed to fall unmaintained. This puts greenfield projects in a bit of a predicament: take the bleeding edge or allocate time for a migration to the official SDK once production readiness is declared.

Rust-s3 appears nicer as one can opt out of the async runtimes and just use a synchronous API (backed by none other than attohttpc mentioned earlier). Before I could do extensive testing, the crate had some incompatibilities with backblaze b2. I’m not sure who is to blame, but backblaze appears to return list bucket results without a name field and rust-s3 doesn’t like that. Since the AWS SDK lists name as an optional field, it would seem like this is erroneous, but my unfamiliarity with the S3 API leaves me unsure. And when it doubt, it seems best to err on the side of the official SDK

Anyways, hopefully this quick tour of the official Rust AWS SDK communicating with 3rd party storage providers has been helpful.

Comments

If you'd like to leave a comment, please email hi@nickb.dev

2022-04-18 - Benjamin Stammen (https://benjaminstammen.com)

I wanted to spend the time to thank you for the post. It saved me a lot of time, being as new to Rust as I am. I was using b2-client, and was considering using rust-s3 before seeing your post. Had uploads working in pretty short order after reading through it!

2022-04-18 - nick

My pleasure Benjamin! I know when I stumble through a problem, I make sure to write the solution down for others.