A workaround for Rust's lack of structural subtyping

This example from a Rust issue does not compile:

struct X { a: u8, b: u16, c: u32, d: i8 } 
struct Y { a: u8, b: u16, c: u32, d: i8 } 
let x = X { a: 1, b: 2, c: 3, d: 4 }; 
let y = Y { a: 5, ..x };

The code does not compile as the struct update syntax, which is the ..x part from above, is contingent on the target struct being a subtype of the base (i.e. the exact same type except lifetimes). Though X and Y are structurally related as they contain the same fields; to Rust, they are completely separate. This is in contrast to Typescript where the above would be legal if translated, as Typescript is based on structural subtyping.

There is an RFC in progress to alleviate this issue somewhat by allowing the update syntax on structs with varying generic parameters, but it doesn’t help our example. Since there will be no language support for what we want in the foreseeable future, workarounds are required.

Is This a Problem?

The fact that Rust does not allow structural subtyping is not in and of itself a flaw. In fact, many aspects of Rust cajoles the programmer to good designs. Traits and ownership are good examples of designing upfront. But sometimes it’s not clear when one’s design vision is mired when switching between languages.

I have a problem in mind and hopefully what I’m getting at becomes more clear.

Imagine a dump of raw data that is extracted into a plain old struct with many fields:

struct RawData {
    pub data1: String,
    pub data2: i32,
    // ...
    pub datan: String,
}

There could be a hundred of these fields (yes, a struct of a hundred fields is realistic when ingesting simulation data that has hundreds of variables). The problem occurs when we want to enrich or append fields to the raw data. For instance, maybe our application can join the raw data with another source to give it a better name. To add a name field in Typescript we’d write:

type EnrichedData = RawData & {
    name: string;
}

But that syntax is not at our disposal. And it’s understandable if one jumps to try and replicate that structure in Rust. After all, with Rust becoming increasingly popular on the server and in the browser (ie: Wasm), it is desirable to create idiomatic and easily mappable structures across language barriers like reusing type names in Rust and Typescript to facilitate understanding.

Personally, answering this question is important as when I’m designing these inter-language structures and I feel constrained by the type system (but am not actually), I’d like to refer to this for a solution.

Non-Solution: Copy, Paste, Traits

Do not create a new struct that has the raw data copy and pasted:

struct RawData {
    pub data1: String,
    pub data2: i32,
    // ...
    pub datan: String,
}

struct EnrichedData {
    pub data1: String,
    pub data2: i32,
    // ...
    pub datan: String,
    pub name: String,
}

The first issue is that now there is more code that needs to be updated if RawData houses another field (and in this scenario it’s a certainty). Hopefully it is evident how error prone this is, or maybe I’m scarred by all the times I needed to double check that I propogated a new field correctly.

What happens if we need to pass a reference of RawData to a downstream function. One may initially reach for an AsRef implementation as conceptually we have cheap references to the fields necessary to construct RawData.

impl AsRef<RawData> for EnrichedData {
    fn as_ref(&self) -> &RawData {
        &RawData { /* */ }
    }
}

Unfortunately, this is a non-starter as we’re returning a reference to a temporary RawData and moving field data around.

Typically an idiomatic solution to this would be to introduce a trait, where the downstream function expects the trait and the raw and enriched data implement said trait. I’ve included such a trait for a single field:

trait DataTrait {
    fn data1(&self) -> &str;
}

impl DataTrait for RawData {
    fn data1(&self) -> &str {
        self.data1.as_str()
    }
}

impl DataTrait for EnrichedData {
    fn data1(&self) -> &str {
        self.data1.as_str()
    }
}

For those keeping track at home, we’ve repeated the data1 field now 5 times (or 7 depending on how you count). We’d then have to fill out the rest of the fields and hope we don’t make a mistake. Even if perfectly implemented, all the code duplication would quickly become fatiguing.

Solution: Composition and Serde

Instead of framing enriched data as raw data with additional fields, it should be framed as containing raw data with additional fields. Confused? A code example should clear the air:

struct EnrichedData {
    pub raw: RawData,
    pub name: String,
}

This is composition. No need to duplicate fields. No needless traits. We can pass around raw data for those that expect it, thus more closely adhering to Rust guideline of exposing intermediate results to avoid duplicate work. The solution is obvious if one steps back, but sometimes when one is a polyglot, the context switch between languages can cloud judgement in writing idiomatic code.

But what if we didn’t want to expose this composed structure across a boundary like HTTP or Wasm, which often use JSON as a data interchange format. What if we wanted to flatten all the fields so that it’s in the same format as if the data wasn’t enriched?

That’s where serde comes into play.

#[derive(Serialize)]
struct EnrichedData {
    #[serde(flatten)]
    pub raw: RawData,
    pub name: String,
}

Which will give us the desired format:

{
  "data1": "a",
  "data2": 2,
  "datan": "b",
  "name": "c"
}

Deserialization also works out of the box, so Typescript can transmit the data back into our composed struct.

Traits still have a place in this solution. If there are several layers of enrichment it can be more ergonomic to introduce an AsRef:

fn myfn<T: AsRef<RawData>>(t: T) {
    // ...
}

impl AsRef<RawData> for RawData {
    fn as_ref(&self) -> &RawData {
        &self
    }
}

impl AsRef<RawData> for EnrichedData {
    fn as_ref(&self) -> &RawData {
        &self.raw
    }
}

What if we wanted to enrich data that may or may not have a sensitive raw data field removed. We’ll assume that the data2 field from our examples is sensitive and make our enriched data struct generic.

#[derive(Serialize, Deserialize)]
struct NonSensitiveData {
    pub data1: String,
    // ...
    pub datan: String,
}

#[derive(Serialize, Deserialize)]
struct RawData {
    #[serde(flatten)]
    pub core: NonSensitiveData,
    pub data2: i32,
}

#[derive(Serialize, Deserialize)]
struct EnrichedData<T> {
    #[serde(flatten)]
    pub raw: T,
    pub name: String,
}

Aside: the above example assumes that the raw data is also derived with serde so that non sensitive data can be extracted without changing how the raw data is constructed.

The trait implementations now come in handy, as it’s easy to declare that the raw data can be derived from enriched data when available.

impl<T> AsRef<RawData> for EnrichedData<T>
where
    T: AsRef<RawData>,
{
    fn as_ref(&self) -> &RawData {
        self.raw.as_ref()
    }
}

And we can sprinkle these trait usages throughout the code and have more granular control over what is serialized to external clients.

Conclusion

In this post I showed how one can take a plain old struct in Rust with many fields, further enrich it with new fields, and expose the data in a format that conceals the internal composition. This results in succinct and idiomatic implementations in Rust and other languages (we used Typescript in this post).

This pattern is immensely useful when working with wide data types. If one is enriching on top of only a few fields then this pattern may be less beneficial as a deeply composed hierarchy can also be unergonomic as repeatedly writing code like x.raw.core.data1 may cause Law of Demeter advocates some heartache. And if at some point, narrow types are enriched to an extent that they really are a new type then there is no need to shackle them to their former selves.

In any case, one can add this method to their programming tool belt and strike at the ready.

Comments

If you'd like to leave a comment, please email [email protected]

2021-12-17 - kekronbekron

Although not directly a solution, check these out:

I’m also looking to work with structs that hold structs within it (hundreds of fields etc.), so whatever you find in this area will surely be useful to others too :)

2021-12-18 - nick

@kekronbekron, interesting I hadn’t considered that use case. Luckily, composition + serde has been sufficient for my use cases as I’m exposing the data over a language boundary so some sort of serialization is needed anyways. But I can see how the crates you linked to could be useful for a Rust library to allow a client to simplify the resulting structure after a series of composition steps. While I assume that a From impl would be more efficient, having a serialization step could result in fewer lines of code to maintain.