Optimizing Serde data model

If you know your data, you can exploit your data.

Here are some tips that have worked out in the past.

If your payloads contains a list with a known amount of numbers, consider inlining.

For instance, a year’s worth of monthly income should have 12 entries.

Let’s take an example

#[derive(Debug, Clone, Deserialize, Default)]
pub struct CountryLedger {
    #[serde(default)]
    pub income: Vec<f32>,
    #[serde(default)]
    pub expense: Vec<f32>,

    // ... additional numerical list fields
}

#[derive(Debug, Clone, Deserialize, Default)]
pub struct CountryLedger {
    #[serde(default)]
    pub income: [f32; 19],
}

/// Deserializes a sequence of elements into a fixed size array. If the input is
/// not long enough, the default value is used. Extraneous elements are ignored.
/// This is useful for deserializing sequences that are expected to be of a
/// fixed size, but being tolerant is more important than meeting expectations.
fn collect_into_default<'de, A, T, const N: usize>(
    mut seq: A,
) -> Result<[T; N], <A as SeqAccess<'de>>::Error>
where
    A: SeqAccess<'de>,
    T: Default + Copy + Deserialize<'de>,
{
    let mut result = [T::default(); N];
    for i in 0..N {
        let Some(x) = seq.next_element::<T>()? else {
            return Ok(result);
        };
        result[i] = x;
    }

    // If the sequence is not finished, we need to consume the rest of the elements
    // so that we drive a potential parser that underlies the deserializer
    while let Some(_x) = seq.next_element::<de::IgnoredAny>()? {}

    Ok(result)
}

For reference, writing the deserialization function is easy enough:

fn deserialize_list_overflow_byte<'de, D, const N: usize>(
    deserializer: D,
) -> Result<[u8; N], D::Error>
where
    D: Deserializer<'de>,
{
    struct ListVisitor<const N: usize>;

    impl<'de, const N: usize> de::Visitor<'de> for ListVisitor<N> {
        type Value = [u8; N];

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            formatter.write_str("a seq of bytes allowed to overflow")
        }

        fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error>
        where
            A: SeqAccess<'de>,
        {
            collect_into_default(seq)
        }
    }

    deserializer.deserialize_seq(ListVisitor)
}

#[derive(Debug)]
struct CountingAllocator {
    total_allocations: AtomicUsize,
    total_bytes_allocated: AtomicUsize,
    bytes_currently_allocated: AtomicUsize,
}

impl CountingAllocator {
    const fn new() -> Self {
        Self {
            total_allocations: AtomicUsize::new(0),
            total_bytes_allocated: AtomicUsize::new(0),
            bytes_currently_allocated: AtomicUsize::new(0),
        }
    }

    fn snapshot(&self) -> AllocationSnapshot {
        AllocationSnapshot {
            total_allocations: self.total_allocations.load(Ordering::Relaxed),
            total_bytes_allocated: self.total_bytes_allocated.load(Ordering::Relaxed),
            bytes_currently_allocated: self.bytes_currently_allocated.load(Ordering::Relaxed),
        }
    }
}

unsafe impl GlobalAlloc for CountingAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        self.total_allocations.fetch_add(1, Ordering::Relaxed);
        self.total_bytes_allocated.fetch_add(layout.size(), Ordering::Relaxed);
        self.bytes_currently_allocated.fetch_add(layout.size(), Ordering::Relaxed);
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        self.bytes_currently_allocated.fetch_sub(layout.size(), Ordering::Relaxed);
        System.dealloc(ptr, layout)
    }
}

#[derive(Debug)]
struct AllocationSnapshot {
    total_allocations: usize,
    total_bytes_allocated: usize,
    bytes_currently_allocated: usize,
}

#[global_allocator]
static ALLOC: CountingAllocator = CountingAllocator::new();

Using vec:

deserialize_ledger/ledger
                        time:   [10.192 µs 10.245 µs 10.301 µs]
                        thrpt:  [162.85 MiB/s 163.75 MiB/s 164.58 MiB/s]

deserialize/parser      time:   [1.2279 µs 1.2349 µs 1.2422 µs]
                        thrpt:  [1.3188 GiB/s 1.3266 GiB/s 1.3341 GiB/s]

byte array

deserialize_ledger/ledger
                        time:   [9.1679 µs 9.2171 µs 9.2744 µs]
                        thrpt:  [180.88 MiB/s 182.00 MiB/s 182.98 MiB/s]

small vec:

deserialize_ledger/ledger
                        time:   [8.9395 µs 8.9745 µs 9.0123 µs]
                        thrpt:  [186.14 MiB/s 186.92 MiB/s 187.65 MiB/s]

Optimizing Serde data model

Comments