Chaining Fallible Operations with Combinators

2018/06/26

Categories: Programming Rust

Rust’s Iterator trait is one of its most useful features. It allows lazy processing of item-by-item streams of anything from the bytes of a file to threads to complex and exotic data structures.

Most of the useful functionality, though, is provided by combinators, functions that allow us to combine iterators and process them in useful ways. These include map, fold, filter, and many other useful functions (including those from the excellent itertools crate).

Concision and Clarity

Consider that we have the following functions which we’d like to apply to some list of numbers:


fn operation_one(a: i32) -> i32 { a + 1 }
fn operation_two(a: i32, b: i32) -> i32 { a + b }
fn check(a: i32) -> bool { a > 3 }

That is, we want to apply operation_one to every item, apply operation_two to every pair of those results, and finally return only the results which satisfy the condition in check.

Procedurally, that looks a bit like this:


fn procedural(values: &[i32]) -> Vec {
    let mut temp = Vec::with_capacity(values.len());
    for value in values {
        temp.push(operation_one(*value));
    }
    
    let mut temp2 = Vec::with_capacity(values.len());
    for index in 1..temp.len() {
        temp2.push(operation_two(temp[index - 1], temp[index]));
    }
    
    let mut temp3 = Vec::with_capacity(values.len());
    for value in temp2 {
        if check(value) {
            temp3.push(value)
        }
    }
    
    return temp3;
}

A function with that signature must allocate at least once (to have a Vec to return), but this function has three allocations, three mutable bindings, and is 19 lines long! Surely we can do better with functional programming:


extern crate itertools; // 0.7.8
use itertools::Itertools;
fn functional(values: &[i32]) -> Vec {
    values
        .iter()
        .map(|x| operation_one(*x))
        .tuple_windows()
        .map(|values: (i32, i32)| operation_two(values.0, values.1))
        .filter(|x| check(*x))
        .collect()
}

Even including the imports and putting every chained call on a new line, this is 8 lines shorter and, in my opinion, far more readable. It also has only a single allocation (in the collect() call) and no mutable bindings.

So, all is rosy and fun with functional programming, right? Not quite. Enter… fallibility.

Fallible Maps

I work with microservices a great deal, and one of the big problems with microservices is the introduction of a great many system boundaries where all type information is lost.

We pass around huge amounts of JSON, and therefore every ingestion routine has to consider the possibility that it was called with utterly nonsensical input.

Often, the desired behavior is to simply ignore the problematic input. This is easy with flat_map:


// Given that parse() return a Result<Item, whatever>
let parsed: Vec<Item> = inputs.iter()
    .flat_map(|input| parse(input).into_iter())
    .collect();

Admittedly, this does result in throwing away the errors, which may not be what we want. We could filter with a match expression, or do other iterator magic, if we simply wanted to log the errors, but what if we want to abort on error?

Fortunately, Result<_, _> implements a trait called FromIterator, which allows the collect() iterator method to collect into a result, inverting the containment relationship:


let items_or_errors: Vec<Result<Item, Error>>;
let single_error_or_items: Result<Vec<Item>, Error> =
    items_or_errors.into_iter().collect();

This is really cool! In cooperation with the ? operator, this can be used to easily, efficiently and concisely abort a function as soon as an error value is encountered - as long as you only have to do it once.

Chaining Fallible Operations

In a real-world scenario, there are often a number of fallible operations that have to be performed. For example, I had a problem in which I had to:

  1. Ingest untrusted input into a list of unnormalized input values
  2. Normalize the input values, possibly rejecting them
  3. Perform a computation on the values, possibly rejecting them
  4. Aggregate the results into a single resulting value

struct DataUnnormalized { data: i32 }
struct Data { data: i32 }

fn ingest(input: String) -> Result<Vec<DataUnnormalized>, String> {
    // In reality, this can also fail
    Ok( vec![DataUnnormalized { data: 1 }] )
}

impl DataUnnormalized {
    fn normalize(self) -> Result<Data, String> {
        // In reality, this can fail
        Ok(Data {data: self.data})
    }
}

impl Data {
    fn perform_operation(&self) -> Result<i32, String> {
        // In reality, this can fail too
        Ok(self.data)
    }
}

fn aggregate(agg: i32, curr: i32) -> i32 {
    agg + curr
}

Unfortunately, composing these functions wasn’t as simple as I expected. I eventually came up with the following solution:


fn compute(input: String) -> Result<i32, String> {
    Ok(ingest(input)? // Ingest and abort if error. Simple enough
        .into_iter()
        .map(|i| i.normalize()) // Maps into Vec<Result<Data, String>>
        .collect::<Result<Vec<Data>, String>>()? // Flatten and abort if error
        .into_iter()
        .map(|i| i.perform_operation()) // Maps into Vec<Result<i32, String>>
        .collect::<Result<Vec<i32>, String>>()? // Flatten and abort if error
        .into_iter()
        .fold(0, |agg, curr| aggregate(agg, curr?))) // Aggregate
} 

This is… not good. It has two allocations which are effectively unnecessary, and are only there to convert between types. It has lots of syntactic noise, and is difficult to read.

After asking around, I found out that I was thinking about this wrong. Rather than mapping from success to success/failure, I should have started out by embracing the fallibility.

Making each map operation take a Result type makes this much prettier, and with the newly stablized try_fold, it can be done in only a few lines. This is because the ? operator is allowed to operate correctly in these contexts.


fn compute(input: String) -> Result {
    ingest(input)? // Ingest and abort if error. Simple enough
        .into_iter()
        .map(|i| i.normalize()) // Maps into Result
        .map(|i| i?.perform_operation()) // Maps into Result
        .try_fold(0, |agg, curr| Ok(aggregate(agg, curr?))) // Maximize
}

Lessons

Thanks for reading, and a huge thanks to CUViper for helping me come up with this solution to my problem. The takeaways from this process for me have been threefold:

  1. Functional combinators in Rust are very powerful. They fit well in any problem that involves processing many homogeneous data points.
  2. Design around fallibility. Pretending that fallible operations can’t fail is a recipe for convoluted, type-weird code.

If you enjoyed this tutorial, you might like to read more about Rust types, or perhaps learn how session types can make your code better. Also, please consider supporting me on Patreon.