Samuel Tardieu @ rfc1149.net

Rust: pattern matching, references, lifetime extensions, and captures

,

The problem for day 12 of the Advent Of Code 2022 uses a map such as this one:

Sabqponm
abcryxxl
accszExk
acctuvwj
abdefghi

In the first part, you must go from S to E with the shortest number of steps. You can only go from a letter (e.g., t) to the next in the alphabet (u), or to a letter that comes sooner in the alphabet (e.g., a, d or m). The starting point S counts as a, and the ending point E counts as z. The second part is identical, except that you may start either from S or from any a place, whichever gives the shortest path to E.

Of course this was an ideal problem for my pathfinding library, as it features both a convenient Matrix type to hold the map, and pathfinding algorithms such as breadth-first search (BFS). I ended up writing the following code:

use pathfinding::prelude::{bfs, Matrix};

fn parse(input: &str) -> (Matrix<u8>, (usize, usize), (usize, usize)) {
    let mut m = Matrix::from_rows(input.lines().map(str::bytes)).unwrap();
    let s = m.indices().find(|p| m[*p] == b'S').unwrap();
    let e = m.indices().find(|p| m[*p] == b'E').unwrap();
    m[s] = b'a';
    m[e] = b'z';
    (m, s, e)
}

fn part1(input: &str) -> usize {
    let (m, s, e) = &parse(input);
    bfs(
        s,
        |&p| m.neighbours(p, false).filter(move |&q| m[q] <= m[p] + 1),
        |&p| p == *e,
    )
    .unwrap()
    .len()
        - 1
}

fn part2(input: &str) -> usize {
    let (m, _, e) = &parse(input);
    bfs(
        e,
        |&p| m.neighbours(p, false).filter(move |&q| m[p] <= m[q] + 1),
        |&p| m[p] == b'a',
    )
    .unwrap()
    .len()
        - 1
}

The interesting line here is

    let (m, s, e) = &parse(input);

which can trigger the following questions to the less-experienced Rust user:

  • How can you assign a reference &parse(input) to a tuple (m, s, e)?
  • Why is the temporary variable returned by parse(input) not destroyed at the end of the statement? How come the reference stays alive?
  • Why did you do this, and was there a better solution?

Non-reference pattern matching

In Rust, assignment with let uses irrefutable pattern-matching. As long as the types match, Rust will accept the assignment to the free variables present in the pattern.

For example, the following code will match, as a couple always matches a couple, and it will assign 10 to a and 20 to b:

    let (a, b) = (10, 20);

This following code does not compile though, because Some(x) doesn’t cover all cases of the type Option<i32> of the expression (even though the compiler could know that only the Some variant is used):

   let Some(x) = Some(42);
|     let Some(x) = Some(42);
|         ^^^^^^^ pattern `None` not covered

When the expression being matched against is a reference (such as in &parse(input)), Rust separates the patterns in two categories:

  • Those that can directly match a reference: a free variable, a pattern starting with &, a constant, a wildcard _.
  • Those that can never directly match a reference: a struct name, an enum or union variant, a tuple, an array. Those are called non-reference patterns.

Our pattern (m, s, e) belongs to the non-reference pattern category. The Rust compiler will attempt to modify the pattern so that it can match by prepending & in front of it, and adding a ref before all free variables which are not preceded by ref already (for mutable references, ref mut is added instead). In our case, it means that our assignment

    let (m, s, e) = &parse(input);

will be rewritten as:

    let &(ref m, ref s, ref e) = &parse(input);

Recall what using ref in front of a variable does in a pattern: the pattern will match as usual, but the variable will be bound to a reference on the part it matches instead of to the part itself. Since parse(input) returns a (Matrix<u8>, (usize, usize), (usize, usize)), the variables will be of the following types:

  • m will have type &Matrix<u8>
  • s and e will have type &(usize, usize)

Temporary lifetime extension

The rules for dropping objects indicate that a temporary value created in a statement will be destroyed at the end of the statement. For example, the following code will not compile:

use std::ops::Deref;

fn main() {
    let s: &str = String::from("foobar").deref();
    println!("s = {s}");
}
|     let s: &str = String::from("foobar").deref();
|                   ^^^^^^^^^^^^^^^^^^^^^^ temporary value is freed at the end of this statement
|                   |
|                   creates a temporary value which is freed while still in use
|     println!("s = {s}");
|                    - borrow later used here

because the temporary value returned by String::from("foobar") is destroyed at the end of the statement. .deref() returns a &str reference whose lifetime cannot be larger than the temporary string itself and cannot be stored in s to be used later.

However, the following code will compile and run just fine:

fn main() {
    let s: &str = &String::from("foobar");
    println!("s = {s}");
}

although to convert from String to &str the same function .deref() as above is called implicitly by the compiler. Why is that?

The temporary object created by String::from("foobar") is subject to a temporary lifetime extension. In a let statement, the lifetime of a temporary object whose reference is taken directly (with & or through a ref binding) is extended to match the lifetime of the variable in which the reference is stored. Tuples components (as well as struct, …) may benefit from an extended lifetime as well. This is not the case if the reference is created only to call a method: in this case, the temporary object and the reference lifetimes will not be extended.

In our case,

    let (m, s, e) = &parse(input);

the lifetime of the references created to store in m, s, and e and of the references temporary values are extended to be respectively the same as those of the m, s, and e variables.

Why did you write the code this way?

Originally, I intended on writing this:

    let (m, s, e) = parse(input);

which uses no references at all. However, the move in the following expression is necessary in order to capture the p local variable in the filter() anonymous function:

    bfs(
        s,
        |&p| m.neighbours(p, false).filter(move |&q| m[q] <= m[p] + 1),
        |&p| p == *e,
    )

Rust does not offer a way of selectively capturing variables. By writing move in front of the anonymous function, it forces the compiler to capture (or copy if the variable implements the Copy trait) all variables appearing in the function. As Matrix<_> does not implement Copy (copying a large matrix can be expensive), the whole matrix m would be captured in addition to p. This is not good as we need to keep the matrix around, and we do not want to clone it at every step of the algorithm.

In order for the anonymous function not to capture the matrix itself, I wanted m to be a reference to the matrix: an immutable reference implements Copy, which means that the anonymous function will receive a copy of the reference to the matrix and will not capture the matrix itself, which is why without further thinking I added a & to the initialization expression, knowing what would happen.

Is there a clearer way to do this?

Could I have done differently? Of course, instead of using a non-reference pattern and a temporary lifetime extension I could have used the simpler statement (but this is not the first idea that popped in my head):

   let (ref m, s, e) = parse(input);

In this case, m would be having type &Matrix<u8> as expected, and s and e would be having type (usize, usize) (without a reference).

But if I had done this, I could not have written this blog post (the first real one in more than 5 years) then, which touches on topics I teach to my graduate students.

To learn more about those topics

blog comments powered by Disqus