Item 9: Consider using iterator transforms instead of explicit loops
The humble loop has had a long journey of increasing convenience and increasing abstraction. The
B language (the
precursor to C) had only while (condition) { ... }
, but with the arrival of C, the common scenario of iterating
through indexes of an array became more convenient with the addition of the for
loop:
// C code
int i;
for (i = 0; i < len; i++) {
Item item = collection[i];
// body
}
The early versions of C++ further improved convenience and scoping by allowing the loop variable declaration to be
embedded in the for
statement (this was also adopted by C in C99):
// C++98 code
for (int i = 0; i < len; i++) {
Item item = collection[i];
// ...
}
Most modern languages abstract the idea of the loop further: the core function of a loop is often to move to the next
item of some container. Tracking the logistics that are required to reach that item (index++
or ++it
) is mostly an
irrelevant detail. This realization produced two core concepts:
- Iterators: A type whose purpose is to repeatedly emit the next item of a container, until exhausted1
- For-each loops: A compact loop expression for iterating over all of the items in a container, binding a loop variable to the item rather than to the details of reaching that item
These concepts allow for loop code that's shorter and (more importantly) clearer about what's intended:
// C++11 code
for (Item& item : collection) {
// ...
}
Once these concepts were available, they were so obviously powerful that they were quickly retrofitted to those languages that didn't already have them (e.g., for-each loops were added to Java 1.5 and C++11).
Rust includes iterators and for-each–style loops, but it also includes the next step in abstraction: allowing the whole
loop to be expressed as an iterator transform (sometimes also referred to as an iterator adaptor). As
with Item 3's discussion of Option
and Result
, this Item will attempt to show how these iterator transforms can be
used instead of explicit loops, and will give guidance as to when it's a good idea. In particular, iterator transforms can
be more efficient than an explicit loop, because the compiler can skip the bounds checks it might otherwise need to
perform.
By the end of this Item, a C-like explicit loop to sum the squares of the first five even items of a vector:
#![allow(unused)] fn main() { let values: Vec<u64> = vec![1, 1, 2, 3, 5 /* ... */]; let mut even_sum_squares = 0; let mut even_count = 0; for i in 0..values.len() { if values[i] % 2 != 0 { continue; } even_sum_squares += values[i] * values[i]; even_count += 1; if even_count == 5 { break; } } }
should start to feel more natural expressed as a functional-style expression:
let even_sum_squares: u64 = values
.iter()
.filter(|x| *x % 2 == 0)
.take(5)
.map(|x| x * x)
.sum();
Iterator transformation expressions like this can roughly be broken down into three parts:
- An initial source iterator, from an instance of a type that implements one of Rust's iterator traits
- A sequence of iterator transforms
- A final consumer method to combine the results of the iteration into a final value
The first two of these parts effectively move functionality out of the loop body and into the for
expression; the last
removes the need for the for
statement altogether.
Iterator Traits
The core Iterator
trait has a very simple interface:
a single method next
that yields Some
items until it doesn't (None
). The type of the emitted items is given by the trait's associated Item
type.
Collections that allow iteration over their contents—what would be called iterables in other
languages—implement the IntoIterator
trait;
the into_iter
method of this trait
consumes Self
and emits an Iterator
in its stead. The compiler will automatically use this trait for
expressions of the form:
for item in collection {
// body
}
effectively converting them to code roughly like:
let mut iter = collection.into_iter();
loop {
let item: Thing = match iter.next() {
Some(item) => item,
None => break,
};
// body
}
or more succinctly and more idiomatically:
let mut iter = collection.into_iter();
while let Some(item) = iter.next() {
// body
}
To keep things running smoothly, there's also an implementation of IntoIterator
for any Iterator
, which just
returns self
; after all, it's easy to convert an Iterator
into an Iterator
!
This initial form is a consuming iterator, using up the collection as it's created:
let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
for item in collection {
println!("Consumed item {item:?}");
}
Any attempt to use the collection after it's been iterated over fails:
println!("Collection = {collection:?}");
error[E0382]: borrow of moved value: `collection`
--> src/main.rs:171:28
|
163 | let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)];
| ---------- move occurs because `collection` has type `Vec<Thing>`,
| which does not implement the `Copy` trait
164 | for item in collection {
| ---------- `collection` moved due to this implicit call to
| `.into_iter()`
...
171 | println!("Collection = {collection:?}");
| ^^^^^^^^^^^^^^ value borrowed here after move
|
note: `into_iter` takes ownership of the receiver `self`, which moves
`collection`
While simple to understand, this all-consuming behavior is often undesired; some kind of borrow of the iterated items is needed.
To ensure that behavior is clear, the examples here use a Thing
type that does not implement Copy
(Item 10), as
that would hide questions of ownership (Item 15)—the compiler would silently make copies everywhere:
#![allow(unused)] fn main() { // Deliberately not `Copy` #[derive(Clone, Debug, Eq, PartialEq)] struct Thing(u64); let collection = vec![Thing(0), Thing(1), Thing(2), Thing(3)]; }
If the collection being iterated over is prefixed with &
:
for item in &collection {
println!("{}", item.0);
}
println!("collection still around {collection:?}");
then the Rust compiler will look for an implementation of
IntoIterator
for the type &Collection
. Properly
designed collection types will provide such an implementation; this implementation will still consume Self
, but now
Self
is &Collection
rather than Collection
, and the associated Item
type will be a reference &Thing
.
This leaves the collection intact after iteration, and the equivalent expanded code is as follows:
let mut iter = (&collection).into_iter();
while let Some(item) = iter.next() {
println!("{}", item.0);
}
If it makes sense to provide iteration over mutable references,2 then a similar pattern applies for for item in &mut collection
: the compiler
looks for and uses an implementation of IntoIterator
for &mut Collection
, with each Item
being of type &mut Thing
.
By convention, standard containers also provide an iter()
method that returns an iterator over references to the
underlying item, and an equivalent iter_mut()
method, if appropriate, with the same behavior as just described. These
methods can be used in for
loops but have a more obvious benefit when used as the start of an iterator
transformation:
let result: u64 = (&collection).into_iter().map(|thing| thing.0).sum();
becomes:
let result: u64 = collection.iter().map(|thing| thing.0).sum();
Iterator Transforms
The Iterator
trait has a single required method
(next
) but also provides default
implementations (Item 13) of a large number of other methods that perform transformations on an iterator.
Some of these transformations affect the overall iteration process:
take(n)
: Restricts an iterator to emitting at mostn
items.skip(n)
: Skips over the firstn
elements of the iterator.step_by(n)
: Converts an iterator so it emits only every nth item.chain(other)
: Glues together two iterators, to build a combined iterator that moves through one then the other.cycle()
: Converts an iterator that terminates into one that repeats forever, starting at the beginning again whenever it reaches the end. (The iterator must supportClone
to allow this.)rev()
: Reverses the direction of an iterator. (The iterator must implement theDoubleEndedIterator
trait, which has an additionalnext_back
required method.)
Other transformations affect the nature of the Item
that's the subject of the Iterator
:
map(|item| {...})
: Repeatedly applies a closure to transform each item in turn. This is the most general version; several of the following entries in this list are convenience variants that could be equivalently implemented as amap
.cloned()
: Produces a clone of all of the items in the original iterator; this is particularly useful with iterators over&Item
references. (This obviously requires the underlyingItem
type to implementClone
.)copied()
: Produces a copy of all of the items in the original iterator; this is particularly useful with iterators over&Item
references. (This obviously requires the underlyingItem
type to implementCopy
, but it is likely to be faster thancloned()
, if that's the case.)enumerate()
: Converts an iterator over items to be an iterator over(usize, Item)
pairs, providing an index to the items in the iterator.zip(it)
: Joins an iterator with a second iterator, to produce a combined iterator that emits pairs of items, one from each of the original iterators, until the shorter of the two iterators is finished.
Yet other transformations perform filtering on the Item
s being emitted by the Iterator
:
filter(|item| {...})
: Applies abool
-returning closure to each item reference to determine whether it should be passed through.take_while()
: Emits an initial subrange of the iterator, based on a predicate. Mirror image ofskip_while
.skip_while()
: Emits a final subrange of the iterator, based on a predicate. Mirror image oftake_while
.
The flatten()
method deals
with an iterator whose items are themselves iterators, flattening the result. On its own, this doesn't seem that
helpful, but it becomes much more useful when combined with the observation that both
Option
and
Result
act as iterators: they produce
either zero (for None
, Err(e)
) or one (for Some(v)
, Ok(v)
) items. This means that flatten
ing a stream of
Option
/Result
values is a simple way to extract just the valid values, ignoring the rest.
Taken as a whole, these methods allow iterators to be transformed so that they produce exactly the sequence of elements that are needed for most situations.
Iterator Consumers
The previous two sections described how to obtain an iterator and how to transform it into exactly the right shape for precise iteration. This precisely targeted iteration could happen as an explicit for-each loop:
let mut even_sum_squares = 0;
for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
even_sum_squares += value * value;
}
However, the large collection of Iterator
methods includes
many that allow an iteration to be consumed in a single method call, removing the need for an explicit for
loop.
The most general of these methods is for_each(|item| {...})
, which runs a closure for each item
produced by the Iterator
. This can do most of the things that an explicit for
loop can do (the exceptions are
described in a later section), but its generality also makes it a little awkward to use—the closure needs to use mutable
references to external state in order to emit anything:
let mut even_sum_squares = 0;
values
.iter()
.filter(|x| *x % 2 == 0)
.take(5)
.for_each(|value| {
// closure needs a mutable reference to state elsewhere
even_sum_squares += value * value;
});
However, if the body of the for
loop matches one of a number of common patterns, there are more specific
iterator-consuming methods that are clearer, shorter, and more idiomatic.
These patterns include shortcuts for building a single value out of the collection:
sum()
: Sums a collection of numeric values (integers or floats).product()
: Multiplies a collection of numeric values.min()
: Finds the minimum value of a collection, relative to theItem
'sOrd
implementation (see Item 10).max()
: Finds the maximum value of a collection, relative to theItem
'sOrd
implementation (see Item 10).min_by(f)
: Finds the minimum value of a collection, relative to a user-specified comparison functionf
.max_by(f)
: Finds the maximum value of a collection, relative to a user-specified comparison functionf
.reduce(f)
: Builds an accumulated value of theItem
type by running a closure at each step that takes the value accumulated so far and the current item. This is a more general operation that encompasses the previous methods.fold(f)
: Builds an accumulated value of an arbitrary type (not just theIterator::Item
type) by running a closure at each step that takes the value accumulated so far and the current item. This is a generalization ofreduce
.scan(init, f)
: Builds an accumulated value of an arbitrary type by running a closure at each step that takes a mutable reference to some internal state and the current item. This is a slightly different generalization ofreduce
.
There are also methods for selecting a single value out of the collection:
find(p)
: Finds the first item that satisfies a predicate.position(p)
: Also finds the first item satisfying a predicate, but this time it returns the index of the item.nth(n)
: Returns then
th element of the iterator, if available.
There are methods for testing against every item in the collection:
any(p)
: Indicates whether a predicate istrue
for any item in the collection.all(p)
: Indicates whether a predicate istrue
for all items in the collection.
In either case, iteration will terminate early if the relevant counterexample is found.
There are methods that allow for the possibility of failure in the closures used with each item. In each case, if a closure returns a failure for an item, the iteration is terminated and the operation as a whole returns the first failure:
try_for_each(f)
: Behaves likefor_each
, but the closure can failtry_fold(f)
: Behaves likefold
, but the closure can failtry_find(f)
: Behaves likefind
, but the closure can fail
Finally, there are methods that accumulate all of the iterated items into a new collection. The most important of these
is collect()
, which can be
used to build a new instance of any collection type that implements the
FromIterator
trait.
The FromIterator
trait is implemented for all of the standard library collection types
(Vec
,
HashMap
,
BTreeSet
, etc.), but
this ubiquity also means that you often have to use explicit types, because otherwise the
compiler can't figure out whether you're trying to assemble (say) a Vec<i32>
or HashSet<i32>
:
#![allow(unused)] fn main() { use std::collections::HashSet; // Build collections of even numbers. Type must be specified, because // the expression is the same for either type. let myvec: Vec<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect(); let h: HashSet<i32> = (0..10).into_iter().filter(|x| x % 2 == 0).collect(); }
This example also illustrates the use of range expressions to generate the initial data to be iterated over.
Other (more obscure) collection-producing methods include the following:
unzip()
: Divides an iterator of pairs into two collectionspartition(p)
: Splits an iterator into two collections based on a predicate that is applied to each item
This Item has touched on a wide selection of Iterator
methods, but this is only a subset of the methods available;
for more information, consult the iterator documentation or
read Chapter 15 of Programming Rust, 2nd edition (O'Reilly), which has extensive coverage of the possibilities.
This rich collection of iterator transformations is there to be used. It produces code that is more idiomatic, more compact, and has clearer intent.
Expressing loops as iterator transformations can also produce code that is more efficient. In the interests of safety,
Rust performs bounds checking on access to contiguous containers such as vectors and slices; an attempt to
access a value beyond the bounds of the collection triggers a panic rather than an access to invalid data. An old-style
loop that accesses container values (e.g., values[i]
) might be subject to these runtime checks, whereas an iterator
that produces one value after another is already known to be within range.
However, it's also the case that an old-style loop might not be subject to additional bounds checks compared to the equivalent iterator transformation. The Rust compiler and optimizer is very good at analyzing the code surrounding a slice access to determine whether it's safe to skip the bounds checks; Sergey "Shnatsel" Davidoff's 2023 article explores the subtleties involved.
Building Collections from Result
Values
The previous section described the use of collect()
to build collections from iterators, but collect()
also has a
particularly helpful feature when dealing with Result
values.
Consider an attempt to convert a vector of i64
values into bytes (u8
), with the optimistic expectation that they
will all fit:
#![allow(unused)] fn main() { // In the 2021 edition of Rust, `TryFrom` is in the prelude, so this // `use` statement is no longer needed. use std::convert::TryFrom; let inputs: Vec<i64> = vec![0, 1, 2, 3, 4]; let result: Vec<u8> = inputs .into_iter() .map(|v| <u8>::try_from(v).unwrap()) .collect(); }
This works until some unexpected input comes along:
#![allow(unused)] fn main() { let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512]; }
and causes a runtime failure:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
TryFromIntError(())', iterators/src/main.rs:266:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Following the advice given in Item 3, we want to keep the Result
type in play and use the ?
operator to make any
failure the problem of the calling code. The obvious modification to emit the Result
doesn't really help:
let result: Vec<Result<u8, _>> =
inputs.into_iter().map(|v| <u8>::try_from(v)).collect();
// Now what? Still need to iterate to extract results and detect errors.
However, there's an alternative version of collect()
, which can assemble a Result
holding a Vec
, instead of a
Vec
holding Result
s.
Forcing use of this version requires the turbofish (::<Result<Vec<_>, _>>
):
let result: Vec<u8> = inputs
.into_iter()
.map(|v| <u8>::try_from(v))
.collect::<Result<Vec<_>, _>>()?;
Combining this with the question mark operator gives useful behavior:
- If the iteration encounters an error value, that error value is emitted to the caller and iteration stops.
- If no errors are encountered, the remainder of the code can deal with a sensible collection of values of the right type.
Loop Transformation
The aim of this Item is to convince you that many explicit loops can be regarded as something to be converted to iterator transformations. This can feel somewhat unnatural for programmers who aren't used to it, so let's walk through a transformation step by step.
Starting with a very C-like explicit loop to sum the squares of the first five even items of a vector:
let mut even_sum_squares = 0;
let mut even_count = 0;
for i in 0..values.len() {
if values[i] % 2 != 0 {
continue;
}
even_sum_squares += values[i] * values[i];
even_count += 1;
if even_count == 5 {
break;
}
}
The first step is to replace vector indexing with direct use of an iterator in a for-each loop:
let mut even_sum_squares = 0;
let mut even_count = 0;
for value in values.iter() {
if value % 2 != 0 {
continue;
}
even_sum_squares += value * value;
even_count += 1;
if even_count == 5 {
break;
}
}
An initial arm of the loop that uses continue
to skip over some items is naturally expressed as a filter()
:
let mut even_sum_squares = 0;
let mut even_count = 0;
for value in values.iter().filter(|x| *x % 2 == 0) {
even_sum_squares += value * value;
even_count += 1;
if even_count == 5 {
break;
}
}
Next, the early exit from the loop once five even items have been spotted maps to a take(5)
:
let mut even_sum_squares = 0;
for value in values.iter().filter(|x| *x % 2 == 0).take(5) {
even_sum_squares += value * value;
}
Every iteration of the loop uses only the item squared, in the value * value
combination, which makes it an ideal target
for a map()
:
let mut even_sum_squares = 0;
for val_sqr in values.iter().filter(|x| *x % 2 == 0).take(5).map(|x| x * x)
{
even_sum_squares += val_sqr;
}
These refactorings of the original loop result in a loop body that's the perfect nail to fit under the hammer of
the sum()
method:
let even_sum_squares: u64 = values
.iter()
.filter(|x| *x % 2 == 0)
.take(5)
.map(|x| x * x)
.sum();
When Explicit Is Better
This Item has highlighted the advantages of iterator transformations, particularly with respect to concision and clarity. So when are iterator transformations not appropriate or idiomatic?
- If the loop body is large and/or multifunctional, it makes sense to keep it as an explicit body rather than squeezing it into a closure.
- If the loop body involves error conditions that result in early termination of the surrounding function, these are
often best kept explicit—the
try_..()
methods help only a little. However,collect()
's ability to convert a collection ofResult
values into aResult
holding a collection of values often allows error conditions to still be handled with the?
operator. - If performance is vital, an iterator transform that involves a closure should get optimized so that it is just
as fast as the equivalent explicit code. But if performance
of a core loop is that important, measure different variants and tune appropriately:
- Be careful to ensure that your measurements reflect real-world performance—the compiler's optimizer can give overoptimistic results on test data (as described in Item 30).
- The Godbolt compiler explorer is an amazing tool for exploring what the compiler spits out.
Most importantly, don't convert a loop into an iteration transformation if the conversion is forced or awkward. This is a matter of taste to be sure—but be aware that your taste is likely to change as you become more familiar with the functional style.
In fact, the iterator can be more general—the idea of emitting next items until completion need not be associated with a container.
This method can't be provided if a mutation to
the item might invalidate the container's internal guarantees. For example, changing the item's contents in a way that
alters its Hash
value would invalidate the internal
data structures of a HashMap
.