1 Introducing Rust
- Introducing Rust’s features and goals
- Exposing Rust’s syntax
- Discussing where to use Rust and when to avoid it
- Building your first Rust program
- Explaining how Rust compares to object-oriented and wider languages
Welcome to Rust—the empowering programming language. Once you scratch its surface, you will not only find a programming language with unparalleled speed and safety, but one that is enjoyable enough to use every day.
When you begin to program in Rust, it’s likely that you will want to continue to do so. And this book, Rust in Action, will build your confidence as a Rust programmer. But it will not teach you how to program from the beginning. This book is intended to be read by people who are considering Rust as their next language and for those who enjoy implementing practical working examples. Here is a list of some of the larger examples this book includes:
- Mandelbrot set renderer
- A grep clone
- CPU emulator
- Generative art
- A database
- HTTP, NTP, and hexdump clients
- LOGO language interpreter
- Operating system kernel
As you may gather from scanning through that list, reading this book will teach you more than just Rust. It also introduces you to systems programming and low-level programming. As you work through Rust in Action, you’ll learn about the role of an operating system (OS), how a CPU works, how computers keep time, what pointers are, and what a data type is. You will gain an understanding of how the computer’s internal systems interoperate. Learning more than syntax, you will also see why Rust was created and the challenges that it addresses.
1.1 Where is Rust used?
Rust has won the “most loved programming language” award in Stack Overflow’s annual developer survey every year in 2016-2020. Perhaps that’s why large technology leaders such as the following have adopted Rust:
- Amazon Web Services (AWS) has used Rust since 2017 for its serverless computing offerings, AWS Lambda and AWS Fargate. With that, Rust has gained further inroads. The company has written the Bottlerocket OS and the AWS Nitro System to deliver its Elastic Compute Cloud (EC2) service.1
- Cloudflare develops many of its services, including its public DNS, serverless computing, and packet inspection offerings with Rust.2
- Dropbox rebuilt its backend warehouse, which manages exabytes of storage, with Rust.3
- Google develops parts of Android, such as its Bluetooth module, with Rust. Rust is also used for the
crosvm
component of Chrome OS and plays an important role in Google’s new operating system, Fuchsia.4 - Facebook uses Rust to power Facebook’s web, mobile, and API services, as well as parts of HHVM, the HipHop virtual machine used by the Hack programming language.5
- Microsoft writes components of its Azure platform including a security daemon for its Internet of Things (IoT) service in Rust.6
- Mozilla uses Rust to enhance the Firefox web browser, which contains 15 million lines of code. Mozilla’s first two Rust-in-Firefox projects, its MP4 metadata parser and text encoder/decoder, led to overall performance and stability improvements.
- GitHub’s npm, Inc., uses Rust to deliver “upwards of 1.3 billion package downloads per day.”7
- Oracle developed a container runtime with Rust to overcome problems with the Go reference implementation.8
- Samsung, via its subsidiary SmartThings, uses Rust in its Hub, which is the firmware backend for its Internet of Things (IoT) service.
Rust is also productive enough for fast-moving startups to deploy it. Here are a few examples:
- Sourcegraph uses Rust to serve syntax highlighting across all of its languages.9
- Figma employs Rust in the performance-critical components of its multi-player server.10
- Parity develops its client to the Ethereum blockchain with Rust.11
1.2 Advocating for Rust at work
What is it like to advocate for Rust at work? After overcoming the initial hurdle, it tends to go well. A 2017 discussion, reprinted below, provides a nice anecdote. One member of Google’s Chrome OS team discusses what it was like to introduce the language to the project:12
indy on Sept 27, 2017 Is Rust an officially sanctioned language at Google? zaxcellent on Sept 27, 2017 Author here: Rust is not officially sanctioned at Google, but there are pockets of folks using it here. The trick with using Rust in this component was convincing my coworkers that no other language was right for job, which I believe to be the case in this instance. That being said, there was a ton of work getting Rust to play nice within the Chrome OS build environment. The Rust folks have been super helpful in answering my questions though. ekidd on Sept 27, 2017 > The trick with using Rust in this component was convincing my > coworkers that no other language was right for job, which I believe > to be the case in this instance. I ran into a similar use case in one of my own projects—a vobsub subtitle decoder, which parses complicated binary data, and which I someday want to run as web service. So obviously, I want to ensure that there are no vulnerabilities in my code. I wrote the code in Rust, and then I used 'cargo fuzz' to try and find vulnerabilities. After running a billion(!) fuzz iterations, I found 5 bugs (see the 'vobsub' section of the trophy case for a list https:/ /github.com/rust-fuzz/trophy-case). Happily, not _one_ of those bugs could actually be escalated into an actual exploit. In each case, Rust's various runtime checks successfully caught the problem and turned it into a controlled panic. (In practice, this would restart the web server cleanly.) So my takeaway from this was that whenever I want a language (1) with no GC, but (2) which I can trust in a security-critical context, Rust is an excellent choice. The fact that I can statically link Linux binaries (like with Go) is a nice plus. Manishearth on Sept 27, 2017 > Happily, not one of those bugs could actually be escalated into > an actual exploit. In each case, Rust's various runtime checks > successfully caught the problem and turned it into a controlled > panic. This has been more or less our experience with fuzzing rust code in firefox too, fwiw. Fuzzing found a lot of panics (and debug assertions / "safe" overflow assertions). In one case it actually found a bug that had been under the radar in the analogous Gecko code for around a decade.
From this excerpt, we can see that language adoption has been “bottom up” by engineers looking to overcome technical challenges in relatively small projects. Experience gained from these successes is then used as evidence to justify undertaking more ambitious work.
In the time since late 2017, Rust has continued to mature and strengthen. It has become an accepted part of Google’s technology landscape, and is now an officially sanctioned language within the Android and Fuchsia operating systems.
1.3 A taste of the language
This section gives you a chance to experience Rust firsthand. It demonstrates how to use the compiler and then moves on to writing a quick program. We tackle full projects in later chapters.
NOTE To install Rust, use the official installers provided at https://rustup.rs/.
1.3.1 Cheating your way to “Hello, world!”
The first thing that most programmers do when they reach for a new programming language is to learn how to print “Hello, world!” to the console. You’ll do that too, but with flair. You’ll verify that everything is in working order before you encounter annoying syntax errors.
If you use Windows, open the Rust command prompt that is available in the Start menu after installing Rust. Then execute this command:
C:\> cd %TMP%
If you are running Linux or macOS, open a Terminal window. Once open, enter the following:
$ cd $TMP
From this point forward, the commands for all operating systems should be the same. If you installed Rust correctly, the following three commands will display “Hello, world!” on the screen (as well as a bunch of other output):
$ cargo new hello $ cd hello $ cargo run
Here is an example of what the entire session looks like when running cmd.exe on MS Windows:
C:\> cd %TMP% C:\Users\Tim\AppData\Local\Temp\> cargo new hello Created binary (application) `hello` project C:\Users\Tim\AppData\Local\Temp\> cd hello C:\Users\Tim\AppData\Local\Temp\hello\> cargo run Compiling hello v0.1.0 (file:/ / /C:/Users/Tim/AppData/Local/Temp/hello) Finished dev [unoptimized + debuginfo] target(s) in 0.32s Running `target\debug\hello.exe` Hello, world!
And on Linux or macOS, your console would look like this:
$ cd $TMP $ cargo new hello Created binary (application) `hello` package $ cd hello $ cargo run Compiling hello v0.1.0 (/tmp/hello) Finished dev [unoptimized + debuginfo] target(s) in 0.26s Running `target/debug/hello` Hello, world!
If you have made it this far, fantastic! You have run your first Rust code without needing to write any Rust. Let’s take a look at what just happened.
Rust’s cargo tool provides both a build system and a package manager. That means cargo knows how to convert your Rust code into executable binaries and also can manage the process of downloading and compiling the project’s dependencies.
cargo new
creates a project for you that follows a standard template. The tree
command can reveal the default project structure and the files that are created after issuing cargo new
:
$ tree hello hello ├── Cargo.toml └── src └── main.rs 1 directory, 2 files
All Rust projects created with cargo have the same structure. In the base directory, a file called Cargo.toml describes the project’s metadata, such as the project’s name, its version, and its dependencies. Source code appears in the src directory. Rust source code files use the .rs filename extension. To view the files that cargo new
creates, use the tree
command.
The next command that you executed was cargo run
. This line is much simpler to grasp, but cargo actually did much more work than you realized. You asked cargo to run the project. As there was nothing to actually run when you invoked the command, it decided to compile the code in debug mode on your behalf to provide maximal error information. As it happens, the src/main.rs file always includes a “Hello, world!” stub. The result of that compilation was a file called hello (or hello.exe). The hello file was executed, and the result printed to your screen.
Executing cargo run
has also added new files to the project. We now have a Cargo.lock file in the base of our project and a target/ directory. Both that file and the directory are managed by cargo. Because these are artifacts of the compilation process, we won’t need to touch these. Cargo.lock is a file that specifies the exact version numbers of all the dependencies so that future builds are reliably built the same way until Cargo.toml is modified.
Running tree
again reveals the new structure created by invoking cargo run
to compile the hello project:
$ tree --dirsfirst hello hello ├── src │ └── main.rs ├── target │ └── debug │ ├── build │ ├── deps │ ├── examples │ ├── native │ └── hello ├── Cargo.lock └── Cargo.toml
For getting things up and running, well done! Now that we’ve cheated our way to “Hello, World!”, let’s get there via the long way.
1.3.2 Your first Rust program
For our first program, we want to write something that outputs the following text in multiple languages:
Hello, world! Grüß Gott! ハロー・ワールド
You have probably seen the first line in your travels. The other two are there to highlight a few of Rust’s features: easy iteration and built-in support for Unicode. For this program, we’ll use cargo to create it as before. Here are the steps to follow:
- Open a console prompt.
- Run
cd %TMP%
on MS Windows; otherwisecd $TMP
. - Run
cargo new hello2
to create a new project. - Run
cd hello2
to move into the project’s root directory. - Open the file src/main.rs in a text editor.
- Replace the text in that file with the text in listing 1.1.
The code for the following listing is in the source code repository. Open ch1/ch1-hello2/src/hello2.rs.
Listing 1.1 “Hello World!” in three languages
1 fn greet_world() { 2 println!("Hello, world!"); ① 3 let southern_germany = "Grüß Gott!"; ② 4 let japan = "ハロー・ワールド"; ③ 5 let regions = [southern_germany, japan]; ④ 6 for region in regions.iter() { ⑤ 7 println!("{}", ®ion); ⑥ 8 } 9 } 10 11 fn main() { 12 greet_world(); ⑦ 13 }
① The exclamation mark indicates the use of a macro, which we’ll discuss shortly.
② Assignment in Rust, more properly called variable binding, uses the let keyword.
③ Unicode support is provided out of the box.
④ Array literals use square brackets.
⑤ Many types can have an iter() method to return an iterator.
⑥ The ampersand “borrows” region for read-only access.
⑦ Calls a function. Note that parentheses follow the function name.
Now that src/main.rs is updated, execute cargo run
from the hello2/ directory. You should see three greetings appear after some output generated from cargo itself:
$ cargo run Compiling hello2 v0.1.0 (/path/to/ch1/ch1-hello2) Finished dev [unoptimized + debuginfo] target(s) in 0.95s Running `target/debug/hello2` Hello, world! Grüß Gott! ハロー・ワールド
Let’s take a few moments to touch on some of the interesting elements of Rust from listing 1.1.
One of the first things that you are likely to notice is that strings in Rust are able to include a wide range of characters. Strings are guaranteed to be encoded as UTF-8. This means that you can use non-English languages with relative ease.
The one character that might look out of place is the exclamation mark after println
. If you have programmed in Ruby, you may be used to thinking that it is used to signal a destructive operation. In Rust, it signals the use of a macro. Macros can be thought of as fancy functions for now. These offer the ability to avoid boilerplate code. In the case of println!
, there is a lot of type detection going on under the hood so that arbitrary data types can be printed to the screen.
1.4 Downloading the book’s source code
In order to follow along with the examples in this book, you might want to access the source code for the listings. For your convenience, source code for every example is available from two sources:
1.5 What does Rust look and feel like?
Rust is the programming language that allows Haskell and Java programmers to get along. Rust comes close to the high-level, expressive feel of dynamic languages like Haskell and Java while achieving low-level, bare-metal performance.
We looked at a few “Hello, world!” examples in section 1.3, so let’s try something slightly more complex to get a better feel for Rust’s features. Listing 1.2 provides a quick look at what Rust can do for basic text processing. The source code for this listing is in the ch1/ch1-penguins/src/main.rs file. Some features to notice include
- Common control flow mechanisms—This includes
for
loops and thecontinue
keyword. - Method syntax—Although Rust is not object-oriented as it does not support inheritance, it carries over this feature of object-oriented languages.
- Higher-order programming—Functions can both accept and return functions. For example, line 19 (
.map(|field| field.trim()))
includes a closure, also known as an anonymous function or lambda function. - Type annotations—Although relatively rare, these are occasionally required as a hint to the compiler (for example, see line 27 beginning with
if let Ok(length)
). - Conditional compilation—In the listing, lines 21–24 (
if cfg!(...);
) are not included in release builds of the program. - Implicit return—Rust provides a
return
keyword, but it’s usually omitted. Rust is an expression-based language.
Listing 1.2 Example of Rust code showing some basic processing of CSV data
1 fn main() { ① 2 let penguin_data = "\ ② 3 common name,length (cm) 4 Little penguin,33 5 Yellow-eyed penguin,65 6 Fiordland penguin,60 7 Invalid,data 8 "; 9 10 let records = penguin_data.lines(); 11 12 for (i, record) in records.enumerate() { 13 if i == 0 || record.trim().len() == 0 { ③ 14 continue; 15 } 16 17 let fields: Vec<_> = record ④ 18 .split(',') ⑤ 19 .map(|field| field.trim()) ⑥ 20 .collect(); ⑦ 21 if cfg!(debug_assertions) { ⑧ 22 eprintln!("debug: {:?} -> {:?}", 23 record, fields); ⑨ 24 } 25 26 let name = fields[0]; 27 if let Ok(length) = fields[1].parse::<f32>() { ⑩ 28 println!("{}, {}cm", name, length); ⑪ 29 } 30 } 31 }
① Executable projects require a main() function.
② Escapes the trailing newline character
③ Skips header row and lines with only whitespace
⑥ Trims whitespace of each field
⑦ Builds a collection of fields
⑧ cfg! checks configuration at compile time.
⑨ eprintln! prints to standard error (stderr).
⑩ Attempts to parse field as a floating-point number
⑪ println! prints to standard out (stdout).
Listing 1.2 might be confusing to some readers, especially those who have never seen Rust before. Here are some brief notes before moving on:
- On line 17, the
fields
variable is annotated with the typeVec<_>
.Vec
is shorthand for_vector_
, a collection type that can expand dynamically. The underscore (_) instructs Rust to infer the type of the elements. - On lines 22 and 28, we instruct Rust to print information to the console. The
println!
macro prints its arguments to standard out (stdout), whereaseprintln!
prints to standard error (stderr).Macros are similar to functions except that instead of returning data, these return code. Macros are often used to simplify common patterns.eprintln!
andprintln!
both use a string literal with an embedded mini-language in their first argument to control their output. The{}
placeholder tells Rust to use a programmer-defined method to represent the value as a string rather than the default representation available with{:?}
. - Line 27 contains some novel features.
if let Ok(length) = fields[1].parse ::<f32>()
reads as “attempt to parsefields[1]
as a 32-bit floating-point number and, if that is successful, then assign the number to the length variable.”Theif let
construct is a concise method of conditionally processing data that also provides a local variable assigned to that data. Theparse()
method returnsOk(T)
(whereT
stands for any type) when it can successfully parse the string; otherwise, it returnsErr(E)
(whereE
stands for an error type). The effect ofif let Ok(T)
is to skip any error cases like the one that’s encountered while processing the lineInvalid,data
.When Rust is unable to infer the types from the surrounding context, it will ask for you to specify those. The call toparse()
includes an inline type annotation asparse::<f32>()
.
Converting source code into an executable file is called compiling. To compile Rust code, we need to install the Rust compiler and run it against the source code. To compile listing 1.2, follow these steps:
- Open a console prompt (such as cmd.exe, PowerShell, Terminal, or Alacritty).
- Move to the ch1/ch1-penguins directory (not ch1/ch1-penguins/src) of the source code you downloaded in section 1.4.
- Execute
cargo run
. Its output is shown in the following code snippet:$ cargo run Compiling ch1-penguins v0.1.0 (../code/ch1/ch1-penguins) Finished dev [unoptimized + debuginfo] target(s) in 0.40s Running `target/debug/ch1-penguins` dbg: ” Little penguin,33″ -> [“Little penguin”, “33”] Little penguin, 33cm dbg: ” Yellow-eyed penguin,65″ -> [“Yellow-eyed penguin”, “65”] Yellow-eyed penguin, 65cm dbg: ” Fiordland penguin,60″ -> [“Fiordland penguin”, “60”] Fiordland penguin, 60cm dbg: ” Invalid,data” -> [“Invalid”, “data”]
You probably noticed the distracting lines starting with dbg:
. We can eliminate these by compiling a release build using cargo’s --release
flag. This conditional compilation functionality is provided by the cfg!(debug_assertions) { ... }
block within lines 22–24 of listing 1.2. Release builds are much faster at runtime, but incur longer compilation times:
$ cargo run --release Compiling ch1-penguins v0.1.0 (.../code/ch1/ch1-penguins) Finished release [optimized] target(s) in 0.34s Running `target/release/ch1-penguins` Little penguin, 33cm Yellow-eyed penguin, 65cm Fiordland penguin, 60cm
It’s possible to further reduce the output by adding the -q
flag to cargo
commands. -q
is shorthand for quiet. The following snippet shows what that looks like:
$ cargo run -q --release Little penguin, 33cm Yellow-eyed penguin, 65cm Fiordland penguin, 60cm
Listing 1.1 and listing 1.2 were chosen to pack as many representative features of Rust into examples that are easy to understand. Hopefully these demonstrated that Rust programs have a high-level feel, paired with low-level performance. Let’s take a step back from specific language features now and consider some of the thinking behind the language and where it fits within the programming language ecosystem.
1.6 What is Rust?
Rust’s distinguishing feature as a programming language is its ability to prevent invalid data access at compile time. Research projects by Microsoft’s Security Response Center and the Chromium browser project both suggest that issues relating to invalid data access account for approximately 70% of serious security bugs.13 Rust eliminates that class of bugs. It guarantees that your program is memory-safe without imposing any runtime costs.
Other languages can provide this level of safety, but these require adding checks that execute while your program is running, thus slowing it down. Rust manages to break out of this continuum, creating its own space as illustrated by figure 1.1.
Figure 1.1 Rust provides both safety and control. Other languages have tended to trade one against the other.
Rust’s distinguishing feature as a professional community is its willingness to explicitly include values into its decision-making process. This ethos of inclusion is pervasive. Public messaging is welcoming. All interactions within the Rust community are governed by its code of conduct. Even the Rust compiler’s error messages are ridiculously helpful.
Until late 2018, visitors to the Rust home page were greeted with the (technically heavy) message, “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.” At that point, the community implemented a change to its wording to put its users (and its potential users) at the center (table 1.1).
Table 1.1 Rust slogans over time. As Rust has developed its confidence, it has increasingly embraced the idea of acting as a facilitator and supporter of everyone wanting to achieve their programming aspirations.
Rust is labelled as a systems programming language, which tends to be seen as quite a specialized, almost esoteric branch of programming. However, many Rust programmers have discovered that the language is applicable to many other domains. Safety, productivity, and control are useful in all software engineering projects. Moreover, the Rust community’s inclusiveness means that the language benefits from a steady stream of new voices with diverse interests.
Let’s flesh out those three goals: safety, productivity, and control. What are these and why do these matter?
1.6.1 Goal of Rust: Safety
- Dangling pointers—Live references to data that has become invalid over the course of the program (see listing 1.3)
- Data races—The inability to determine how a program will behave from run to run because external factors change (see listing 1.4)
- Buffer overflow—An attempt to access the 12th element of an array with only 6 elements (see listing 1.5)
- Iterator invalidation—An issue caused by something that is iterated over after being altered midway through (see listing 1.6)
When programs are compiled in debug mode, Rust also protects against integer overflow. What is integer overflow? Well, integers can only represent a finite set of numbers; these have a fixed-width in memory. Integer overflow is what happens when the integers hit their limit and flow over to the beginning again.
The following listing shows a dangling pointer. Note that you’ll find this source code in the ch1/ch1-cereals/src/main.rs file.
Listing 1.3 Attempting to create a dangling pointer
1 #[derive(Debug)] ① 2 enum Cereal { ② 3 Barley, Millet, Rice, 4 Rye, Spelt, Wheat, 5 } 6 7 fn main() { 8 let mut grains: Vec<Cereal> = vec![]; ③ 9 grains.push(Cereal::Rye); ④ 10 drop(grains); ⑤ 11 println!("{:?}", grains); ⑥ 12 }
① Allows the println! macro to print the Cereal enum
② An enum (enumeration) is a type with a fixed number of legal variants.
③ Initializes an empty vector of Cereal
④ Adds one item to the grains vector
⑤ Deletes grains and its contents
⑥ Attempts to access the deleted value
Listing 1.3 contains a pointer within grains
, which is created on line 8. Vec<Cereal>
is implemented with an internal pointer to an underlying array. But the listing does not compile. An attempt to do so triggers an error message that complains about attempting to “borrow” a “moved” value. Learning how to interpret that error message and to fix the underlying error are topics for the pages to come. Here’s the output from attempting to compile the code for listing 1.3:
$ cargo run Compiling ch1-cereals v0.1.0 (/rust-in-action/code/ch1/ch1-cereals) error[E0382]: borrow of moved value: `grains` --> src/main.rs:12:22 | 8 | let mut grains: Vec<Cereal> = vec![]; | ------- move occurs because `grains` has type `std::vec::Vec<Cereal>`, which does not implement the `Copy` trait 9 | grains.push(Cereal::Rye); 10 | drop(grains); | ------ value moved here 11 | 12 | println!("{:?}", grains); | ^^^^^^ value borrowed here after move error: aborting due to previous error For more information about this error, try `rustc --explain E0382`. error: could not compile `ch1-cereals`.
Listing 1.4 shows an example of a data race condition. If you remember, this condition results from the inability to determine how a program behaves from run to run due to changing external factors. You’ll find this code in the ch1/ch1-race/src/ main.rs file.
Listing 1.4 Example of Rust preventing a race condition
1 use std::thread; ① 2 fn main() { 3 let mut data = 100; 4 5 thread::spawn(|| { data = 500; }); ② 6 thread::spawn(|| { data = 1000; }); ② 7 println!("{}", data); 8 }
① Brings multi-threading into local scope
② thread::spawn() takes a closure as an argument.
If you are unfamiliar with the term thread, the upshot is that this code is not deterministic. It’s impossible to know what value data
will hold when main()
exits. On lines 6 and 7 of the listing, two threads are created by calls to thread::spawn()
. Each call takes a closure as an argument, denoted by vertical bars and curly braces (e.g., || {...}
). The thread spawned on line 5 is attempting to set the data variable to 500, whereas the thread spawned on line 6 is attempting to set it to 1,000. Because the scheduling of threads is determined by the OS rather than the program, it’s impossible to know if the thread defined first will be the one that runs first.
Attempting to compile listing 1.5 results in a stampede of error messages. Rust does not allow multiple places in an application to have write access to data. The code attempts to allow this in three places: once within the main thread running main()
and once in each child thread created by thread::spawn()
. Here’s the compiler message:
$ cargo run Compiling ch1-race v0.1.0 (rust-in-action/code/ch1/ch1-race) error[E0373]: closure may outlive the current function, but it borrows `data`, which is owned by the current function --> src/main.rs:6:19 | 6 | thread::spawn(|| { data = 500; }); | ^^ ---- `data` is borrowed here | | | may outlive borrowed value `data` | note: function requires argument type to outlive `'static` --> src/main.rs:6:5 | 6 | thread::spawn(|| { data = 500; }); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: to force the closure to take ownership of `data` (and any other referenced variables), use the `move` keyword | 6 | thread::spawn(move || { data = 500; }); | ^^^^^^^ ... ① error: aborting due to 4 previous errors Some errors have detailed explanations: E0373, E0499, E0502. For more information about an error, try `rustc --explain E0373`. error: could not compile `ch1-race`.
Listing 1.5 provides an example of a buffer overflow. A buffer overflow describes situations where an attempt is made to access items in memory that do not exist or that are illegal. In our case, an attempt to access fruit[4]
results in the program crashing, as the fruit
variable only contains three fruit. The source code for this listing is in the file ch1/ch1-fruit/src/main.rs.
Listing 1.5 Example of invoking a panic via a buffer overflow
1 fn main() { 2 let fruit = vec!['', '', '']; 3 4 let buffer_overflow = fruit[4]; ① 5 assert_eq!(buffer_overflow, '') ② 6 }
① Rust will cause a crash rather than assign an invalid memory location to a variable.
② assert_eq!() tests that arguments are equal.
When listing 1.5 is compiled and executed, you’ll encounter this error message:
$ cargo run Compiling ch1-fruit v0.1.0 (/rust-in-action/code/ch1/ch1-fruit) Finished dev [unoptimized + debuginfo] target(s) in 0.31s Running `target/debug/ch1-fruit` thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 4', src/main.rs:3:25 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The next listing shows an example of iterator invalidation, where an issue is caused by something that’s iterated over after being altered midway through. The source code for this listing is in ch1/ch1-letters/src/main.rs.
Listing 1.6 Attempting to modify an iterator while iterating over it
1 fn main() { 2 let mut letters = vec![ ① 3 "a", "b", "c" 4 ]; 5 6 for letter in letters { 7 println!("{}", letter); 8 letters.push(letter.clone()); ② 9 } 10 }
① Creates a mutable vector letters
② Copies each letter and appends it to the end of letters
Listing 1.6 fails to compile because Rust does not allow the letters
variable to be modified within the iteration block. Here’s the error message:
$ cargo run Compiling ch1-letters v0.1.0 (/rust-in-action/code/ch1/ch1-letters) error[E0382]: borrow of moved value: `letters` --> src/main.rs:8:7 | 2 | let mut letters = vec![ | ----------- move occurs because `letters` has type | `std::vec::Vec<&str>`, which does not | implement the `Copy` trait ... 6 | for letter in letters { | ------- | | | `letters` moved due to this implicit call | to `.into_iter()` | help: consider borrowing to avoid moving | into the for loop: `&letters` 7 | println!("{}", letter); 8 | letters.push(letter.clone()); | ^^^^^^^ value borrowed here after move error: aborting due to previous error For more information about this error, try `rustc --explain E0382`. error: could not compile `ch1-letters`. To learn more, run the command again with --verbose.
While the language of the error message is filled with jargon (borrow, move, trait, and so on), Rust has protected the programmer from stepping into a trap that many others fall into. And fear not—that jargon will become easier to understand as you work through the first few chapters of this book.
Knowing that a language is safe provides programmers with a degree of liberty. Because they know their program won’t implode, they become much more willing to experiment. Within the Rust community, this liberty has spawned the expression fearless concurrency.
1.6.2 Goal of Rust: Productivity
When given a choice, Rust prefers the option that is easiest for the developer. Many of its more subtle features are productivity boosts. But programmer productivity is a difficult concept to demonstrate through an example in a book. Let’s start with something that can snag beginners—using assignment (=
) within an expression that should use an equality (==
) test:
1 fn main() { 2 let a = 10; 3 4 if a = 10 { 5 println!("a equals ten"); 6 } 7 }
In Rust, the preceding code fails to compile. The Rust compiler generates the following message:
error[E0308]: mismatched types --> src/main.rs:4:8 | 4 | if a = 10 { | ^^^^^^ | | | expected `bool`, found `()` | help: try comparing for equality: `a == 10` error: aborting due to previous error For more information about this error, try `rustc --explain E0308`. error: could not compile `playground`. To learn more, run the command again with --verbose.
At first, “mismatched types” might feel like a strange error message to encounter. Surely we can test variables for equality against integers.
After some thought, it becomes apparent why the if
test receives the wrong type. The if
is not receiving an integer. It’s receiving the result of an assignment. In Rust, this is the blank type: ()
. ()
is pronounced unit.14
When there is no other meaningful return value, expressions return ()
. As the following shows, adding a second equals sign on line 4 results in a working program that prints a equals ten
:
1 fn main() { 2 let a = 10; 3 4 if a == 10 { ① 5 println!("a equals ten"); 6 } 7 }
① Using a valid assignment operator (==) allows the program to compile.
Rust has many ergonomic features. It offers generics, sophisticated data types, pattern matching, and closures.15 Those who have worked with other ahead-of-time compilation languages are likely to appreciate Rust’s build system and its comprehensive package manager: cargo.
At first glance, we see that cargo is a front end for rustc, the Rust compiler, but cargo provides several additional utilities including the following:
cargo new
creates a skeleton Rust project in a new directory (cargo init
uses the current directory).cargo build
downloads dependencies and compiles the code.cargo run
executescargo build
and then also runs the resulting executable file.cargo doc
builds HTML documentation for every dependency in the current project.
1.6.3 Goal of Rust: Control
Rust offers programmers fine-grained control over how data structures are laid out in memory and their access patterns. While Rust uses sensible defaults that align with its “zero cost abstractions” philosophy, those defaults do not suit all situations.
At times, it is imperative to manage your application’s performance. It might matter to you that data is stored in the stack rather than on the heap. Perhaps, it might make sense to add reference counting to create a shared reference to a value. Occasionally, it might be useful to create one’s own type of pointer for a particular access pattern. The design space is large and Rust provides the tools to allow you to implement your preferred solution.
NOTE If terms such as stack, heap, and reference counting are new, don’t put the book down! We’ll spend lots of time explaining these and how they work together throughout the rest of the book.
Listing 1.7 prints the line a: 10, b: 20, c: 30, d: Mutex { data: 40 }
. Each representation is another way to store an integer. As we progress through the next few chapters, the trade-offs related to each level become apparent. For the moment, the important thing to remember is that the menu of types is comprehensive. You are welcome to choose exactly what’s right for your specific use case.
Listing 1.7 also demonstrates multiple ways to create integers. Each form provides differing semantics and runtime characteristics. But programmers retain full control of the trade-offs that they want to make.
Listing 1.7 Multiple ways to create integer values
1 use std::rc::Rc; 2 use std::sync::{Arc, Mutex}; 3 4 fn main() { 5 let a = 10; ① 6 let b = Box::new(20); ② 7 let c = Rc::new(Box::new(30)); ③ 8 let d = Arc::new(Mutex::new(40)); ④ 9 println!("a: {:?}, b: {:?}, c: {:?}, d: {:?}", a, b, c, d); 10 }
② Integer on the heap, also known as a boxed integer
③ Boxed integer wrapped within a reference counter
④ Integer wrapped in an atomic reference counter and protected by a mutual exclusion lock
To understand why Rust is doing something the way it is, it can be helpful to refer back to these three principles:
- The language’s first priority is safety.
- Data within Rust is immutable by default.
- Compile-time checks are strongly preferred. Safety should be a “zero-cost abstraction.”
1.7 Rust’s big features
Our tools shape what we believe we can create. Rust enables you to build the software that you want to make, but were too scared to try. What kind of tool is Rust? Flowing from the three principles discussed in the last section are three overarching features of the language:
1.7.1 Performance
Rust offers all of your computer’s available performance. Famously, Rust does not rely on a garbage collector to provide its memory safety.
There is, unfortunately, a problem with promising you faster programs: the speed of your CPU is fixed. Thus, for software to run faster, it needs to do less. Yet, the language is large. To resolve this conflict, Rust pushes the burden onto the compiler.
The Rust community prefers a bigger language with a compiler that does more, rather than a simpler language where the compiler does less. The Rust compiler aggressively optimizes both the size and speed of your program. Rust also has some less obvious tricks:
- Cache-friendly data structures are provided by default. Arrays usually hold data within Rust programs rather than deeply nested tree structures that are created by pointers. This is referred to as data-oriented programming.
- The availability of a modern package manager (cargo) makes it trivial to benefit from tens of thousands of open source packages. C and C++ have much less consistency here, and building large projects with many dependencies is typically difficult.
- Methods are always dispatched statically unless you explicitly request dynamic dispatch. This enables the compiler to heavily optimize code, sometimes to the point of eliminating the cost of a function call entirely.
1.7.2 Concurrency
Asking a computer to do more than one thing at the same time has proven difficult for software engineers. As far as an OS is concerned, two independent threads of execution are at liberty to destroy each other if a programmer makes a serious mistake. Yet Rust has spawned the expression fearless concurrency. Its emphasis on safety crosses the bounds of independent threads. There is no global interpreter lock (GIL) to constrain a thread’s speed. We explore some of the implications of this in part 2.
1.7.3 Memory efficiency
Rust enables you to create programs that require minimal memory. When needed, you can use fixed-size structures and know exactly how every byte is managed. High-level constructs, such as iteration and generic types, incur minimal runtime overhead.
1.8 Downsides of Rust
It’s easy to talk about this language as if it is the panacea for all software engineering. For example
- “A high-level syntax with low-level performance!”
- “Concurrency without crashes!”
- “C with perfect safety!”
These slogans (sometimes overstated) are great. But for all of its merits, Rust does have some disadvantages.
1.8.1 Cyclic data structures
In Rust, it is difficult to model cyclic data like an arbitrary graph structure. Implementing a doubly-linked list is an undergraduate-level computer science problem. Yet Rust’s safety checks do hamper progress here. If you’re new to the language, avoid implementing these sorts of data structures until you’re more familiar with Rust.
1.8.2 Compile times
Rust is slower at compiling code than its peer languages. It has a complex compiler toolchain that receives multiple intermediate representations and sends lots of code to the LLVM compiler. The unit of compilation for a Rust program is not an individual file but a whole package (known affectionately as a crate). As crates can include multiple modules, these can be exceedingly large units to compile. Although this enables whole-of-crate optimization, it requires whole-of-crate compilation as well.
1.8.3 Strictness
It’s impossible—well, difficult—to be lazy when programming with Rust. Programs won’t compile until everything is just right. The compiler is strict, but helpful.
Over time, it’s likely that you’ll come to appreciate this feature. If you’ve ever programmed in a dynamic language, then you may have encountered the frustration of your program crashing because of a misnamed variable. Rust brings that frustration forward so that your users don’t have to experience the frustration of things crashing.
1.8.4 Size of the language
Rust is large! It has a rich type system, several dozen keywords, and includes some features that are unavailable in other languages. These factors all combine to create a steep learning curve. To make this manageable, I encourage learning Rust gradually. Start with a minimal subset of the language and give yourself time to learn the details when you need these. That is the approach taken in this book. Advanced concepts are deferred until much later.
1.8.5 Hype
The Rust community is wary of growing too quickly and being consumed by hype. Yet, a number of software projects have encountered this question in their Inbox: “Have you considered rewriting this in Rust?” Unfortunately, software written in Rust is still software. It not immune to security problems and does not offer a panacea to all of software engineering’s ills.
1.9 TLS security case studies
To demonstrate that Rust will not alleviate all errors, let’s examine two serious exploits that threatened almost all internet-facing devices and consider whether Rust would have prevented those.
By 2015, as Rust gained prominence, implementations of SSL/TLS (namely, OpenSSL and Apple’s own fork) were found to have serious security holes. Known informally as Heartbleed and goto fail;, both exploits provide opportunities to test Rust’s claims of memory safety. Rust is likely to have helped in both cases, but it is still possible to write Rust code that suffers from similar issues.
1.9.1 Heartbleed
Heartbleed, officially designated as CVE-2014-0160,16 was caused by re-using a buffer incorrectly. A buffer is a space set aside in memory for receiving input. Data can leak from one read to the next if the buffer’s contents are not cleared between writes.
Why does this situation occur? Programmers hunt for performance. Buffers are reused to minimize how often memory applications ask for memory from the OS.
Imagine that we want to process some secret information from multiple users. We decide, for whatever reason, to reuse a single buffer through the course of the program. If we don’t reset this buffer once we use it, information from earlier calls will leak to the latter ones. Here is a précis of a program that would encounter this error:
let buffer = &mut[0u8; 1024]; ① read_secrets(&user1, buffer); ② store_secrets(buffer); read_secrets(&user2, buffer); ③ store_secrets(buffer);
① Binds a reference (&) to a mutable (mut) array ([…]) that contains 1,024 unsigned 8-bit integers (u8) initialized to 0 to the variable buffer
② Fills buffer with bytes from the data from user1
③ The buffer still contains data from user1 that may or may not be overwritten by user2.
Rust does not protect you from logical errors. It ensures that your data is never able to be written in two places at the same time. It does not ensure that your program is free from all security issues.
1.9.2 Goto fail;
The goto fail;
bug, officially designated as CVE-2014-1266,17 was caused by programmer error coupled with C design issues (and potentially by its compiler not pointing out the flaw). A function that was designed to verify a cryptographic key pair ended up skipping all checks. Here is a selected extract from the original SSLVerifySignedServerKeyExchange
function with a fair amount of obfuscatory syntax retained:18
1 static OSStatus 2 SSLVerifySignedServerKeyExchange(SSLContext *ctx, 3 bool isRsa, 4 SSLBuffer signedParams, 5 uint8_t *signature, 6 UInt16 signatureLen) 7{ 8 OSStatus err; ① 9 ... 10 11 if ((err = SSLHashSHA1.update( 12 &hashCtx, &serverRandom)) != 0) ② 13 goto fail; 14 15 if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) 16 goto fail; 17 goto fail; ③ 18 if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) 19 goto fail; 20 21 err = sslRawVerify(ctx, 22 ctx->peerPubKey, 23 dataToSign, /* plaintext \*/ 24 dataToSignLen, /* plaintext length \*/ 25 signature, 26 signatureLen); 27 if(err) { 28 sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify " 29 "returned %d\n", (int)err); 30 goto fail; 31 } 32 33 fail: 34 SSLFreeBuffer(&signedHashes); 35 SSLFreeBuffer(&hashCtx); 36 return err; ④ 37 }
① Initializes OSStatus with a pass value (e.g., 0)
② A series of defensive programming checks
③ Unconditional goto skips SSLHashSHA1.final() and the (significant) call to sslRawVerify().
④ Returns the pass value of 0, even for inputs that should have failed the verification test
In the example code, the issue lies between lines 15 and 17. In C, logical tests do not require curly braces. C compilers interpret those three lines like this:
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) { goto fail; } goto fail;
Would Rust have helped? Probably. In this specific case, Rust’s grammar would have caught the bug. It does not allow logical tests without curly braces. Rust also issues a warning when code is unreachable. But that doesn’t mean the error is made impossible in Rust. Stressed programmers under tight deadlines make mistakes. In general, similar code would compile and run.
1.10 Where does Rust fit best?
Although it was designed as a systems programming language, Rust is a general-purpose language. It has been successfully deployed in many areas, which we discuss next.
1.10.1 Command-line utilities
Rust offers three main advantages for programmers creating command-line utilities: minimal startup time, low memory use, and easy deployment. Programs start their work quickly because Rust does not need to initialize an interpreter (Python, Ruby, etc.) or virtual machine (Java, C#, etc.).
As a bare metal language, Rust produces memory-efficient programs.19 As you’ll see throughout the book, many types are zero-sized. That is, these only exist as hints to the compiler and take up no memory at all in the running program.
Utilities written in Rust are compiled as static binaries by default. This compilation method avoids depending on shared libraries that you must install before the program can run. Creating programs that can run without installation steps makes these easy to distribute.
1.10.2 Data processing
Rust excels at text processing and other forms of data wrangling. Programmers benefit from control over memory use and fast startup times. As of mid-2017, Rust touts the world’s fastest regular expression engine. In 2019, the Apache Arrow data-processing project—foundational to the Python and R data science ecosystems—accepted the Rust-based DataFusion project.
Rust also underlies the implementation of multiple search engines, data-processing engines, and log-parsing systems. Its type system and memory control provide you with the ability to create high throughput data pipelines with a low and stable memory footprint. Small filter programs can be easily embedded into the larger framework via Apache Storm, Apache Kafka, or Apache Hadoop streaming.
1.10.3 Extending applications
Rust is well suited for extending programs written in a dynamic language. This enables JNI (Java Native Interface) extensions, C extensions, or Erlang/Elixir NIFs (native implemented functions) in Rust. C extensions are typically a scary proposition. These tend to be quite tightly integrated with the runtime. Make a mistake and you could be looking at runaway memory consumption due to a memory leak or a complete crash. Rust takes away a lot of this anxiety.
- Sentry, a company that processes application errors, finds that Rust is an excellent candidate for rewriting CPU-intensive components of their Python system.20
- Dropbox used Rust to rewrite the file synchronization engine of its client-side application: “More than performance, [Rust’s] ergonomics and focus on correctness have helped us tame sync’s complexity.”21
1.10.4 Resource-constrained environments
C has occupied the domain of microcontrollers for decades. Yet, the Internet of Things (IoT) is coming. That could mean many billions of insecure devices exposed to the network. Any input parsing code will be routinely probed for weaknesses. Given how infrequently firmware updates for these devices occur, it’s critical that these are as secure as possible from the outset. Rust can play an important role here by adding a layer of safety without imposing runtime costs.
1.10.5 Server-side applications
Most applications written in Rust live on the server. These could be serving web traffic or supporting businesses running their operations. There is also a tier of services that sit between the OS and your application. Rust is used to write databases, monitoring systems, search appliances, and messaging systems. For example
- The npm package registry for the JavaScript and node.js communities is written in Rust.22
- sled (https://github.com/spacejam/sled), an embedded database, can process a workload of 1 billion operations that includes 5% writes in less than a minute on a 16-core machine.
- Tantivy, a full text search engine, can index 8 GB of English Wikipedia in approximately 100 s on a 4-core desktop machine.23
1.10.6 Desktop applications
There is nothing inherent in Rust’s design that prevents it from being deployed to develop user-facing software. Servo, the web browser engine that acted as an incubator for Rust’s early development, is a user-facing application. Naturally, so are games.
1.10.7 Desktop
There is still a significant need to write applications that live on people’s computers. Desktop applications are often complex, difficult to engineer, and hard to support. With Rust’s ergonomic approach to deployment and its rigor, it is likely to become the secret sauce for many applications. To start, these will be built by small, independent developers. As Rust matures, so will the ecosystem.
1.10.8 Mobile
Android, iOS, and other smartphone operating systems generally provide a blessed path for developers. In the case of Android, that path is Java. In the case of macOS, developers generally program in Swift. There is, however, another way.
Both platforms provide the ability for native applications to run on them. This is generally intended for applications written in C++, such as games, to be able to be deployed to people’s phones. Rust is able to talk to the phone via the same interface with no additional runtime cost.
1.10.9 Web
As you are probably aware, JavaScript is the language of the web. Over time though, this will change. Browser vendors are developing a standard called WebAssembly (Wasm) that promises to be a compiler target for many languages. Rust is one of the first. Porting a Rust project to the browser requires only two additional command-line commands. Several companies are exploring the use of Rust in the browser via Wasm, notably CloudFlare and Fastly.
1.10.10 Systems programming
In some sense, systems programming is Rust’s raison d’être. Many large programs have been implemented in Rust, including compilers (Rust itself), video game engines, and operating systems. The Rust community includes writers of parser generators, databases, and file formats.
Rust has proven to be a productive environment for programmers who share Rust’s goals. Three standout projects in this area include the following:
- Google is sponsoring the development of Fuchsia OS, an operating system for devices.24
- Microsoft is actively exploring writing low-level components in Rust for Windows.25
- Amazon Web Services (AWS) is building Bottlerocket, a bespoke OS for hosting containers in the cloud.26
It takes more than software to grow a programming language. One of the things that the Rust team has done extraordinarily well is to foster a positive and welcoming community around the language. Everywhere you go within the Rust world, you’ll find that you’ll be treated with courtesy and respect.
1.12 Rust phrase book
When you interact with members of the Rust community, you’ll soon encounter a few terms that have special meaning. Understanding the following terms makes it easier to understand why Rust has evolved the way that it has and the problems that it attempts to solve:
- Empowering everyone—All programmers regardless of ability or background are welcome to participate. Programming, and particularly systems programming, should not be restricted to a blessed few.
- Blazingly fast—Rust is a fast programming language. You’ll be able to write programs that match or exceed the performance of its peer languages, but you will have more safety guarantees.
- Fearless concurrency—Concurrent and parallel programming have always been seen as difficult. Rust frees you from whole classes of errors that have plagued its peer languages.
- No Rust 2.0—Rust code written today will always compile with a future Rust compiler. Rust is intended to be a reliable programming language that can be depended upon for decades to come. In accordance with semantic versioning, Rust is never backward-incompatible, so it will never release a new major version.
- Zero-cost abstractions—The features you gain from Rust impose no runtime cost. When you program in Rust, safety does not sacrifice speed.
Summary
- Many companies have successfully built large software projects in Rust.
- Software written in Rust can be compiled for the PC, the browser, and the server, as well as mobile and IoT devices.
- The Rust language is well loved by software developers. It has repeatedly won Stack Overflow’s “most loved programming language” title.
- Rust allows you to experiment without fear. It provides correctness guarantees that other tools are unable to provide without imposing runtime costs.
- With Rust, there are three main command_line tools to learn:
- Rust projects are not immune from all bugs.
- Rust code is stable, fast, and light on resources.
1.See “How our AWS Rust team will contribute to Rust’s future successes,” http://mng.bz/BR4J.
2.See “Rust at Cloudflare,” https://news.ycombinator.com/item?id=17077358.
3.See “The Epic Story of Dropbox’s Exodus From the Amazon Cloud Empire,” http://mng.bz/d45Q.
4.See “Google joins the Rust Foundation,” http://mng.bz/ryOX.
5.See “HHVM 4.20.0 and 4.20.1,” https://hhvm.com/blog/2019/08/27/hhvm-4.20.0.html.
6.See https://github.com/Azure/iotedge/tree/master/edgelet.
7.See “Rust Case Study: Community makes Rust an easy choice for npm,” http://mng.bz/xm9B.
8.See “Building a Container Runtime in Rust,” http://mng.bz/d40Q.
9.See “HTTP code syntax highlighting server written in Rust,” https://github.com/sourcegraph/syntect_server.
10.See “Rust in Production at Figma,” https://www.figma.com/blog/rust-in-production-at-figma/.
11.See “The fast, light, and robust EVM and WASM client,” https://github.com/paritytech/parity-ethereum.
12.See “Chrome OS KVM—A component written in Rust,” https://news.ycombinator.com/item?id=15346557.
13.See the articles “We need a safer systems programming language,” http://mng.bz/VdN5 and “Memory safety,” http://mng.bz/xm7B for more information.
14.The name unit reveals some of Rust’s heritage as a descendant of the ML family of programming languages that includes OCaml and F#. The term stems from mathematics. Theoretically, a unit type only has a single value. Compare this with Boolean types that have two values, true
or false
, or strings that have an infinite number of valid values.
15.If these terms are unfamiliar, do keep reading. These are explained throughout the book. They are language features that you will miss in other languages.
16.See “CVE-2014-0160 Detail,” https://nvd.nist.gov/vuln/detail/CVE-2014-0160.
17.See “CVE-2014-1266 Detail,” https://nvd.nist.gov/vuln/detail/CVE-2014-1266.
18.Original available at http://mng.bz/RKGj.
19.The joke goes that Rust is as close to bare metal as possible.
20.See “Fixing Python Performance with Rust,” http://mng.bz/ryxX.
21.See “Rewriting the heart of our sync engine,” http://mng.bz/Vdv5.
22.See “Community makes Rust an easy choice for npm: The npm Registry uses Rust for its CPU-bound bottlenecks,” http://mng.bz/xm9B.
23.See “Of tantivy’s indexing,” https://fulmicoton.com/posts/behold-tantivy-part2/.
24.See “Welcome to Fuchsia!,” https://fuchsia.dev/.
25.See “Using Rust in Windows,” http://mng.bz/A0vW.
26.See “Bottlerocket: Linux-based operating system purpose-built to run containers,” https://aws.amazon.com/ bottlerocket/.table of contentssearchSettingsqueueback
Part 1 Rust language distinctives
Part 1 of the book is a quick-fire introduction to the Rust programming language. By the end of the chapters in this part, you will have a good understanding of Rust syntax and know what motivates people to choose Rust. You will also understand some fundamental differences between Rust and its peer languages.
TopicsStart LearningWhat’s New
Part 1 Rust language distinctives
2 Language foundations
11h 31m remaining
2 Language foundations
- Coming to grips with the Rust syntax
- Learning fundamental types and data structures
- Building command-line utilities
- Compiling programs
This chapter introduces you to the fundamentals of Rust programming. By the end of the chapter, you will be able to create command-line utilities and should be able to get the gist of most Rust programs. We’ll work through most of the language’s syntax, but defer much of the detail about why things are how they are for later in the book.
NOTE Programmers who have experience with another programming language will benefit the most from this chapter. If you are an experienced Rust programmer, feel free to skim through it.
Beginners are welcomed. Rust’s community strives to be responsive to newcomers. At times, you may strike a mental pothole when you encounter terms such as lifetime elision, hygienic macros, move semantics, and algebraic data types without context. Don’t be afraid to ask for help. The community is much more welcoming than these helpful, yet opaque, terms might suggest.
In this chapter, we will build grep-lite, a greatly stripped-down version of the ubiquitous grep utility. Our grep-lite program looks for patterns within text and prints lines that match. This simple program allows us to focus on the unique features of Rust.
The chapter takes a spiral approach to learning. A few concepts will be discussed multiple times. With each iteration, you will find yourself learning more. Figure 2.1 shows a completely unscientific map of the chapter.
Figure 2.1 Chapter topic outline. Starting with primitive types, the chapter progresses through several concepts with increasing levels of depth.
It’s highly recommended that you follow along with the examples in this book. As a reminder, to access or download the source code for the listings, use either of these two sources:
2.1 Creating a running program
Every plain text file has a hidden superpower: when it includes the right symbols, it can be converted into something that can be interpreted by a CPU. That is the magic of a programming language. This chapter’s aim is to allow you to become familiar with the process of converting Rust source code into a running program.
Understanding this process is more fun than it sounds! And it sets you up for an exciting learning journey. By the end of chapter 4, you will have implemented a virtual CPU that can also interpret programs that you create.
2.1.1 Compiling single files with rustc
Listing 2.1 is a short, yet complete Rust program. To translate it into a working program, we use software called a compiler. The compiler’s role is to translate the source code into machine code, as well as take care of lots of bookkeeping to satisfy the operating system (OS) and CPU that it is a runnable program. The Rust compiler is called rustc. You’ll find the source code for listing 2.1 in the file ch2/ok.rs.
Listing 2.1 Almost the shortest valid Rust program
1 fn main() { 2 println!("OK") 3 }
To compile a single file written in Rust into a working program
- Save your source code to a file. In our case, we’ll use the filename ok.rs.
- Make sure that the source code includes a
main()
function. - Open a shell window such as Terminal, cmd.exe, Powershell, bash, zsh, or any other.
- Execute the command
rustc <file>
, where<file>
is the file you want to compile.
When compilation succeeds, rustc sends no output to the console. Behind the scenes, rustc has dutifully created an executable, using the input filename to choose the output filename.
Assuming that you’ve saved listing 2.1 to a file called ok.rs, let’s see what that looks like. The following snippet provides a short demonstration of the process:
$ rustc ok.rs $ ./ok ① OK
① For Windows, include the .exe filename extension (for example, ok.exe).
2.1.2 Compiling Rust projects with cargo
Most Rust projects are larger than a single file. These typically include dependencies. To prepare ourselves for that, we’ll use a higher-level tool than rustc, called cargo. cargo understands how to drive rustc (and much more).
Migrating from a single file workflow managed by rustc to one managed by cargo is a two-stage process. The first is to move that original file into an empty directory. Then execute the cargo init
command.
Here is a detailed overview of that process, assuming that you are starting from a file called ok.rs generated by following the steps in the previous section:
- Run
mkdir <project>
to create an empty directory (e.g.,mkdir ok
). - Move your source code into the <project> directory (e.g.,
mv ok.rs ok
). - Change to the <project> directory (e.g.,
cd ok
). - Run
cargo init
.
From this point on, you can issue cargo run
to execute your project’s source code. One difference from rustc is that compiled executables are found in a <project>/target subdirectory. Another is that cargo provides much more output by default:
$ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.03s Running `target/debug/ok` OK
If you’re ever curious about what cargo is doing under the hood to drive rustc, add the verbose flag (-v
) to your command:
$ rm -rf target/ ① $ cargo run -v Compiling ok v0.1.0 (/tmp/ok) Running `rustc --crate-name ok --edition=2018 ok.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=55485250d3e77978 -C extra-filename=-55485250d3e77978 --out-dir /tmp/ok/target/debug/deps -C incremental=/tmp/target/debug/incremental -L dependency=/tmp/ok/target/debug/deps -C link-arg=-fuse-ld=lld` Finished dev [unoptimized + debuginfo] target(s) in 0.31s Running `target/debug/ok` OK
① Added here to provoke cargo into compiling the project from scratch
2.2 A glance at Rust’s syntax
Rust is boring and predictable where possible. It has variables, numbers, functions, and other familiar things that you have seen in other languages. For example, it delimits blocks with curly brackets ({
and }
), it uses a single equals sign as its assignment operator (=
), and it is whitespace-agnostic.
2.2.1 Defining variables and calling functions
Let’s look at another short listing to introduce some fundamentals: defining variables with type annotations and calling functions. Listing 2.2 prints ( a + b ) + ( c + d ) = 90
to the console. As you can see from lines 2–5 in the listing, there are multiple syntactic choices for annotating data types to integers. Use whichever feels most natural for the situation at hand. The source code for this listing is in ch2/ch2-first-steps.rs.
Listing 2.2 Adding integers using variables and declaring types
1 fn main() { ① 2 let a = 10; ② 3 let b: i32 = 20; ③ 4 let c = 30i32; ④ 5 let d = 30_i32; ⑤ 6 let e = add(add(a, b), add(c, d)); 7 8 println!("( a + b ) + ( c + d ) = {}", e); 9 } 10 11 fn add(i: i32, j: i32) -> i32 { ⑥ 12 i + j ⑦ 13 }
① Rust is flexible with the location of the main() function.
② Types can be inferred by the compiler…
③ …or declared by the programmer when creating variables.
④ Numeric types can include a type annotation in their literal form.
⑤ Numbers can include underscores, which are intended to increase readability and have no functional impact.
⑥ Type declarations are required when defining functions.
⑦ Functions return the last expression’s result so that return is not required.
NOTE In the listing, be careful about adding a semicolon to the add()
function declaration. This changes the semantics, returning ()
(unit) rather than i32
.
Although there are only 13 lines of code, there is quite a lot packed into listing 2.2. Here’s a brief description that should provide the gist of what’s going on. We will cover the details in the rest of the chapter.
In line 1 (fn main() {
), the fn
keyword begins a function definition. The entry point to all Rust programs is main()
. It takes no arguments and returns no value.1 Code blocks, also known as lexical scopes, are defined with curly braces: {
and }
.
In line 2 (let a = 10;
), we use let
to declare variable bindings. Variables are immutable by default, meaning that they are read-only rather than read-write. And finally, statements are delimited with semicolons (;
).
In line 3 (let b: i32 = 20;
), you can designate a specific data type for the compiler. At times, this will be required as the compiler will be unable to deduce a unique type on your behalf.
In line 4 (let c = 30i32;
), you’ll note that Rust’s numeric literals can include types annotations. This can be helpful when navigating complex numerical expressions. And in line 5 (let c = 30_i32;
), you’ll see that Rust permits the use of underscores within numeric literals. These increase readability but are insignificant to the compiler. In line 6 (let e = add(add(a, b), add(c, d));
), it should be easy to see that calling functions looks like what you’ve experienced in most other programming languages.
In line 8 (println!("( a + b ) + ( c + d ) = {}", e); ), println!()
is a macro, which is function-like but returns code rather than values. When printing to the console, every input data type has its own way of being represented as a text string. println!()
takes care of figuring out the exact methods to call on its arguments.
Strings use double quotes ("
) rather than single quotes ('
). Rust uses single quotes for single characters, which are a distinct type, char
. And with Rust, string formatting uses {}
as a placeholder, rather than the C-like printf
style of %s
or other variants.
Finally, in line 10 (fn add(...) -> i32 {
), you can see that Rust’s syntax for defining functions is similar to those programming languages that use explicit type declarations. Commas delimit parameters, and type declarations follow variable names. The dagger (->
) or thin arrow syntax indicates the return type.
2.3 Numbers
Computers have been associated with numbers for longer than you have been able to say “formula translator.” This section discusses how to create numeric types in Rust and how to perform operations on these.
2.3.1 Integers and decimal (floating-point) numbers
Rust uses a relatively conventional syntax for creating integers (1
, 2
, …) and floating-point numbers (1.0
, 1.1
, …). Operations on numbers use infix notation, meaning that numeric expressions look like those that you’re used to seeing in most programming languages. To operate on multiple types, Rust also allows the same token (+
) for addition. This is called operator overloading. Some notable differences from other languages follow:
- Rust includes a large number of numeric types. You will become used to declaring the size in bytes, which affects how many numbers the type can represent and whether your type is able to represent negative numbers.
- Conversions between types are always explicit. Rust does not automatically convert your 16-bit integer into a 32-bit integer.
- Rust’s numbers can have methods. For example, to round 24.5 to the nearest integer, Rust programmers use
24.5_f32.round()
rather than (round(24.5_f32)
). Here, the type suffix is required because a concrete type is necessary.
To start, let’s consider a small example. You’ll find the code in ch2/ch2-intro-to-numbers.rs in the examples for this book. Listing 2.3 prints these few lines to the console:
20 + 21 + 22 = 63 1000000000000 42
Listing 2.3 Numeric literals and basic operations on numbers in Rust
1 fn main() { 2 let twenty = 20; ① 3 let twenty_one: i32 = 21; ② 4 let twenty_two = 22i32; ③ 5 6 let addition = twenty + twenty_one + twenty_two; 7 println!("{} + {} + {} = {}", twenty, twenty_one, twenty_two, addition); 8 9 let one_million: i64 = 1_000_000; ④ 10 println!("{}", one_million.pow(2)); ⑤ 11 12 let forty_twos = [ ⑥ 13 42.0, ⑦ 14 42f32, ⑧ 15 42.0_f32, ⑨ 16 ]; 17 18/ println!("{:02}", forty_twos[0]); ⑩ 19 }
① Rust infers a type on your behalf if you don’t supply one…
② …which is done by adding type annotations (i32)…
④ Underscores increase readability and are ignored by the compiler.
⑥ Creates an array of numbers, which must all be the same type, by surrounding those with square brackets
⑦ Floating-point literals without an explicit type annotation become 32-bit or 64-bit, depending on context.
⑧ Floating-point literals can also have type suffixes…
⑩ Elements within arrays can be indexed numerically, starting at 0.
2.3.2 Integers with base 2, base 8, and base 16 notation
Rust also has built-in support for numeric literals that allow you to define integers in base 2 (binary), base 8 (octal), and base 16 (hexadecimal). This notation is also available within the formatting macros like println!
. Listing 2.4 demonstrates the three styles. You can find the source code for this listing in ch2/ch2-non-base2.rs. It produces the following output:
base 10: 3 30 300 base 2: 11 11110 100101100 base 8: 3 36 454 base 16: 3 1e 12c
Listing 2.4 Using base 2, base 8, and base 16 numeric literals
1 fn main() { 2 let three = 0b11; ① 3 let thirty = 0o36; ② 4 let three_hundred = 0x12C; ③ 5 6 println!("base 10: {} {} {}", three, thirty, three_hundred); 7 println!("base 2: {:b} {:b} {:b}", three, thirty, three_hundred); 8 println!("base 8: {:o} {:o} {:o}", three, thirty, three_hundred); 9 println!("base 16: {:x} {:x} {:x}", three, thirty, three_hundred); 10 }
① The 0b prefix indicates binary (base 2) numerals.
② The 0o prefix indicates octal (base 8) numerals.
③ The 0x prefix indicates hexadecimal (base 16) numerals.
In binary (base 2) numerals, 0b11 equals 3 because 3 = 2 × 1 + 1 × 1. With octal (base 8) numerals, 0o36
equals 30 because 30 = 8 × 3 + 1 × 6. And with hexadecimal (base 16) numerals, 0x12C
equals 300 because 300 = 256 × 1 + 16 × 2 + 1 × 12. Table 2.1 shows the types that represent scalar numbers.
Table 2.1 Rust types for representing scalar (single) numbers
Rust contains a full complement of numeric types. The types are grouped into a few families:
- Signed integers (
i
) represent negative as well as positive integers. - Unsigned integers (
u
) only represent positive integers but can go twice as high as their signed counterparts. - Floating-point types (
f
) represent real numbers with special bit patterns to represent infinity, negative infinity, and “not a number” values.
Integer width is the number of bits that the type uses in RAM and in the CPU. Types that take up more space, such as u32
vs. i8
, can represent a wider range of numbers. But this incurs the expense of needing to store extra zeros for smaller numbers, as table 2.2 shows.
Table 2.2 Multiple bit patterns can represent the same number.
Number | Type | Bit pattern in memory |
---|---|---|
20 | u32 | 00000000000000000000000000010100 |
20 | i8 | 00010100 |
20 | f32 | 01000001101000000000000000000000 |
Although we’ve only touched on numbers, we nearly have enough exposure to Rust to create a prototype of our pattern-matching program. But let’s look at comparing numbers before we create our program.
2.3.3 Comparing numbers
Rust’s numeric types support a large suite of comparisons that you’re probably familiar with. Enabling support for these comparisons is provided by a feature that you have not encountered yet. It is called traits.2 Table 2.3 summarizes the comparison operators available to you.
Table 2.3 Mathematical operators supported by Rust’s numeric types
That support does include a few caveats. We’ll look at these conditions in the rest of this section.
IMPOSSIBLE TO COMPARE DIFFERENT TYPES
Rust’s type safety requirements prevent comparisons between types. For example, this code does not compile:
fn main() { let a: i32 = 10; let b: u16 = 100; if a < b { println!("Ten is less than one hundred."); } }
To appease the compiler, we need to use an as
operator to cast one of the operands to the other’s type. The following code shows this type cast: b as i32
:
fn main() { let a: i32 = 10; let b: u16 = 100; if a < (b as i32) { println!("Ten is less than one hundred."); } }
It is safest to cast the smaller type to a larger one (for example, a 16-bit type to a 32-bit type). This is sometimes referred to as promotion. In this case, we could have demoted a
down to a u16
, but such a move is generally more risky.
WARNING Using type casts carelessly will cause your program to behave unexpectedly. For example, the expression 300_i32 as i8
returns 44
.
In some cases, using the as
keyword is too restrictive. It’s possible to regain fuller control over the type conversion process at the cost of introducing some bureaucracy. The following listing shows a Rust method to use instead of the as
keyword when the conversion might fail.
Listing 2.5 The try_into()
method converts between types
1 use std::convert::TryInto; ① 2 3 fn main() { 4 let a: i32 = 10; 5 let b: u16 = 100; 6 7 let b_ = b.try_into() 8 .unwrap(); ② 9 10 if a < b_ { 11 println!("Ten is less than one hundred."); 12 } 13 }
① Enables try_into() to be called on those types that have implemented it (such as u16)
② try_into() returns a Result type that provides access to the conversion attempt.
Listing 2.5 introduces two new Rust concepts: traits and error handling. On line 1, the use
keyword brings the std::convert::TryInto
trait into local scope. This unlocks the try_into()
method of the b
variable. We’ll bypass a full explanation of why this occurs for now. In the meantime, consider a trait as a collection of methods. If you are from an object-oriented background, traits can be thought of as abstract classes or interfaces. If your programming experience is in functional languages, you can think of traits as type classes.
Line 7 provides a glimpse of error handling in Rust. b.try_into()
returns an i32
value wrapped within a Result
. Result
is introduced properly in chapter 3. It can contain either a success value or an error value. The unwrap()
method can handle the success value and returns the value of b
as an i32
here. If the conversion between u16
and i32
were to fail, then calling unsafe()
would crash the program. As the book progresses, you will learn safer ways of dealing with Result
rather than risking the program’s stability!
A distinguishing characteristic of Rust is that it only allows a type’s methods to be called when the trait is within local scope. An implicit prelude enables common operations such as addition and assignment to be used without explicit imports.
TIP To understand what is included in local scope by default, you should investigate the std::prelude
module. Its documentation is available online at https://doc.rust-lang.org/std/prelude/index.html.
Floating-point types (f32
and f64
, for example) can cause serious errors for the unwary. There are (at least) two reasons for this:
- These often approximate the numbers that they’re representing. Floating-point types are implemented in base 2, but we often want to calculate numbers in base 10. This mismatch creates ambiguity. Moreover, although often described as representing the real numbers, floating point values have a limited precision. Representing all of the reals would require infinite precision.
- These can represent values that have unintuitive semantics. Unlike integers, floating-point types have some values that do not play well together (by design). Formally, these only have a partial equivalence relation. This is encoded in Rust’s type system.
f32
andf64
types only implement thestd::cmp::PartialEq
trait, whereas other numeric types also implementstd::cmp::Eq
.
To prevent these hazards, here are two guidelines to follow:
- Avoid testing floating-point numbers for equality.
- Be wary when results may be mathematically undefined.
Using equality to compare floating-point numbers can be highly problematic. Floating-point numbers are implemented by computing systems that use binary (base 2) mathematics, but are often asked to perform operations on decimal (base 10) numbers. This poses a problem because many values we care about, such as 0.1, have no exact representation in binary.a
To illustrate the problem, consider the following snippet. Should it run successfully, or should it crash? Although the expression that is being evaluated (0.1 + 0.2 = 0.3) is a mathematical tautology, it crashes on most systems that run it:
fn main() { assert!(0.1 + 0.2 == 0.3); ① }
① assert! crashes the program unless its argument evaluates to true.
But not all. It turns out that the data type can affect whether the program succeeds or fails. The following code, available at ch2/ch2-add-floats.rs, interrogates the internal bit patterns of each value to find where the differences lie. It then performs the test in the previous example against both f32
and f64
types. Only one test passes:
1 fn main() { 2 let abc: (f32, f32, f32) = (0.1, 0.2, 0.3); 3 let xyz: (f64, f64, f64) = (0.1, 0.2, 0.3); 4 5 println!("abc (f32)"); 6 println!(" 0.1 + 0.2: {:x}", (abc.0 + abc.1).to_bits()); 7 println!(" 0.3: {:x}", (abc.2).to_bits()); 8 println!(); 9 10 println!("xyz (f64)"); 11 println!(" 0.1 + 0.2: {:x}", (xyz.0 + xyz.1).to_bits()); 12 println!(" 0.3: {:x}", (xyz.2).to_bits()); 13 println!(); 14 15 assert!(abc.0 + abc.1 == abc.2); ① 16 assert!(xyz.0 + xyz.1 == xyz.2); ② 17 }
When executed, the program successfully generates the short report that follows, which reveals the error. After that, it crashes. Significantly, it crashes on line 14, when it compares the result of the f64
values:
abc (f32) 0.1 + 0.2: 3e99999a 0.3: 3e99999a xyz (f64) 0.1 + 0.2: 3fd3333333333334 0.3: 3fd3333333333333 thread 'main' panicked at 'assertion failed: xyz.0 + xyz.1 == xyz.2', ➥ch2-add-floats.rs.rs:14:5 note: run with `RUST_BACKTRACE=1` environment variable to display ➥a backtrace
Generally speaking, it is safer to test whether mathematical operations fall within an acceptable margin of their true mathematical result. This margin is often referred to as the epsilon.
Rust includes some tolerances to allow comparisons between floating-point values. These tolerances are defined as f32::EPSILON
and f64::EPSILON
. To be more precise, it’s possible to get closer to how Rust is behaving under the hood, as the following small example shows:
fn main() { let result: f32 = 0.1 + 0.1; let desired: f32 = 0.2; let absolute_difference = (desired - result).abs(); assert!(absolute_difference <= f32::EPSILON); }
In the example, what actually happens is interesting, but mostly irrelevant. The Rust compiler actually delegates the comparison to the CPU. Floating-point operations are implemented using bespoke hardware within the chip.b
Operations that produce mathematically undefined results, such as taking the square root of a negative number (-42.0.sqrt()
), present particular problems. Floating-point types include “not a number” values (represented in Rust syntax as NAN
values) to handle these cases.
NAN
values poison other numbers. Almost all operations interacting with NAN
return NAN
. Another thing to be mindful of is that, by definition, NAN
values are never equal. This small program will always crash:
fn main() { let x = (-42.0_f32).sqrt(); assert_eq!(x, x); }
To program defensively, make use of the is_nan()
and is_finite()
methods. Inducing a crash, rather than silently proceeding with a mathematical error, allows you to debug close to what has caused the problem. The following illustrates using the is_finite()
method to bring about this condition:
fn main() { let x: f32 = 1.0 / 0.0; assert!(x.is_finite()); }
a If this is confusing to think about, consider that many values, such as 1/3 (one third), have no exact representation within the decimal number system.
b Illegal or undefined operations trigger a CPU exception. You will read about those in chapter 12.
2.3.4 Rational, complex numbers, and other numeric types
Rust’s standard library is comparatively slim. It excludes numeric types that are often available within other languages. These include
- Many mathematical objects for working with rational numbers and complex numbers
- Arbitrary size integers and arbitrary precision floating-point numbers for working with very large or very small numbers
- Fixed-point decimal numbers for working with currencies
To access these specialized numeric types, you can use the num crate. Crates are Rust’s name for packages. Open source crates are shared at the https://crates.io repository, which is where cargo downloads num from.
Listing 2.6 demonstrates adding two complex numbers together. If you’re unfamiliar with the term complex numbers, these are two-dimensional, whereas numbers that you deal with day to day are one-dimensional. Complex numbers have “real” and “imaginary” parts and are denoted as <real> + <imaginary>i
.3 For example, 2.1 + –1.2i is a single complex number. That’s enough mathematics. Let’s look at the code.
Here is the recommended workflow to compile and run listing 2.6:
- Execute the following commands in a terminal:git clone –depth=1 https:/ /github.com/rust-in-action/code rust-in-action cd rust-in-action/ch2/ch2-complex cargo run
- For those readers who prefer to learn by doing everything by hand, the following instructions will achieve the same end result:
- Execute the following commands in a terminal:cargo new ch2-complex cd ch2-complex
- Add version 0.4 of the num crate into the
[dependencies]
section of Cargo.toml. That section will look like this:[dependencies] num = “0.4” - Replace src/main.rs with the source code from listing 2.6 (available at ch2/ch2-complex/src/main.rs).
- Execute
cargo run
.
After several lines of intermediate output, cargo run
should produce the following output:
13.2 + 21.02i
Listing 2.6 Calculating values with complex numbers
1 use num::complex::Complex; ① 2 3 fn main() { 4 let a = Complex { re: 2.1, im: -1.2 }; ② 5 let b = Complex::new(11.1, 22.2); ③ 6 let result = a + b; 7 8 println!("{} + {}i", result.re, result.im) ④ 9 }
① The use keyword brings the Complex type into local scope.
② Every Rust type has a literal syntax.
③ Most types implement a new() static method.
④ Accesses fields with the dot operator
Some points from the listing are worth pausing to consider:
- The
use
keyword pulls crates into local scope, and the namespace operator (::
) restricts what’s imported. In our case, only a single type is required:Complex
. - Rust does not have constructors; instead, every type has a literal form. You can initialize types by using the type name (
Complex
) and assigning their fields (re
,im)
values (such as2.1
or–1.2
) within curly braces ({ }
). - Many types implement a
new()
method for simplicity. This convention, however, is not part of the Rust language. - To access fields, Rust programmers use the dot operator (
.
). For example, thenum:: complex::Complex
type has two fields:re
represents the real part, andim
represents the imaginary part. Both are accessible with the dot operator.
Listing 2.6 also introduces some new commands. It demonstrates two forms of initializing non-primitive data types.
One is a literal syntax available as part of the Rust language (line 4). The other is the new()
static method, which is implemented by convention only and isn’t defined as part of the language (line 5). A static method is a function that’s available for a type, but it’s not an instance of that type.4
The second form is often preferred in real-world code because library authors use a type’s new()
method to set defaults. It also involves less clutter.
Shortcut for adding a third-party dependency to a project
I recommend that you install the cargo-edit crate to enable the cargo add
subcommand. You can do this with the following code:
$ cargo install cargo-edit Updating crates.io index Installing cargo-edit v0.6.0 ... Installed package `cargo-edit v0.6.0` (executables `cargo-add`, `cargo-rm`, `cargo-upgrade`)
Up to this point, we have manually added dependencies to Cargo.toml. The cargo add
command simplifies this process by editing the file correctly on your behalf:
$ cargo add num Updating 'https:/ /github.com/rust-lang/crates.io-index' index Adding num v0.4.0 to dependencies
We’ve now addressed how to access built-in numeric types and types available from third-party libraries. We’ll move on to discussing some more of Rust’s features.
2.4 Flow control
Programs execute from top to bottom, except when you don’t want that. Rust has a useful set of flow control mechanisms to facilitate this. This section provides a brief tour of the fundamentals.
2.4.1 For: The central pillar of iteration
The for
loop is the workhorse of iteration in Rust. Iterating through collections of things, including iterating over collections that may have infinitely many values, is easy. The basic form is
for item in container { // ... }
This basic form makes each successive element in container
available as item
. In this way, Rust emulates many dynamic languages with an easy-to-use, high-level syntax. However, it does have some pitfalls.
Counterintuitively, once the block ends, accessing the container another time becomes invalid. Even though the container
variable remains within local scope, its lifetime has ended. For reasons that are explained in chapter 4, Rust assumes that container
is no longer needed once the block finishes.
When you want to reuse container
later in your program, use a reference. Again, for reasons that are explained in chapter 4, when a reference is omitted, Rust assumes that container
is no longer needed. To add a reference to the container, prefix it with an ampersand (&
) as this example shows:
for item in &container { // ... }
If you need to modify each item
during the loop, you can use a mutable reference by including the mut
keyword:
for item in &mut collection { // ... }
As an implementation detail, Rust’s for
loop construct is expanded to method calls by the compiler. As the following table shows, these three forms of for
each map to a different method.
When a local variable is not used within a block, by convention, you’ll use an underscore (_
). Using this pattern in conjunction with the _exclusive range syntax_ (n..m
) and the inclusive range syntax (n..=m
) makes it clear that the intent is to perform a loop for a fixed number of times. Here’s an example:
for _ in 0..10 { // ... }
AVOID MANAGING AN INDEX VARIABLE
In many programming languages, it’s common to loop through things by using a temporary variable that’s incremented at the end of each iteration. Conventionally, this variable is named i
(for index). A Rust version of that pattern is
let collection = [1, 2, 3, 4, 5]; for i in 0..collection.len() { let item = collection[i]; // ... }
This is legal Rust. It’s also essential in cases when iterating directly over collection
via for item in collection
is impossible. However, it is generally discouraged. The manual approach introduces two problems with this:
- Performance—Indexing values with the
collection[index]
syntax incurs run-time costs for bounds checking. That is, Rust checks thatindex
currently exists withincollection
as valid data. Those checks are not necessary when iterating directly overcollection
. The compiler can use compile-time analysis to prove that illegal access is impossible. - Safety—Periodically accessing
collection
over time introduces the possibility that it has changed. Using afor
loop overcollection
directly allows Rust to guarantee that thecollection
remains untouched by other parts of the program.
2.4.2 Continue: Skipping the rest of the current iteration
The continue
keyword operates as you would expect. Here’s an example:
for n in 0..10 { if n % 2 == 0 { continue; } // ... }
2.4.3 While: Looping until a condition changes its state
The while
loop proceeds as long as a condition holds. The condition, formally known as a predicate, can be any expression that evaluates to true
or false
. This (non-functioning) snippet takes air quality samples, checking to avoid anomalies:
let mut samples = vec![]; while samples.len() < 10 { let sample = take_sample(); if is_outlier(sample) { continue; } samples.push(sample); }
USING WHILE TO STOP ITERATING ONCE A DURATION IS REACHED
Listing 2.7 (source code available at ch2/ch2-while-true-incr-count.rs) provides a working example of while
. It isn’t an ideal method for implementing benchmarks, but can be a useful tool to have in your toolbox. In the listing, while
continues to execute a block when a time limit is not reached.
Listing 2.7 Testing how fast your computer can increment a counter
1 use std::time::{Duration, Instant}; ① 2 3 fn main() { 4 let mut count = 0; 5 let time_limit = Duration::new(1,0); ② 6 let start = Instant::now(); ③ 7 8 while (Instant::now() - start) < time_limit { ④ 9 count += 1; 10 } 11 println!("{}", count); 12 }
① This form of an import hasn’t been seen before. It brings the Duration and Instant types from std::time into local scope.
② Creates a Duration that represents 1 second
③ Accesses time from the system’s clock
④ An Instant minus an Instant returns a Duration.
AVOID WHILE WHEN ENDLESSLY LOOPING
Most Rust programmers avoid the following idiom to express looping forever. The preferred alternative is to use the loop
keyword, explained in the next section.
while true { println!("Are we there yet?"); }
2.4.4 Loop: The basis for Rust’s looping constructs
Rust contains a loop
keyword for providing more control than for
and while
. loop
executes a code block again and again, never stopping for a tea (or coffee) break. loop
continues to execute until a break
keyword is encountered or the program is terminated from the outside. Here’s an example showing the loop
syntax:
loop { // ... }
loop
is often seen when implementing long-running servers, as the following example shows:
loop { let requester, request = accept_request(); let result = process_request(request); send_response(requester, result); }
2.4.5 Break: Aborting a loop
The break
keyword breaks out of a loop. In this regard, Rust’s generally operates as you are used to:
for (x, y) in (0..).zip(0..) { if x + y > 100 { break; } // ... }
You can break out of a nested loop with loop labels.5 A loop label is an identifier prefixed with an apostrophe ('
), like this example shows:
'outer: for x in 0.. { for y in 0.. { for z in 0.. { if x + y + z > 1000 { break 'outer; } // ... } } }
Rust does not include the goto
keyword, which provides the ability to jump to other parts of the program. The goto
keyword can make control flow confusing, and its use is generally discouraged. One place where it is still commonly used, though, is to jump to and clean up a section of a function when an error condition is detected. Use loop labels to enable that pattern.
2.4.6 If, if else, and else: Conditional branching
So far, we’ve indulged in the exciting pursuit of looking for numbers within lists of numbers. Our tests have involved utilizing the if
keyword. Here’s an example:
if item == 42 { // ... }
if
accepts any expression that evaluates to a Boolean value (e.g., true
or false
). When you want to test multiple expressions, it’s possible to add a chain of if else
blocks. The else
block matches anything that has not already been matched. For example
if item == 42 { // ... } else if item == 132 { // ... } else { // ... }
Rust has no concept of “truthy” or “falsey” types. Other languages allow special values such as 0
or an empty string to stand in for false
and for other values to represent true
, but Rust doesn’t allow this. The only value that can be used for true
is true
, and for false
, use false
.
Rust is an expression-based language
In programming languages from this heritage, all expressions return values and almost everything is an expression. This heritage reveals itself through some constructs that are not legal in other languages. In idiomatic Rust, the return
keyword is omitted from functions as shown in the following snippet:
fn is_even(n: i32) -> bool { n % 2 == 0 }
For example, Rust programmers assign variables from conditional expressions:
fn main() { let n = 123456; let description = if is_even(n) { "even" } else { "odd" }; println!("{} is {}", n, description); ① }
This can be extended to other blocks including match
like this:
fn main() { let n = 654321; let description = match is_even(n) { true => "even", false => "odd", }; println!("{} is {}", n, description); ① }
Perhaps most surprisingly, the break
keyword also returns a value. This can be used to allow “infinite” loops to return values:
fn main() { let n = loop { break 123; }; println!("{}", n); ① }
You may wonder what parts of Rust are not expressions and, thus, do not return values. Statements are not expressions. These appear in Rust in three places:
- Expressions delimited by the semicolon (
;
) - Binding a name to a value with the assignment operator (
=
) - Type declarations, which include functions (
fn
) and data types created with thestruct
andenum
keywords
Formally, the first form is referred to as an expression statement. The last two are both called declaration statements. In Rust, no value is represented as ()
(the “unit” type).
2.4.7 Match: Type-aware pattern matching
While it’s possible to use if
/else
blocks in Rust, match
provides a safer alternative. match
warns you if you haven’t considered a relevant alternative. It is also elegant and concise:
match item { 0 => {}, ① 10 ..= 20 => {}, ② 40 | 80 => {}, ③ _ => {}, ④ }
① To match a single value, provide the value. No operator is required.
② The ..= syntax matches an inclusive range.
③ The vertical bar (|) matches values on either side of it.
④ The underscore (_) matches every value.
match
offers a sophisticated and concise syntax for testing multiple possible values. Some examples are
- Inclusive ranges (
10 ..= 20
) to match any value within the range. - A Boolean OR (
|
) will match when either side matches. - The underscore (
_
) to match everything.
match
is analogous to the switch
keyword in other languages. Unlike C’s switch
, however, match
guarantees that all possible options for a type are explicitly handled. Failing to provide a branch for every possible value triggers a compiler error. Additionally, a match does not “fall through” to the next option by default. Instead, match
returns immediately when a match is found.
Listing 2.8 demonstrates a larger example of match
. The source code for this listing is in ch2/ch2-match-needles.rs. The code prints these two lines to the screen:
42: hit! 132: hit!
Listing 2.8 Using match
to match multiple values
fn main() { let needle = 42; ① let haystack = [1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862]; for item in &haystack { let result = match item { ② 42 | 132 => "hit!", ③ _ => "miss", ④ }; if result == "hit!" { println!("{}: {}", item, result); } } }
① The variable needle is now redundant.
② This match expression returns a value that can be bound to a variable.
③ Success! 42 | 132 matches both 42 and 132.
④ A wildcard pattern that matches everything
The match
keyword plays an important role within the Rust language. Many control structures (like looping) are defined in terms of match
under the hood. These really shine when combined with the Option
type that’s discussed in depth in the next chapter.
Now that we have taken a good look at defining numbers and working with some of Rust’s flow control mechanisms, let’s move on to adding structure to programs with functions.
2.5 Defining functions
Looking back to where the chapter begins, the snippet in listing 2.2 contained a small function, add()
. add
takes two i32
values and returns their sum. The following listing repeats the function.
Listing 2.9 Defining a function (extract of listing 2.2)
10 fn add(i: i32, j: i32) -> i32 { ① 11 i + j 12 }
① add() takes two integer parameters and returns an integer. The two arguments are bound to the local variables i and j.
For the moment, let’s concentrate on the syntax of each of the elements in listing 2.9. Figure 2.2 provides a visual picture of each of the pieces. Anyone who has programmed in a strongly-typed programming language should be able to squint their way through the diagram.
Figure 2.2 Rust’s function definition syntax
Rust’s functions require that you specify your parameter’s types and the function’s return type. This is the foundational knowledge that we’ll need for the majority of our work with Rust. Let’s put it to use with our first non-trivial program.
2.6 Using references
If you have only used a dynamic programming language so far in your career, the syntax and semantics of references can be frustrating. It can be difficult to form a mental picture of what is happening. That makes it difficult to understand which symbols to put where. Thankfully, the Rust compiler is a good coach.
A reference is a value that stands in place for another value. For example, imagine that variable a
is a large array that is costly to duplicate. In some sense, a reference r
is a cheap copy of a
. But instead of creating a duplicate, the program stores a
’s address in memory. When the data from a
is required, r
can be dereferenced to make a
available. The following listing shows the code for this.
Listing 2.10 Creating a reference to a large array
fn main() { let a = 42; let r = &a; ① let b = a + *r; ② println!("a + a = {}", b); ③ }
② Adds a to a (via dereferencing r) and assigns it to b
References are created with the reference operator (&
) and dereferencing occurs with the dereference operator (*
). These operators act as unary operators, meaning that these only take one operand. One of the limitations of source code written in ASCII text is that multiplication and dereferencing use the same symbol. Let’s see these in use as part of a larger example.
Listing 2.11 searches for a number (the needle
defined on line 2) within an array of numbers (the haystack
defined on line 3). The code then prints 42
to the console when compiled. The code for this listing is in ch2/ch2-needle-in-haystack.rs.
Listing 2.11 Searching for an integer in an array of integers
1 fn main() { 2 let needle = 0o204; 3 let haystack = [1, 1, 2, 5, 15, 52, 203, 877, 4140, 21147]; 4 5 for item in &haystack { ① 6 if *item == needle { ② 7 println!("{}", item); 8 } 9 } 10 }
① Iterates over references to elements within haystack
② The syntax *item returns the item’s referent.
Each iteration changes the value of item
to refer to the next item within haystack
. On 2.7 the first iteration, *item
returns 1
, and on the last, it returns 21147
.
2.7 Project: Rendering the Mandelbrot set
So far, we haven’t learned much Rust, but we already have the tools to create some interesting pictures of fractals. So let’s do that now with listing 2.12. To begin
- In a terminal window, execute the following commands to create a project that can render the Mandelbrot set:
cd $TMP
(orcd %TMP%
on MS Windows) to move to a directory that’s not critical.cargo new mandelbrot --vcs none
creates a new blank project.cd mandelbrot
moves into the new project root.cargo add num
to edit Cargo.toml, adding the num crate as a dependency (see the sidebar entitled “2.2” in section 2.3.4 for instructions to enable this cargo feature).
- Replace
src/main.rs
with the code in listing 2.12, which you’ll also find in ch2/ch2-mandelbrot/src/main.rs. - Execute
cargo run
. You should see the Mandelbrot set rendered in the terminal:
Listing 2.12 Rendering the Mandelbrot set
1 use num::complex::Complex; ① 2 3 fn calculate_mandelbrot( ② 4 5 max_iters: usize, ③ 6 x_min: f64, ④ 7 x_max: f64, ④ 8 y_min: f64, ④ 9 y_max: f64, ④ 10 width: usize, ⑤ 11 height: usize, ⑤ 12 ) -> Vec<Vec<usize>> { 13 14 let mut rows: Vec<_> = Vec::with_capacity(width); ⑥ 15 for img_y in 0..height { ⑦ 16 17 let mut row: Vec<usize> = Vec::with_capacity(height); 18 for img_x in 0..width { 19 20 let x_percent = (img_x as f64 / width as f64); 21 let y_percent = (img_y as f64 / height as f64); 22 let cx = x_min + (x_max - x_min) * x_percent; ⑧ 23 let cy = y_min + (y_max - y_min) * y_percent; ⑧ 24 let escaped_at = mandelbrot_at_point(cx, cy, max_iters); 25 row.push(escaped_at); 26 } 27 28 all_rows.push(row); 29 } 30 rows 31 } 32 33 fn mandelbrot_at_point( ⑨ 34 cx: f64, 35 cy: f64, 36 max_iters: usize, 37 ) -> usize { 38 let mut z = Complex { re: 0.0, im: 0.0 }; ⑩ 39 let c = Complex::new(cx, cy); ⑪ 40 41 for i in 0..=max_iters { 42 if z.norm() > 2.0 { ⑫ 43 return i; 44 } 45 z = z * z + c; ⑬ 46 } 47 max_iters ⑭ 48 } 49 50 fn render_mandelbrot(escape_vals: Vec<Vec<usize>>) { 51 for row in escape_vals { 52 let mut line = String::with_capacity(row.len()); 53 for column in row { 54 let val = match column { 55 0..=2 => ' ', 56 2..=5 => '.', 57 5..=10 => '•', 58 11..=30 => '*', 59 30..=100 => '+', 60 100..=200 => 'x', 61 200..=400 => '$', 62 400..=700 => '#', 63 _ => '%', 64 }; 65 66 line.push(val); 67 } 68 println!("{}", line); 69 } 70 } 71 72 fn main() { 73 let mandelbrot = calculate_mandelbrot(1000, 2.0, 1.0, -1.0, 74 1.0, 100, 24); 75 76 render_mandelbrot(mandelbrot); 77 }
① Imports the Complex number type from num crate and its complex submodule
② Converts between the output space (a grid of rows and columns) and a range that surrounds the Mandelbrot set (a continuous region near (0,0))
③ If a value has not escaped before reaching the maximum number of iterations, it’s considered to be within the Mandelbrot set.
④ Parameters that specify the space we’re searching for to look for members of the set
⑤ Parameters that represent the size of the output in pixels
⑥ Creates a container to house the data from each row
⑦ Iterates row by row, allowing us to print the output line by line
⑧ Calculates the proportion of the space covered in our output and converts that to points within the search space
⑨ Called at every pixel (e.g., every row and column that’s printed to stdout)
⑩ Initializes a complex number at the origin with real (re) and imaginary (im) parts at 0.0
⑪ Initializes a complex number from the coordinates provided as function arguments
⑫ Checks the escape condition and calculates the distance from the origin (0, 0), an absolute value of a complex number
⑬ Repeatedly mutates z to check whether c lies within the Mandelbrot set
⑭ As i is no longer in scope, we fall back to max_iters.
So far in this section, we’ve put the basics of Rust into practice. Let’s continue our exploration by learning how to define functions and types.
2.8 Advanced function definitions
Rust’s functions can get somewhat scarier than the add(i: i32, j: i32) -> i32
from listing 2.2. To assist those who are reading more Rust source code than writing it, the following sections provide some extra content.
2.8.1 Explicit lifetime annotations
As a bit of forewarning, allow me to introduce some more complicated notation. As you read through Rust code, you might encounter definitions that are hard to decipher because those look like hieroglyphs from an ancient civilizations. Listing 2.13 provides an extract from listing 2.14 that shows one such example.
Listing 2.13 A function signature with explicit lifetime annotations
1 fn add_with_lifetimes<'a, 'b>(i: &'a i32, j: &'b i32) -> i32 { 2 *i + *j 3 }
Like all unfamiliar syntax, it can be difficult to know what’s happening at first. This improves with time. Let’s start by explaining what is happening, and then go on to discuss why it is happening. The following bullet points break line 1 of the previous snippet into its parts:
fn add_with_lifetimes(...) -> i32
should be familiar to you already. From this we can infer thatadd_with_lifetimes()
is a function that returns ani32
value.<'a, 'b>
declares two lifetime variables,'a
and'b
, within the scope ofadd_with_lifetimes()
. These are normally spoken as lifetime a and lifetime b.i: &'a i32
binds lifetime variable'a
to the lifetime ofi
. The syntax reads as “parameteri
is a reference to ani32
with lifetimea
.”j: &'b i32
binds the lifetime variable'b
to the lifetime ofj
. The syntax reads as “parameterj
is a reference to ani32
with lifetimeb
.”
The significance of binding a lifetime variable to a value probably isn’t obvious. Underpinning Rust’s safety checks is a lifetime system that verifies that all attempts to access data are valid. Lifetime annotations allow programmers to declare their intent. All values bound to a given lifetime must live as long as the last access to any value bound to that lifetime.
The lifetime system usually works unaided. Although every parameter has a lifetime, these checks are typically invisible as the compiler can infer most lifetimes by itself.6 But the compiler needs assistance in difficult cases. Functions that accept multiple references as arguments or return a reference are often when the compiler will request assistance via an error message.
No lifetime annotations are required when calling a function. When used in a complete example as in the next listing, you can see lifetime annotations at the function definition (line 1), but not when it’s used (line 8). The source code for the listing is in ch2-add-with-lifetimes.rs.
Listing 2.14 Type signature of a function with lifetime explicit annotations
1 fn add_with_lifetimes<'a, 'b>(i: &'a i32, j: &'b i32) -> i32 { 2 *i + *j ① 3 } 4 5 fn main() { 6 let a = 10; 7 let b = 20; 8 let res = add_with_lifetimes(&a, &b); ② 9 10 println!("{}", res); 11 }
① Adds the values referred to by i and j rather than adding the references directly
② &10 and &20 mean reference 10 and 20, respectively. No lifetime notation is required when calling a function.
On line 2, *i + *j
adds together the referent values held by the i
and j
variables. It’s common to see lifetime parameters when using references. While Rust can infer lifetimes in other cases, references require the programmer to specify the intent. Using two lifetime parameters (a
and b
) indicates that the lifetimes of i
and j
are decoupled.
NOTE Lifetime parameters are a way of providing control to the programmer while maintaining high-level code.
2.8.2 Generic functions
Another special case of function syntax appears when programmers write Rust functions to handle many possible input types. So far, we have seen functions that accept 32-bit integers (i32
). The following listing shows a function signature that can be called by many input types as long as these are all the same.
Listing 2.15 Type signature of a generic function
fn add<T>(i: T, j: T) -> T { ① i + j }
① The type variable T is introduced with angle brackets (<T>). This function takes two arguments of the same type and returns a value of that type.
Capital letters in place of a type indicate a generic type. Conventionally, the variables T
, U
, and V
are used as placeholder values, but this is arbitrary. E
is often used to denote an error type. We’ll look at error handling in detail in chapter 3.
Generics enable significant code reuse and can greatly increase the usability of a strongly-typed language. Unfortunately, listing 2.15 doesn’t compile as is. The Rust compiler complains that it cannot add two values of any type T
together. The following shows the output produced when attempting to compile listing 2.15:
error[E0369]: cannot add `T` to `T` --> add.rs:2:5 | 2 | i + j | - ^ - T | | | T | help: consider restricting type parameter `T` | 1 | fn add<T: std::ops::Add<Output = T>>(i: T, j: T) -> T { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ error: aborting due to previous error For more information about this error, try `rustc --explain E0369`.
This issue arises because T
really means any type at all, even types where addition is not supported. Figure 2.3 provides a visual representation of the problem. Listing 2.15 attempts to refer to the outer ring, whereas addition is only supported by types within the inner ring.
Figure 2.3 Only a subset of types have implement operators. When creating generic functions that include such an operator, that operation’s trait must be included as a trait bound.
How do we specify that type T
must implement addition? Answering this requires introducing some new terminology.
All of Rust’s operators, including addition, are defined within traits. To require that type T
must support addition, we include a trait bound alongside the type variable in the function’s definition. The following listing gives an example of this syntax.
Listing 2.16 Type signature of a generic function with trait bounds
fn add<T: std::ops::Add<Output = T>>(i: T, j: T) -> T { i + j }
The fragment <T: std::ops::Add<Output = T>>
says that T
must implement std::ops::Add
. Using a single type variable T
with the trait bound ensures that arguments i
and j
, as well as the result type, are the same type and that their type supports addition.
What is a trait? A trait is a language feature that is analogous to an interface, protocol, or contract. If you have a background in object-oriented programming, consider a trait to be an abstract base class. If you have a background in functional programming, Rust’s traits are close to Haskell’s type classes. For now, it’s enough to say that traits enable types to advertise that they are using common behavior.
All of Rust’s operations are defined with traits. For example, the addition operator (+
) is defined as the std::ops::Add
trait. Traits are properly introduced in chapter 3 and are progressively explained in depth during the course of the book.
To reiterate: all of Rust’s operators are syntactic sugar for a trait’s methods. Rust supports operator overloading this way. During the compilation process, a + b
is converted to a.add(b)
.
Listing 2.17 is a full example that demonstrates that generic functions can be called by multiple types. The listing prints these three lines to the console:
4.6 30 15s
Listing 2.17 A generic function with a type variable and trait bounds
1 use std::ops::{Add}; ① 2 use std::time::{Duration}; ② 3 4 fn add<T: Add<Output = T>>(i: T, j: T) -> T { ③ 5 i + j 6 } 7 8 fn main() { 9 let floats = add(1.2, 3.4); ④ 10 let ints = add(10, 20); ⑤ 11 let durations = add( ⑥ 12 Duration::new(5, 0), ⑥ 13 Duration::new(10, 0) ⑥ 14 ); 15 16 println!("{}", floats); 17 println!("{}", ints); 18 println!("{:?}", durations); ⑦ 19 20 }
① Brings the Add trait from std::ops into local scope
② Brings the Duration type from std::time into local scope
③ The arguments to add() can accept any type that implements std::ops::Add.
④ Calls add() with floating-point values
⑤ Calls add() with integer values
⑥ Calls add() with Duration values, representing a duration between two points in time
⑦ Because std::time::Duration does not implement the std::fmt::Display trait, we can fall back to requesting std::fmt::Debug.
As you can see, function signatures can become somewhat convoluted. Interpreting these can take some patience. Hopefully, you now have the tools to break the pieces apart in case you get stuck down the track. Here are a few principles that should assist you when reading Rust code:
- Terms in lowercase (
i
,j
) denote variables. - Single uppercase letters (
T
) denote generic type variables. - Terms beginning with uppercase (
Add
) are either traits or concrete types, such asString
orDuration
. - Labels (
'a
) denote lifetime parameters.
2.9 Creating grep-lite
We’ve spent most of the chapter discussing numbers. It’s time for another practical example. We’ll use it to learn a little bit about how Rust handles text.
Listing 2.18 is our first iteration of grep-lite. The code for this program is in the ch2-str-simple-pattern.rs file. Its hard-coded parameters restrict flexibility somewhat, but these are useful illustrations of string literals. The code prints a line to the console:
dark square is a picture feverishly turned--in search of what?
Listing 2.18 Searching for a simple pattern within lines of a string
1 fn main() { 2 let search_term = "picture"; 3 let quote = "\ 4 Every face, every shop, bedroom window, public-house, and 5 dark square is a picture feverishly turned--in search of what? 6 It is the same with books. 7 What do we seek through millions of pages?"; ① 8 9 for line in quote.lines() { ② 10 if line.contains(search_term) { 11 println!("{}", line); 12 } 13 } 14 }
① Multilined strings do not require special syntax. The \ character on line 3 escapes the new line.
② lines() returns an iterator over quote where each iteration is a line of text. Rust uses each operating system’s conventions on what constitutes a new line.
As you can see, Rust’s strings can do quite a lot by themselves. Some features of listing 2.18 that are worth highlighting include the following. From here, we’ll expand the functionality of our proto-application:
- Line 9 (
quote.lines()
) demonstrates iterating line-by-line in a platform-independent manner. - Line 10 (
line.contains()
) demonstrates searching for text using the method syntax.
Navigating Rust’s rich collection of string types
Strings are complicated for newcomers to Rust. Implementation details tend to bubble up from below and make comprehension difficult. How computers represent text is complicated, and Rust chooses to expose some of that complexity. This enables programmers to have full control but does place a burden on those learning the language.
String
and &str
both represent text, yet are distinct types. Interacting with values from both types can be an annoying exercise at first as different methods are required to perform similar actions. Prepare yourself for irritating type errors as your intuition develops. Until that intuition develops, however, you will usually have fewer issues if you convert your data to the String
type.
A String
is (probably) closest to what you know as a string type from other languages. It supports familiar operations such as concatenation (joining two strings together), appending new text onto an existing string, and trimming whitespace.
str
is a high-performance, relatively feature-poor type. Once created, str
values cannot expand or shrink. In this sense, these are similar to interacting with a raw memory array. Unlike a raw memory array, though, str
values are guaranteed to be valid UTF-8 characters.
str
is usually seen in this form: &str
. A &str
(pronounced string slice) is a small type that contains a reference to str
data and a length. Attempting to assign a variable to type str
will fail. The Rust compiler wants to create fixed-sized variables within a function’s stack frame. As str
values can be of arbitrary length, these can only be stored as local variables by reference.
For those readers that have prior experience with systems programming, String
uses dynamic memory allocation to store the text that it represents. Creating &str
values avoids a memory allocation.
String
is an owned type. Ownership has a particular meaning within Rust. An owner is able to make any changes to the data and is responsible for deleting values that it owns when it leaves scope (this is fully explained in chapter 3). A &str
is a borrowed type. In practical terms, this means that &str
can be thought of as read-only data, whereas String
is read-write.
String literals (e.g., "Rust in Action"
) have the type &str
. The full type signature including the lifetime parameter is &'static str
. The 'static
lifetime is somewhat special. It too owes its name to implementation details. Executable programs can contain a section of memory that is hard-coded with values. That section is known as static memory because it is read-only during execution.
Some other types may be encountered in your travels. Here’s a short list:a
char
—A single character encoded as 4 bytes. The internal representation ofchar
is equivalent to UCS-4/UTF-32. This differs from&str
andString
, which encodes single characters as UTF-8. Conversion does impose a penalty, but it means thatchar
values are of fixed-width and are, therefore, easier for the compiler to reason about. Characters encoded as UTF-8 can span 1 to 4 bytes.[u8]
—A slice of raw bytes, usually found when dealing with streams of binary data.Vec<u8>
—A vector of raw bytes, usually created when consuming[u8]
data.String
is toVec<u8>
asstr
is to[u8]
.std::ffi::OSString
—A platform-native string. It’s behavior is close toString
but without a guarantee that it’s encoded as UTF-8 and that it won’t contain the zero byte (0x00
).std::path::Path
—A string-like type that is dedicated to handling filesystem paths.
Fully understanding the distinction between String
and &str
requires knowledge of arrays and vectors. Textual data is similar to these two types with added convenience methods applied over the top.
a Unfortunately, this is not an exhaustive list. Specific use cases sometimes require special handling.
Let’s start adding functionality to grep-lite by printing the line number along with the match. This is equivalent to the -n
option within the POSIX.1-2008 standard for the grep utility (http://mng.bz/ZPdZ).
Adding a few lines to our previous example, we now see the following line printed to the screen. Listing 2.19 shows the code that adds this functionality, which you’ll find in ch2/ch2-simple-with-linenums.rs:
2: dark square is a picture feverishly turned--in search of what?
Listing 2.19 Manually incrementing an index variable
1 fn main() { 2 let search_term = "picture"; 3 let quote = "\ ① 4 Every face, every shop, bedroom window, public-house, and 5 dark square is a picture feverishly turned--in search of what? 6 It is the same with books. What do we seek through millions of pages?"; 7 let mut line_num: usize = 1; ② 8 9 for line in quote.lines() { 10 if line.contains(search_term) { 11 println!("{}: {}", line_num, line); ③ 12 } 13 line_num += 1; ④ 14 } 15 }
① A backslash escapes the newline character in the string literal.
② Declares line_num as mutable via let mut and initializes it with 1
③ Updates the println! macro to allow for both values to be printed
④ Increments line_num in place
Listing 2.20 shows a more ergonomic approach to incrementing i
. The output is the same, but here the code makes use of the enumerate()
method and method chaining. enumerate()
takes an iterator I
, returning another (N, I)
, where N
is a number that starts at 0 and increments by 1 each iteration. The source code for this listing can be found in ch2/ch2-simple-with-enumerate.rs.
Listing 2.20 Automatically incrementing an index variable
1 fn main() { 2 let search_term = "picture"; 3 let quote = "\ 4 Every face, every shop, bedroom window, public-house, and 5 dark square is a picture feverishly turned--in search of what? 6 It is the same with books. What do we seek through millions of pages?"; 7 8 for (i, line) in quote.lines().enumerate() { ① 9 if line.contains(search_term) { 10 let line_num = i + 1; ② 11 println!("{}: {}", line_num, line); 12 } 13 } 14 }
① Because lines() returns an iterator, it can be chained with enumerate().
② Performs addition to calculate the line number, avoiding calculations at every step
Another feature of grep that is extremely useful is to print some context before and after the line that matches. In the GNU grep implementation, this is the -C NUM
switch. To add support for that feature in grep-lite, we need to be able to create lists.
2.10 Making lists of things with arrays, slices, and vectors
Lists of things are incredibly common. The two types that you will work with most often are arrays and vectors. Arrays are fixed-width and extremely lightweight. Vectors are growable but incur a small runtime penalty because of the extra bookkeeping that these do. To understand the underlying mechanisms with text data in Rust, it helps to have a cursory understanding of what is happening.
The goal of this section is to support printing out n lines of context that surround a match. To get there, we need to segue somewhat and explain more fully arrays, slices, and vectors. The most useful type for this exercise is the vector. To learn about vectors, though, we need to start by learning about its two simpler cousins: arrays and slices.
2.10.1 Arrays
An array, at least as far as Rust is concerned, is a tightly-packed collection of the same thing. It’s possible to replace items within an array, but its size cannot change. Because variable-length types like String
add a degree of complication, we’ll revert back to discussing numbers for a little while.
Creating arrays takes two forms. We can provide a comma-delimited list within square brackets (for example, [1, 2, 3]
) or a repeat expression, where you furnish two values delimited by a semicolon (for example, [0; 100]
). The value on the left (0
) is repeated by the number of times on the right (100
). Listing 2.21 shows each variation on lines 2–5. The source code for this listing is in the ch2-3arrays.rs file. It prints these four lines to the console:
[1, 2, 3]: 1 + 10 = 11 2 + 10 = 12 3 + 10 = 13 (Σ[1, 2, 3] = 6) [1, 2, 3]: 1 + 10 = 11 2 + 10 = 12 3 + 10 = 13 (Σ[1, 2, 3] = 6) [0, 0, 0]: 0 + 10 = 10 0 + 10 = 10 0 + 10 = 10 (Σ[0, 0, 0] = 0) [0, 0, 0]: 0 + 10 = 10 0 + 10 = 10 0 + 10 = 10 (Σ[0, 0, 0] = 0)
Listing 2.21 Defining arrays and iterating over their elements
fn main() { let one = [1, 2, 3]; let two: [u8; 3] = [1, 2, 3]; let blank1 = [0; 3]; let blank2: [u8; 3] = [0; 3]; let arrays = [one, two, blank1, blank2]; for a in &arrays { print!("{:?}: ", a); for n in a.iter() { print!("\t{} + 10 = {}", n, n+10); } let mut sum = 0; for i in 0..a.len() { sum += a[i]; } println!("\t({:?} = {})", a, sum); } }
Arrays are a simple data structure from the machine’s point of view. These are a contiguous block of memory with elements of a uniform type. The simplicity is still somewhat deceptive. Arrays can cause a few learning difficulties for newcomers:
- The notation can be confusing.
[T;
n]
describes an array’s type, whereT
is the elements’ type and n is a non-negative integer.[f32; 12]
denotes an array of 12 32-bit floating-point numbers. It’s easy to get confused with slices[T]
, which do not have a compile-time length. [u8; 3]
is a different type than[u8; 4]
. The size of the array matters to the type system.- In practice, most interaction with arrays occurs via another type called a slice (
[T]
). The slice is itself interacted with by reference (&[T]
). And to add some linguistic confusion into the mix, both slices and references to slices are called slices.
Rust maintains its focus on safety. Array indexing is bounds checked. Requesting an item that’s out of bounds crashes (panics in Rust terminology) the program rather than returning erroneous data.
2.10.2 Slices
Slices are dynamically sized array-like objects. The term dynamically sized means that their size is not known at compile time. Yet, like arrays, these don’t expand or contract. The use of the word dynamic in dynamically sized is closer in meaning to dynamic typing rather than movement. The lack of compile-time knowledge explains the distinction in the type signature between an array ([T;
n ]
) and a slice ([T]
).
Slices are important because it’s easier to implement traits for slices than arrays. Traits are how Rust programmers add methods to objects. As [T; 1]
, [T; 2]
, …, [T;
n ]
are all different types, implementing traits for arrays can become unwieldy. Creating a slice from an array is easy and cheap because it doesn’t need to be tied to any specific size.
Another important use for slices is their ability to act as a view on arrays (and other slices). The term view here is taken from database technology and means that slices can gain fast, read-only access to data without needing to copy anything around.
The problem with slices is that Rust wants to know the size of every object in your program, and slices are defined as not having a compile-time size. References to the rescue. As mentioned in the discussion about the use of the term dynamically sized, slice size is fixed in memory. These are made up of two usize
components (a pointer and a length). That’s why you typically see slices referred to in their referenced form, &[T]
(like string slices that take the notation &str
).
NOTE Don’t worry too much about the distinctions between arrays and slices yet. In practice, it’s not material. Each term is an artifact of implementation details. Those implementation details are important when dealing with performance-critical code but not when learning the basics of the language.
2.10.3 Vectors
Vectors (Vec<T>
) are growable lists of T
. Using vectors is extremely common in Rust code. These incur a small runtime penalty compared to arrays because of the extra bookkeeping that must be done to enable their size to change over time. But vectors almost always make up for this with their added flexibility.
The task at hand is to expand the feature set of the grep-lite utility. Specifically, we want the ability to store n lines of context around a match. Naturally, there are many ways to implement such a feature.
To minimize code complexity, we’ll use a two-pass strategy. In the first pass, we’ll tag lines that match. During the second pass, we’ll collect lines that are within n lines of each of the tags.
The code in listing 2.22 (available at ch2/ch2-introducing-vec.rs) is the longest you’ve seen so far. Take your time to digest it.
The most confusing syntax in the listing is probably Vec<Vec<(usize, String)>>
, which appears on line 15. Vec<Vec<(usize, String)>>
is a vector of vectors (e.g., Vec<Vec<T>>
), where T
is a pair of values of type (usize, String)
. (usize, String)
is a tuple that we’ll use to store line numbers along with the text that’s a near match. When the needle
variable on line 3 is set to "oo"
, the following text is printed to the console:
1: Every face, every shop, 2: bedroom window, public-house, and 3: dark square is a picture 4: feverishly turned--in search of what? 3: dark square is a picture 4: feverishly turned--in search of what? 5: It is the same with books. 6: What do we seek 7: through millions of pages?
Listing 2.22 Enabling context lines to be printed out with a Vec<Vec<T>>
1 fn main() { 2 let ctx_lines = 2; 3 let needle = "oo"; 4 let haystack = "\ 5 Every face, every shop, 6 bedroom window, public-house, and 7 dark square is a picture 8 feverishly turned--in search of what? 9 It is the same with books. 10 What do we seek 11 through millions of pages?"; 12 13 let mut tags: Vec<usize> = vec![]; ① 14 let mut ctx: Vec<Vec<( 15 usize, String)>> = vec![]; ② 16 17 for (i, line) in haystack.lines().enumerate() { ③ 18 if line.contains(needle) { 19 tags.push(i); 20 21 let v = Vec::with_capacity(2*ctx_lines + 1); ④ 22 ctx.push(v); 23 } 24 } 25 26 if tags.is_empty() { ⑤ 27 return; 28 } 29 30 for (i, line) in haystack.lines().enumerate() { ⑥ 31 for (j, tag) in tags.iter().enumerate() { 32 let lower_bound = 33 tag.saturating_sub(ctx_lines); ⑦ 34 let upper_bound = 35 tag + ctx_lines; 36 37 if (i >= lower_bound) && (i <= upper_bound) { 38 let line_as_string = String::from(line); ⑧ 39 let local_ctx = (i, line_as_string); 40 ctx[j].push(local_ctx); 41 } 42 } 43 } 44 45 for local_ctx in ctx.iter() { 46 for &(i, ref line) in local_ctx.iter() { ⑨ 47 let line_num = i + 1; 48 println!("{}: {}", line_num, line); 49 } 50 } 51 }
① tags holds line numbers where matches occur.
② ctx contains a vector per match to hold the context lines.
③ Iterates through the lines, recording line numbers where matches are encountered
④ Vec::with_capacity(n) reserves space for n items. No explicit type signature is required as it can be inferred via the definition of ctx on line 15.
⑤ When there are no matches, exits early
⑥ For each tag, at every line, checks to see if we are near a match. When we are, adds that line to the relevant Vec<T> within ctx.
⑦ saturating_sub() is subtraction that returns 0 on integer underflow rather than crashing the program (CPUs don’t like attempting to send usize below zero).
⑧ Copies line into a new String and stores that locally for each match
⑨ ref line informs the compiler that we want to borrow this value rather than move it. These two terms are explained fully in later chapters.
Vec<T>
performs best when you can provide it with a size hint via Vec::with_ capacity()
. Providing an estimate minimizes the number of times memory will need to be allocated from the OS.
NOTE When considering this approach in real text files, encodings can cause issues. String
is guaranteed to be UTF-8. Naively reading in a text file to a String
causes errors if invalid bytes are detected. A more robust approach is to read in data as [u8]
(a slice of u8
values), then decode those bytes with help from your domain knowledge.
2.11 Including third-party code
Incorporating third-party code is essential to productive Rust programming. Rust’s standard library tends to lack many things that other languages provide, like random number generators and regular expression support. That means it’s common to incorporate third-party crates into your project. To get your feet wet, let’s start with the regex crate.
Crates are the name the Rust community uses where others use terms such as package, distribution, or library. The regex crate provides the ability to match regular expressions rather than simply looking for exact matches.
To use third-party code, we’ll rely on the cargo command-line tool. Follow these instructions:
- Open a command prompt.
- Move to a scratch directory with
cd /tmp
(cd %TMP%
on MS Windows). - Run
cargo new grep-lite --vcs none
. It produces a short confirmation message:Created binary (application) `grep-lite` package - Run
cd grep-lite
to move into the project directory. - Execute
cargo add regex@1
to add version 1 of the regex crate as a dependency. This alters the file /tmp/grep-lite/Cargo.toml. Ifcargo add
is unavailable for you, see the sidebar, “2.2,” in section 2.3.4. - Run
cargo build
. You should see output fairly similar to the following begin to appear: Updating crates.io index Downloaded regex v1.3.6 Compiling lazy_static v1.4.0 Compiling regex-syntax v0.6.17 Compiling thread_local v1.0.1 Compiling aho-corasick v0.7.10 Compiling regex v1.3.6 Compiling grep-lite v0.1.0 (/tmp/grep-lite) Finished dev [unoptimized + debuginfo] target(s) in 4.47s
Now that you have the crate installed and compiled, let’s put it into action. First, we’ll support searching for exact matches in listing 2.23. Later, in listing 2.26, the project grows to support regular expressions.
2.11.1 Adding support for regular expressions
Regular expressions add great flexibility to the patterns that we are able to search for. The following listing is a copy of an early example that we’ll modify.
Listing 2.23 Matching on exact strings with the contains()
method
fn main() { let search_term = "picture"; let quote = "Every face, every shop, bedroom window, public-house, and dark square is a picture feverishly turned--in search of what? It is the same with books. What do we seek through millions of pages?"; for line in quote.lines() { if line.contains(search_term) { ① println!("{}", line); } } }
① Implements a contains() method that searches for a substring
Make sure that you have updated grep-lite/Cargo.toml to include regex
as a dependency as described in the previous section. Now, open grep-lite/src/main.rs in a text editor and fill it in with the code in the following listing. The source code for this listing is available in ch2/ch2-with-regex.rs.
Listing 2.24 Searching for patterns with regular expressions
use regex::Regex; ① fn main() { let re = Regex::new("picture").unwrap(); ② let quote = "Every face, every shop, bedroom window, public-house, and dark square is a picture feverishly turned--in search of what? It is the same with books. What do we seek through millions of pages?"; for line in quote.lines() { let contains_substring = re.find(line); match contains_substring { ③ Some(_) => println!("{}", line), ④ None => (), ⑤ } } }
① Brings the Regex type from the regex crate into local scope
② unwrap() unwraps a Result, crashing if an error occurs. Handling errors more robustly is discussed in depth later in the book.
③ Replaces the contains() method from listing 2.23 with a match block that requires that we handle all possible cases
④ Some(T) is the positive case of an Option, meaning that re.find() was successful: it matches all values.
⑤ None is the negative case of an Option; () can be thought of as a null placeholder value here.
Open a command prompt and move to the root directory of your grep-lite project. Executing cargo run
should produce output similar to the following text:
$ cargo run Compiling grep-lite v0.1.0 (file:/ / /tmp/grep-lite) Finished dev [unoptimized + debuginfo] target(s) in 0.48s Running `target/debug/grep-lite` dark square is a picture feverishly turned--in search of what?
Admittedly, the code within listing 2.24 hasn’t taken significant advantage of its newfound regular expression capabilities. Hopefully, you’ll have the confidence to be able to slot those into some of the more complex examples.
2.11.2 Generating the third-party crate documentation locally
Documentation for third-party crates is typically available online. Still, it can be useful to know how to generate a local copy in case the internet fails you:
- Move to the root of the project directory in a terminal: /tmp/grep-lite or %TMP%\grep-lite
- Execute
cargo doc
. It will inform you of its progress in the console:$ cargo doc Checking lazy_static v1.4.0 Documenting lazy_static v1.4.0 Checking regex-syntax v0.6.17 Documenting regex-syntax v0.6.17 Checking memchr v2.3.3 Documenting memchr v2.3.3 Checking thread_local v1.0.1 Checking aho-corasick v0.7.10 Documenting thread_local v1.0.1 Documenting aho-corasick v0.7.10 Checking regex v1.3.6 Documenting regex v1.3.6 Documenting grep-lite v0.1.0 (file:/ / /tmp/grep-lite) Finished dev [unoptimized + debuginfo] target(s) in 3.43s
Congratulations. You have now created HTML documentation. By opening /tmp/grep-lite/target/doc/grep_lite/index.html in a web browser (also try cargo doc --open
from the command line), you’ll be able to view the documentation for all the crates that yours depend on. It’s also possible to inspect the output directory to take a look at what is available to you:
$ tree -d -L 1 target/doc/ target/doc/ ├── aho_corasick ├── grep_lite ├── implementors ├── memchr ├── regex ├── regex_syntax ├── src └── thread_local
2.11.3 Managing Rust toolchains with rustup
rustup is another handy command-line tool, along with cargo. Where cargo manages projects, rustup manages your Rust installation(s). rustup cares about Rust toolchains and enables you to move between versions of the compiler. This means it’s possible to compile your projects for multiple platforms and experiment with nightly features of the compiler while keeping the stable version nearby.
rustup also simplifies accessing Rust’s documentation. Typing rustup doc
opens your web browser to a local copy of Rust’s standard library.
2.12 Supporting command-line arguments
Our program is rapidly increasing its feature count. Yet, there is no way for any options to be specified. To become an actual utility, grep-lite needs to be able to interact with the world.
Sadly, though, Rust has a fairly tight standard library. As with regular expressions, another area with relatively minimalist support is handling command-line arguments. A nicer API is available through a third-party crate called clap (among others).
Now that we’ve seen how to bring in third-party code, let’s take advantage of that to enable users of grep-lite to choose their own pattern. (We’ll get to choosing their own input source in the next section.) First, add clap as a dependency in your Cargo.toml:
$ cargo add clap@2 Updating 'https:/ /github.com/rust-lang/crates.io-index' index Adding clap v2 to dependencies
You can confirm that the crate has been added to your project by inspecting its Cargo.toml file.
Listing 2.25 Adding a dependency to grep-lite/Cargo.toml
[package] name = "grep-lite" version = "0.1.0" authors = ["Tim McNamara <author@rustinaction.com>"] [dependencies] regex = "1" clap = "2"
Listing 2.26 Editing grep-lite/src/main.rs
1 use regex::Regex; 2 use clap::{App,Arg}; ① 3 4 fn main() { 5 let args = App::new("grep-lite") ② 6 .version("0.1") 7 .about("searches for patterns") 8 .arg(Arg::with_name("pattern") 9 .help("The pattern to search for") 10 .takes_value(true) 11 .required(true)) 12 .get_matches(); 13 14 let pattern = args.value_of("pattern").unwrap(); ③ 15 let re = Regex::new(pattern).unwrap(); 16 17 let quote = "Every face, every shop, bedroom window, public-house, and 18 dark square is a picture feverishly turned--in search of what? 19 It is the same with books. What do we seek through millions of pages?"; 20 21 for line in quote.lines() { 22 match re.find(line) { 23 Some(_) => println!("{}", line), 24 None => (), 25 } 26 } 27 }
① Brings clap::App and clap::Arg objects into local scope
② Incrementally builds a command argument parser, where each argument takes an Arg. In our case, we only need one.
③ Extracts the pattern argument
With your project updated, executing cargo run
should set off a few lines in your console:
$ cargo run Finished dev [unoptimized + debuginfo] target(s) in 2.21 secs Running `target/debug/grep-lite` error: The following required arguments were not provided: <pattern> USAGE: grep-lite <pattern> For more information try --help
The error is due to the fact that we haven’t passed sufficient arguments through to our resulting executable. To pass arguments through, cargo supports some special syntax. Any arguments appearing after --
are sent through to the resulting executable binary:
$ cargo run -- picture Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs Running `target/debug/grep-lite picture` dark square is a picture feverishly turned--in search of what?
But clap does more than provide parsing. It also generates usage documentation on your behalf. Running grep-lite --help
provides an expanded view:
$ ./target/debug/grep-lite --help grep-lite 0.1 searches for patterns USAGE: grep-lite <pattern> FLAGS: -h, --help Prints help information -V, --version Prints version information ARGS: <pattern> The pattern to search for
2.13 Reading from files
Searching for text wouldn’t be complete without being able to search within files. File I/O can be surprisingly finicky and so has been left until last.
Before adding this functionality to grep-lite, let’s take a look at a standalone example in listing 2.27. The code for this listing is in the ch2-read-file.rs file. The general pattern is to open a File
object, then wrap that in a BufReader
. BufReader
takes care of providing buffered I/O, which can reduce system calls to the OS if the hard disk is congested.
Listing 2.27 Reading a file manually line by line
1 use std::fs::File; 2 use std::io::BufReader; 3 use std::io::prelude::*; 4 5 fn main() { 6 let f = File::open("readme.md").unwrap(); ① 7 let mut reader = BufReader::new(f); 8 9 let mut line = String::new(); ② 10 11 loop { 12 let len = reader.read_line(&mut line) 13 .unwrap(); ③ 14 if len == 0 { 15 break 16 } 17 18 println!("{} ({} bytes long)", line, len); 19 20 line.truncate(0); ④ 21 } 22 }
① Creates a File object that requires a path argument and error handling if the file does not exist. This program crashes if a readme.md is not present.
② Reuses a single String object over the lifetime of the program
③ Because reading from disk can fail, we need to explicitly handle this. In our case, errors crash the program.
④ Shrinks the String back to length 0, preventing lines from persisting into the following ones
Manually looping through a file can be cumbersome, despite its usefulness in some cases. For the common case of iterating through lines, Rust provides a helper iterator as the following listing shows. The source code for this listing is in the file ch2/ch2-bufreader-lines.rs.
Listing 2.28 Reading a file line by line via BufReader::lines()
1 use std::fs::File; 2 use std::io::BufReader; 3 use std::io::prelude::*; 4 5 fn main() { 6 let f = File::open("readme.md").unwrap(); 7 let reader = BufReader::new(f); 8 9 for line_ in reader.lines() { ① 10 let line = line_.unwrap(); ② 11 println!("{} ({} bytes long)", line, line.len()); 12 } 13 }
① A subtle behavior change occurs here. BufReader::lines() removes the trailing newline character from each line.
② Unwraps the Result, but at the risk of crashing the program if an error occurs
We’re now in a position to add reading from a file into grep-lite’s feature list. The following listing creates a complete program that takes a regular expression pattern and an input file as arguments.
Listing 2.29 Reading lines from a file
1 use std::fs::File; 2 use std::io::BufReader; 3 use std::io::prelude::*; 4 use regex::Regex; 5 use clap::{App,Arg}; 6 7 fn main() { 8 let args = App::new("grep-lite") 9 .version("0.1") 10 .about("searches for patterns") 11 .arg(Arg::with_name("pattern") 12 .help("The pattern to search for") 13 .takes_value(true) 14 .required(true)) 15 .arg(Arg::with_name("input") 16 .help("File to search") 17 .takes_value(true) 18 .required(true)) 19 .get_matches(); 20 21 let pattern = args.value_of("pattern").unwrap(); 22 let re = Regex::new(pattern).unwrap(); 23 24 let input = args.value_of("input").unwrap(); 25 let f = File::open(input).unwrap(); 26 let reader = BufReader::new(f); 27 28 for line_ in reader.lines() { 29 let line = line_.unwrap(); 30 match re.find(&line) { ① 31 Some(_) => println!("{}", line), 32 None => (), 33 } 34 } 35 }
① line is a String, but re.find() takes an &str as an argument.
2.14 Reading from stdin
A command-line utility wouldn’t be complete if it wasn’t able to read from stdin. Unfortunately for those readers who skimmed over earlier parts of this chapter, some of the syntax on line 8 might look quite unfamiliar. In short, rather than duplicate code within main()
, we’ll use a generic function to abstract away the details of whether we are dealing with files or stdin:
Listing 2.30 Searching through a file or stdin
1 use std::fs::File; 2 use std::io; 3 use std::io::BufReader; 4 use std::io::prelude::*; 5 use regex::Regex; 6 use clap::{App,Arg}; 7 8 fn process_lines<T: BufRead + Sized>(reader: T, re: Regex) { 9 for line_ in reader.lines() { 10 let line = line_.unwrap(); 11 match re.find(&line) { ① 12 Some(_) => println!("{}", line), 13 None => (), 14 } 15 } 16 } 17 18 fn main() { 19 let args = App::new("grep-lite") 20 .version("0.1") 21 .about("searches for patterns") 22 .arg(Arg::with_name("pattern") 23 .help("The pattern to search for") 24 .takes_value(true) 25 .required(true)) 26 .arg(Arg::with_name("input") 27 .help("File to search") 28 .takes_value(true) 29 .required(false)) 30 .get_matches(); 31 32 let pattern = args.value_of("pattern").unwrap(); 33 let re = Regex::new(pattern).unwrap(); 34 35 let input = args.value_of("input").unwrap_or("-"); 36 37 if input == "-" { 38 let stdin = io::stdin(); 39 let reader = stdin.lock(); 40 process_lines(reader, re); 41 } else { 42 let f = File::open(input).unwrap(); 43 let reader = BufReader::new(f); 44 process_lines(reader, re); 45 } 46 }
① line is a String, but re.find() takes an &str as an argument.
Summary
- Rust has full support for primitive types, such as integers and floating-point numbers.
- Functions are strongly typed and require types to be specified for their parameters and return values.
- Rust features, such as iteration and mathematical operations, rely on traits. The
for
loop is a shorthand for thestd::iter::IntoIterator
trait, for example. - List-like types are tailored to specific use cases. You will typically reach for
Vec<T>
first. - All Rust programs have a single entry function:
main()
. - Every crate has a Cargo.toml file that specifies its metadata.
- The cargo tool is able to compile your code and fetch its dependencies.
- The rustup tool provides access to multiple compiler toolchains and to the language’s documentation.
1.This isn’t technically correct, but is accurate enough for now. If you’re an experienced Rust programmer skimming through this chapter, you’ll know that main()
returns ()
(unit) by default and can also return a Result
.
2.For the curious and eager, the traits involved here are std::cmp::PartialOrd
and std::cmp::PartialEq
.
3.Mechanical engineers use j rather than i.
4.Although Rust is not object-oriented (it’s impossible to create a subclass, for example), Rust makes use of some terminology from that domain. It’s common to hear of Rust programmers discussing instances, methods, and objects.
5.This functionality is also available with continue
, but it’s less common.
6.Omitting lifetime annotations is formally referred to as lifetime elision.table of contentssearchSettingsqueue
TopicsStart LearningWhat’s New
4 Lifetimes, ownership, and borrowing
Part 2 Demystifying systems programming
9h 25m remaining
4 Lifetimes, ownership, and borrowing
- Discovering what the term lifetime means in Rust programming
- Working with the borrow checker rather than against it
- Multiple tactics for dealing with issues when these crop up
- Understanding the responsibilities of an owner
- Learning how to borrow values that are owned elsewhere
This chapter explains one of the concepts that trip up most newcomers to Rust—its borrow checker. The borrow checker checks that all access to data is legal, which allows Rust to prevent safety issues. Learning how this works will, at the very least, speed up your development time by helping you avoid run-ins with the compiler. More significantly though, learning to work with the borrow checker allows you to build larger software systems with confidence. It underpins the term fearless concurrency.
This chapter will explain how this system operates and help you learn how to comply with it when an error is discovered. It uses the somewhat lofty example of simulating a satellite constellation to explain the trade-offs relating to different ways to provide shared access to data. The details of borrow checking are thoroughly explored within the chapter. However, a few points might be useful for readers wanting to quickly get the gist. Borrow checking relies on three interrelated concepts—lifetimes, ownership, and borrowing:
- Ownership is a stretched metaphor. There is no relationship to property rights. Within Rust, ownership relates to cleaning values when these are no longer needed. For example, when a function returns, the memory holding its local variables needs to be freed. Owners cannot prevent other parts of the program from accessing their values or report data theft to some overarching Rust authority.
- A value’s lifetime is the period when accessing that value is valid behavior. A function’s local variables live until the function returns, while global variables might live for the life of the program.
- To borrow a value means to access it. This terminology is somewhat confusing as there is no obligation to return the value to its owner. Its meaning is used to emphasize that while values can have a single owner, it’s possible for many parts of the program to share access to those values.
4.1 Implementing a mock CubeSat ground station
Our strategy for this chapter is to use an example that compiles. Then we’ll make a minor change that triggers an error that appears to emerge without any adjustment to the program’s flow. Working through the fixes to those issues should make the concepts more complete.
The learning example for the chapter is a CubeSat constellation. If you’ve never encountered that phrase before, here are some definitions:
- CubeSat—A miniature artificial satellite, as compared to a conventional satellite, that has increasingly expanded the accessibility of space research.
- Ground station—An intermediary between the operators and the satellites themselves. It listens on a radio, checking the status of every satellite in the constellation and transmitting messages to and fro. When introduced in our code, it acts as the gateway between the user and the satellites.
- Constellation—The collective noun for satellites in orbit.
Figure 4.1 shows several CubeSats orbiting our ground station.
In figure 4.1, we have three CubeSats. To model this, we’ll create a variable for each. This model can happily implement integers for the moment. We don’t need to model the ground station explicitly because we’re not yet sending messages around the constellations. We’ll omit that model for now. These are the variables:
let sat_a = 0; let sat_b = 1; let sat_c = 2;
To check on the status of each of our satellites, we’ll use a stub function and an enum
to represent potential status messages:
#[derive(Debug)] enum StatusMessage { Ok, ① } fn check_status(sat_id: u64) -> StatusMessage { StatusMessage::Ok ① }
① For now, all of our CubeSats function perfectly all the time
The check_status()
function would be extremely complicated in a production system. For our purposes, though, returning the same value every time is perfectly sufficient. Pulling these two snippets into a whole program that “checks” our satellites twice, we end up with something like the following listing. You’ll find this code in the file ch4/ch4-check-sats-1.rs.
Listing 4.1 Checking the status of our integer-based CubeSats
1 #![allow(unused_variables)] 2 3 #[derive(Debug)] 4 enum StatusMessage { 5 Ok, 6 } 7 8 fn check_status(sat_id: u64) -> StatusMessage { 9 StatusMessage::Ok 10 } 11 12 fn main () { 13 let sat_a = 0; ① 14 let sat_b = 1; ① 15 let sat_c = 2; ① 16 17 let a_status = check_status(sat_a); 18 let b_status = check_status(sat_b); 19 let c_status = check_status(sat_c); 20 println!("a: {:?}, b: {:?}, c: {:?}", a_status, b_status, c_status); 21 22 // "waiting" ... 23 let a_status = check_status(sat_a); 24 let b_status = check_status(sat_b); 25 let c_status = check_status(sat_c); 26 println!("a: {:?}, b: {:?}, c: {:?}", a_status, b_status, c_status); 27 }
① Each satellite variable is represented by an integer.
Running the code in listing 4.1 should be fairly uneventful. The code compiles, albeit begrudgingly. We encounter the following output from our program:
a: Ok, b: Ok, c: Ok a: Ok, b: Ok, c: Ok
4.1.1 Encountering our first lifetime issue
Let’s move closer to idiomatic Rust by introducing type safety. Instead of integers, let’s create a type to model our satellites. A real implementation of a CubeSat type would probably include lots of information about its position, its RF frequency band, and more. In the following listing, we stick with only recording an identifier.
Listing 4.2 Modeling a CubeSat as its own type
#[derive(Debug)] struct CubeSat { id: u64, }
Now that we have a struct
definition, let’s inject it into our code. The next listing will not compile (yet). Understanding the details of why it won’t is the goal of much of this chapter. The source for this listing is in ch4/ch4-check-sats-2.rs.
Listing 4.3 Checking the status of our integer-based CubeSats
1 #[derive(Debug)] ① 2 struct CubeSat { 3 id: u64, 4 } 5 6 #[derive(Debug)] 7 enum StatusMessage { 8 Ok, 9 } 10 11 fn check_status( 12 sat_id: CubeSat 13 ) -> StatusMessage { ② 14 StatusMessage::Ok 15 } 16 17 fn main() { 18 let sat_a = CubeSat { id: 0 }; ③ 19 let sat_b = CubeSat { id: 1 }; ③ 20 let sat_c = CubeSat { id: 2 }; ③ 21 22 let a_status = check_status(sat_a); 23 let b_status = check_status(sat_b); 24 let c_status = check_status(sat_c); 25 println!("a: {:?}, b: {:?}, c: {:?}", a_status, b_status, c_status); 26 27 // "waiting" ... 28 let a_status = check_status(sat_a); 29 let b_status = check_status(sat_b); 30 let c_status = check_status(sat_c); 31 println!("a: {:?}, b: {:?}, c: {:?}", a_status, b_status, c_status); 32 }
① Modification 1 adds the definition.
② Modification 2 uses the new type within check_status().
③ Modification 3 creates three new instances.
When you attempt to compile the code for listing 4.3, you will receive a message similar to the following (which has been edited for brevity):
error[E0382]: use of moved value: `sat_a` --> code/ch4-check-sats-2.rs:26:31 | 20 | let a_status = check_status(sat_a); | ----- value moved here ... 26 | let a_status = check_status(sat_a); | ^^^^^ value used here after move | = note: move occurs because `sat_a` has type `CubeSat`, = which does not implement the `Copy` trait ... ① error: aborting due to 3 previous errors
To trained eyes, the compiler’s message is helpful. It tells us exactly where the problem is and provides us with a recommendation on how to fix it. To less experienced eyes, it’s significantly less useful. We are using a “moved” value and are fully advised to implement the Copy
trait on CubeSat
. Huh? It turns out that although it is written in English, the term move means something very specific within Rust. Nothing physically moves.
Movement within Rust code refers to movement of ownership, rather than the movement of data. Ownership is a term used within the Rust community to refer to the compile-time process that checks that every use of a value is valid and that every value is destroyed cleanly.
Every value in Rust is owned. In both listings 4.1 and 4.3, sat_a
, sat_b
, and sat_c
own the data that these refer to. When calls to check_status()
are made, ownership of the data moves from the variables in the scope of main()
to the variable sat_id
within the check_status()
function. The significant difference is that listing 4.3 places that integer within a CubeSat
struct.1 This type change alters the semantics of how the program behaves.
The next listing provides a stripped-down version of the main()
function from listing 4.3. It is centered on sat_a
and attempts to show how ownership moves from main()
into check_status()
.
Listing 4.4 Extract of listing 4.3, focusing on main()
fn main() { let sat_a = CubeSat { id: 0 }; ① // ... ② let a_status = check_status(sat_a); ③ // ... ② // "waiting" ... let a_status = check_status(sat_a); ④ // ... ② }
① Ownership originates here at the creation of the CubeSat object.
③ Ownership of the object moves to check_status() but is not returned to main().
④ At line 27, sat_a is no longer the owner of the object, making access invalid.
Rebinding is legal when values are not borrowed
If you have experience with programming languages such as JavaScript (from 2015 onward), you may have been surprised to see that the variables for each of the CubeSats were redefined in listing 4.3. In that listing on line 20, a_status
is assigned to the result of the first call to check_status(sat_a)
. On line 26, it is reassigned to the result of the second call. The original value is overwritten.
This is legal Rust code, but one must be aware of ownership issues and lifetime here too. It’s possible in this context because there are no live borrows to contend with. Attempting to overwrite a value that’s still available from elsewhere in the program causes the compiler to refuse to compile your program.
Figure 4.2 provides a visual walk-through of the interrelated processes of control flow, ownership, and lifetimes. During the call to check_status(sat_a)
, ownership moves to the check_status()
function. When check_status()
returns a StatusMessage
, it drops the sat_a
value. The lifetime of sat_a
ends here. Yet, sat_a
remains in the local scope of main()
after the first call to check_status()
. Attempting to access that variable will incur the wrath of the borrow checker.
Figure 4.2 Visual explanation of Rust’s ownership movement
The distinction between a value’s lifetime and its scope—which is what many programmers are trained to rely on—can make things difficult to disentangle. Avoiding and overcoming this type of issue makes up the bulk of this chapter. Figure 4.2 helps to shed some light on this.
4.1.2 Special behavior of primitive types
Before carrying on, it might be wise to explain why listing 4.1 compiled at all. Indeed, the only change that we made in listing 4.3 was to wrap our satellite variables in a custom type. As it happens, primitive types in Rust have special behavior. These implement the Copy
trait.
Types implementing Copy
are duplicated at times that would otherwise be illegal. This provides some day-to-day convenience at the expense of adding a trap for newcomers. As you grow out from toy programs using integers, your code suddenly breaks.
Formally, primitive types are said to possess copy semantics, whereas all other types have move semantics. Unfortunately, for learners of Rust, that special case looks like the default case because beginners typically encounter primitive types first. Listings 4.5 and 4.6 illustrate the difference between these two concepts. The first compiles and runs; the other does not. The only difference is that these listings use different types. The following listing shows not only the primitive types but also the types that implement Copy
.
Listing 4.5 The copy semantics of Rust’s primitive types
1 fn use_value(_val: i32) { ① 2 } 3 4 fn main() { 5 let a = 123 ; 6 use_value(a); 7 8 println!("{}", a); ② 9 10 }
① use_value() takes ownership of the _val argument. The use_value() function is generic as it’s used in the next example.
② It’s perfectly legal to access a after use_value() has returned.
The following listing focuses on those types that do not implement the Copy
trait. When used as an argument to a function that takes ownership, values cannot be accessed again from the outer scope.
Listing 4.6 The move semantics of types not implementing Copy
1 fn use_value(_val: Demo) { ① 2 } 3 4 struct Demo { 5 a: i32, 6 } 7 8 fn main() { 9 let demo = Demo { a: 123 }; 10 use_value(demo); 11 12 println!("{}", demo.a); ② 13 }
① use_value() takes ownership of _val.
② It’s illegal to access demo.a, even after use_value() has returned.
4.2 Guide to the figures in this chapter
The figures used in this chapter use a bespoke notation to illustrate the three interrelated concepts of scope, lifetimes, and ownership. Figure 4.3 illustrates this notation.
Figure 4.3 How to interpret the figures in this chapter
4.3 What is an owner? Does it have any responsibilities?
In the world of Rust, the notion of ownership is rather limited. An owner cleans up when its values’ lifetimes end.
When values go out of scope or their lifetimes end for some other reason, their destructors are called. A destructor is a function that removes traces of the value from the program by deleting references and freeing memory. You won’t find a call to any destructors in most Rust code. The compiler injects that code itself as part of the process of tracking every value’s lifetime.
To provide a custom destructor for a type, we implement Drop
. This typically is needed in cases where we have used unsafe
blocks to allocate memory. Drop
has one method, drop(&mut self)
, that you can use to conduct any necessary wind-up activities.
An implication of this system is that values cannot outlive their owner. This kind of situation can make data structures built with references, such as trees and graphs, feel slightly bureaucratic. If the root node of a tree is the owner of the whole tree, it can’t be removed without taking ownership into account.
Finally, unlike the Lockean notion of personal property, ownership does not imply control or sovereignty. In fact, the “owners” of values do not have special access to their owned data. Nor do these have the ability to restrict others from trespassing. Owners don’t get a say on other sections of code borrowing their values.
4.4 How ownership moves
There are two ways to shift ownership from one variable to another within a Rust program. The first is by assignment.2 The second is by passing data through a function barrier, either as an argument or a return value. Revisiting our original code from listing 4.3, we can see that sat_a
starts its life with ownership over a CubeSat
object:
fn main() { let sat_a = CubeSat { id: 0 }; // ...
The CubeSat
object is then passed into check_status()
as an argument. This moves ownership to the local variable sat_id
:
fn main() { let sat_a = CubeSat { id: 0 }; // ... let a_status = check_status(sat_a); // ...
Another possibility is that sat_a
relinquishes its ownership to another variable within main()
. That would look something like this:
fn main() { let sat_a = CubeSat { id: 0 }; // ... let new_sat_a = sat_a; // ...
Lastly, if there is a change in the check_status()
function signature, it too could pass ownership of the CubeSat
to a variable within the calling scope. Here is our original function:
fn check_status(sat_id: CubeSat) -> StatusMessage { StatusMessage::Ok }
And here is an adjusted function that achieves its message notification through a side effect:
fn check_status(sat_id: CubeSat) -> CubeSat { println!("{:?}: {:?}", sat_id, ① StatusMessage::Ok); sat_id ② }
① Uses the Debug formatting syntax as our types have been annotated with #[derive(Debug)]
② Returns a value by omitting the semicolon at the end of the last line
With the adjusted check_status()
function used in conjunction with a new main()
, it’s possible to send ownership of the CubeSat
objects back to their original variables. The following listing shows the code. Its source is found in ch4/ch4-check-sats-3.rs.
Listing 4.7 Returning ownership back to the original scope
1 #![allow(unused_variables)] 2 3 #[derive(Debug)] 4 struct CubeSat { 5 id: u64, 6 } 7 8 #[derive(Debug)] 9 enum StatusMessage { 10 Ok, 11 } 12 13 fn check_status(sat_id: CubeSat) -> CubeSat { 14 println!("{:?}: {:?}", sat_id, StatusMessage::Ok); 15 sat_id 16 } 17 18 fn main () { 19 let sat_a = CubeSat { id: 0 }; 20 let sat_b = CubeSat { id: 1 }; 21 let sat_c = CubeSat { id: 2 }; 22 23 let sat_a = check_status(sat_a); ① 24 let sat_b = check_status(sat_b); 25 let sat_c = check_status(sat_c); 26 27 // "waiting" ... 28 29 let sat_a = check_status(sat_a); 30 let sat_b = check_status(sat_b); 31 let sat_c = check_status(sat_c); 32 }
① Now that the return value of check_status() is the original sat_a, the new let binding is reset.
The output from the new main()
function in listing 4.7 now looks like this:
CubeSat { id: 0 }: Ok CubeSat { id: 1 }: Ok CubeSat { id: 2 }: Ok CubeSat { id: 0 }: Ok CubeSat { id: 1 }: Ok CubeSat { id: 2 }: Ok
Figure 4.4 shows a visual overview of the ownership movements within listing 4.7.
Figure 4.4 The ownership changes within listing 4.7
4.5 Resolving ownership issues
Rust’s ownership system is excellent. It provides a route to memory safety without needing a garbage collector. There is a “but,” however.
The ownership system can trip you up if you don’t understand what’s happening. This is particularly the case when you bring the programming style from your past experience to a new paradigm. Four general strategies can help with ownership issues:
- Use references where full ownership is not required.
- Duplicate the value.
- Refactor code to reduce the number of long-lived objects.
- Wrap your data in a type designed to assist with movement issues.
To examine each of these strategies, let’s extend the capabilities of our satellite network. Let’s give the ground station and our satellites the ability to send and receive messages. Figure 4.5 shows what we want to achieve: create a message at Step 1, then transfer it at Step 2. After Step 2, no ownership issues should arise.
Figure 4.5 Goal: Enable messages to be sent while avoiding ownership issues
Ignoring the details of implementing the methods, we want to avoid code that looks like the following. Moving ownership of sat_a
to a local variable in base.send()
ends up hurting us. That value will no longer be accessible for the rest of main()
:
base.send(sat_a, "hello!"); ① sat_a.recv();
① Moves ownership of sat_a to a local variable in base.send()
To get to a “toy” implementation, we need a few more types to help us out somewhat. In listing 4.8, we add a new field, mailbox,
to CubeSat
. CubeSat.mailbox
is a Mailbox
struct that contains a vector of Messages
within its messages
field. We alias String
to Message
, giving us the functionality of the String
type without needing to implement it ourselves.
Listing 4.8 Adding a Mailbox
type to our system
1 #[derive(Debug)] 2 struct CubeSat { 3 id: u64, 4 mailbox: Mailbox, 5 } 6 7 #[derive(Debug)] 8 enum StatusMessage { 9 Ok, 10 } 11 12 #[derive(Debug)] 13 struct Mailbox { 14 messages: Vec<Message>, 15 } 16 17 type Message = String;
Creating a CubeSat
instance has become slightly more complicated. To create one now, we also need to create its associated Mailbox
and the mailbox’s associated Vec<Message>
. The following listing shows this addition.
Listing 4.9 Creating a new CubeSat
with Mailbox
CubeSat { id: 100, mailbox: Mailbox { messages: vec![] } }
Another type to add is one that represents the ground station itself. We will use a bare struct for the moment, as shown in the following listing. That allows us to add methods to it and gives us the option of adding a mailbox as a field later on as well.
Listing 4.10 Defining a struct to represent our ground station
struct GroundStation;
Creating an instance of GroundStation
should be trivial for you now. The following listing shows this implementation.
Listing 4.11 Creating a new ground station
GroundStation {};
Now that we have our new types in place, let’s put these to work. You’ll see how in the next section.
4.5.1 Use references where full ownership is not required
The most common change you will make to your code is to reduce the level of access you require. Instead of requesting ownership, you can use a “borrow” in your function definitions. For read-only access, use & T
. For read-write access, use &mut T
.
Ownership might be needed in advanced cases, such as when functions want to adjust the lifetime of their arguments. Table 4.1 compares the two different approaches.
Table 4.1 Comparing ownership and mutable references
Sending messages will eventually be wrapped up in a method, but with essence functions, implementing that modifies the internal mailbox of the CubeSat
. For simplicity’s sake, we’ll return ()
and hope for the best in case of transmission difficulties caused by solar winds.
The following snippet shows the flow that we want to end up with. The ground station can send a message to sat_a
with its send()
method, and sat_a
then receives the message with its recv()
method:
base.send(sat_a, "hello!".to_string()); let msg = sat_a.recv(); println!("sat_a received: {:?}", msg); // -> Option("hello!")
The next listing shows the implementations of these methods. To achieve that flow, add the implementations to GroundStation
and CubeSat
types.
Listing 4.12 Adding the GroundStation.send()
and CubeSat.recv()
methods
1 impl GroundStation { 2 fn send( 3 &self, ① 4 to: &mut CubeSat, ① 5 msg: Message, ① 6 ) { 7 to.mailbox.messages.push(msg); ② 8 } 9 } 10 11 impl CubeSat { 12 fn recv(&mut self) -> Option<Message> { 13 self.mailbox.messages.pop() 14 } 15 }
① &self indicates that GroundStation.send() only requires a read-only reference to self. The recipient takes a mutable borrow (&mut) of the CubeSat instance, and msg takes full ownership of its Message instance.
② Ownership of the Message instance transfers from msg to messages.push() as a local variable.
Notice that both GroundStation.send()
and CubeSat.recv()
require mutable access to a CubeSat
instance because both methods modify the underlying CubeSat.messages
vector. We move ownership of the message that we’re sending into the messages.push()
. This provides us with some quality assurance later, notifying us if we access a message after it’s already sent. Figure 4.6 illustrates how we can avoid ownership issues.
Figure 4.6 Game plan: Use references to avoid ownership issues.
Listing 4.13 (ch4/ch4-sat-mailbox.rs) brings together all of the code snippets in this section thus far and prints the following output. The messages starting with t0
through t2
are added to assist your understanding of how data is flowing through the program:
t0: CubeSat { id: 0, mailbox: Mailbox { messages: [] } } t1: CubeSat { id: 0, mailbox: Mailbox { messages: ["hello there!"] } } t2: CubeSat { id: 0, mailbox: Mailbox { messages: [] } } msg: Some("hello there!")
Listing 4.13 Avoiding ownership issues with references
1 #[derive(Debug)] 2 struct CubeSat { 3 id: u64, 4 mailbox: Mailbox, 5 } 6 7 #[derive(Debug)] 8 struct Mailbox { 9 messages: Vec<Message>, 10 } 11 12 type Message = String; 13 14 struct GroundStation; 15 16 impl GroundStation { 17 fn send(&self, to: &mut CubeSat, msg: Message) { 18 to.mailbox.messages.push(msg); 19 } 20 } 21 22 impl CubeSat { 23 fn recv(&mut self) -> Option<Message> { 24 self.mailbox.messages.pop() 25 } 26 } 27 28 fn main() { 29 let base = GroundStation {}; 30 let mut sat_a = CubeSat { 31 id: 0, 32 mailbox: Mailbox { 33 messages: vec![], 34 }, 35 }; 36 37 println!("t0: {:?}", sat_a); 38 39 base.send(&mut sat_a, 40 Message::from("hello there!")); ① 41 42 println!("t1: {:?}", sat_a); 43 44 let msg = sat_a.recv(); 45 println!("t2: {:?}", sat_a); 46 47 println!("msg: {:?}", msg); 48 }
① We don’t have a completely ergonomic way to create Message instances yet. Instead, we’ll take advantage of the String.from() method that converts &str to String (aka Message).
4.5.2 Use fewer long-lived values
If we have a large, long-standing object such as a global variable, it can be somewhat unwieldy to keep this around for every component of your program that needs it. Rather than using an approach involving long-standing objects, consider making objects that are more discrete and ephemeral. Ownership issues can sometimes be resolved by considering the design of the overall program.
In our CubeSat case, we don’t need to handle much complexity at all. Each of our four variables—base
, sat_a
, sat_b
, and sat_c
—live for the duration of main()
. In a production system, there can be hundreds of different components and many thousands of interactions to manage. To increase the manageability of this kind of scenario, let’s break things apart. Figure 4.7 presents the game plan for this section.
Figure 4.7 Game plan: Short-lived variables to avoid ownership issues
To implement this kind of strategy, we will create a function that returns CubeSat identifiers. That function is assumed to be a black box that’s responsible for communicating with some store of identifiers, such as a database. When we need to communicate with a satellite, we’ll create a new object, as the following code snippet shows. In this way, there is no requirement for us to maintain live objects for the whole of the program’s duration. It also has the dual benefit that we can afford to transfer ownership of our short-lived variables to other functions:
fn fetch_sat_ids() -> Vec<u64> { ① vec![1,2,3] }
① Returns a vector of CubeSat IDs
We’ll also create a method for GroundStation
. This method allows us to create a CubeSat
instance on demand once:
impl GroundStation { fn connect(&self, sat_id: u64) -> CubeSat { CubeSat { id: sat_id, mailbox: Mailbox { messages: vec![] } } } }
Now we are a bit closer to our intended outcome. Our main function looks like the following code snippet. In effect, we’ve implemented the first half of figure 4.7.
fn main() { let base = GroundStation(); let sat_ids = fetch_sat_ids(); for sat_id in sat_ids { let mut sat = base.connect(sat_id); base.send(&mut sat, Message::from("hello")); } }
But there’s a problem. Our CubeSat
instances die at the end of the for
loop’s scope, along with any messages that base
sends to them. To carry on with our design decision of short-lived variables, the messages need to live somewhere outside of the CubeSat
instances. In a real system, these would live on the RAM of a device in zero gravity. In our not-really-a-simulator, let’s put these in a buffer object that lives for the duration of our program.
Our message store will be a Vec<Message>
(our Mailbox
type defined in one of the first code examples of this chapter). We’ll change the Message
struct to add a sender and recipient field, as the following code shows. That way our now-proxy CubeSat
instances can match their IDs to receive messages:
#[derive(Debug)] struct Mailbox { messages: Vec<Message>, } #[derive(Debug)] struct Message { to: u64, content: String, }
We also need to reimplement sending and receiving messages. Up until now, CubeSat
objects have had access to their own mailbox object. The central GroundStation
also had the ability to sneak into those mailboxes to send messages. That needs to change now because only one mutable borrow can exist per object.
In the modifications in listing 4.14, the Mailbox
instance is given the ability to modify its own message vector. When any of the satellites transmit messages, these take a mutable borrow to the mailbox. These then defer the delivery to the mailbox object. According to this API, although our satellites are able to call Mailbox
methods, these are not allowed to touch any internal Mailbox
data themselves.
Listing 4.14 Modifications to Mailbox
1 impl GroundStation { 2 fn send( 3 &self, 4 mailbox: &mut Mailbox, 5 to: &CubeSat, 6 msg: Message, 7 ) { ① 8 mailbox.post(to, msg); 9 } 10 } 11 12 impl CubeSat { 13 fn recv( 14 &self, 15 mailbox: &mut Mailbox 16 ) -> Option<Message> { ② 17 mailbox.deliver(&self) 18 } 19 } 20 21 impl Mailbox { 22 fn post(&mut self, msg: Message) { ③ 23 self.messages.push(msg); 24 } 25 26 fn deliver( 27 &mut self, 28 recipient: &CubeSat 29 ) -> Option<Message> { ④ 30 for i in 0..self.messages.len() { 31 if self.messages[i].to == recipient.id { 32 let msg = self.messages.remove(i); 33 return Some(msg); ⑤ 34 } 35 } 36 37 None ⑥ 38 } 39 }
① Calls Mailbox.post() to send messages, yielding ownership of a Message
② Calls Mailbox.deliver() to receive messages, gaining ownership of a Message
③ Mailbox.post() requires mutable access to itself and ownership over a Message.
④ Mailbox.deliver() requires a shared reference to a CubeSat to pull out its id field.
⑤ When we find a message, returns early with the Message wrapped in Some per the Option type
⑥ When no messages are found, returns None
NOTE Astute readers of listing 4.14 will notice a strong anti-pattern. On line 32, the self.messages
collection is modified while it is being iterated over. In this instance, this is legal because of the return
on the next line. The compiler can prove that another iteration will not occur and allows the mutation to proceed.
With that groundwork in place, we’re now able to fully implement the strategy laid out in figure 4.7. Listing 4.15 (ch4/ch4-short-lived-strategy.rs) is the full implementation of the short-lived variables game plan. The output from a compiled version of that listing follows:
CubeSat { id: 1 }: Some(Message { to: 1, content: "hello" }) CubeSat { id: 2 }: Some(Message { to: 2, content: "hello" }) CubeSat { id: 3 }: Some(Message { to: 3, content: "hello" })
Listing 4.15 Implementing the short-lived variables strategy
1 #![allow(unused_variables)] 2 3 #[derive(Debug)] 4 struct CubeSat { 5 id: u64, 6 } 7 8 #[derive(Debug)] 9 struct Mailbox { 10 messages: Vec<Message>, 11 } 12 13 #[derive(Debug)] 14 struct Message { 15 to: u64, 16 content: String, 17 } 18 19 struct GroundStation {} 20 21 impl Mailbox { 22 fn post(&mut self, msg: Message) { 23 self.messages.push(msg); 24 } 25 26 fn deliver(&mut self, recipient: &CubeSat) -> Option<Message> { 27 for i in 0..self.messages.len() { 28 if self.messages[i].to == recipient.id { 29 let msg = self.messages.remove(i); 30 return Some(msg); 31 } 32 } 33 34 None 35 } 36 } 37 38 impl GroundStation { 39 fn connect(&self, sat_id: u64) -> CubeSat { 40 CubeSat { 41 id: sat_id, 42 } 43 } 44 45 fn send(&self, mailbox: &mut Mailbox, msg: Message) { 46 mailbox.post(msg); 47 } 48 } 49 50 impl CubeSat { 51 fn recv(&self, mailbox: &mut Mailbox) -> Option<Message> { 52 mailbox.deliver(&self) 53 } 54 } 55 fn fetch_sat_ids() -> Vec<u64> { 56 vec![1,2,3] 57 } 58 59 60 fn main() { 61 let mut mail = Mailbox { messages: vec![] }; 62 63 let base = GroundStation {}; 64 65 let sat_ids = fetch_sat_ids(); 66 67 for sat_id in sat_ids { 68 let sat = base.connect(sat_id); 69 let msg = Message { to: sat_id, content: String::from("hello") }; 70 base.send(&mut mail, msg); 71 } 72 73 let sat_ids = fetch_sat_ids(); 74 75 for sat_id in sat_ids { 76 let sat = base.connect(sat_id); 77 78 let msg = sat.recv(&mut mail); 79 println!("{:?}: {:?}", sat, msg); 80 } 81 }
4.5.3 Duplicate the value
Having a single owner for every object can mean significant up-front planning and/or refactoring of your software. As we saw in the previous section, it can be quite a lot of work to wriggle out of an early design decision.
One alternative to refactoring is to simply copy values. Doing this often is typically frowned upon, however, but it can be useful in a pinch. Primitive types like integers are a good example of that. Primitive types are cheap for a CPU to duplicate—so cheap, in fact, that Rust always copies these if it would otherwise worry about ownership being moved.
Types can opt into two modes of duplication: cloning and copying. Each mode is provided by a trait. Cloning is defined by std::clone::Clone
, and the copying mode is defined by std::marker::Copy
. Copy
acts implicitly. Whenever ownership would otherwise be moved to an inner scope, the value is duplicated instead. (The bits of object a are replicated to create object b.) Clone
acts explicitly. Types that implement Clone
have a .clone()
method that is permitted to do whatever it needs to do to create a new value. Table 4.2 outlines the major differences between the two modes.
Table 4.2 Distinguishing cloning from copying
So why do Rust programmers not always use Copy
? There are three main reasons:
- The
Copy
trait implies that there will only be negligible performance impact. This is true for numbers but not true for types that are arbitrarily large, such asString
. - Because
Copy
creates exact copies, it cannot treat references correctly. Naïvely copying a reference toT
would (attempt to) create a second owner ofT
. That would cause problems later on because there would be multiple attempts to deleteT
as each reference is deleted. - Some types overload the
Clone
trait. This is done to provide something similar to, yet different from, creating duplicates. For example,std::rc::Rc<T>
usesClone
to create additional references when.clone()
is called.
NOTE Throughout your time with Rust, you will normally see the std::clone ::Clone
and std::marker::Copy
traits referred to simply as Clone
and Copy
. These are included in every crate’s scope via the standard prelude.
Let’s go back to our original example (listing 4.3), which caused the original movement issue. Here it is replicated for convenience, with sat_b
and sat_c
removed for brevity:
#[derive(Debug)] struct CubeSat { id: u64, } #[derive(Debug)] enum StatusMessage { Ok, } fn check_status(sat_id: CubeSat) -> StatusMessage { StatusMessage::Ok } fn main() { let sat_a = CubeSat { id: 0 }; let a_status = check_status(sat_a); println!("a: {:?}", a_status); let a_status = check_status(sat_a); ① println!("a: {:?}", a_status); }
① The second call to check_status(sat_a) is the location of error.
At this early stage, our program consisted of types that contain types, which themselves implement Copy
. That’s good because it means implementing it ourselves is fairly straightforward, as the following listing shows.
Listing 4.16 Deriving Copy
for types made up of types that implement Copy
#[derive(Copy,Clone,Debug)] ① struct CubeSat { id: u64, } #[derive(Copy,Clone,Debug)] ① enum StatusMessage { Ok, }
① #[derive(Copy,Clone,Debug)] tells the compiler to add an implementation of each of the traits.
The following listing shows how it’s possible to implement Copy
manually. The impl
blocks are impressively terse.
Listing 4.17 Implementing the Copy
trait manually
impl Copy for CubeSat { } impl Copy for StatusMessage { } impl Clone for CubeSat { ① fn clone(&self) -> Self { CubeSat { id: self.id } ② } } impl Clone for StatusMessage { fn clone(&self) -> Self { *self ③ } }
① Implementing Copy requires an implementation of Clone.
② If desired, we can write out the creation of a new object ourselves…
③ …but often we can simply dereference self.
Now that we know how to implement them, let’s put Clone
and Copy
to work. We’ve discussed that Copy
is implicit. When ownership would otherwise move, such as during assignment and passing through function barriers, data is copied instead.
Clone
requires an explicit call to .clone()
. That’s a useful marker in non-trivial cases, such as in listing 4.18, because it warns the programmer that the process may be expensive. You’ll find the source for this listing in ch4/ch4-check-sats-clone-and-copy-traits.rs.
Listing 4.18 Using Clone
and Copy
1 #[derive(Debug,Clone,Copy)] ① 2 struct CubeSat { 3 id: u64, 4 } 5 6 #[derive(Debug,Clone,Copy)] ① 7 enum StatusMessage { 8 Ok, 9 } 10 11 fn check_status(sat_id: CubeSat) -> StatusMessage { 12 StatusMessage::Ok 13 } 14 15 fn main () { 16 let sat_a = CubeSat { id: 0 }; 17 18 let a_status = check_status(sat_a.clone()); ② 19 println!("a: {:?}", a_status.clone()); ② 20 21 let a_status = check_status(sat_a); ③ 22 println!("a: {:?}", a_status); ③ 23 }
① Copy implies Clone, so we can use either trait later.
② Cloning each object is as easy as calling .clone().
4.5.4 Wrap data within specialty types
So far in this chapter, we have discussed Rust’s ownership system and ways to navigate the constraints it imposes. A final strategy that is quite common is to use wrapper types, which allow more flexibility than what is available by default. These, however, incur costs at runtime to ensure that Rust’s safety guarantees are maintained. Another way to phrase this is that Rust allows programmers to opt in to garbage collection.3
To explain the wrapper type strategy, let’s introduce a wrapper type: std:rc::Rc
. std:rc::Rc
takes a type parameter T
and is typically referred to as Rc<T>
. Rc<T>
reads as “R. C. of T” and stands for “a reference-counted value of type T
.” Rc<T>
provides shared ownership of T
. Shared ownership prevents T
from being removed from memory until every owner is removed.
As indicated by the name, reference counting is used to track valid references. As each reference is created, an internal counter increases by one. When a reference is dropped, the count decreases by one. When the count hits zero, T
is also dropped.
Wrapping T
involves a calling Rc::new()
. The following listing, at ch4/ch4-rc-groundstation.rs, shows this approach.
Listing 4.19 Wrapping a user-defined type in Rc
1 use std::rc::Rc; ① 2 3 #[derive(Debug)] 4 struct GroundStation {} 5 6 fn main() { 7 let base = Rc::new(GroundStation {}); ② 8 9 println!("{:?}", base); ③ 10 }
① The use keyword brings modules from the standard library into local scope.
② Wrapping involves enclosing the GroundStation instance in a call to Rc::new().
Rc<T>
implements Clone
. Every call to base.clone()
increments an internal counter. Every Drop
decrements that counter. When the internal counter reaches zero, the original instance is freed.
Rc<T>
does not allow mutation. To permit that, we need to wrap our wrapper. Rc<RefCell<T>>
is a type that can be used to perform interior mutability, first introduced at the end of of chapter 3 in section 3.4.1. An object that has interior mutability presents an immutable façade while internal values are being modified.
In the following example, we can modify the variable base
despite being marked as an immutable variable. It’s possible to visualize this by looking at the changes to the internal base.radio_freq
:
base: RefCell { value: GroundStation { radio_freq: 87.65 } } base_2: GroundStation { radio_freq: 75.31 } base: RefCell { value: GroundStation { radio_freq: 75.31 } } base: RefCell { value: "<borrowed>" } ① base_3: GroundStation { radio_freq: 118.52000000000001 }
① value: “<borrowed>” indicates that base is mutably borrowed somewhere else and is no longer generally accessible.
The following listing, found at ch4/ch4-rc-refcell-groundstation.rs, uses Rc<RefCell<T>>
to permit mutation within an object marked as immutable. Rc<RefCell<T>>
incurs some additional runtime cost over Rc<T>
while allowing shared read/write access to T
.
Listing 4.20 Using Rc<RefCell<T>>
to mutate an immutable object
1 use std::rc::Rc; 2 use std::cell::RefCell; 3 4 #[derive(Debug)] 5 struct GroundStation { 6 radio_freq: f64 // Mhz 7 } 8 9 fn main() { 10 let base: Rc<RefCell<GroundStation>> = Rc::new(RefCell::new( 11 GroundStation { 12 radio_freq: 87.65 13 } 14 )); 15 16 println!("base: {:?}", base); 17 18 { ① 19 let mut base_2 = base.borrow_mut(); 20 base_2.radio_freq -= 12.34; 21 println!("base_2: {:?}", base_2); 22 } 23 24 println!("base: {:?}", base); 25 26 let mut base_3 = base.borrow_mut(); 27 base_3.radio_freq += 43.21; 28 29 println!("base: {:?}", base); 30 println!("base_3: {:?}", base_3); 31 }
① Introduces a new scope where base can be mutably borrowed
There are two things to note from this example:
- Adding more functionality (e.g., reference-counting semantics rather than move semantics) to types by wrapping these in other types typically reduces their run-time performance.
- If implementing
Clone
would be prohibitively expensive,Rc<T>
can be a handy alternative. This allows two places to “share” ownership.
NOTE Rc<T>
is not thread-safe. In multithreaded code, it’s much better to replace Rc<T>
with Arc<T>
and Rc<RefCell<T>>
with Arc<Mutex<T>>
. Arc stands for atomic reference counter.
Summary
- A value’s owner is responsible for cleaning up after that value when its lifetime ends.
- A value’s lifetime is the period when accessing that value is valid behavior. Attempting to access a value after its lifetime has expired leads to code that won’t compile.
- To borrow a value means to access that value.
- If you find that the borrow checker won’t allow your program to compile, several tactics are available to you. This often means that you will need to rethink the design of your program.
- Use shorter-lived values rather than values that stick around for a long time.
- Borrows can be read-only or read-write. Only one read-write borrow can exist at any one time.
- Duplicating a value can be a pragmatic way to break an impasse with the borrow checker. To duplicate a value, implement
Clone
orCopy
. - It’s possible to opt in to reference counting semantics through
Rc<T>
. - Rust supports a feature known as interior mutability, which enables types to present themselves as immutable even though their values can change over time.
1.Remember the phrase zero-cost abstractions ? One of the ways this manifests is by not adding extra data around values within structs.
2.Within the Rust community, the term variable binding is preferred because it is more technically correct.
3.Garbage collection (often abbreviated as GC) is a strategy for memory management used by many programming languages, including Python and JavaScript, and all languages built on the JVM (Java, Scala, Kotlin) or the CLR (C#, F#).table of contentssearchSettingsqueue