Learning Rust

A Quick Lesson in Ownership

I'm learning rust

Ferris the Rustacean crab mascot of Rust

A note on format

This is a little stream of consciousness post about my experience working through the official intro to rust book. There's not much point to it other than to help me evaluate how I feel working with the language.

Getting going

I started out by installing rustup, but immediately hit a problem. I already had a version of rust installed on my system. It was an old one installed through homebrew. Nevermind that, I just needed to uninstall the brew version and install a newer one via the recommended approach.

Next, I installed rust-analyzer making sure to grab the VS Code extension which (at the time of writing this) is the recommended language server even though the RLS extension has many more downloads. There's been some sort of maintenance issue or something. I didn't look into it too much, but this is the community direction right now.

I found that rust-analyzer only works when you have a whole cargo project as the root of your workspace in VS Code though. It's a bummer that we can't have non-root rust projects and still get the benefit of the language server and all it's hover/intellisense/type goodies. I think there's a way to make this work with a rust-project.json file, but I didn't want to get sidetracked before even getting started.

Alright, with my toolchain all set up I was able to run cargo new hello-cargo and get a whole rust project bootstrapped, built, and run. On to the fun bits 🦀.

Feeling familiar

Everything feels familiar — generally old-hat for someone who has even a tiny amount of experience with C-like languages. There was noticeably less fiddling with types and memory right out of the gate, but I don't know at this point if I just haven't gotten into it enough yet.

fn main() {
  println!("Hello, world!")
}

Notice the ! signifying a Rust macro. I've been promised this will be explained in due time, but it's the only new-ish thing I've seen.

Ownership

Before we get started, a refresher on stack and heap from the book:

Both the stack and the heap are parts of memory available to your code to use at runtime, but they are structured in different ways. The stack stores values in the order it gets them and removes the values in the opposite order. This is referred to as last in, first out. Think of a stack of plates: when you add more plates, you put them on top of the pile, and when you need a plate, you take one off the top. Adding or removing plates from the middle or bottom wouldn't work as well! Adding data is called pushing onto the stack, and removing data is called popping off the stack. All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead.

The heap is less organized: when you put data on the heap, you request a certain amount of space. The memory allocator finds an empty spot in the heap that is big enough, marks it as being in use, and returns a pointer, which is the address of that location. This process is called allocating on the heap and is sometimes abbreviated as just allocating. Pushing values onto the stack is not considered allocating. Because the pointer to the heap is a known, fixed size, you can store the pointer on the stack, but when you want the actual data, you must follow the pointer. Think of being seated at a restaurant. When you enter, you state the number of people in your group, and the staff finds an empty table that fits everyone and leads you there. If someone in your group comes late, they can ask where you've been seated to find you.

Furthermore, pushing to the stack is faster than allocating on the heap and popping from the stack is faster than accessing data from the heap.

The rationale behind of a large part of Rust's language design — and specifically its concept of memory ownership — is based on how the stack and heap work.

So how does ownership work exactly though?

Every value has a variable that's called its owner
There can only be one owner at a time
When the owner goes out of scope, the value will be dropped.

These simple rules create a rich, compile-time memory management strategy.

This ownership pattern is sometimes used in C++ under the name Resource Acquisition Is Initialization (RAII), which helps us answer the questions "Who is responsible for a given resource?" and "Who can access a resource and how?"

Simple scalars and pushing to the stack behave how you'd expect. Allocating a single variable pointing to the heap is also fairly trivial to reason about. The interesting pieces come when we start re-assigning variables that point to the heap and so on.

Consider the following:

let s1 = String::from("hello");
let s2 = s1;

println!("{}, world!", s1);

Typically, there would be two variables consisting of pointers that refer to the same memory on the heap. In Rust, this re-assignment invalidates s1 and it can no longer be used. The above code will cause an error. s2 becomes the sole owner of this memory allocation and our use-after-free class of problems are solved.

Moreover, Rust has only a few types of variables T, &mut T, and &T.

T is owned
&mut T is an exclusive borrowed, mutable reference that allows writes.
&T is a shared, borrowed, immutable reference that only allows reads.

Rust will check that you never have both writers and readers (&mut T and &T) or multiple writers (&mut T) at the same time. You can have multiple readers (&T) though, which makes sense. The compiler is requiring us to prove that you don't have data-races by requiring references not to have lifetimes that outlive the owner T. This also means that owners will always only free memory once. If we do need to have multiple writers, Rust is giving us strong guarantees of correctness by requiring us to use well-formed synchronization via a Mutex, Read/Write Lock, or similar.

Some deep, powerful things we get from this model:

Lots of concurrency bugs are impossible
No more use-after-free or double-free errors
Awesome enums (Algebraic Data Types!)
Explicit error-handling (the Result type)
No more null pointers (the Option type)
Pattern matching!

Rust has listened to the advances we've made in CS over the past 30 years and put developers first in its design.

Interesting tidbit: Rust uses non-lexical lifetimes meaning code like this

let mut s = String::from("hello");

let r1 = &s; // no problem
let r2 = &s; // no problem
println!("{} and {}", r1, r2);
// variables r1 and r2 will not be used after this point

let r3 = &mut s; // no problem
println!("{}", r3);

is valid and the fact that r1 and r2 won't be used and thus r3 is ok, can be statically verified by the compiler.

At this point I went down a little bit of a rabbit hole watching a great video by Jon Gjengset about mutexes and reader-writer locks

impl Mutex<T> {
  fn lock(&'mtx self) -> MutexGuard<'mtx, T> { ... }
}

impl MutexGuard<'mtx, T> {
  fn get(&'a mut self) -> &'a mut T { ... }
}

This implementation in rust says "as long as the Mutex is still alive and you have a lock, you can get a mutable reference to the thing inside the lock" It's mutable access to a T through shared reference to the Mutex

Could also use a Reader/Writer lock. This allows multiple readers at once XOR a single writer.

struct RWLock {
  r: Mutex<usize>,
  g: Mutex<()>,
}

impl RWLock {
  fn lock(&self) {
    // take a single exclusive lock first to check validity of getting read/write lock
    let mut b = self.r.lock();
    *b += 1;
    if *b == 1 { self.g.lock() }
    b.unlock(); // release the single exclusive lock
  }

  fn unlock(&self) {
    let mut b = self.r.lock();
    *b -= 1;
    if *b == 0 { self.g.unlock() }
    b.unlock();
  }
}

For long critical sections, that lock time becomes irrelevant, but short critical sections, it can be prohibitively expensive. So for DBs this doesn't work, but some fewer thread longer running things, we'd be good.

Enter unsafe Rust

Corro the unsafe Rusturchin

At its core, all unsafe Rust does is allow you to dereference a raw pointer.

Raw pointer to T is *mut T which is a pointer with no lifetime that can be turned into &mut T, but doing so is unsafe. You are now responsible for making sure there are no data-races. Now you know exactly which parts of your code to audit when you crash with a data-race.

Have two HashMaps, all the readers point to one, all the writers point to the other. Once the writer map decides to reveal the writes (after every write or some time frame or w/e), the pointers swap and the readers now look at the written map while the writers start writing to the old reader map. The writer then re-applies the writes to the old map to be consistent. There could be a reader who is still looking through the old map though. How to solve that?

Readers keep a counter of reads they have done, then the writer observes the counts of all readers and wait until they all tick up by one.

What if there's an inactive reader though?

The reader increments twice, once before the read and once after. Now we have another constraint. If the value is even, we know it's idle, therefore safe to ignore because it'd get a new pointer before the next read.

This scales linearly!!! over read threads.

This is the Noria DB

Go's concurrency is very easy but not safe and very easy to shoot yourself in the foot with.

Considering Rust

Another video about why you might want to consider Rust.

Best features of Rust (that some others don't have)

Modern language

Nice and efficient generics
Algebraic data types and pattern matching
Modern tooling

Safety by design

Pointers checked at compile-time
Thread-safety from types
No hidden states

Low-level control that gets out of your way

No GC or runtime
Control allocation and dispatch
Can write + wrap low-level code

Tooling

Dependency management
Standard tools included (formatters, tests, etc.)
Excellent support for macros

Asynchronous Code

Language support for writing asynchronous code
Choose your own runtime

Generics

struct MyVec<T> {
  // ...
}

impl<T> MyVec<T> {
  pub fn find<P>(&self, predicate: P) -> Option<&T>
  where
    P: Fn(&T) -> bool
  {
    for v in self {
      if predicate(v) {
        return Some(v)
      }
    }
    None
  }
}

This is compiled for all specific types in the actual code (zero-overhead abstraction). Also note that you cannot get back a null pointer because of the Option type.

ADTs and matching

Compiler will include exhaustiveness checking This is just like all the goodness of many functional languages.

// Option<MyType> is an enum that is either Some(MyType) or None
if let Some(f) = my_vec.find(|t| t >= 42) {
  /* found */
}

enum DecompressionResult {
  Finished { size: u32 }
  InputError(std::io::Error)
  OutputError(std::io::Error)
}

// this will not compile because it's not exhaustive
match decompress() {
  Finished { size } => { /* parsed successfully */ }
  InputError(e) if e.is_eof() => { /* got EOF */ }
  OutputError(e) => { /* output failed with error e */}
}

Modern Tooling

Compiler knows about tests and docs. Builtin testing, benchmarking, friendly errors, deep integration with package management and builds.

After about two weeks of solving puzzles and hacking with it, I can definitively say that I really like Rust. It's everything I've ever wanted out of a systems language. It's modern, fast, comfortable to use, easy to learn, and easy to use well. There are things about it that even make me think that it could be a better all-purpose language than something like Haskell or OCaml. I'm definitely going to continue working in it.

This post has been all over the place and certainly not a "blog post" in the traditional sense. Regardless, I had a good time writing and taking some public notes about my initial experiences with Rust.