Rust's Ownership Model

10 Nov 2016 - London, England

Preamble

Rust is a really interesting language playing in the same space as C/C++. Initially a pet project started by former Mozilla employee Graydon Hoare, it garnered internal support from the organisation and Firefox developers’ frustrated by C++’s complexity and burden of manual memory allocation. It’s fair to say that the core driver of Rust is memory safety paired with best in class performance.

I’ve been tracking Rust for a while and the language/ecosystem has begun to mature, so I wanted to spend some time investigating how it achieves this memory safety, through it’s ownership model. What follows is a summary of the Rust Book section on ownership and hopefully others may find these notes useful.

Ownership

In Rust, variable bindings have ownership over the resource they are bound to:

fn aFunction() {
  let x = 10;
}

x is bound to the resource 10
Primitives such as x are allocated memory in the stack along with a memory allocation for the binary representation of 10.
If no type is declared, integers will default to type i32 and 4 bytes of memory is allocated for the data.
When a binding goes out of scope, the bound resource(s) will be freed.

fn aFunction() {  
  let n = vec![1,2,3];
}  

For a non-primitive type such as the vector n, a vector object is allocated memory in the stack.
Data within the n vector is added to the heap, the memory address for this data is then copied to an internal pointer and this ‘data pointer’ is added to the vector object (n) on the stack so it always has a reference to the data.
The vector object on the stack must always stay in sync with the data stored on the heap with regards to length, capacity etc.

Moving values and taking ownership

fn aFunction() {
  let foo = vec![10];
  let bar = foo;
  // cannot use foo from this point!
}

If we initially bind foo to the vector [10], when we bind bar to foo, bar takes ownership of the underlying resource. The value has moved to bar.
There can only ever be one binding to a given resource, therefore foo can no longer be used after bar takes ownership and Rust will generate a compiler error if we try to use foo.

fn randomFunction(agesParam: Vec<i32>) {
  // does stuff
}

let ages = vec![10];
let newAges = randomFunction(ages);
// cannot use ages from this point!

If you pass a binding to a function, the function takes ownership of the resource and by default we cannot use age again outside of the function because it has moved to agesParam which is declared inside randomFunction.
In the stack, Rust will make a bitwise copy of age into the new binding agesParam. Now age and agesParam have a data pointer to the heap data.
But if we then tried to update agesParam, the heap data would be updated but age wouldn’t know about it. The stack would be out of sync which could cause segmentation faults and potentially attempt to access to memory it shouldn’t.
This is why rust forbids the use of a binding once it’s resource value has been moved.

Copy types

To change this behaviour we could implement the Copy trait. The vector type used above does not implement this by default but primitives such as i32 do. This means a full copy of the resource is made on binding to an existing resource and we can continue to use foo in the example below:

fn aFunction() {
  let foo = 10;
  let bar = foo;
  // Can still use foo here..
}

If desired, it’s possible to make your own types implement the Copy trait.
Alternatively for non-primitive types, we can use Rust’s borrowing mechanism.

Borrowing

For types that do not implement the Copy trait we’d be forced to continually return multiple types any time we move a resource and wanted to retain the original binding e.g. let (v1, v2) = aFunction(v1);. This would quickly become impractical and so Rust solves this with the concept of Borrowing.

fn aFunction(v1: &Vec<i32>, v2: &Vec<i32>) -> i32 {
  // do stuff with v1 and v2
  42 // return i32
}

let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3];

let answer = aFunction(&v1, &v2);
// we can use v1 and v2 here!

In the above example, aFunction receives references for two vectors, to write this we append an ampersand to the function arguments: &Vec<i32>. When calling the function we need to pass the reference of the bindings, we do this by prefixing each binding with an ampersand v1 -> &v1 when calling the function.
This means that rather than aFunction taking ownership of those resources, it borrows ownership of them.
Like bindings, references are immutable by default meaning that we cannot change data of a referenced or borrowed value.
To mutate v1 within aFunction, we’d have to initially declare that the function takes a mutable reference: fn aFunction(v1: mut &Vec<i32) {}. Then in our main scope we would declare v1 as mutable let mut v1 = vec![1, 2, 3] and below it define a second scope where we call our function, passing a mutable reference: { aFunction(mut &v1) }.

Borrowing Rules

A borrow cannot last longer than the scope of it’s owner.
You can only have one type of borrow at a given time for a resource. Either:
- Multiple Reader Borrows - n number of references (&T) to a resource.
- Single Write Borrow - exactly one mutable reference (&mut T) to a resource.
This prevents data races because only one binding can be used to write data to a resource at any one time.
The key to seeing how long the borrow lasts for is understanding scope.

let mut x = 5;      // 1st scope
{           
    let y = &mut x; // -+ 2nd scope and &mut borrow start
    *y += 1;        //  |
}                   // -+ 2nd scope and &mut borrow end
println!("{}", x);  // OK to use x as y ownership of resource ended

y borrows the mutable reference to the resource and goes out of scope when the curly braces end, allowing us to print x immediately afterwards.
If we removed the curly braces above we’d get a compiler error because y had borrowed mutable reference and not returned it (i.e. not yet gone out of scope because there is only one scope).

Lifetimes

Rust’s ownership system and concept of lifetimes prevent dangling pointers - where a reference points to a resource that has been deallocated and freed from memory.
Every reference has a lifetime associated with it which can be implicit (called Lifetime Elision) but there are certain instances where an explicit declaration is required within the generic parameters (<>) of functions:

// implicit lifetime
fn foo(x: &i32) { ... }

// explicit lifetime - a reference to an i32 with a lifetime of a
fn bar<'a>(x: &'a i32) { ... }

You need to explicitly define lifetimes when working with structs and impl blocks too as they contain references. We want to ensure that any reference to a Foo cannot outlive the reference to the i32 it contains:

struct Foo<'a> {
    x: &'a i32,
}

impl<'a> Foo<'a> {
    fn x(&self) -> &'a i32 { self.x }
}

fn main() {
    let y = &5; // this is the same as `let _y = 5; let y = &_y;`
    let f = Foo { x: y };

    println!("x is: {}", f.x());
}

You can reuse a single lifetime for multiple variables, meaning the variables are alive for the same scope. In this example above, x and y have different valid scopes, but the return value has the same lifetime as x.

fn x_or_y<'a, 'b>(x: &'a str, y: &'b str) -> &'a str {

We could think about lifetimes purely in terms of scopes but named lifetimes aid discussion as we can refer to a variable’s scope directly by name.
'static is a special lifetime which means that for the reference will be valid for the entire duration of the program and this is baked into the data section of the binary.

let x: &'static str = "Hello, world.";

static FOO: i32 = 5;
let x: &'static i32 = &FOO;