2.16 Strings

Strings are an important part of any programming language: for without them there would be no way of storing text in the form of a single variable. Although strings in Rust are somewhat similar to that of other programming languages, they operate in slightly different ways, as Rust contains two different types of strings: String and ‘string slice’.

String slices

What is usually referred to in Rust as a ‘string slice’, is a statically allocated string living for the duration of the entire program. String slices have a type of &'static str, but are more commonly annotated as &str. Due to Rust’s compiler’s lifetime elision, it automatically realized that there need be a static lifetime for the string:

let a: &str = "Hello, world!"; // Automatically given type ‘&'static str’

println!("{}", a);

String slices cannot be edited once they have been set. This is due to the fact that string slices are statically allocated, existing inside our compiled program at all times. Whatever value they have, they will exist for the duration of the entire program, since they are inside the very basic parts of the program itself. It is, on the other hand, possible to use the same binding name to bind with a new string slice.

Our binding a in the above example, serves as a reference to the location of our string slice in memory. This is the reason string slices need an & prior to them, as they are pointers rather than actual memory.

Composition

Any string slice is composed as a sequence of Unicode scalar values, which are in turn encoded as UTF-8 bytes. This means that you cannot be entirely sure about the length of a string slice, as one character might be made out of more than one building block.

Strings

Compared to string slices, Rust’s String is allocated on the heap, which basically means they are possible to change in length. On the other hand, they are just as nicely encoded in fabolous UTF-8 bytes, which means we can store any valid UTF-8 character in them.

Creation

Creating a new string is rather simple:

let a = String::new();

This will create a new string absent any value. It is also possible to define a string using another method located on the String, featuring a set value:

let a = String::from("Hello, world!");

Both these methods of creating a String works great, but perhaps one might want to choose a slightly different approach? And there is such a different approach, converting a string slice to a String, rather than using the initialization methods located on the String.

Conversion

Initializing a new String with a set value can be done by converting a string slice to a String by calling a method that may be used on the string slice. This method of creating strings is preferred by many people practicing the Rust programming language:

let a = "Hello, world!".to_string();

println!("{}", a);

The to_string method is called on a string slice to convert it into a String.

Perhaps you would at some point wish to do the opposite and use a String in the form of its more primitive relative, the string slice. You can then force a String into a string slice using &, to point to the String’s binding’s resource:

let a = String::from("Hello, world!");
let b = &a; // Inferred type of ‘&str’, with the same lifetime as a

println!("{}", b);

As this string slice b only lives for as long as a lives, b only lives for the duration of the entire program if a does. The b binding has for that reason not &'static str as its default type.

Although this way of conversion works well for most cases, there is another way in which Strings need to be converted if they are to be used as &strs taken by functions:

let a = String::from("Hello, world!");
let b = &*a;

We can now use b when providing functions with arguments.

Null-termination?

You may be used to other great systems languages using a ‘null byte’ in order to terminate a string: this is not the case with Rust. Null bytes are perfectly valid as part in a string in Rust. The same goes for both the types of strings Rust has to offer.

Indexing?

As mentioned earlier, since strings are of valid UTF-8, one character in a string may be made up of multiple bytes. This means that we cannot use some traditional ways of indexing strings: for instance using []. Such types of indexing is usually very fast, but since we cannot know for sure how many bytes a certain character in the string is, in order to find a character on a certain position, we need to walk over every single one starting from the beginning:

let a = "♐♍";

println!("{:?}", a.as_bytes());
println!("{:?}", a.chars());

Letting Rust do the job for us, this will print:

[226, 153, 144, 226, 153, 141]
Chars { iter: Iter([226, 153, 144, 226, 153, 141]) }

Both may be used as iterators in loops: the former will loop over each byte individually, whilst the latter will loop over each character it locates. The latter, placed inside a loop, will thus be able to print out the two UTF-8 characters from inside the string.

If we wanted the program to do the iteration automatically for us, and pick out a certain UTF-8 encoded character, we may use the nth method located on the Chars:

let s = a.chars().nth(0);

The s binding will now be bound to the value of ♐.

Slices

Remember from the section on primitive types, how we created a sort of viewport into collections? The same you can do on a string slice:

let a = "Hello, world!";
let b = &a[1..2];

Whilst this works brilliantly retrieving a viewport into the string slice, there will be problems if you do it on values which are not contained by one single byte:

let a = "♐♍";
let b = &a[1..2];

This will generate a rather nasty error message:

thread 'main' panicked at 'index 1 and/or 2 in `♐♍` do not lie on character boundary'

The error occurs because the values we specified lie directly on top of a character. These offsets we specified are byte offsets, rather than character offsets, meaning we set the offset on top of the bytes rather than on the characters: and half a character Rust truly does not like.

Concatenation

In many languages when concatenating strings, it is as simple as adding a + in between the strings and they will turn out as a beautiful version of the two strings combined. This is not the case with Rust. In Rust you concatenate strings using what is known in the language as ‘deref coercions’, meaning you add a string slice to the end of a String:

let a = String::from("I am ");
let b = "broke!";
let c = a + b;

The above will only work if you have one String and add a string slice to it. Should you at some point want to add two Strings, you need to reference it into a string slice first:

let a = String::from("I am ");
let b = String::from("broke!");
let c = a + &b;

format!

Deref coercions work well when quickly wanting to concatenate two strings: but what if you had a whole ton of strings that need concatenation, perhaps even with numbers added into them? You can then make use of the format! macro:

let a = String::from("I am ");
let b = "almost ";
let c = "broke! I have $";
let d = 1;
let e = String::from(".");
let f = format!("{}{}{}{}{}", a, b, c, d, e);

println!("{}", f);

This will print:

I am almost broke! I have $1.

format! works in a similar way to that of some other macros (print! for instance), and is in fact the foundation in which print! is based upon. This is discussed in-depth in a couple of sections.

Exercises

  • Make a function that converts a String into a string slice.
  • Create a function that concatenates three strings and a number into an utterly random message.

Conclusion

Rust strings are wonderful pieces of bytes: who would not want a string made up of UTF-8 encoded bytes? This is one of those features of Rust showing how modern Rust is as a language. That of course may very well not be the case a few thousand years from now, but better focus on this moment than what will happen in some hundred lifetimes.

Enums