10 ways to build applications faster with Amazon CodeWhisperer

Amazon CodeWhisperer is a powerful generative AI tool that gives me coding superpowers. Ever since I have incorporated CodeWhisperer into my workflow, I have become faster, smarter, and even more delighted when building applications. However, learning to use any generative AI tool effectively requires a beginner’s mindset and a willingness to embrace new ways of working.

Best practices for tapping into CodeWhisperer’s power are still emerging. But, as an early explorer, I’ve discovered several techniques that have allowed me to get the most out of this amazing tool. In this article, I’m excited to share these techniques with you, using practical examples to illustrate just how CodeWhisperer can enhance your programming workflow. I’ll explore:

Typing less
Generating functions using code
Generating functions using comments
Generating classes
Implementing algorithms
Writing unit tests
Creating sample data
Simplifying regular expressions
Learning third-party code libraries faster
Documenting code

Before we begin

If you would like to try these techniques for yourself, you will need to use a code editor with the AWS Toolkit extension installed. VS Code, AWS Cloud9, and most editors from JetBrains will work. Refer to the CodeWhisperer “Getting Started” resources for setup instructions.

CodeWhisperer will present suggestions automatically as you type. If you aren’t presented with a suggestion, you can always manually trigger a suggestion using the Option + C (Mac) or Alt + C (Windows) shortcut. CodeWhisperer will also sometimes present you with multiple suggestions to choose from. You can press the → and ← keys to cycle through all available suggestions.

The suggestions CodeWhisperer offers are non-deterministic, which means you may receive slightly different suggestions than the ones shown in this article. If you receive a suggestion that doesn’t suit your needs, try cycling through all available suggestions using the arrow keys or try rephrasing any code comment prompts slightly. Even punctuation changes can produce fresh suggestions.

The examples in this article are written in JavaScript and TypeScript. However, the techniques I’ll demonstrate are equally suited to many of the programming languages supported by CodeWhisperer. Now, onto the examples!

Typing less

At the most basic level, CodeWhisperer can be used as a supercharged auto-complete, significantly reducing your typing when writing individual lines or blocks of code. To use CodeWhisperer in this way, just write code as you normally would. CodeWhisperer will make automatic suggestions as you type. Accept suggestions when they’re helpful, and ignore suggestions when they’re not. In my experience, using CodeWhisperer in this way alone reduces my typing by 50% or more.

Generating functions

Automatically generating function bodies from signatures

Skilled programmers use clear, descriptive names for their functions and parameters. If you follow this practice in your own programming, CodeWhisperer will have all the context it needs to implement many functions for you.

After I enter the function signature below, CodeWhisperer generates the function body for me.

User input (JavaScript):

function standardDeviation(numbers)

Result:

function standardDeviation(numbers) {
let mean = numbers.reduce((a, b) => a + b) / numbers.length;
let variance =
numbers.reduce((a, b) => a + Math.pow(b – mean, 2)) / numbers.length;
return Math.sqrt(variance);
}

Generating functions from comments

Sometimes a good function signature alone isn’t enough to convey the desired behavior of a function. In those cases, you can use a detailed code comment as a prompt to CodeWhisperer. I recommend using your language’s doc comment delimiter when using this technique.

User Input (JavaScript):

/**
* Given a list that contains some numbers and strings, format them into a string
* in which the numbers are prepended with “#” and the strings are wrapped in
* double quotes.
*/

Result:

/**
* Given a list that contains some numbers and strings, format them into a string
* in which the numbers are prepended with “#” and the strings are wrapped in
* double quotes.
*/
function formatList(list) {
return list
.map((item) => {
if (typeof item === “number”) {
return `#${item}`;
} else {
return `”${item}”`;
}
})
.join(“, “);
}

Generating classes

You can generate full or partial class implementations from a single code comment. The comment should describe the key properties and behaviors of the class. Simple descriptions are often all that is required.

User input (TypeScript):

/**
* A folder has a name, children, and an optional parent. A child can’t be added
* twice.
*/

Result:

After I write the above block comment, CodeWhisperer provides multiple suggestions, including the following implementation, complete with documentation comments for each method!

/**
* A folder has a name, children, and an optional parent. A child can’t be added
* twice.
*/
export class Folder {
name: string;
children: Folder[];
parent?: Folder;

constructor(name: string, parent?: Folder) {
this.name = name;
this.children = [];
this.parent = parent;
}

/**
* Add a child to the folder.
* @param child The child to add.
* @returns True if the child was added, false if it was already present.
*/
addChild(child: Folder): boolean {
if (this.children.includes(child)) {
return false;
}
this.children.push(child);
child.parent = this;
return true;
}

/**
* Remove a child from the folder.
* @param child The child to remove.
* @returns True if the child was removed, false if it wasn’t present.
*/
removeChild(child: Folder): boolean {
const index = this.children.indexOf(child);
if (index === -1) {
return false;
}
this.children.splice(index, 1);
child.parent = undefined;
return true;
}
}

Implementing algorithms

CodeWhisperer enables you to implement well-known algorithms with ease. Simply tell CodeWhisperer which algorithm you want to apply to a particular coding problem, and it will oblige.

User input (JavaScript):

const actors = [ … ];

// Sort the actors by age using the merge sort algorithm.

Result:

const actors = [ … ];

// Sort the actors by age using the merge sort algorithm.
function mergeSort(arr) {
if (arr.length < 2) {
return arr;
}

const middle = Math.floor(arr.length / 2);
const left = arr.slice(0, middle);
const right = arr.slice(middle);

return merge(mergeSort(left), mergeSort(right));
}

function merge(left, right) {
const result = [];

while (left.length && right.length) {
if (left[0].age < right[0].age) {
result.push(left.shift());
} else {
result.push(right.shift());
}
}

while (left.length) {
result.push(left.shift());
}

while (right.length) {
result.push(right.shift());
}

return result;
}

console.log(mergeSort(actors));

Writing unit tests

CodeWhisperer accelerates the implementation of unit tests. Not only is it able to implement the logic for most of the explicit unit tests I ask for, but it often suggests unit tests for edge cases that hadn’t even occurred to me.

When I want CodeWhisperer to implement a unit test, I start with a declaration of the condition I’m testing and the expected result as in the following example.

User input (JavaScript):

describe(“Folder.addChild()”, () => {
it(“when adding an existing folder should not add duplicate”

Result:

describe(“Folder.addChild()”, () => {
it(“when adding an existing folder should not add duplicate”, () => {
const folder = new Folder(“root”);
const child = new Folder(“child”);
folder.addChild(child);
folder.addChild(child);
expect(folder.children.length).toBe(1);
});
});

After you’ve written a test or two, CodeWhisperer is likely to suggest more tests, even without explicit prompting.

Tip: You’ll notice that my test description includes 1) the function I’m testing, 2) the condition I’m testing, and 3) what I expect the result to be. Following this three-part pattern is a great way to ensure CodeWhisperer has the context it needs to implement high quality test logic.

Creating sample data

One of the delightful discoveries I’ve made is that CodeWhisperer can generate dummy data—or really any code that has a repeating pattern. As an illustration, for the following list I entered only the first “user” object as an example, and CodeWhisperer helpfully continued the pattern.

User input (JavaScript):

const fakeUsers = [
{ name: “User 1”, id: “user1” },

Result:

const fakeUsers = [
{ name: “User 1”, id: “user1” },
{ name: “User 2”, id: “user2” },
{ name: “User 3”, id: “user3” },
{ name: “User 4”, id: “user4” },
{ name: “User 5”, id: “user5” },
{ name: “User 6”, id: “user6” }
]

It even works for generating random but plausible values like the names seen below.

User input (JavaScript):

const fakeUsers = [
{ name: “Sally”, id: “user1” },

Result:

const fakeUsers = [
{ name: “Sally”, id: “user1” },
{ name: “Bob”, id: “user2” },
{ name: “Joe”, id: “user3” },
{ name: “Jane”, id: “user4” },
{ name: “Jill”, id: “user5” },
{ name: “Jack”, id: “user6” }
]

Simplifying regular expressions

While CodeWhisperer can’t translate natural language into custom regular expressions (yet), it does know a lot of commonly used regular expressions. This saves you time by putting these regular expressions at your fingertips whenever you need them.

Let’s look at one notoriously complex but common regular expression.

User input (JavaScript):

/**
* Function that validates whether a string is a valid email address.
*/

Result:

/**
* Function that validates whether a string is a valid email address.
*/
function validateEmail(email: string): boolean {
const re =
/^(([^<>()[]\.,;:s@”]+(.[^<>()[]\.,;:s@”]+)*)|(“.+”))@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}])|(([a-zA-Z-0-9]+.)+[a-zA-Z]{2,}))$/;
return re.test(String(email).toLowerCase());
}

Learning third-party code libraries faster

CodeWhisperer helps accelerate my learning when I’m using an unfamiliar library, module, or framework. In the example below, I needed to write a utility function that would take a gzip compressed value coming from an API and translate it into a value that the rest of my code could understand. I knew the “pako” library would help me, but I had no idea how to use it. I just prompted CodeWhisperer with an import statement and a comment describing my goal.

import pako from “pako”;

/**
* Accepts a base64-encoded, gzip-compressed JSON string and returns
* a parsed JSON object.
*/

Result:

import pako from “pako”;

/**
* Accepts a base64-encoded, gzip-compressed JSON string and returns
* a parsed JSON object.
*/
export function decodeGzipJson(base64String) {
const compressed = atob(base64String);
const uncompressed = pako.inflate(compressed, { to: “string” });
return JSON.parse(uncompressed);
}

Documenting code

CodeWhisperer is capable of generating docstrings and comments for the code it generates, as well as for your existing code. For example, let’s say I want CodeWhisperer to document the matches() method of this FavoritesFilter TypeScript class I’ve implemented (I’ve omitted some implementation details for brevity).

class FavoritesFilter implements IAssetFilter {

matches(asset: Asset): boolean {

}
}

I can just type a doc comment delimiter (/** */) immediately above the method name and CodeWhisperer will generate the body of the doc comment for me.

Note: When using CodeWhisperer in this way you may have to manually trigger a suggestion using Option + C (Mac) or Alt + C (Windows).

class FavoritesFilter implements IAssetFilter {

/**
* Determines whether the asset matches the filter.
*/
matches(asset: Asset): boolean {

}
}

Conclusion

I hope the techniques above inspire ideas for how CodeWhisperer can make you a more productive coder. Install CodeWhisperer today to start using these time-saving techniques in your own projects. These examples only scratch the surface. As additional creative minds start applying CodeWhisperer to their daily workflows, I’m sure new techniques and best practices will continue to emerge. If you discover a novel approach that you find useful, post a comment to share what you’ve discovered. Perhaps your technique will make it into a future article and help others in the CodeWhisperer community enhance their superpowers.

Kris Schultz (he/him)

Kris Schultz has spent over 25 years bringing engaging user experiences to life by combining emerging technologies with world class design. In his role as 3D Specialist Solutions Architect, Kris helps customers leverage AWS services to power 3D applications of all sorts.

Building binary trees from inorder-depth lists

I ran into an interesting algorithm while hacking on Advent of Code a while ago. This post is a summary of what I’ve
learned.

Consider a binary tree that represents nested pairs, where each pair consists
of two kinds of elements: a number, or another pair. For example, the nested
pair ((6 9) ((3 4) 2)) is represented with this tree [1]:

(ignore the numbered lines on the right for now, we’ll get to them shortly)

Trees representing such pairs have the following characteristics:

Leaves hold numbers, while internal nodes don’t hold numbers, but only
pointers to child nodes.
Each node in the tree has either 0 or 2 children.
A non-empty tree has at least one internal node.

While looking for alternative solutions to a
problem, I ran across Tim Visée‘s Rust
solution which uses an interesting representation of this tree. It’s represented
by an in-order traversal of the tree, with a list of (value depth) pairs
where value is a leaf value and depth is its depth in the tree. The
depth starts from 0 at the root – this is what the numbered lines in the diagram
above represent.

For our sample tree, the inorder-depth representation is as follows:

(6 2) (9 2) (3 3) (4 3) (2 2)

The surprising realization (at least for me) is that the original tree can
be reconstructed from this representation! Note that it’s just a list of leaf
values – the internal nodes are not specified. It’s well known that we can’t
reconstruct a tree just from its in-order traversal, but a combination of the
added depth markers and the restrictions on the tree make it possible.

I’ll present a recursive algorithm to reconstruct the tree (based on Tim Visée’s
code, which does not explicitly rebuild the tree but computes something on it);
this algorithm is very clever and isn’t easy to grok. Then, I’ll present an
iterative algorithm which IMHO is easier to understand and explain.

But first, let’s start with the data structures. The full (Go) code is available
on GitHub.

type DItem struct {
Value int
Depth int
}

type DList []DItem

This is our representation of the inorder-depth list – a slice of DItem
values, each of which has a numeric value and depth.

The tree itself is just what you’d expect in Go:

type Tree struct {
Value int
Left, Right *Tree
}

Recursive algorithm

Here is the recursive version of the tree reconstruction algorithm:

func (dl DList) BuildTreeRec() *Tree {
cursor := 0

var builder func(depth int) *Tree
builder = func(depth int) *Tree {
if cursor >= len(dl) {
return nil
}

var left *Tree
if dl[cursor].Depth == depth {
left = &Tree{Value: dl[cursor].Value}
cursor++
} else {
left = builder(depth + 1)
}

var right *Tree
if dl[cursor].Depth == depth {
right = &Tree{Value: dl[cursor].Value}
cursor++
} else {
right = builder(depth + 1)
}
return &Tree{Left: left, Right: right}
}

return builder(1)
}

I find this algorithm fairly tricky to understand; the combination of double
recursion with mutable state is powerful. Some tips:

cursor represents the next item in the inorder-depth list; it may
help thinking of it as a queue; taking dl[cursor] and advancing cursor
is akin to popping from the head of the queue.
The depth parameter represents the depth in the tree the builder is
currently on. If the next item in the queue has a matching depth, we construct
a leaf from it. Otherwise, we recurse with higher depth to construct an
internal node starting from it.
The basic recursive invariant for builder is: the remaining items in
dl represent a pair: build its left side, then build its right side.

If it’s still not 100% clear, that’s OK. In what follows, I’ll describe an
alternative formulation of this algorithm – without recursion. IMHO this version
is easier to follow, and once one gets it – it’s also easier to understand the
recursive approach.

Iterative algorithm

To get some intuition for how the algorithm works, let’s first work through the
example we’ve using throughout this post. We’ll take the inorder-depth
representation:

(6 2) (9 2) (3 3) (4 3) (2 2)

And will see how to construct a tree from it, step by step. In what follows, the
numbered list walks through inserting the first 6 child nodes into the tree,
and the steps correspond one-to-one to the diagrams below the list. Each step
of the algorithm inserts one node into the tree (either an internal node or
a leaf node with the value). The red “pointer” in the diagrams corresponds to
the node inserted by each step.

Let’s assume we begin with the root node already created.

To insert (6 2) we have to get to depth 2. The children of the root node
would be at depth 1, so we have to create a new internal node first. Since
the list is in-order, we create the left child first and move our pointer to
it.
Now our current node’s children are depth 2, so we can insert (6 2).
Since the current node has no left child, we insert 6 as its left child.
The next node to insert is (9 2). The node we’ve just inserted is a leaf,
so we backtrack to its parent. Its children are depth two, and it has no
right child, so we insert 9 as its right child.
The next node to insert is (3 3). The current node is a leaf so it can’t
have children; we climb up to the parent, which already has both its children
links created. So we climb up again, to the root. The root has a left child,
but no right child. We create the right child.
Since the current node’s children are depth 2, we can’t insert (3 3) yet.
The current node has no left child, so we create it and move into it.
The current node’s children are depth 3, so we can insert 3 as its left
child.

And so on, until we proceed to insert all the values.

The main thing to notice here is that the insertion follows a strict in-order.
We go left as far as possible, then backtrack through the parent and turn right.
How much is “possible” is determined by the depth markers in the representation,
so there’s actually no ambiguity [2].

Before we move on to the code, one important point about reaching a parent
from a given node. There are at least two common ways to do this: one is keeping
parent links in the nodes, and another is using a stack of parents while
constructing the tree. In the code shown below, I opt for the second option –
an explicit stack of parent nodes. This code can be easily rewritten with parent
links instead (try it as an exercise!)

With all that in place, the code shouldn’t be hard to understand; here it is,
with copious comments:

// BuildTree builds a Tree from a DList using an iterative algorithm.
func (dl DList) BuildTree() *Tree {
if len(dl) == 0 {
return nil
}
// result is the tree this function is building. The result pointer always
// points at the root, so we can return it to the caller. t points to the
// current node being constructed throughout the algorithm.
result := &Tree{}
t := result

// depth is the current depth of t’s children.
depth := 1

// stack of parent nodes to implement backtracking up the tree once we’re done
// with a subtree.
var stack []*Tree

// The outer loop iterates over all the items in a DList, inserting each one
// into the tree. Loop invariant: all items preceding this item in dl have
// already been inserted into the tree, and t points to the node where the
// last insertion was made.
nextItem:
for _, item := range dl {
// The inner loop find the right place for item in the tree and performs
// insertion.
// Loop invariant: t points at the node where we’re trying to insert, depth
// is the depth of its children and stack holds a stack of t’s parents.
for {
// Check if item can be inserted as a child of t; this can be done only if
// our depth matches the item’s and t doesn’t have both its children yet.
// Otherwise, t is not the right place and we have to keep looking.
if item.Depth == depth && t.Left == nil {
t.Left = &Tree{Value: item.Value}
continue nextItem
} else if item.Depth == depth && t.Right == nil {
t.Right = &Tree{Value: item.Value}
continue nextItem
}

// We can’t insert at t.
// * If t does not have a left child yet, create it and repeat loop with
// this left child as t.
// * If t does not have a right child yet, create it and repeat loop with
// this right child as t.
// * If t has both children, we have to backtrack up the tree to t’s
// parent.
if t.Left == nil {
stack = append(stack, t)
t.Left = &Tree{}
t = t.Left
depth++
} else if t.Right == nil {
stack = append(stack, t)
t.Right = &Tree{}
t = t.Right
depth++
} else {
// Pop from the stack to make t point to its parent
t, stack = stack[len(stack)1], stack[:len(stack)1]
depth
}
}
}

return result
}

Final words

If you take some time to convince yourself that the iterative algorithm works,
it becomes easier to understand the recursive one… because it’s doing the
exact same thing! The loops are replaced by recursion; the explicit parent stack
is replaced by an implicit call stack of the recursive function, but otherwise –
it’s the same algorithm [3].

Finally, some credits are due. Thanks to my wife for helping me come up with the
iterative formulation of the algorithm. Thanks to Tim Visée for the inspiration
for this post.

[1]
Note that this is not a binary search tree; the order of values in the
leaves is entirely arbitrary.

[2]
One way the algorithm avoids ambiguity is by requiring that nodes in the
tree have either no children or two children. Nodes with one child would
confuse the algorithm; can you see why?

[3]
Here is an exercise: the code of the iterative algorithm is currently
structured to ease understanding, but what happens if we merge the
conditions of t.Left == nil, checking it in just one place and
then either inserting (if the depth matches) or keep looking; and the
same for t.Right. If you make these changes the algorithm will still
work (feel free to use the tests in the accompanying code), and it starts
resembling the recursive version even more.

Flatlogic Admin Templates banner

Rust data structures with circular references

To implement its safety guarantees, the Rust compiler keeps careful track
of ownership and references throughout a program. This makes writing certain
kinds of data structures challenging; in particular, data structures that have
circular references.

Let’s start with a simple binary tree:

struct Tree {
root: Option<Node>,
}

struct Node {
data: i32,
left: Option<Box<Node>>,
right: Option<Box<Node>>,
}

Since the Rust compiler should be able to calculate the size of a struct at
compile-time, left and right typically use a heap-allocated Box.
These boxes are wrapped in an Option because a node’s left or right child
might be empty.

Now suppose we want to add a parent link to every node. This is useful for
certain tree structures; for example, in a binary search tree (BST), a parent link can
be used to efficiently find the successor of a node. How can we do that?

The “obvious” approach fails

We can’t just add a parent: Option<Box<Node>> field to Node, because
that would imply that a node owns its parent; this is clearly wrong. In fact,
our original Node definition already makes it clear that a parent owns its
children, not the other way around.

So we probably want to add a reference instead; a parent owns its children,
but a child may refert to a parent. That sounds right; let’s try it:

struct Node {
data: i32,
left: Option<Box<Node>>,
right: Option<Box<Node>>,
parent: Option<&Node>,
}

Rust will refuse to compile this, asking for an explicit lifetime parameter.
When we store a reference in a struct field, Rust wants to know how the lifetime
of this reference relates to the lifetime of the struct itself. Fair enough, we
can do that:

struct Tree<‘a> {
root: Option<Node<‘a>>,
}

struct Node<‘a> {
data: i32,
left: Option<Box<Node<‘a>>>,
right: Option<Box<Node<‘a>>>,
parent: Option<&‘a Node<‘a>>,
}

Now the lifetime is explicit: we tell the compiler that the lifetime of the
parent reference is the same as the Node itself. Does it make sense?
Well, not really. Can’t a parent outlive its children? Based on our previous
discussions of ownership, it sounds like it can. So how can a reference in
a child assume the same lifetime? It can’t. And indeed, while Rust will compile
the above struct definitions, writing actual code using them will very quickly
run into an altercation with the borrow checker.

To be clear, Rust has no idea what “parent” or “child” means here – these are
just semantics we imbue into our code. But Rust sees a Node that has a
reference to another Node as its field, having the same lifetime as the
owning node. So it will not allow any code that may violate this lifetime
restriction, even if we know that the invariants are properly maintained at
run-time.

Now what?

Clearly, the “obvious” way doesn’t work. And indeed, thinking about it from
first principles, it shouldn’t. In this post I’m using BSTs with parent links as
a simple case study, but there are even more obvious (and perhaps more useful)
examples. Consider a graph data structure; a node has edges pointing to other
nodes, and two nodes can easily point to each other. Who owns whom? Since this
question cannot be anwered at compile-time in the general case, it means we
can’t just use plain Rust references for these “points to” relations. We need to
be somewhat more clever.

Having tackled this issue over and over again, Rust programmers
have settled on three potential solutions:

Defer borrow checking to run-time, by using a
reference-counted pointer (std::rc::Rc) to a std::cell:RefCell.
Centralize the ownership (e.g. all nodes are owned by a vector of nodes
in the Tree), and then references become handles (indices into
the abovementioned vector).
Use raw pointers and unsafe blocks.

In this post, I’ll present each of these approaches, applied to implementing
a reasonably feature-complete BST with insertion, deletion, a “get successor”
method and inorder traversal. The full code for this post is available in
this repository; the post
only presents a few snippets of interest for each approach. Refer to the code
repository for the complete implementations with comments and extensive tests.

Run-time borrow checking with Rc and RefCell

This approach uses a combination of two data structures from the Rust standard
library:

std::rc::Rc is a reference-counted pointer, providing shared ownership of
heap-allocated data. Multiple instances of Rc can refer to the same
data; when all references are gone, the heap allication is dropped [1]. This
is quite similar to a shared_ptr in C++.

Rc has a weak dual, std::rc::Weak; this represents a weak pointer
to data owned by some other Rc. While we can access the data through a
Weak, if only weak pointers remain the allocation will be dropped. In
C++ this would be a weak_ptr.

std::cell::RefCell is a mutable memory location with dynamically
checked borrow rules. RefCell allows us to take and pass around
references to heap data without the scrutiny of the borrow checker. However,
it’s still safe; all the same borrow rules are enforced by RefCell at
run-time.

You can see the full code implementing this approach here.
In what follows, I’ll highlight some interesting parts.

This is how can define our BST data structure [2]:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

pub struct Tree {
count: usize,
root: Option<NodeLink>,
}

type NodeLink = Rc<RefCell<Node>>;

#[derive(Debug)]
struct Node {
data: i32,
left: Option<NodeLink>,
right: Option<NodeLink>,
parent: Option<Weak<RefCell<Node>>>,
}

Owning “links” are represented by a Option<Rc<RefCell<Node>>>; non-owning
links are represented by Option<Weak<RefCell<Node>>>. Let’s look at some
representative code samples:

/// Insert a new item into the tree; returns `true` if the insertion
/// happened, and `false` if the given data was already present in the
/// tree.
pub fn insert(&mut self, data: i32) -> bool {
if let Some(root) = &self.root {
if !self.insert_at(root, data) {
return false;
}
} else {
self.root = Some(Node::new(data));
}
self.count += 1;
true
}

// Insert a new item into the subtree rooted at `atnode`.
fn insert_at(&self, atnode: &NodeLink, data: i32) -> bool {
let mut node = atnode.borrow_mut();
if data == node.data {
false
} else if data < node.data {
match &node.left {
None => {
let new_node = Node::new_with_parent(data, atnode);
node.left = Some(new_node);
true
}
Some(lnode) => self.insert_at(lnode, data),
}
} else {
match &node.right {
None => {
let new_node = Node::new_with_parent(data, atnode);
node.right = Some(new_node);
true
}
Some(rnode) => self.insert_at(rnode, data),
}
}
}

For simplicity, the code samples in this post separate the operation on the
root into a top level function/method, which then calls a recursive method
that operates at the node level. In this case, insert_at takes a link
and inserts the new data as the child of that node. It preserves the BST
invariant (smaller implies left child, larger implies right child). The
interesting thing to note here is the borrow_mut() call at the very
beginning. It obtains a mutable reference from the RefCell pointed to by
atnode. But this isn’t just a regular Rust mutable reference, as in
&mut; instead, it’s a special type called std::cell::RefMut. This is
where the mutability magic happens – there is no &mut in sight, yet the
code can actually mutate the underlying data [3].

To reiterate, this code remains safe; if you attempt to do another
borrow_mut() on the RefCell while the previous RefMut is in scope,
you’ll get a run-time panic. The safety is guaranteed at run-time.

Another interesting example is the private find_node method, which finds
and returns a node with the given data, starting from some node:

/// Find the item in the tree; returns `true` iff the item is found.
pub fn find(&self, data: i32) -> bool {
self.root
.as_ref()
.map_or(false, |root| self.find_node(root, data).is_some())
}

fn find_node(&self, fromnode: &NodeLink, data: i32) -> Option<NodeLink> {
let node = fromnode.borrow();
if node.data == data {
Some(fromnode.clone())
} else if data < node.data {
node.left
.as_ref()
.and_then(|lnode| self.find_node(lnode, data))
} else {
node.right
.as_ref()
.and_then(|rnode| self.find_node(rnode, data))
}
}

The .borrow() call at the beginning is how we ask a RefCell to provide
a immutable reference to the internal data (naturally, this cannot coexist at
run-time with any mutable references). When we return a node that was found, we
clone the Rc, because we need a separate shared owner for the node. This
lets Rust guarantee that the node cannot just be dropped while the returned
Rc is still alive.

As the full code sample demonstrates, this approach is workable. It takes a lot
of practice and patience to get right though, at least for inexperienced Rust
programmers. Since each node is wrapped in three levels of indirection
(Option, Rc and RefCell) writing the code can be somewhat tricky,
since at any point you have to remember which level of inderection you’re
“currently in”.

Another downside of this approach is that getting a plain reference to data
stored in the tree isn’t easy. As you can see in the sample above, the top-level
find method doesn’t return the node or its contents, but simply a boolean.
This isn’t great; for example, it makes the successor method sub-optimal.
The problem here is that with RefCell we can’t just return a regular &
reference to the data, since the RefCell must keep a run-time track of all
the borrows. We can only return a std::cell::Ref, but this leaks
implementation details. This isn’t a fatal flaw, but just something to keep in
mind while writing code using these types.

Using handles into a vector as Node references

The second approach we’re going to dicuss has the Tree owning all the nodes
created in it, using a simple Vec. Then, all node references become
“handles” – indices into this vector. Here are the data structures:

pub struct Tree {
// All the nodes are owned by the `nodes` vector. Throughout the code, a
// NodeHandle value of 0 means “none”.
root: NodeHandle,
nodes: Vec<Node>,
count: usize,
}

type NodeHandle = usize;

#[derive(Debug)]
struct Node {
data: i32,
left: NodeHandle,
right: NodeHandle,
parent: NodeHandle,
}

Once again, the full code is available on GitHub;
here I’ll show some salient parts. Here’s insertion:

/// Insert a new item into the tree; returns `true` if the insertion
/// happened, and `false` if the given data was already present in the
/// tree.
pub fn insert(&mut self, data: i32) -> bool {
if self.root == 0 {
self.root = self.alloc_node(data, 0);
} else if !self.insert_at(self.root, data) {
return false;
}
self.count += 1;
true
}

// Insert a new item into the subtree rooted at `atnode`.
fn insert_at(&mut self, atnode: NodeHandle, data: i32) -> bool {
if data == self.nodes[atnode].data {
false
} else if data < self.nodes[atnode].data {
if self.nodes[atnode].left == 0 {
self.nodes[atnode].left = self.alloc_node(data, atnode);
true
} else {
self.insert_at(self.nodes[atnode].left, data)
}
} else {
if self.nodes[atnode].right == 0 {
self.nodes[atnode].right = self.alloc_node(data, atnode);
true
} else {
self.insert_at(self.nodes[atnode].right, data)
}
}
}

Where alloc_node is:

// Allocates a new node in the tree and returns its handle.
fn alloc_node(&mut self, data: i32, parent: NodeHandle) -> NodeHandle {
self.nodes.push(Node::new_with_parent(data, parent));
self.nodes.len() 1
}

After writing code with Option<Rc<RefCell<…>>>, this handle approach feels
very simple. There are no layers of indirection; a handle is an index; a
reference to a node is a handle; handle 0 means “none”, that’s it [4].

This version is also likely to be much more efficient than the linked version,
because it has far fewer heap allocations and the single vector is very cache
friendly.

That said, there are some issues here as well.

First, we take some of the safety into our own hands. While this approach won’t
result in corrupted memory, double frees or accessing freed pointers, it
can lead to run-time panics and other problems because we deal in “raw” indices
to a vector. Due to bugs, these incides may point past the vector’s bounds,
or point at the wrong slot, etc. For example, nothing prevents us from modifying
a slot while there are “live handles” to it.

The other issue is removing tree nodes. Right now, the code “removes” a node
simply by not pointing to it with any live handle. This makes the node
unreachable through the tree’s methods, but it does not deallocate memory. In
fact, this BST implementation never deallocates anything:

// Replaces `node` with `r` in the tree, by setting `node`’s parent’s
// left/right link to `node` with a link to `r`, and setting `r`’s parent
// link to `node`’s parent.
// Note that this code doesn’t actually deallocate anything. It just
// makes self.nodes[node] unused (in the sense that nothing points to
// it).
fn replace_node(&mut self, node: NodeHandle, r: NodeHandle) {
let parent = self.nodes[node].parent;
// Set the parent’s appropriate link to `r` instead of `node`.
if parent != 0 {
if self.nodes[parent].left == node {
self.nodes[parent].left = r;
} else if self.nodes[parent].right == node {
self.nodes[parent].right = r;
}
} else {
self.root = r;
}
// r’s parent is now node’s parent.
if r != 0 {
self.nodes[r].parent = parent;
}
}

This is obviously wrong for real-world applications. At the very least, this
implementation can be improved by creating a “free list” of unused indices which
can be reused when nodes are added. A more ambitious approach could be to
implement a full-fledged garbage collector for the nodes. If you’re up for a
challenge, give it a try 😉

Using raw pointers and unsafe blocks

The third and final approach to discuss is using raw pointers and unsafe
blocks to implement our BST. This approach feels very familiar if you’re coming
from a C / C++ background. The full code is here.

pub struct Tree {
count: usize,
root: *mut Node,
}

#[derive(Debug)]
struct Node {
data: i32,

// Null pointer means “None” here; right.is_null() ==> no right child, etc.
left: *mut Node,
right: *mut Node,
parent: *mut Node,
}

Node links become *mut Node, which is a raw pointer to a mutable Node.
Working with raw pointers is quite similar to writing C code, with a few
idiosyncracies around allocating, deallocating and accsesing data through these
pointers. Let’s start with allocation; here are the Node constructors:

impl Node {
fn new(data: i32) -> *mut Self {
Box::into_raw(Box::new(Self {
data,
left: std::ptr::null_mut(),
right: std::ptr::null_mut(),
parent: std::ptr::null_mut(),
}))
}

fn new_with_parent(data: i32, parent: *mut Node) -> *mut Self {
Box::into_raw(Box::new(Self {
data,
left: std::ptr::null_mut(),
right: std::ptr::null_mut(),
parent,
}))
}
}

The simplest way I found to allocate new memory for a raw pointer is using
Box::into_raw, and it works well as long as we remember that deallocating
this memory is on us from that point on (more on this later).

Let’s see how insertion works:

/// Insert a new item into the tree; returns `true` if the insertion
/// happened, and `false` if the given data was already present in the
/// tree.
pub fn insert(&mut self, data: i32) -> bool {
if self.root.is_null() {
self.root = Node::new(data);
} else {
if !insert_node(self.root, data) {
return false;
}
}

self.count += 1;
true
}

// Inserts `data` into a new node at the `node` subtree.
fn insert_node(node: *mut Node, data: i32) -> bool {
unsafe {
if (*node).data == data {
false
} else if data < (*node).data {
if (*node).left.is_null() {
(*node).left = Node::new_with_parent(data, node);
true
} else {
insert_node((*node).left, data)
}
} else {
if (*node).right.is_null() {
(*node).right = Node::new_with_parent(data, node);
true
} else {
insert_node((*node).right, data)
}
}
}
}

The interesting point to notice here is the unsafe block the body of
insert_node is wrapped in. This is required because this code dereferences
raw pointers. In Rust it’s OK to assign pointers and pass them around
without special provisions, but dereferencing requires unsafe.

Let’s see how removing nodes work; here’s replace_node, which performs
the same task as the similarly named method we’ve seen in the node handle
approach:

// Replaces `node` with `r` in the tree, by setting `node`’s parent’s
// left/right link to `node` with a link to `r`, and setting `r`’s parent
// link to the `node`’s parent. `node` cannot be null.
fn replace_node(&mut self, node: *mut Node, r: *mut Node) {
unsafe {
let parent = (*node).parent;
if parent.is_null() {
// Removing the root node.
self.root = r;
if !r.is_null() {
(*r).parent = std::ptr::null_mut();
}
} else {
if !r.is_null() {
(*r).parent = parent;
}
if (*parent).left == node {
(*parent).left = r;
} else if (*parent).right == node {
(*parent).right = r;
}
}
// node is unused now, so we can deallocate it by assigning it to
// an owning Box that will be automatically dropped.
Box::from_raw(node);
}
}

This demonstrates how we deallocate heap data through raw pointers: by using
Box::from_raw. This makes a Box that takes ownership of the data; a
Box has a destructor, so it will actually deallocate it when it goes out
of scope.

This brings us to an important point: we have to take care of releasing the
memory of the Tree now. Unlike in the previous approaches, the default
Drop implementation won’t do here, since the only thing contained in our
Tree is root: *mut Node and Rust has no idea how to “drop” that. If
we run our tests without implementing the Drop trait explicitly, there will
be memory leaks. Here’s a simple implementation of Drop to fix that:

impl Drop for Tree {
fn drop(&mut self) {
// Probably not the most efficient way to destroy the whole tree, but
// it’s simple and it works 🙂
while !self.root.is_null() {
self.remove_node(self.root);
}
}
}

As an exercise, try to implement a more efficient Drop!

Writing the code with raw pointers felt fairly natural; while the final LOC
count is similar, the raw pointers required significantly less mental burden
than using Option<Rc<RefCell<Node>>> [5]. Though I didn’t benchmark it,
my hunch would be that the pointer version is more efficient as well; at the
very least, it eschews the dynamic borrow checks that RefCell does. The
flip side is, of course, the loss of safety. With the unsafe version we
can run into all the good-old C memory bugs.

Conclusion

The goal of this post was to review the different approaches one could take to
implement non-trivial linked data structures in Rust. It covers approaches with
three levels of safety:

Fully safe with Rc and RefCell.
Memory safe but otherwise more prone to bugs (such as aliasing borrows)
with integer handles into a Vec.
Fully unsafe with raw pointers.

All three approaches have their merits and are useful to know, IMHO. Anecdotal
evidence from Rust’s own standard library and some popular crates suggests that
the third approach – using raw pointers and unsafe – is quite popular.

Thanks for reading! Since this is my first non-trivial Rust post, I’m
particularly interested in any comments, feedback or suggestions. Feel free to
send me an email or leave a comment on whatever
aggregation site this gets reposted to – I keep an eye on these from time to
time.

[1]

Rc also has a thread-safe variant
called Arc, which uses atomics and is therefore slightly slower for
single-threaded applications. In this post I’ll just use Rc, but
switching to Arc would be easy.

[2]
For simplicity, the data type throughout this post is going to be just
an i32. In a realistic program all Node types would be generic.

[3]
If you’re wondering how RefMut achieves this feat – it’s by using
unsafe, of course.

[4]
To be fair, Rust’s support for Option is really good and I missed
it when coding the handle-based approach. Some code could certainly have
been a bit shorter if I had access to the equivalent of
let Some(xx) = ….

[5]
Though this can be attributed to my relative inexperience with
Rust and a long history of writing C and C++.

Learn to build applications with F#

You might be completely new to .NET, or a seasoned C# / VB.NET developer who wants to expand their horizons. Either way, F# is a great language to learn.

F# makes it easy to write succinct, robust, and performant code. It has a lightweight syntax that requires very little code to build software. It’s backed by a powerful type system, convenient standard library, and .NET runtime that you can trust to build mission-critical software that is correct, fast, and reliable.

If you’re interested in getting started, this is the perfect time – we’ve got a lot of fresh videos, courses, and more to help you get started today.

Getting started

If you’re just getting started with F#, we’ve got some great content to get you started!

Let’s Learn .NET: F#

Let’s Learn .NET is a monthly series of beginner friendly courses to teach you the basics. In Let’s Learn .NET: F#, Luis Quintanilla and Jayme Singleton walk through the the basics of F# in two hours, with simple explanations and Q&A from live viewers.

Get a quick start Microsoft Learn

You can get a quick start on F# fundamentals using the self guided, interactive learning modules at Microsoft Learn.

The team has just launched a new F# Learning Path that makes it easy to learn the fundamentals.

Write your first F# code
Store and retrieve data by using F#
Manage execution flow in F#
Create and architect with functions in F#
Store and apply operations on list data in F#

Check out our updated Learn F# content list

We’ve updated the .NET Learning Center with lots of great learning resources for F#. Don’t miss the new Beginner’s Series to F#!

Explore the world of F# with .NET Conf: Focus on F

.NET Conf: Focus on F# was a broadcasted on July 29, 2021, and included an amazing lineup of F# experts and content. 9 hours of presentations covering a broad range of F# content from industry leaders? It’s all here!

Go deeper with .NET Live TV interviews

.NET Live TV broadcasts five days a week, covering the latest .NET technologies, frameworks, and libraries. During July, a number of the shows covered F# topics, leading up to our .NET Conf: Focus on F# and Let’s Learn .NET: F# events. These shows include a lot of interesting deep dives into F# topics, with interviews and live Q&A with industry leaders.

The .NET Docs show – F#: Ask Me Anything
The .NET Docs show – F# – The Pit of Success
On .NET Live – Fun Functions for F# folks
On .NET Live – Exploring Spark and ML .NET with F#
On .NET Live – Let’s Talk Functional-First Programming!
ASP.NET Community Standup – Building ASP.NET Core apps in F#

Summary

It’s always great to expand your horizons as a programmer by learning a new language, and F# is a great choice for both practical application as well as just expanding the way you approach programming. I hope you’ll take advantage of some of this fresh content to learn F# and get started today!

The post Learn to build applications with F# appeared first on .NET Blog.