Basic Memory Management in Adamant

Dec 17, 2018 • Jeff Walker

The Adamant programming language uses compile-time memory management (CTMM). That is, it doesn’t use a garbage collector, nor does it require the programmer to do manual memory management. Instead, references are checked at compile time for safety, and the compiler determines when to free memory. There has been research on ideas like this, but there wasn’t a widely used, practical programming language with compile-time memory management until Rust. Its single owner model with borrowing has demonstrated the feasibility of CTMM. Adamant is improving on the ergonomics of the single owner model and hoping to provide additional options. Rust is a systems programming language where the focus is on relatively low-level details of the hardware. Adamant is a high-level language in the vein of C♯, Java, Scala, Kotlin, and Swift. Whenever possible, it abstracts away low-level details.

Benefits

Compile-time memory management has many advantages over manual memory management. These advantages are part of what drove the adoption of the Rust programming language. Hopefully, future systems programming languages will see this as an essential feature to include. For Adamant, however, the crucial question is: what are the benefits of its approach over a garbage collected language? As with anything, there will be tradeoffs, but for many situations, the benefits outweigh the costs. Let’s look at some of those benefits.

Unified Resource Management

CTMM uses the resource acquisition is initialization (RAII) idiom to unite the management of other computer resources with memory management. Resources such as file handles, network sockets, locks and anything else with limited availability that must be released. Garbage collected languages leave the management of these to the programmer. Object finalizers can be a backstop, but without guarantees about when or if they are run, the developer must be sure to close files and release resources as soon as they are no longer used. Several languages advocate the disposable pattern for this, but it is notoriously tricky and verbose. Enough so that some provide statements specifically to aid in resource cleanup. In Adamant, the same mechanisms that ensure safe, timely release of memory also handle these other resources. No additional patterns or constructs are needed. This eliminates bugs from failure to release resources.

Safety

The great promise of Rust is safety. In systems programming safety is a big deal. A lot of that safety in Rust comes from the single owner memory model. Garbage collectors provide memory safety, but that isn’t the only kind of safety that matters. Software bugs cost the world millions of dollars every year. We need new languages that help us write code with fewer bugs.

Safe Concurrency

The single owner memory management model isn’t just about memory safety. It also enables safe concurrency. Concurrent use of shared mutable state is the source of most concurrency issues. Mutable state that isn’t shared is fine. Immutable data that is shared is fine. When concurrent processes try to read and write to shared mutable state, that is when bad things can happen. Many languages provide locking mechanisms developers can use. But locks are easy to misuse, and it is easy to create deadlock bugs accidentally. This is such a problem that languages built for concurrency, such as Erlang, often disallow shared mutable state entirely by mandating one particular concurrency strategy. Avoiding shared mutable state is also providing some of the impetus for the growing popularity of functional languages. Adamant uses CTMM and mutability permissions to enable safe shared mutable state without locks. How is that possible? Well, it controls the sharing of mutable state between threads so that no state is ever mutable by one thread at the same time other threads have access to it. So technically, it prevents the sharing. However, it is so easy to safely pass around ownership, borrow mutability and share immutable access to state that it enables fearless use of concurrency.

Safe Asynchronous Programming

The rise of web programming and changes in the performance characteristics of computers have led to more asynchronous programming. New async/await features pioneered by C♯, and now being added to JavaScript, Scala, and Python help support this. While asynchronous programming can be done in a single-threaded manner, the real potential lies in allowing asynchronous code to run in multiple threads. That would mean fine-grained concurrency across every asynchronous method call boundary. Managing that concurrency is a recipe for bugs. This is a problem many asynchronous C♯ programs already have even though it isn’t widely acknowledged. Safe, fearless concurrency also means fully unleashing safe asynchronous programming.

Safe Iteration

While less of an issue, there are other safety issues addressed by CTMM. As one example, consider the problem of iterator invalidation. While iterating over a collection, it is unsafe to modify that collection by adding or removing items from it. Both C♯ and Java check for this condition at runtime and throw an exception. Adamant detects this at compile time and reports an error. This means less time spent debugging and fewer runtime checks and failure modes.

Simplicity

Simplicity has long been a virtue among computer programmers. They’ve enshrined it in principles like “Keep it simple, stupid” (KISS), “You aren’t gonna need it” (YAGNI), and “Don’t repeat yourself” (DRY). Many wise programmers try to minimize the code running in their app. Doing so leads to improved maintainability, performance, predictability and fewer bugs. Yet every application relying on a garbage collector has a large amount of code running in the background to manage memory. It isn’t uncommon that there are more lines of code in the garbage collector than in the program itself. Sure that code has been carefully written, tuned and debugged by teams of developers. Still, complaints of slow and unpredictable performance and memory usage are too common. However well written the garbage collector is, the only bug-free code is the code that doesn’t exist. CTMM eliminates the garbage collector, thereby removing one more complex piece from the runtime system. It replaces it with a straightforward strategy for safe memory management.

Optimizable

Many developers don’t realize the performance impact of allocating memory. A heap allocation often has to acquire locks on shared heap management data structures. Then it must find an appropriate free space in which to allocate the memory. Compacting garbage collectors can improve this some. However, garbage collected languages encourage a programming style where most objects are allocated on the heap one by one, thereby maximizing the use of heap allocation. Sophisticated compilers perform escape analysis to determine which allocations never leave the scope of a function so they can be allocated on the stack. By contrast, the single owner memory model contains within it the information about which objects escape. Indeed, it may provide a fuller picture because the compiler’s escape analysis may not span enough functions to make a correct determination. More importantly, CTMM encourages a programming style where fewer objects escape the function in which they are allocated. This means more allocations can be optimized onto the stack for maximum performance.

Transparency

Closely related to the runtime simplicity of CTMM is its transparency. That is, it is easy for developers to see what memory management operations will be performed and where. Developers can easily predict the behavior of their code and modify that behavior when necessary.

Enjoyable

All of these benefits aren’t worth much if it is too much work and pain to write code in a language with CTMM. For that, let’s look at Rust, the one language with CTMM that has seen widespread use. Every year Stack Overflow surveys their users on a variety of programming related topics. In the 2018 Developer Survey, Rust was the most loved programming language. This is the third year in a row it has held that spot. This is determined by the percentage of programmers working in a language who “expressed interest in continuing to develop with it.” There are many reasons why developers so appreciate Rust, but its memory management is a part of it. As the Rust book writes, “ownership is Rust’s most unique feature.” Whatever challenges the single owner model brings to programming in Rust, these developers still enjoy working in it, and Adamant improves on the ergonomics of CTMM in Rust by abstracting away more of the low-level details.

An Introduction

With those benefits, CTMM is worth considering. Let’s take a look at how it works in Adamant. This is an introduction to memory management in Adamant intended for programmers familiar with strongly typed object-oriented languages like those listed before. Knowledge of memory management in Rust might be helpful, but shouldn’t be necessary. This introduction only covers the parts of Adamant that have a parallel to Rust. They are an adaptation of the ideas in Rust to a high-level object-oriented language with reference types. The most radical departure is in how lifetime relationships are handled (though the Dyon scripting language independently invented something similar). Despite that, readers familiar with Rust might be best served by taking Adamant on its own terms at first and only later comparing it to Rust. Thinking about Adamant code in terms of the equivalent Rust code can be confusing and lead to incorrect intuitions.

Please keep in mind that:

  • Adamant is still under development so the syntax, especially around lifetimes, may change in the future.
  • This is an introduction; it does not cover all the features of CTMM in Adamant.
  • CTMM can be confusing to those unfamiliar with it but is relatively simple once understood. This post is by necessity brief and may not explain it in a way accessible to all readers. If something doesn’t make sense, please check back when there is more documentation available.
  • The features presented here are only a starting point for what might be possible.

Value Types and Reference Types

Before we can talk about compile-time memory management, we need to understand the distinction between value types and reference types. A value type is allocated directly in its variable. That is, it is allocated on the stack or in the declaring object. For value types, assignment typically means the value is copied.1 A reference type is allocated separately from the variable. The variable is just a reference to the object that is allocated on the heap. Reference types allow for multiple variables to reference the same object. That way, an update to the object through one reference is visible from all other references to the object. For reference types, assignment copies the reference, not the object. After an assignment, there is one more variable that references the same object.

As with many languages, Adamant’s primitive number types are value types. Thus simple math operations work the same as they do in other languages. There is nothing new to learn and nothing extra to deal with.

public fn square(x: int) -> int
{
    return x * x;
}

In Adamant developers can declare new value types using structs. This is similar to C♯ structs, and Java is adding support for user-defined value types soon. Structs are useful for specific things, but most of the time, classes are used instead of structs.

public copy struct complex_number
{
    public let real: float;
    public let imaginary: float;

    // Constructor shorthand that initializes the fields
    public new(.real, .imaginary) { }
}

Although the primitive types are value types, most types in Adamant are reference types. Just like in other object-oriented languages all classes and traits are reference types (traits are akin to interfaces or protocols). Reference types are where CTMM differs from garbage collected memory management. They are the focus from here on.

Mutability

One more thing before we deal with memory management directly, in Adamant, both objects and variables are immutable by default. For an object to be mutable, its type must be declared with the mut keyword. To make a variable binding mutable declare it with var instead of let. Note that the mutability of objects and variables bindings are independent.

let numbers1: List[int] = new List(); // a list of ints
numbers1.add(236); // ERROR: Can't mutate non-mutable list
numbers1 = new List(); // ERROR: Can't modify let bindings

let numbers2: mut List[int] = new List();
numbers2.add(89); // adds 89 to the list
numbers2 = new List(); // ERROR: Can't modify let bindings

var numbers3: List[int] = new List();
numbers3.add(89); // ERROR: Can't mutate non-mutable list
numbers3 = #[1, 2, 3]; // numbers refers to a list of 1, 2, 3

Ownership

With that background, we can look at memory management. In Adamant, every object has a single reference that owns it. The owning reference determines when it will be deleted. As soon as that reference goes out of scope, the object will be deleted. Ownership is part of the type of a variable and is declared with the special lifetime owned. We’ll see what lifetimes are as we go along. For now, think of it as an extra annotation on the type of the variable. Lifetimes are distinguished by the lifetime operator $.

public fn main(console: mut Console)
{
    let greeting: mut String_Builder$owned = new String_Builder();
    greeting.append("Hello");
    greeting.append(", World!");
    console.write_line(greeting.to_string());
}

In the example above, a new String_Builder is made and assigned to the variable greeting. Since greeting is the only variable referencing the string builder, it has to be the owner. The variable’s type reflects this. Declaring ownership all the time would be very tedious, but in Adamant, the type of local variables can be omitted, and the compiler will figure out the correct type. Even in cases where the type must be specified because the compiler can’t figure it out, the lifetime can often still be omitted. For example, an equivalent declaration is let greeting = mut new String_Builder();.

Transferring Ownership

Ownership of an object can be passed to and returned from functions and passed between variables. Passing ownership is done using the move keyword. Using a variable after a reference has been moved out of it is an error.

public fn append_testing(builder: mut String_Builder$owned)
    -> String_Builder$owned
{
    builder.append("testing...");
    return builder;
}

public fn main(console: mut Console)
{
    let x = mut new String_Builder();
    let y = mut append_testing(move x);
    x.append("not allowed") // ERROR: reference moved out of x
    let z = move y; // Move ownership from y to z
    console.write_line(z.to_string())
}

In this example, the append_testing function receives ownership of the string builder, appends a string to it and then returns ownership of that string builder.

Borrowing

If passing an object to a function always required moving it and thereby losing ownership, that would be very annoying. Notice the append_testing function had to return the string builder so it could be used afterward. If it hadn’t, nothing further could have been done with the object. The solution to this is borrowing. Borrowing lets an object be used temporarily without changing which reference owns it. Borrowing is the default when passing and assigning references. With borrowing, the append_testing function is easier to write and use.

public fn append_testing(builder: mut String_Builder)
{
    builder.append("testing...");
}

public fn main(console: mut Console)
{
    let x = mut new String_Builder();
    append_testing(x);
    console.write_line(x.to_string())
}

Borrowing an object beyond when it is deleted isn’t allowed.

public fn borrow_too_long()
{
    let x: Book;
    {
        // y owns the book
        let y = new Book("The Fable of the Dragon-Tyrant");
        x = y; // x borrows the book
        // the book is deleted when y goes out of scope at `}`
    } // ERROR: object referenced by x borrowed too long
    x.read();
}

In the function above, the reference yowns the Book object. At the end of the block y is declared in, it goes out of scope and automatically deletes the Book object. However, x has borrowed that object and still has a reference to it that is used later in x.read(). That call could crash when it tried to use the deleted Book object. Instead, the compiler tells us it is invalid. Ownership and borrowing thereby provide memory safety. In low-level languages, there are lots of ways to cause errors by misusing memory. In memory safe languages like C♯, Java, Rust and Adamant, that isn’t possible.

Mutability when Borrowing

The borrowing rules provide not only memory safety, but also concurrency safety. They prevent data races. A data race occurs when multiple threads try to read and write the same place in memory without correct locking. Data races can result in nondeterministic behavior and reading invalid data. To prevent that, only one mutable borrow of an object can be active at a time. Alternatively, multiple immutable borrows are allowed if there is no mutable reference to the object. When an object is borrowed, the owning reference is temporarily restricted as well.

public fn add_pairs(x: List[int], y: List[int])
    -> List[int]$owned
{
    return x.zip(y, fn(a, b) => a + b).to_list();
}

// Takes an element from x and adds it to y
public fn move_element(x: mut List[int], y: mut List[int])
{
    let n = x.remove_at(0);
    y.add(n);
}

public fn main()
{
    let numbers: mut List[int] = #[1, 2, 3]; // initialize list

    // immutably borrow numbers twice. sum == #[2, 4, 6]
    let sum = mut add_pairs(numbers, numbers);

    // mutably borrow numbers and sum once each
    move_element(mut numbers, mut sum);

    // can't mutably borrow numbers twice
    move_element(mut numbers, mut numbers); // ERROR
}

The Forever Lifetime

Ownership and borrowing are sufficient for most functions. However, some values exist for the lifetime of the program and will never be deleted. String literals are the most common example. For values like that, there is the special lifetime forever. Borrowing from an object with the lifetime forever is the same as borrowing from an owned object.

public fn main(console: mut Console)
{
    let greeting: String$forever = "Hello, World!";
    console.write_line(greeting); // borrows the string
}

Fields

So far, we’ve seen how Adamant manages memory in functions. Now let’s look at how to manage memory in classes. In object hierarchies, the most common relationship between objects is composition (as opposed to aggregation). In composition, the child object can’t exist independent of the parent object, for example, a house (parent) is composed of its rooms (children). For this reason, the default lifetime for class fields is owned.

public class House
{
    public let driveway: Driveway; // implicit $owned
    public let rooms: List[Room]; // implicit List[Room]$owned
}

Lifetime Constraints

A lifetime is an abstract idea of how long an object or reference will exist when the program is running. The borrowing rules require that the lifetime of all borrowed references is less than that of the object being borrowed. Sometimes objects are borrowed in more sophisticated ways, and it is necessary to spell out the relationship between the lifetimes of a function’s parameters and return value. We do this with lifetime relationship operators.

public class Car
{
    public let model_year: Year;
    public let tires: List[Tire]; // implicit List[Tire]$owned

    // the special parameter "self" is like "this"
    public fn oldest_tire(self) -> Tire$< self
    {
        return tires.order_by(fn(t) => t.replaced_on).first();
    }
}

The oldest_tire function returns the tire that was replaced longest ago. The car object still owns the tire; the oldest_tire function is returning a borrowed reference to that tire. The type Tire$< self is read “Tire with a lifetime less than self” and indicates the relationship between the lifetimes of the tire object and the car object. The compiler uses that to ensure memory and concurrency safety by preventing modification or deletion of the car object as long as the tire is borrowed. In this case, it isn’t necessary to spell out the lifetime relationships as we did. In Adamant, there are a few simple rules for lifetime elision that allow the omission of lifetimes in common cases. One of those cases is methods, for which the assumption is that the return value is borrowed from inside the object. So in this case, the function could be declared public fn oldest_tire(self) -> Tire. There are cases where that isn’t possible though.

public fn second_car(first: Car, second: Car) -> Car$< second
{
    first.drive();
    return second;
}

In the second_car function, it isn’t possible to leave out the lifetime relationship. Based on the parameters, it’s possible the returned car is related to either the first or second car. Rather than guessing which, the compiler requires that it be specified.

Lifetime Parameters

Occasionally, lifetime relationships exist that can’t be specified using just simple relationships. When that happens, we use lifetime parameters. A lifetime parameter provides an abstract lifetime that may not be the lifetime of any actual object but explains the relationship between the lifetimes of the various objects and references. Lifetime parameters are generic parameters prefixed with the lifetime operator. Like other generic parameters, their value can often be inferred when calling a function.

public fn newer_car[$a](c1: Car$> a, c2: Car$> a) -> Car$< a
{
    return if c1.model_year >= c2.model_year => c1
           else => c2;
}

public fn main() -> int
{
    let c17 = new Car(2017);
    let c18 = new Car(2018);
    let newer = newer_car(c17, c18); // lifetime $a is inferred
    return newer.model_year;
}

As you might guess from its name, the newer_car function returns the car with the newer model year. The borrowed reference must not outlive either car, since either could be returned. To express this, we introduce a lifetime parameter $a that represents a lifetime in between the lifetimes of the objects passed in and the borrowed reference returned. Note that this lifetime may, or may not, actually be equal to the lifetime of one of the cars or the returned reference.

Lifetime parameters can also be useful for putting borrowed references into the fields of objects.

public class Employee[$manager]
{
    public let name: String$forever;
    public let boss: Employee?$manager;

    public new(.name, .boss = none) { }
}

public fn main(console: mut Console)
{
    let boss = new Employee("Alice");
    let employee = new Employee("Bob", alice);
    console.write_line(
        "\(boss.name) is \(employee.name)'s boss.");
    // prints "Alice is Bob's boss."
}

Here we use a lifetime parameter to borrow a reference to an employee’s boss. Of course, some employees don’t have a boss (they are top dog) so the reference to the boss is optional as indicated by the ? after the type. When creating employee objects, their lifetimes will have to be less than the lifetime of the boss object (unless we were to assign them a new boss).

Remaining Cases

Experience with Rust shows that a significant amount of code can be written easily with nothing more than single ownership and borrowing. Explicit lifetime parameters and constraints are needed only occasionally. Adamant enables this approach in a way that is simpler and more familiar to many programmers. In Adamant, the majority of code requires minimal annotations added to what the same program would be in a garbage collected language. For example, in MVC architecture web applications, most memory allocations are already scoped to the lifetime of the web request. Developers should quickly become proficient enough in Adamant that compile-time memory management is an insubstantial burden most of the time.

However, single ownership with borrowing does not support all use cases. It doesn’t support object graphs that can’t be represented as a tree or that include references to parent nodes. We need other approaches for these. Assuming something like 80% of cases are handled, we need a solution for the remaining 20%. It’s important to remember that a perfect compile-time solution isn’t required. If 90% to 95% of cases are handled at compile time, we can deal with the remaining manually or with reference counting. The Swift language has demonstrated the viability of an automatic reference counting approach. In future posts, I’ll discuss the problematic cases in more detail and look at some ideas for how to address them.

  1. Adamant also supports value types with move semantics. This is the default for structs in Rust. However, in Adamant, value types with move semantics are rare. 

Garbage Collection is a Hack

Nov 28, 2018 • Jeff Walker

At the dawn of the heap, there was manual memory management, and it was bad. Programmers were forced to keep track of when to free memory, and whether it had been freed. That distracted from the task of writing domain logic. It made some software architectures neigh impossible because tracking whether memory had been freed is infeasible in them. Predictably, it led to mistakes—the source of many memory leaks, bugs, and crashes. Worst of all were the numerous security vulnerabilities created.

In time, programmers developed regimented practices for managing memory. As a general rule, the allocator of a block of memory was responsible for freeing it. They carefully documented when functions returned memory the caller now owned. In some languages, they codified these rules into data types like “auto_ptr” in C++. These helped, but were insufficient.

It didn’t take long for programmers to think about automating the task of managing memory. Thus was born garbage collection (GC). There were early languages that used garbage collection, but for many years they were regarded as too slow and inefficient. These languages weren’t mainstream or widely used. A few languages offered GC as an option that could be enabled or used for a subset of allocations. One approach to GC, reference counting, often fit better into an existing manual management approach. Some languages provided it in the standard library. Yet, lack of language support limited safety and convenience. Often, programmers had to explicitly increase and decrease the reference count, leading to similar opportunities for mistakes.

Garbage collection was used for academic languages, or when trading performance for ease of programming. The latter situations included languages for scripting, beginners and rapid development. This remained the status quo for many years. As computers got faster, and GC techniques improved and matured, attitudes shifted. The Java programming language seems to have marked a turning point. Garbage collection was an essential part of Java’s virtual machine approach to safe “write once, run anywhere” software. As Java’s popularity grew, it demonstrated to a generation of programmers that computers were ready for the age of garbage collection. Performance was still sometimes an issue, but for the most part, it was good enough. Most languages designed since Java have included garbage collection. Attitudes have shifted so far in favor of GC, that now even “systems” programming languages like D and Go offer or require garbage collection.

The rapid, widespread adoption of garbage collection had several causes. Multiple orders of magnitude improvements in computer performance and memory capacity mitigated the performance issues. Improvements also came from better algorithms and tens of thousands of hours spent by excellent programmers tuning the implementations of garbage collectors. More importantly, the problem GC fixed was so painful that it was worth paying a high price to “fix” it. Remember, garbage collection has the following benefits:

  • Prevents Use After Free by making it impossible to access memory after freeing it.
  • Prevents Double Free since it’s not possible to free a block of memory repeatedly.
  • Significantly Reduces Memory Leaks by eliminating forgetting to free memory when it is no longer used.
  • Improves Programmer Productivity by allowing them to not worry about memory management and by increasing writability, thereby decreasing development time and costs.
  • Increases Flexibility in Software Architecture by promoting modular design. Explicit deallocation causes modules to be responsible for knowing whether other modules are interested in a particular object.
  • Simplifies Efficient Persistent Data Structures by removing the complexity of tracking which parts of the structure are still in use.
  • Simplifies Lock-free Data Structures because the alogrithms for implementing them without GC are more complex.

With that list of benefits, it isn’t hard to see why garbage collection became the de facto strategy for memory management as soon as its performance was acceptable. Given that, how can I call garbage collection a hack?

A hack or kludge is a solution that works but is inefficient, inelegant and hard to maintain. I’ve heard the claim that garbage collection is elegant because it works off the insight that the memory to be freed is exactly the memory that is unreachable. I disagree. Rather than finding a way to manage memory without errors correctly, we just pawned the problem off on the computer. We wrote complex pieces of software that use sophisticated algorithms to manage the fact that we have abandoned any attempt to manage memory. Millions of programs essentially have a second program that is at least as complex running in the background to clean up their memory. GC uses up computing time and requires more memory. Many teams have put countless hours into writing performant garbage collectors. Even after all that, how well they perform is dependent on the particular workload of the application. What could those teams of good programmers have accomplished for their languages and tools if they didn’t have to spend that time just trying to get a garbage collector that was acceptable?

Some claim that garbage collection is better than manual memory management because of other advantages. By relocating memory, it can avoid heap fragmentation. Under some conditions, GC outperforms naive manual memory management because it doesn’t have to free each object individually but only consider the live objects. Those are rationalizations though. If one had problems with heap fragmentation, would they create a garbage collector to solve it? No, instead they would try other heap management strategies or allow for object relocation. If one had problems with the performance of freeing objects, would they create a garbage collector to solve it? No, they would batch or delay deallocation. Indeed, region and pool memory allocators handle both of these problems.

If garbage collection is a hack, then what should we use instead? A real solution would be an automatic, correct way of managing memory that introduced little or no complexity at runtime. We haven’t found that solution yet. However, we have seen the first sign that it might exist. The Rust programming language uses restrictions on memory use and a compile-time analysis to ensure memory safety and automatically insert deallocations at the correct places in the code. However, there is room for improvement. In Rust, the restrictions on memory use can be too restrictive. Also, as a systems language, it isn’t appropriate for many programs. Finally, its strategy doesn’t work for all situations, and so it’s common for Rust programs to use reference counting for some of their memory.

We don’t have a solution today, but my work on a solution leads me to believe one is probably within reach. However, there are very few people working on this. It’s not one of the trendy problems that are attracting lots of work right now. We need more people focused on the problem and exploring many possible solutions. If you want to see an alternative to garbage collection, I encourage you to lend a hand in creating it. Get involved by sharing your thoughts and ideas about it. The only project I’m aware of that is trying to tackle this is my own, the Adamant programming language. Please reach out to me if you’d like to help out.

August 2018 Progress Report

Aug 31, 2018 • Jeff Walker

The Adamant language has been under development in one form or another since at least August 2016. However, it wasn’t until May 2018 that development picked up. In June 2018, additional hours began to be committed to development each week. In all that time, the language, compiler, and website progressed considerably. However, that progress was not documented and shared outside of the commits on the compiler and docs. This is the first of what will hopefully be more frequent updates on progress on the Adamant language. As such, it summarizes all the work to date.

Current Status

The Adamant language currently exists mostly as a set of design ideas. Many aspects of this are documented, but other portions remain mostly undocumented. Generally, those portions which more closely follow the precedent set by other languages are the ones that are not well documented. Someone familiar with C# should be able to infer how many of them work. For Adamant to be successful, it needs design, tools, and documentation. The current state of each of these is:

  • Language Design:

    The language design has been relatively stable. Semantically, the most significant change has been the introduction of more compile-time code execution. Over time, there has been some drift in the syntax from Adamant’s C family heritage. These changes have been made to improve readability and consistency and make good use of the easily typed characters. The rules for borrow checking variables and references were clarified, but there is still uncertainty about how object fields should be handled. Reference counting will be supported and should be sufficient if no further borrow checking rules are added. The areas of the language that still need better design and more thought are mainly asynchronous support, effect typing, error handling, compile-time code execution and advanced generic types.

  • Compiler:

    The current compiler is implemented in C# with .NET core. This is for rapid development and because C# still allows execution on multiple platforms. The back end emits C code that can be compiled natively using clang. The compiler supports only a minimum subset of the language. The focus has been on establishing the framework to support basic functionality like type checking and borrow checking.

  • Documentation:
    • Language Reference:

      The language reference replaces a previous introductory book on the language. It is easier to write, maintain, and update the more succinct reference. Some portions of the language are nearly completely documented. Others have only the most cursory of coverage. There is a section of ideas that are being considered. In some cases, an idea has been accepted, but the reference has yet to be updated to reflect it.

    • Design Notes:

      The language reference, by necessity, leaves out information about why certain design decisions were made. The design notes describe some of these as well as laying out design principles to be followed in the design of the rest of the language. Sections are added to this document as the issues are encountered during language design.

  • Website: adamant-lang.org

    The website has been up for some time now. Content is minimal, but it does serve as an entry point to learn about the language. The most useful section is probably the links to the docs. The landing page and tagline need to be improved to focus the marketing message.

  • Blog: blog.adamant-lang.org

    This post is the first on the blog. With the publishing of this post, the blog has been added as an item in the menu of the main page. Further design work is needed, but the design is sufficient for now. Planned topics include progress updates, language design theory, language critique and evaluation, and discussion of Adamant.

Deprecated Compiler Attempts

The Adamant language has gone through a number of attempts at writing a compiler before the current one was started. The goal was to find an approach that would be easy and not waste too much effort writing a bootstrap compiler that would be discarded after the language switched to a self-hosted compiler. These early compilers can be found on GitHub under the adamant-deprecated organization. The earlier attempts were:

  1. The Adamant Bootstrap Compiler was an attempt to create and Adamant to C# translator that did not perform type checking, borrow checking or significant code transformations. This was not possible due to the significant semantic differences between the languages.
  2. The Adamant Temporary Compiler was an attempt to create a full ecosystem of compiler generation tools in C# that could be easily translated to Adamant later. This was because Adamant does not have any of those tools and they will be needed eventually. This approach was too much work and prevented progress on the language itself. It did not enable rapid prototyping and iteration.
  3. The Adamant Exploratory Compiler was an attempt to write a compiler for the full language in C# using the ANTLR 4 compiler tools. This approach was still found to be very heavyweight as each feature required changes throughout many layers and the tree produced directly by ANTLR was not ideal.
  4. The Pre-Adamant Translator was an brief attempt to create a Nanopass compiler framework for C# and use it to write a compiler for a subset of Adamant.
  5. The Pre-Adamant Compiler was to be a compiler in C# for a tiny subset of the Adamant language meant to minimize the time until a working compiler was complete.
  6. The adamant.tools.compiler was an attempt to skip directly to a self-hosting compiler through a minimal C++ bootstrap and a scheme of simple translation to C/C++.

After the failure of the first five attempts, it was decided that creating a self-hosting compiler directly would be more beneficial. This sixth attempt would allow the developer to get experience working in a subset of Adamant from very early which would help to inform the design of the language. It would also prevent wasted effort on a bootstrap compiler written in another language. This compiler started as some C++ to perform a very basic lexing and recursive descent parsing to translate Adamant to C++. It quickly reached a point where it could translate a subset of Adamant that was equivalent to the subset of C++ it was written in. The compiler code was re-written in Adamant, and the compiler was “self-hosting.” However, it didn’t do most of what a compiler should do. Rather, it was a simple translation to a subset of C++. This version of the compiler progressed rapidly at first. Eventually, a transition was made from translating to C++ to translating to C. However, it eventually reached a catch-22 point. Functionality like type checking, borrow checking, classes, enums, and other abstraction facilities needed to be implemented. However, the lack of those features was making development incredibly difficult. It was eventually decided that there didn’t seem to be a way forward. That is when the current compiler was started.

Current Compiler

Taking the lessons from the previous attempts, the adamant.tools.compiler.bootstrap project was started. This compiler is written in C# but focuses on a small subset of Adamant using a minimalist, simple approach similar to the “self-hosting” compiler. It does not yet support most of the language features the last compiler did. Instead the focus has been on laying the groundwork for important features like type checking and borrow checking rather than filling out the range of core features like control flow, expressions, and primitive types.

August 2018

Progress on Adamant in August of 2018 consisted mostly of:

  • Refactoring the compiler to use a more attribute grammar based approach.
  • Realizing that some ideas for the borrow checker around object fields weren’t going to work.
  • Reading academic literature trying to find a solution. Discussion of “ownership types” was found which gets part of the way. However, the academic literature isn’t trying to solve the problem as Adamant, and it wasn’t much help. This problem was set aside to make progress on other things. Even if this issue isn’t resolved, Adamant can still be a great language.
  • Learning the details of how the Rust borrow checker works.
  • Deciding details of how borrow checking would work in Adamant.
  • Researching the implementation of the Rust borrow checker.
  • Adding an intermediate representation (IR) layer to the compiler to aid in implementing borrow checking. This is similar to the MIR step in the Rust compiler.
  • Starting to implement borrow checking

Next Steps

The next focus will be implementing basic borrow checking for variables, parameters, and functions.

EDIT 2018-11-15: Updated URL to Language Reference.