The Language Design Meta-Problem

Feb 7, 2019 • Jeff Walker

I don’t like the majority of programming languages available today. Computer programming is a relatively young field, and we still have a long way to go. We haven’t dealt with simple problems like null safety. There are lots of other challenging issues with programming languages. Even if we solved those problems, I think we would have much farther to go. Yet, we programmers spend our time on holy wars over matters such as dynamic versus static typing.

Even the few languages I think are pretty good have serious issues. Designing a programming language requires making thousands of design choices big and small. It is tough for even the best designer to avoid making a mistake somewhere along the way. Avoiding mistakes isn’t enough though. Languages are highly interconnected. There are a limited number of symbols available for syntax and using one for something means it isn’t available for other things. The semantics of language features can interact in unintuitive ways. Even when you understand the syntax and semantics, there are issues of praxis. How will the language influence how people write code?

Despite lots of work and plenty of advances, programming language designs fall short. We need to take language design to the next level somehow. I’ve thought about language design a lot as I’ve worked on the design and implementation of Adamant. This has led me to ask why we aren’t designing better languages already. The answer I’ve come up with is that there is another, deeper, problem besides just what would make an excellent programming language. There is a meta-problem behind programming language design that is holding us back.

Consider that Java and C# were forced to add generics to the language later. Doing so severely constrained what they could support in both syntax and semantics. In Java’s case, it led to a style of generics that is far from ideal. C# was able to make more radical changes and support generics better. However, it still has messy libraries and language aspects that can’t be fixed or cleaned up. Now, Go has fallen into the same trap. It is trying to add generics in Go 2, and the existing language and codebase overly constrain the design.

As another example, in C# 8 Microsoft is planning to add “nullable reference types” to address what Tony Hoare called his “billion-dollar mistake.” Even though this change is one of the more radical breaks with the previous syntax ever made to the C# language, they still aren’t able to fully correct the problem. They can only offer compiler warnings, and there will be ongoing interop issues with all the existing libraries that don’t use the new language feature.

The problem is that it is too hard for language designers to fix their mistakes. Once a language is in general use, it is nearly impossible to make significant changes to it. Often breaking changes are entirely off the table. Even if they aren’t, they are so costly that they must be rare and minimal. Additions are ok, but these must be carefully fit in between the existing syntax and semantics. The limitations on changes mean that designers don’t even consider vast sections of the language design space.

If language changes and fixes are so challenging, why don’t we language designers spend more time making sure the design is right in the first place? Why did version one of the Go language ship without generics? Well, it “did not seem essential to the language’s goals at the time, and so was left out for simplicity.” There are many other reasons features are left out or messed up in the first version of languages. Often there is pressure to get the language into the hands of users. Designers make tradeoffs. The final result is lots of bad languages with poor features piled on top of weak foundations.

I call this the language design meta-problem. That language designers don’t have the time to create better designs before releasing the first version but don’t have the flexibility to improve them later. This is the problem behind the problem. I know some people interested in designing better languages. I know of almost no one interested in addressing this meta-problem.

Having thought about this problem, I don’t have any solutions. However, I see three areas where an answer or at least improvements might lie. If we could enable radical changes to languages after they are in general use, that would almost entirely alleviate the problem. Barring that, we will have to attempt to improve the designs in their initial release. I see two approaches to that. If we could easily and rapidly prototype languages, designers could iterate on their language designs many more times before the initial release. Additionally, they could learn from the multitude of little languages that would flourish because of easy language creation. Alternatively, if we could change the circumstances under which languages are designed, we could give language designers much more time to develop better designs.

It isn’t clear how to enable radical changes to be made to programming languages after they are in general use. Developers still struggle to handle changes to package APIs even with approaches like semantic versioning. Language changes can have a significant impact on how code must be written. Incorporating a language change to a codebase may require restructuring the logic of the application. Changes like that won’t be dealt with by code migration tools. I do think that standard libraries should be broken up and treated as packages deployed via a package manager and managed with semantic versioning. That at least will help, but it isn’t nearly enough. We do at least have many good test cases for what kinds of language changes might be needed. Here are a few to get started:

  1. Removing unsafe null handling to provide null safety (C#, Swift, Kotlin)
  2. Adding generics (Java, C#, Go)
  3. Changing error handling strategies between checked exceptions, unchecked exceptions, error codes, error codes through multiple returns or error values through data structures. Additionally, there is a recent trend to move from throwing references to exception objects to throwing value types. (C++ Zero-overhead deterministic exceptions, Project Midori)
  4. Adding or removing keywords (C#, Rust)
  5. Adding async/await (C#, Javascript)
  6. Changing concurrency strategies between one of threads, actors, channels.
  7. Adding reference permissions (like Pony or Project Midori)
  8. Fixing array subtyping (broken in C# and Java)
  9. Introducing function reference/pointer types
  10. Fixing the meaning of equality (issues C# and Java)
  11. Limiting mutability
  12. Reforming strings (remove C style null-terminated, support Unicode)
  13. Adding or removing operators (++ and -- are considered by some to be mistakes in C#)
  14. Convert to partial order operator precedence

Improving the circumstances under which languages are designed might be easier to tackle. Currently, creating a language means not just designing the language, but writing a compiler or interpreter and any other supporting tooling. That may include editors, IDEs, test runners, package managers, build systems and many other tools. The bare minimum needed to experiment with a language design is an interpreter or compiler. There are plenty of guides and tools for this. However, most of them help with lexing and parsing, acting as if then the challenge is solved. Yet, the majority of the work is in implementing semantic analysis including type checking, definite assignment, and other validations. There are also challenges with the backend. If the language doesn’t fit neatly into what is provided by LLVM or one of the language VMs like the JVM or CLR, there can be an enormous amount of work to implement code generation or garbage collection. Even relatively simple changes to the design of a language can cause cascading changes through the layers of a compiler or interpreter meaning that any change involves significant work and takes a long time to complete. That severely limits how many designs can be explored beyond the thought experiment stage and holds back designers from even considering more radical changes.

An additional challenge is that many of the tools and VMs for language design subtly or overtly constraint the design of the language. The JVM and CLR impose their type systems and limitations on languages implemented on them. To some extent, it is possible to circumvent this by mapping language features to more complex structures in the intermediate language. However, in many respects, if the VM doesn’t support something, it just can’t be included in the language. Additionally, the desire for interop with the other languages on the platform often leads designers to further constrain their languages leading to somewhat cookie cutter type systems.

We need a new set of tools for rapidly creating compilers and interpreters. Performance need not be a priority for these, but ideally, there would be a smooth path toward greater performance as the language matures and moves toward general release. Most importantly, we need tools that do not restrict the kinds of languages designers can make. Some time ago there was a move toward the creation of language workbenches. These promised to ease the creation of domain-specific languages (DSLs). Language workbenches haven’t lived up to their hype, but we have seen the development of a few intriguing tools. Unfortunately, many of these are too focused on DSLs with insufficient features for creating full languages. Some also suffer from a focus on lexing, parsing and editing but do not assist in the creation of the semantic analysis. However, there are signs of promising possibilities. The Spoofax Language Workbench is still under development and lacking in documentation, but it promises to provide end-to-end support for concrete syntax, abstract syntax, tree rewriting, semantic analysis, dataflow analysis, and dynamic semantics. Another project is the Racket language which supports “language-oriented programming.” It includes support for lexing, parsing, and translating for execution. Unfortunately, it defaults to embedding those languages in Racket files with a #lang attribute at their header. It also encourages languages that can be translated to Racket which is a Scheme-like language. Creating languages without the attribute that translate to machine code or other VMs may be possible, but it isn’t clear.

If it isn’t possible to create tools for rapid language prototyping, perhaps we can change the circumstances under which languages are designed to provide more design time. Even if we have language prototyping tools, it may be beneficial to provide more design time. Today most languages seem to be developed in one of three ways. They may be developed by academia where most languages are as minimal as possible to demonstrate some publishable idea. There are no resources to create a large fully featured language and the surrounding ecosystem of tools. On the other hand, commercial languages are created in the designer’s spare time as a personal project, or they are sponsored by a large company to support their platform or projects. Creating a language in one’s free time may mean that the design process can continue for a long time. However, comparatively little design can be achieved over that time. The enormous workload to the designer for making any change may discourage exploration of design alternatives. If the designer wants to see the language possibly gain some measure of popularity in their lifetime, they must prioritize completing a version they can attract users too. Languages sponsored by companies have vastly more resources behind them. However, those resources come with strings attached. The company’s incentive is usually to release a language as soon as possible. The goal is to create a language that is sufficient to the business purpose, not the language that will generate the most value for the programming profession and the world.

Providing other circumstances for language design will be challenging. We must somehow create an economic situation to provide longer-term funding for language design. If tools are available for language prototyping, this need not be much more than the money to pay the language designer for an extended period. If such tools are not available, then large development teams will need to be supported. Unfortunately, there is no way to profit from programming languages directly. Almost all are open source and the few that aren’t have seen only very modest success. Selling tools connected to languages like editors, IDEs, test runners, etc. provides low to zero profit. Furthermore, that isn’t available until after it has become successful. If languages had the potential for a significant payoff, we could explore something like a venture capital model. As is, we somehow need to create incentives for companies to incubate languages for longer. Additionally, it isn’t enough to create one, it must be brought to market. Microsoft research has produced some exciting research languages including Project Midori, Spec# and most recently F*. However, none of these have been commercialized. Instead, they have a modest impact on the future design of C# that fails to deliver the better languages we need.

I worry that if we don’t solve the language design meta-problem, we will be stuck with lousy programming languages for the foreseeable future. Developers using the first version of Java released in 1996 felt the pain of the lack of null safety and complained about null reference exceptions. The C# language introduced nullable value types in Nov 2005. That showed a clear, workable design for what null safety could be in an OO language. However, it wasn’t until Sept 2014, 18 years after Java was first released, that Swift 1.0 shipped with proper null safety. There was in principle no reason that Java 1.0 couldn’t have had null safety. Indeed, Rust shows that a low-level C style language could have had null safety long before Java was released. If it took that long to solve our billion dollar mistake, how long will it take us to fix all the other problems?

A New Approach to Lifetimes in Adamant

Dec 26, 2018 • Jeff Walker

Adamant’s compile-time memory management allows the compiler to enforce memory safety. However, that requires the declaration of the sometimes complex relationships between references and values. Initially, this was done in a way similar to Rust’s lifetime parameters and annotations. However, lifetimes are one of the most confusing parts of Rust. In an attempt to make lifetimes in Adamant clearer, a syntax based around expressing the relationships between lifetimes was developed. That syntax was described in the “Basic Memory Management in Adamant” post. However, Jonathan Goodwin pointed out that the return types in the examples weren’t consistent with how lifetimes were defined and used throughout. We talked through what lifetime parameters mean in Rust and came to understand them better. By the end of that, I came up with a new approach to lifetimes in Adamant. Let’s look at that approach by working through examples drawn from the Rust Book.1

Lifetimes for Return Values

Inside function bodies, the compiler can automatically deduce and check the relationships between references and values. In function signatures, one must declare the relationships. For simple relationships between parameters and return values, lifetime elision rules provide defaults, so the programmer doesn’t need to do this. When the defaults aren’t enough, the connections must be explicitly declared. An example of such a function is one that takes two strings and returns the longest of the two.2 In Rust, this requires a lifetime parameter and lifetime annotations on all the references involved. However, that leads to confusion and questions about what the lifetime parameter means and how it relates to the references.

// A correctly annotated longest function in Rust
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

The constraints on the references and values in the longest function are quite complicated. However, they are the result of avoiding dangling references and of how the parameters are used to create the return value. So instead of directly trying to express those constraints, we could state how the parameters are related to the return value. In all but the most complicated situations requiring lifetime parameters, only which parameters are related to the return value is stated. The longest function requires annotations because both of its parameters are related to the return value. Why is that the case? In this function, either parameter could be returned. Either parameter could go into creating the return value. That is what we need to express.

In the mathematical sense, functions take their parameters and transform them into a return value. Sometimes though, a parameter may not be used, may be used only in side-effecting code or may only indirectly be used to create the return value. For example, a parameter might be used only as part of a condition, after which it is no longer needed. Thus, we can’t just assume every parameter constrains the lifetime of the return value. We need to declare not only the parameters but which lifetimes go into producing the return value. That is precisely what the new approach does.

public fn longest(x: String, y: String) $x & $y -> String
{
    return if x.grapheme_count() > y.grapheme_count()
                => x
           else
                => y;
}

The highlighted code expresses the relationship of the parameters’ lifetimes to the return value. When applied to a variable the $ should be read as “the lifetime of”. This refers to the lifetime of the referent (i.e., the object), not of the references themselves. The & is the operator for constructing intersection types and is read “and”. That is T1 & T2 is the type of all values that implement both T1 and T2. Thus the code means “the lifetime of x and the lifetime of y”. The placement of these to the left of the arrow is meant to indicate that just like the parameters to the function, these two lifetimes go into creating the return value. Indeed they are like parameters in that the lifetime of x and y will be different at each call site. This syntax captures the idea that either parameter could go into the return value. It does not make a direct statement about the various constraints placed on the lifetimes of the references involved. Consequently, it is much easier to reason about and validate as correct.

This syntax requires the repetition of the parameter names for the lifetime declaration. Some may find this too verbose. So, further shorthand syntax could be added. For example, a dollar sign after the parameter name might indicate that both the parameter’s value and its lifetime are used. The example above would then be declared fn longest(x$: String, y$: String) -> String. To keep early versions of the language simple, no shorthand is currently supported.

Of course, more complex relationships between the parameter lifetimes and return value are possible. For those situations, the arrow can be used to indicate which lifetimes go into which part of the return type. For the first example, note that Tuple is a value type and so does not require a lifetime declaration.

public fn make_tuple(x: String, y: String)
    -> Tuple[$x -> String, $y -> String]
{
    return #(x, y);
}

public fn make_list(x: String, y: String)
    -> List[$x & $y -> String]$owned
{
    return #[x, y];
}

It may be possible to simplify the declaration of make_list to fn make_list(x: String, y: String) $x & $y -> List[String]$owned since there is no other place the lifetimes could go into the return value except for the strings in the list.

Parameter Lifetime Constraints

Sometimes, the issue is not how the lifetimes of the parameters relate to the return values, but rather how they relate to each other. In those situations, we need to express constraints between lifetimes. Rust uses lifetime subtyping for this. In Rust, 'a: 'b can be read “the lifetime a outlives the lifetime b”. This relationship can be confusing and hard to remember. To see why the subtype relationship implies a outlives b, consider that a subtype must be substitutable for its supertype. The new approach allows the expression of the same relationships without introducing lifetime parameters.

public fn assign_into(value: String, variable: ref var String)
    where $value > $^variable
{
    ^variable = value;
}

The assign_into function takes a string and a reference to a variable of type String and assigns the string into the variable using the dereference operator ^. For this to be safe, we must declare that the lifetime of the value we are assigning is greater than the lifetime of the space we are assigning it into. This is done using a generics constraints clause introduce by where. The expression directly states the relationship between the variable we are assigning into and the lifetime of the value. Notice that rather than using a subtype relationship, we can use comparison operators on lifetimes. It may also be possible to allow the use of the arrow in where clauses in which case the constraint would become $value -> ^variable.

Lifetimes of Borrowed References in Classes

Of course, functions aren’t the only place where the relationship between lifetimes must be declared. Rust uses lifetime parameters and annotations in struct declarations too. An example from the Rust Book that illustrates this is based on a parser returning an error.3 Below is the correct Rust code. The Rust 2018 edition simplifies this slightly. Additional inference rules were added, and it now infers the relationship between the lifetimes in the Parser struct. It can now be declared struct Parser<'c, 's>.

struct Context<'s>(&'s str);

struct Parser<'c, 's: 'c> {
    context: &'c Context<'s>,
}

impl<'c, 's> Parser<'c, 's> {
    fn parse(&self) -> Result<(), &'s str> {
        Err(&self.context.0[1..])
    }
}

fn parse_context(context: Context) -> Result<(), &str> {
    Parser { context: &context }.parse()
}

Here, multiple lifetime parameters are necessary. If Parser were declared with only a single lifetime parameter then the parse_context function would not compile. It takes ownership of the context, so the reference to the context has a much shorter lifetime than the string it contains. With a single lifetime parameter, these two lifetimes get collapsed into one, and the compiler is no longer able to tell that the string returned in the error result lives long enough to be safely returned from the parse_context function.

In Adamant, error handling like this would probably be done using exceptions. The lifetime issues would be similar, but to keep the examples parallel, I’ll assume a result type is used. Unfortunately, it isn’t possible to use the same trick we did with functions to entirely remove the lifetime parameters. In a class, multiple methods may need to use the same lifetime and expressing the relationships between all the lifetimes in the various methods would quickly get out of hand. However, there is a way to simplify them and make them more intuitive. The critical insight is that lifetimes can be treated more like associated types rather than generic parameters.

public class Context
{
    public lifetimes $text;

    public let text: String$text;

    public new(.text) {}
}

public class Parser
{
    public lifetimes $context: Context;

    public let context: Context$context;

    public new(.context) { }

    public fn parse(self)
        -> Result[never, $context.text -> String]
    {
        return .Error(self.context.text.slice(1..);
    }
}

public parse_context(context: Context)
    -> Result[never, $context.text -> String]
{
    return new Parser(context).parse();
}

The lifetimes keyword introduces a comma-separated list of named lifetimes associated with a class. In this example, the lifetime names match the names of the properties. This is possible because lifetimes have a distinct namespace. A class may have private fields with associated lifetimes where the field cannot be accessed, but the lifetime can. Indeed associated lifetimes must be declared public. Rarely, a class may even have lifetimes that don’t correspond to any fields in the class. This might be done for classes that use unsafe pointers or for base classes with abstract methods using the lifetimes.

The associated lifetimes are essentially lifetime properties of the object, accessible from variables of that type using the lifetime operator. These properties chain so that a variable p: Parser would have a lifetime property of $p.context.text available. Thus associated lifetimes are themselves “typed”. This is what the declaration $context: Context in the parser class is indicating. This typing and nesting replaces the need to declare multiple lifetimes on the parser class as is done in the Rust code. Nested lifetimes are automatically treated as independent. As you can see, these named lifetimes are bound to fields of the class and are available for use in methods and functions.

The typing of lifetimes may have an additional benefit. It may allow the lifetime elision rules to handle more sophisticated cases. For example, a make_tuple function like the example above except with two different parameter types might require no annotations because the types associated with the lifetimes of the parameters can only match up to the types in the return type in a single way, i.e. fn make_tuple(x: String, y: List[int]) -> Tuple[String, List[int]].

I think the use of named lifetimes in classes is far less confusing than the use of lifetime parameters in functions. In a function, a single lifetime parameter is tied to multiple references. However, that lifetime often doesn’t correspond to the lifetime of any of the objects or references involved. In a class, each named lifetime will typically be used with a single field. It can thus be thought of as more directly corresponding with the actual lifetime of the value in that field.

The syntax used above is deliberately verbose. Users often prefer unfamiliar features to have clearer, more explicit syntax. This syntax makes it completely clear that the lifetimes are separate from the fields, but used by them and that they are public. However, it does repeat the field name and type. A shorthand syntax allowing the lifetime to be declared as part of the field could make this much less burdensome. However, it isn’t clear how to convey that these lifetimes match their field names and are public even when the field isn’t. Something like private let parser: Parser$borrowed; is less than ideal. The explicit syntax is being used for now until a better shorthand syntax can be found.

What Now?

There will probably be issues with this approach. Corner cases will have to be addressed. However, this seems like a much better starting point from which to evolve Adamant’s lifetime handling. I feel this could represent a real step forward in the usability of lifetimes in a programming language.

EDIT 2020-01-27: Changed dereference operator from * to ^.

Basic Memory Management in Adamant

Dec 17, 2018 • Jeff Walker

The Adamant programming language uses compile-time memory management (CTMM). That is, it doesn’t use a garbage collector, nor does it require the programmer to do manual memory management. Instead, references are checked at compile time for safety, and the compiler determines when to free memory. There has been research on ideas like this, but there wasn’t a widely used, practical programming language with compile-time memory management until Rust. Its single owner model with borrowing has demonstrated the feasibility of CTMM. Adamant is improving on the ergonomics of the single owner model and hoping to provide additional options. Rust is a systems programming language where the focus is on relatively low-level details of the hardware. Adamant is a high-level language in the vein of C♯, Java, Scala, Kotlin, and Swift. Whenever possible, it abstracts away low-level details.

Benefits

Compile-time memory management has many advantages over manual memory management. These advantages are part of what drove the adoption of the Rust programming language. Hopefully, future systems programming languages will see this as an essential feature to include. For Adamant, however, the crucial question is: what are the benefits of its approach over a garbage collected language? As with anything, there will be tradeoffs, but for many situations, the benefits outweigh the costs. Let’s look at some of those benefits.

Unified Resource Management

CTMM uses the resource acquisition is initialization (RAII) idiom to unite the management of other computer resources with memory management. Resources such as file handles, network sockets, locks and anything else with limited availability that must be released. Garbage collected languages leave the management of these to the programmer. Object finalizers can be a backstop, but without guarantees about when or if they are run, the developer must be sure to close files and release resources as soon as they are no longer used. Several languages advocate the disposable pattern for this, but it is notoriously tricky and verbose. Enough so that some provide statements specifically to aid in resource cleanup. In Adamant, the same mechanisms that ensure safe, timely release of memory also handle these other resources. No additional patterns or constructs are needed. This eliminates bugs from failure to release resources.

Safety

The great promise of Rust is safety. In systems programming safety is a big deal. A lot of that safety in Rust comes from the single owner memory model. Garbage collectors provide memory safety, but that isn’t the only kind of safety that matters. Software bugs cost the world millions of dollars every year. We need new languages that help us write code with fewer bugs.

Safe Concurrency

The single owner memory management model isn’t just about memory safety. It also enables safe concurrency. Concurrent use of shared mutable state is the source of most concurrency issues. Mutable state that isn’t shared is fine. Immutable data that is shared is fine. When concurrent processes try to read and write to shared mutable state, that is when bad things can happen. Many languages provide locking mechanisms developers can use. But locks are easy to misuse, and it is easy to create deadlock bugs accidentally. This is such a problem that languages built for concurrency, such as Erlang, often disallow shared mutable state entirely by mandating one particular concurrency strategy. Avoiding shared mutable state is also providing some of the impetus for the growing popularity of functional languages. Adamant uses CTMM and mutability permissions to enable safe shared mutable state without locks. How is that possible? Well, it controls the sharing of mutable state between threads so that no state is ever mutable by one thread at the same time other threads have access to it. So technically, it prevents the sharing. However, it is so easy to safely pass around ownership, borrow mutability and share immutable access to state that it enables fearless use of concurrency.

Safe Asynchronous Programming

The rise of web programming and changes in the performance characteristics of computers have led to more asynchronous programming. New async/await features pioneered by C♯, and now being added to JavaScript, Scala, and Python help support this. While asynchronous programming can be done in a single-threaded manner, the real potential lies in allowing asynchronous code to run in multiple threads. That would mean fine-grained concurrency across every asynchronous method call boundary. Managing that concurrency is a recipe for bugs. This is a problem many asynchronous C♯ programs already have even though it isn’t widely acknowledged. Safe, fearless concurrency also means fully unleashing safe asynchronous programming.

Safe Iteration

While less of an issue, there are other safety issues addressed by CTMM. As one example, consider the problem of iterator invalidation. While iterating over a collection, it is unsafe to modify that collection by adding or removing items from it. Both C♯ and Java check for this condition at runtime and throw an exception. Adamant detects this at compile time and reports an error. This means less time spent debugging and fewer runtime checks and failure modes.

Simplicity

Simplicity has long been a virtue among computer programmers. They’ve enshrined it in principles like “Keep it simple, stupid” (KISS), “You aren’t gonna need it” (YAGNI), and “Don’t repeat yourself” (DRY). Many wise programmers try to minimize the code running in their app. Doing so leads to improved maintainability, performance, predictability and fewer bugs. Yet every application relying on a garbage collector has a large amount of code running in the background to manage memory. It isn’t uncommon that there are more lines of code in the garbage collector than in the program itself. Sure that code has been carefully written, tuned and debugged by teams of developers. Still, complaints of slow and unpredictable performance and memory usage are too common. However well written the garbage collector is, the only bug-free code is the code that doesn’t exist. CTMM eliminates the garbage collector, thereby removing one more complex piece from the runtime system. It replaces it with a straightforward strategy for safe memory management.

Optimizable

Many developers don’t realize the performance impact of allocating memory. A heap allocation often has to acquire locks on shared heap management data structures. Then it must find an appropriate free space in which to allocate the memory. Compacting garbage collectors can improve this some. However, garbage collected languages encourage a programming style where most objects are allocated on the heap one by one, thereby maximizing the use of heap allocation. Sophisticated compilers perform escape analysis to determine which allocations never leave the scope of a function so they can be allocated on the stack. By contrast, the single owner memory model contains within it the information about which objects escape. Indeed, it may provide a fuller picture because the compiler’s escape analysis may not span enough functions to make a correct determination. More importantly, CTMM encourages a programming style where fewer objects escape the function in which they are allocated. This means more allocations can be optimized onto the stack for maximum performance.

Transparency

Closely related to the runtime simplicity of CTMM is its transparency. That is, it is easy for developers to see what memory management operations will be performed and where. Developers can easily predict the behavior of their code and modify that behavior when necessary.

Enjoyable

All of these benefits aren’t worth much if it is too much work and pain to write code in a language with CTMM. For that, let’s look at Rust, the one language with CTMM that has seen widespread use. Every year Stack Overflow surveys their users on a variety of programming related topics. In the 2018 Developer Survey, Rust was the most loved programming language. This is the third year in a row it has held that spot. This is determined by the percentage of programmers working in a language who “expressed interest in continuing to develop with it.” There are many reasons why developers so appreciate Rust, but its memory management is a part of it. As the Rust book writes, “ownership is Rust’s most unique feature.” Whatever challenges the single owner model brings to programming in Rust, these developers still enjoy working in it, and Adamant improves on the ergonomics of CTMM in Rust by abstracting away more of the low-level details.

An Introduction

With those benefits, CTMM is worth considering. Let’s take a look at how it works in Adamant. This is an introduction to memory management in Adamant intended for programmers familiar with strongly typed object-oriented languages like those listed before. Knowledge of memory management in Rust might be helpful, but shouldn’t be necessary. This introduction only covers the parts of Adamant that have a parallel to Rust. They are an adaptation of the ideas in Rust to a high-level object-oriented language with reference types. The most radical departure is in how lifetime relationships are handled (though the Dyon scripting language independently invented something similar). Despite that, readers familiar with Rust might be best served by taking Adamant on its own terms at first and only later comparing it to Rust. Thinking about Adamant code in terms of the equivalent Rust code can be confusing and lead to incorrect intuitions.

Please keep in mind that:

  • Adamant is still under development so the syntax, especially around lifetimes, may change in the future.
  • This is an introduction; it does not cover all the features of CTMM in Adamant.
  • CTMM can be confusing to those unfamiliar with it but is relatively simple once understood. This post is by necessity brief and may not explain it in a way accessible to all readers. If something doesn’t make sense, please check back when there is more documentation available.
  • The features presented here are only a starting point for what might be possible.

Value Types and Reference Types

Before we can talk about compile-time memory management, we need to understand the distinction between value types and reference types. A value type is allocated directly in its variable. That is, it is allocated on the stack or in the declaring object. For value types, assignment typically means the value is copied.1 A reference type is allocated separately from the variable. The variable is just a reference to the object that is allocated on the heap. Reference types allow for multiple variables to reference the same object. That way, an update to the object through one reference is visible from all other references to the object. For reference types, assignment copies the reference, not the object. After an assignment, there is one more variable that references the same object.

As with many languages, Adamant’s primitive number types are value types. Thus simple math operations work the same as they do in other languages. There is nothing new to learn and nothing extra to deal with.

public fn square(x: int) -> int
{
    return x * x;
}

In Adamant developers can declare new value types using structs. This is similar to C♯ structs, and Java is adding support for user-defined value types soon. Structs are useful for specific things, but most of the time, classes are used instead of structs.

public copy struct complex_number
{
    public let real: float;
    public let imaginary: float;

    // Constructor shorthand that initializes the fields
    public new(.real, .imaginary) { }
}

Although the primitive types are value types, most types in Adamant are reference types. Just like in other object-oriented languages all classes and traits are reference types (traits are akin to interfaces or protocols). Reference types are where CTMM differs from garbage collected memory management. They are the focus from here on.

Mutability

One more thing before we deal with memory management directly, in Adamant, both objects and variables are immutable by default. For an object to be mutable, its type must be declared with the mut keyword. To make a variable binding mutable declare it with var instead of let. Note that the mutability of objects and variables bindings are independent.

let numbers1: List[int] = new List(); // a list of ints
numbers1.add(236); // ERROR: Can't mutate non-mutable list
numbers1 = new List(); // ERROR: Can't modify let bindings

let numbers2: mut List[int] = new List();
numbers2.add(89); // adds 89 to the list
numbers2 = new List(); // ERROR: Can't modify let bindings

var numbers3: List[int] = new List();
numbers3.add(89); // ERROR: Can't mutate non-mutable list
numbers3 = #[1, 2, 3]; // numbers refers to a list of 1, 2, 3

Ownership

With that background, we can look at memory management. In Adamant, every object has a single reference that owns it. The owning reference determines when it will be deleted. As soon as that reference goes out of scope, the object will be deleted. Ownership is part of the type of a variable and is declared with the special lifetime owned. We’ll see what lifetimes are as we go along. For now, think of it as an extra annotation on the type of the variable. Lifetimes are distinguished by the lifetime operator $.

public fn main(console: mut Console)
{
    let greeting: mut String_Builder$owned = new String_Builder();
    greeting.append("Hello");
    greeting.append(", World!");
    console.write_line(greeting.to_string());
}

In the example above, a new String_Builder is made and assigned to the variable greeting. Since greeting is the only variable referencing the string builder, it has to be the owner. The variable’s type reflects this. Declaring ownership all the time would be very tedious, but in Adamant, the type of local variables can be omitted, and the compiler will figure out the correct type. Even in cases where the type must be specified because the compiler can’t figure it out, the lifetime can often still be omitted. For example, an equivalent declaration is let greeting = mut new String_Builder();.

Transferring Ownership

Ownership of an object can be passed to and returned from functions and passed between variables. Passing ownership is done using the move keyword. Using a variable after a reference has been moved out of it is an error.

public fn append_testing(builder: mut String_Builder$owned)
    -> String_Builder$owned
{
    builder.append("testing...");
    return builder;
}

public fn main(console: mut Console)
{
    let x = mut new String_Builder();
    let y = mut append_testing(move x);
    x.append("not allowed") // ERROR: reference moved out of x
    let z = move y; // Move ownership from y to z
    console.write_line(z.to_string())
}

In this example, the append_testing function receives ownership of the string builder, appends a string to it and then returns ownership of that string builder.

Borrowing

If passing an object to a function always required moving it and thereby losing ownership, that would be very annoying. Notice the append_testing function had to return the string builder so it could be used afterward. If it hadn’t, nothing further could have been done with the object. The solution to this is borrowing. Borrowing lets an object be used temporarily without changing which reference owns it. Borrowing is the default when passing and assigning references. With borrowing, the append_testing function is easier to write and use.

public fn append_testing(builder: mut String_Builder)
{
    builder.append("testing...");
}

public fn main(console: mut Console)
{
    let x = mut new String_Builder();
    append_testing(mut x);
    console.write_line(x.to_string())
}

Borrowing an object beyond when it is deleted isn’t allowed.

public fn borrow_too_long()
{
    let x: Book;
    {
        // y owns the book
        let y = new Book("The Fable of the Dragon-Tyrant");
        x = y; // x borrows the book
        // the book is deleted when y goes out of scope at `}`
    } // ERROR: object referenced by x borrowed too long
    x.read();
}

In the function above, the reference yowns the Book object. At the end of the block y is declared in, it goes out of scope and automatically deletes the Book object. However, x has borrowed that object and still has a reference to it that is used later in x.read(). That call could crash when it tried to use the deleted Book object. Instead, the compiler tells us it is invalid. Ownership and borrowing thereby provide memory safety. In low-level languages, there are lots of ways to cause errors by misusing memory. In memory safe languages like C♯, Java, Rust and Adamant, that isn’t possible.

Mutability when Borrowing

The borrowing rules provide not only memory safety, but also concurrency safety. They prevent data races. A data race occurs when multiple threads try to read and write the same place in memory without correct locking. Data races can result in nondeterministic behavior and reading invalid data. To prevent that, only one mutable borrow of an object can be active at a time. Alternatively, multiple immutable borrows are allowed if there is no mutable reference to the object. When an object is borrowed, the owning reference is temporarily restricted as well.

public fn add_pairs(x: List[int], y: List[int])
    -> List[int]$owned
{
    return x.zip(y, fn(a, b) => a + b).to_list();
}

// Takes an element from x and adds it to y
public fn move_element(x: mut List[int], y: mut List[int])
{
    let n = x.remove_at(0);
    y.add(n);
}

public fn main()
{
    let numbers: mut List[int] = #[1, 2, 3]; // initialize list

    // immutably borrow numbers twice. sum == #[2, 4, 6]
    let sum = mut add_pairs(numbers, numbers);

    // mutably borrow numbers and sum once each
    move_element(mut numbers, mut sum);

    // can't mutably borrow numbers twice
    move_element(mut numbers, mut numbers); // ERROR
}

The Forever Lifetime

Ownership and borrowing are sufficient for most functions. However, some values exist for the lifetime of the program and will never be deleted. String literals are the most common example. For values like that, there is the special lifetime forever. Borrowing from an object with the lifetime forever is the same as borrowing from an owned object.

public fn main(console: mut Console)
{
    let greeting: String$forever = "Hello, World!";
    console.write_line(greeting); // borrows the string
}

Fields

So far, we’ve seen how Adamant manages memory in functions. Now let’s look at how to manage memory in classes. In object hierarchies, the most common relationship between objects is composition (as opposed to aggregation). In composition, the child object can’t exist independent of the parent object, for example, a house (parent) is composed of its rooms (children). For this reason, the default lifetime for class fields is owned.

public class House
{
    public let driveway: Driveway; // implicit $owned
    public let rooms: List[Room]; // implicit List[Room]$owned
}

Lifetime Constraints

A lifetime is an abstract idea of how long an object or reference will exist when the program is running. The borrowing rules require that the lifetime of all borrowed references is less than that of the object being borrowed. Sometimes objects are borrowed in more sophisticated ways, and it is necessary to spell out the relationship between the lifetimes of a function’s parameters and return value. We do this with lifetime relationship operators.

public class Car
{
    public let model_year: Year;
    public let tires: List[Tire]; // implicit List[Tire]$owned

    // the special parameter "self" is like "this"
    public fn oldest_tire(self) -> Tire$< self
    {
        return tires.order_by(fn(t) => t.replaced_on).first();
    }
}

The oldest_tire function returns the tire that was replaced longest ago. The car object still owns the tire; the oldest_tire function is returning a borrowed reference to that tire. The type Tire$< self is read “Tire with a lifetime less than self” and indicates the relationship between the lifetimes of the tire object and the car object. The compiler uses that to ensure memory and concurrency safety by preventing modification or deletion of the car object as long as the tire is borrowed. In this case, it isn’t necessary to spell out the lifetime relationships as we did. In Adamant, there are a few simple rules for lifetime elision that allow the omission of lifetimes in common cases. One of those cases is methods, for which the assumption is that the return value is borrowed from inside the object. So in this case, the function could be declared public fn oldest_tire(self) -> Tire. There are cases where that isn’t possible though.

public fn second_car(first: Car, second: Car) -> Car$< second
{
    first.drive();
    return second;
}

In the second_car function, it isn’t possible to leave out the lifetime relationship. Based on the parameters, it’s possible the returned car is related to either the first or second car. Rather than guessing which, the compiler requires that it be specified.

Lifetime Parameters

Occasionally, lifetime relationships exist that can’t be specified using just simple relationships. When that happens, we use lifetime parameters. A lifetime parameter provides an abstract lifetime that may not be the lifetime of any actual object but explains the relationship between the lifetimes of the various objects and references. Lifetime parameters are generic parameters prefixed with the lifetime operator. Like other generic parameters, their value can often be inferred when calling a function.

public fn newer_car[$a](c1: Car$> a, c2: Car$> a) -> Car$< a
{
    return if c1.model_year >= c2.model_year => c1
           else => c2;
}

public fn main() -> int
{
    let c17 = new Car(2017);
    let c18 = new Car(2018);
    let newer = newer_car(c17, c18); // lifetime $a is inferred
    return newer.model_year;
}

As you might guess from its name, the newer_car function returns the car with the newer model year. The borrowed reference must not outlive either car, since either could be returned. To express this, we introduce a lifetime parameter $a that represents a lifetime in between the lifetimes of the objects passed in and the borrowed reference returned. Note that this lifetime may, or may not, actually be equal to the lifetime of one of the cars or the returned reference.

Lifetime parameters can also be useful for putting borrowed references into the fields of objects.

public class Employee[$manager]
{
    public let name: String$forever;
    public let boss: Employee?$manager;

    public new(.name, .boss = none) { }
}

public fn main(console: mut Console)
{
    let boss = new Employee("Alice");
    let employee = new Employee("Bob", alice);
    console.write_line(
        "\(boss.name) is \(employee.name)'s boss.");
    // prints "Alice is Bob's boss."
}

Here we use a lifetime parameter to borrow a reference to an employee’s boss. Of course, some employees don’t have a boss (they are top dog) so the reference to the boss is optional as indicated by the ? after the type. When creating employee objects, their lifetimes will have to be less than the lifetime of the boss object (unless we were to assign them a new boss).

Remaining Cases

Experience with Rust shows that a significant amount of code can be written easily with nothing more than single ownership and borrowing. Explicit lifetime parameters and constraints are needed only occasionally. Adamant enables this approach in a way that is simpler and more familiar to many programmers. In Adamant, the majority of code requires minimal annotations added to what the same program would be in a garbage collected language. For example, in MVC architecture web applications, most memory allocations are already scoped to the lifetime of the web request. Developers should quickly become proficient enough in Adamant that compile-time memory management is an insubstantial burden most of the time.

However, single ownership with borrowing does not support all use cases. It doesn’t support object graphs that can’t be represented as a tree or that include references to parent nodes. We need other approaches for these. Assuming something like 80% of cases are handled, we need a solution for the remaining 20%. It’s important to remember that a perfect compile-time solution isn’t required. If 90% to 95% of cases are handled at compile time, we can deal with the remaining manually or with reference counting. The Swift language has demonstrated the viability of an automatic reference counting approach. In future posts, I’ll discuss the problematic cases in more detail and look at some ideas for how to address them.

  1. Adamant also supports value types with move semantics. This is the default for structs in Rust. However, in Adamant, value types with move semantics are rare. 

Garbage Collection is a Hack

Nov 28, 2018 • Jeff Walker

At the dawn of the heap, there was manual memory management, and it was bad. Programmers were forced to keep track of when to free memory, and whether it had been freed. That distracted from the task of writing domain logic. It made some software architectures neigh impossible because tracking whether memory had been freed is infeasible in them. Predictably, it led to mistakes—the source of many memory leaks, bugs, and crashes. Worst of all were the numerous security vulnerabilities created.

In time, programmers developed regimented practices for managing memory. As a general rule, the allocator of a block of memory was responsible for freeing it. They carefully documented when functions returned memory the caller now owned. In some languages, they codified these rules into data types like “auto_ptr” in C++. These helped, but were insufficient.

It didn’t take long for programmers to think about automating the task of managing memory. Thus was born garbage collection (GC). There were early languages that used garbage collection, but for many years they were regarded as too slow and inefficient. These languages weren’t mainstream or widely used. A few languages offered GC as an option that could be enabled or used for a subset of allocations. One approach to GC, reference counting, often fit better into an existing manual management approach. Some languages provided it in the standard library. Yet, lack of language support limited safety and convenience. Often, programmers had to explicitly increase and decrease the reference count, leading to similar opportunities for mistakes.

Garbage collection was used for academic languages, or when trading performance for ease of programming. The latter situations included languages for scripting, beginners and rapid development. This remained the status quo for many years. As computers got faster, and GC techniques improved and matured, attitudes shifted. The Java programming language seems to have marked a turning point. Garbage collection was an essential part of Java’s virtual machine approach to safe “write once, run anywhere” software. As Java’s popularity grew, it demonstrated to a generation of programmers that computers were ready for the age of garbage collection. Performance was still sometimes an issue, but for the most part, it was good enough. Most languages designed since Java have included garbage collection. Attitudes have shifted so far in favor of GC, that now even “systems” programming languages like D and Go offer or require garbage collection.

The rapid, widespread adoption of garbage collection had several causes. Multiple orders of magnitude improvements in computer performance and memory capacity mitigated the performance issues. Improvements also came from better algorithms and tens of thousands of hours spent by excellent programmers tuning the implementations of garbage collectors. More importantly, the problem GC fixed was so painful that it was worth paying a high price to “fix” it. Remember, garbage collection has the following benefits:

  • Prevents Use After Free by making it impossible to access memory after freeing it.
  • Prevents Double Free since it’s not possible to free a block of memory repeatedly.
  • Significantly Reduces Memory Leaks by eliminating forgetting to free memory when it is no longer used.
  • Improves Programmer Productivity by allowing them to not worry about memory management and by increasing writability, thereby decreasing development time and costs.
  • Increases Flexibility in Software Architecture by promoting modular design. Explicit deallocation causes modules to be responsible for knowing whether other modules are interested in a particular object.
  • Simplifies Efficient Persistent Data Structures by removing the complexity of tracking which parts of the structure are still in use.
  • Simplifies Lock-free Data Structures because the alogrithms for implementing them without GC are more complex.

With that list of benefits, it isn’t hard to see why garbage collection became the de facto strategy for memory management as soon as its performance was acceptable. Given that, how can I call garbage collection a hack?

A hack or kludge is a solution that works but is inefficient, inelegant and hard to maintain. I’ve heard the claim that garbage collection is elegant because it works off the insight that the memory to be freed is exactly the memory that is unreachable. I disagree. Rather than finding a way to manage memory without errors correctly, we just pawned the problem off on the computer. We wrote complex pieces of software that use sophisticated algorithms to manage the fact that we have abandoned any attempt to manage memory. Millions of programs essentially have a second program that is at least as complex running in the background to clean up their memory. GC uses up computing time and requires more memory. Many teams have put countless hours into writing performant garbage collectors. Even after all that, how well they perform is dependent on the particular workload of the application. What could those teams of good programmers have accomplished for their languages and tools if they didn’t have to spend that time just trying to get a garbage collector that was acceptable?

Some claim that garbage collection is better than manual memory management because of other advantages. By relocating memory, it can avoid heap fragmentation. Under some conditions, GC outperforms naive manual memory management because it doesn’t have to free each object individually but only consider the live objects. Those are rationalizations though. If one had problems with heap fragmentation, would they create a garbage collector to solve it? No, instead they would try other heap management strategies or allow for object relocation. If one had problems with the performance of freeing objects, would they create a garbage collector to solve it? No, they would batch or delay deallocation. Indeed, region and pool memory allocators handle both of these problems.

If garbage collection is a hack, then what should we use instead? A real solution would be an automatic, correct way of managing memory that introduced little or no complexity at runtime. We haven’t found that solution yet. However, we have seen the first sign that it might exist. The Rust programming language uses restrictions on memory use and a compile-time analysis to ensure memory safety and automatically insert deallocations at the correct places in the code. However, there is room for improvement. In Rust, the restrictions on memory use can be too restrictive. Also, as a systems language, it isn’t appropriate for many programs. Finally, its strategy doesn’t work for all situations, and so it’s common for Rust programs to use reference counting for some of their memory.

We don’t have a solution today, but my work on a solution leads me to believe one is probably within reach. However, there are very few people working on this. It’s not one of the trendy problems that are attracting lots of work right now. We need more people focused on the problem and exploring many possible solutions. If you want to see an alternative to garbage collection, I encourage you to lend a hand in creating it. Get involved by sharing your thoughts and ideas about it. The only project I’m aware of that is trying to tackle this is my own, the Adamant programming language. Please reach out to me if you’d like to help out.

August 2018 Progress Report

Aug 31, 2018 • Jeff Walker

The Adamant language has been under development in one form or another since at least August 2016. However, it wasn’t until May 2018 that development picked up. In June 2018, additional hours began to be committed to development each week. In all that time, the language, compiler, and website progressed considerably. However, that progress was not documented and shared outside of the commits on the compiler and docs. This is the first of what will hopefully be more frequent updates on progress on the Adamant language. As such, it summarizes all the work to date.

Current Status

The Adamant language currently exists mostly as a set of design ideas. Many aspects of this are documented, but other portions remain mostly undocumented. Generally, those portions which more closely follow the precedent set by other languages are the ones that are not well documented. Someone familiar with C# should be able to infer how many of them work. For Adamant to be successful, it needs design, tools, and documentation. The current state of each of these is:

  • Language Design:

    The language design has been relatively stable. Semantically, the most significant change has been the introduction of more compile-time code execution. Over time, there has been some drift in the syntax from Adamant’s C family heritage. These changes have been made to improve readability and consistency and make good use of the easily typed characters. The rules for borrow checking variables and references were clarified, but there is still uncertainty about how object fields should be handled. Reference counting will be supported and should be sufficient if no further borrow checking rules are added. The areas of the language that still need better design and more thought are mainly asynchronous support, effect typing, error handling, compile-time code execution and advanced generic types.

  • Compiler:

    The current compiler is implemented in C# with .NET core. This is for rapid development and because C# still allows execution on multiple platforms. The back end emits C code that can be compiled natively using clang. The compiler supports only a minimum subset of the language. The focus has been on establishing the framework to support basic functionality like type checking and borrow checking.

  • Documentation:
    • Language Reference:

      The language reference replaces a previous introductory book on the language. It is easier to write, maintain, and update the more succinct reference. Some portions of the language are nearly completely documented. Others have only the most cursory of coverage. There is a section of ideas that are being considered. In some cases, an idea has been accepted, but the reference has yet to be updated to reflect it.

    • Design Notes:

      The language reference, by necessity, leaves out information about why certain design decisions were made. The design notes describe some of these as well as laying out design principles to be followed in the design of the rest of the language. Sections are added to this document as the issues are encountered during language design.

  • Website: adamant-lang.org

    The website has been up for some time now. Content is minimal, but it does serve as an entry point to learn about the language. The most useful section is probably the links to the docs. The landing page and tagline need to be improved to focus the marketing message.

  • Blog: blog.adamant-lang.org

    This post is the first on the blog. With the publishing of this post, the blog has been added as an item in the menu of the main page. Further design work is needed, but the design is sufficient for now. Planned topics include progress updates, language design theory, language critique and evaluation, and discussion of Adamant.

Deprecated Compiler Attempts

The Adamant language has gone through a number of attempts at writing a compiler before the current one was started. The goal was to find an approach that would be easy and not waste too much effort writing a bootstrap compiler that would be discarded after the language switched to a self-hosted compiler. These early compilers can be found on GitHub under the adamant-deprecated organization. The earlier attempts were:

  1. The Adamant Bootstrap Compiler was an attempt to create and Adamant to C# translator that did not perform type checking, borrow checking or significant code transformations. This was not possible due to the significant semantic differences between the languages.
  2. The Adamant Temporary Compiler was an attempt to create a full ecosystem of compiler generation tools in C# that could be easily translated to Adamant later. This was because Adamant does not have any of those tools and they will be needed eventually. This approach was too much work and prevented progress on the language itself. It did not enable rapid prototyping and iteration.
  3. The Adamant Exploratory Compiler was an attempt to write a compiler for the full language in C# using the ANTLR 4 compiler tools. This approach was still found to be very heavyweight as each feature required changes throughout many layers and the tree produced directly by ANTLR was not ideal.
  4. The Pre-Adamant Translator was an brief attempt to create a Nanopass compiler framework for C# and use it to write a compiler for a subset of Adamant.
  5. The Pre-Adamant Compiler was to be a compiler in C# for a tiny subset of the Adamant language meant to minimize the time until a working compiler was complete.
  6. The adamant.tools.compiler was an attempt to skip directly to a self-hosting compiler through a minimal C++ bootstrap and a scheme of simple translation to C/C++.

After the failure of the first five attempts, it was decided that creating a self-hosting compiler directly would be more beneficial. This sixth attempt would allow the developer to get experience working in a subset of Adamant from very early which would help to inform the design of the language. It would also prevent wasted effort on a bootstrap compiler written in another language. This compiler started as some C++ to perform a very basic lexing and recursive descent parsing to translate Adamant to C++. It quickly reached a point where it could translate a subset of Adamant that was equivalent to the subset of C++ it was written in. The compiler code was re-written in Adamant, and the compiler was “self-hosting.” However, it didn’t do most of what a compiler should do. Rather, it was a simple translation to a subset of C++. This version of the compiler progressed rapidly at first. Eventually, a transition was made from translating to C++ to translating to C. However, it eventually reached a catch-22 point. Functionality like type checking, borrow checking, classes, enums, and other abstraction facilities needed to be implemented. However, the lack of those features was making development incredibly difficult. It was eventually decided that there didn’t seem to be a way forward. That is when the current compiler was started.

Current Compiler

Taking the lessons from the previous attempts, the adamant.tools.compiler.bootstrap project was started. This compiler is written in C# but focuses on a small subset of Adamant using a minimalist, simple approach similar to the “self-hosting” compiler. It does not yet support most of the language features the last compiler did. Instead the focus has been on laying the groundwork for important features like type checking and borrow checking rather than filling out the range of core features like control flow, expressions, and primitive types.

August 2018

Progress on Adamant in August of 2018 consisted mostly of:

  • Refactoring the compiler to use a more attribute grammar based approach.
  • Realizing that some ideas for the borrow checker around object fields weren’t going to work.
  • Reading academic literature trying to find a solution. Discussion of “ownership types” was found which gets part of the way. However, the academic literature isn’t trying to solve the problem as Adamant, and it wasn’t much help. This problem was set aside to make progress on other things. Even if this issue isn’t resolved, Adamant can still be a great language.
  • Learning the details of how the Rust borrow checker works.
  • Deciding details of how borrow checking would work in Adamant.
  • Researching the implementation of the Rust borrow checker.
  • Adding an intermediate representation (IR) layer to the compiler to aid in implementing borrow checking. This is similar to the MIR step in the Rust compiler.
  • Starting to implement borrow checking

Next Steps

The next focus will be implementing basic borrow checking for variables, parameters, and functions.

EDIT 2018-11-15: Updated URL to Language Reference.