The Language Design Meta-Problem

Feb 7, 2019 • Jeff Walker

I don’t like the majority of programming languages available today. Computer programming is a relatively young field, and we still have a long way to go. We haven’t dealt with simple problems like null safety. There are lots of other challenging issues with programming languages. Even if we solved those problems, I think we would have much farther to go. Yet, we programmers spend our time on holy wars over matters such as dynamic versus static typing.

Even the few languages I think are pretty good have serious issues. Designing a programming language requires making thousands of design choices big and small. It is tough for even the best designer to avoid making a mistake somewhere along the way. Avoiding mistakes isn’t enough though. Languages are highly interconnected. There are a limited number of symbols available for syntax and using one for something means it isn’t available for other things. The semantics of language features can interact in unintuitive ways. Even when you understand the syntax and semantics, there are issues of praxis. How will the language influence how people write code?

Despite lots of work and plenty of advances, programming language designs fall short. We need to take language design to the next level somehow. I’ve thought about language design a lot as I’ve worked on the design and implementation of Adamant. This has led me to ask why we aren’t designing better languages already. The answer I’ve come up with is that there is another, deeper, problem besides just what would make an excellent programming language. There is a meta-problem behind programming language design that is holding us back.

Consider that Java and C# were forced to add generics to the language later. Doing so severely constrained what they could support in both syntax and semantics. In Java’s case, it led to a style of generics that is far from ideal. C# was able to make more radical changes and support generics better. However, it still has messy libraries and language aspects that can’t be fixed or cleaned up. Now, Go has fallen into the same trap. It is trying to add generics in Go 2, and the existing language and codebase overly constrain the design.

As another example, in C# 8 Microsoft is planning to add “nullable reference types” to address what Tony Hoare called his “billion-dollar mistake.” Even though this change is one of the more radical breaks with the previous syntax ever made to the C# language, they still aren’t able to fully correct the problem. They can only offer compiler warnings, and there will be ongoing interop issues with all the existing libraries that don’t use the new language feature.

The problem is that it is too hard for language designers to fix their mistakes. Once a language is in general use, it is nearly impossible to make significant changes to it. Often breaking changes are entirely off the table. Even if they aren’t, they are so costly that they must be rare and minimal. Additions are ok, but these must be carefully fit in between the existing syntax and semantics. The limitations on changes mean that designers don’t even consider vast sections of the language design space.

If language changes and fixes are so challenging, why don’t we language designers spend more time making sure the design is right in the first place? Why did version one of the Go language ship without generics? Well, it “did not seem essential to the language’s goals at the time, and so was left out for simplicity.” There are many other reasons features are left out or messed up in the first version of languages. Often there is pressure to get the language into the hands of users. Designers make tradeoffs. The final result is lots of bad languages with poor features piled on top of weak foundations.

I call this the language design meta-problem. That language designers don’t have the time to create better designs before releasing the first version but don’t have the flexibility to improve them later. This is the problem behind the problem. I know some people interested in designing better languages. I know of almost no one interested in addressing this meta-problem.

Having thought about this problem, I don’t have any solutions. However, I see three areas where an answer or at least improvements might lie. If we could enable radical changes to languages after they are in general use, that would almost entirely alleviate the problem. Barring that, we will have to attempt to improve the designs in their initial release. I see two approaches to that. If we could easily and rapidly prototype languages, designers could iterate on their language designs many more times before the initial release. Additionally, they could learn from the multitude of little languages that would flourish because of easy language creation. Alternatively, if we could change the circumstances under which languages are designed, we could give language designers much more time to develop better designs.

It isn’t clear how to enable radical changes to be made to programming languages after they are in general use. Developers still struggle to handle changes to package APIs even with approaches like semantic versioning. Language changes can have a significant impact on how code must be written. Incorporating a language change to a codebase may require restructuring the logic of the application. Changes like that won’t be dealt with by code migration tools. I do think that standard libraries should be broken up and treated as packages deployed via a package manager and managed with semantic versioning. That at least will help, but it isn’t nearly enough. We do at least have many good test cases for what kinds of language changes might be needed. Here are a few to get started:

  1. Removing unsafe null handling to provide null safety (C#, Swift, Kotlin)
  2. Adding generics (Java, C#, Go)
  3. Changing error handling strategies between checked exceptions, unchecked exceptions, error codes, error codes through multiple returns or error values through data structures. Additionally, there is a recent trend to move from throwing references to exception objects to throwing value types. (C++ Zero-overhead deterministic exceptions, Project Midori)
  4. Adding or removing keywords (C#, Rust)
  5. Adding async/await (C#, Javascript)
  6. Changing concurrency strategies between one of threads, actors, channels.
  7. Adding reference permissions (like Pony or Project Midori)
  8. Fixing array subtyping (broken in C# and Java)
  9. Introducing function reference/pointer types
  10. Fixing the meaning of equality (issues C# and Java)
  11. Limiting mutability
  12. Reforming strings (remove C style null-terminated, support Unicode)
  13. Adding or removing operators (++ and -- are considered by some to be mistakes in C#)
  14. Convert to partial order operator precedence

Improving the circumstances under which languages are designed might be easier to tackle. Currently, creating a language means not just designing the language, but writing a compiler or interpreter and any other supporting tooling. That may include editors, IDEs, test runners, package managers, build systems and many other tools. The bare minimum needed to experiment with a language design is an interpreter or compiler. There are plenty of guides and tools for this. However, most of them help with lexing and parsing, acting as if then the challenge is solved. Yet, the majority of the work is in implementing semantic analysis including type checking, definite assignment, and other validations. There are also challenges with the backend. If the language doesn’t fit neatly into what is provided by LLVM or one of the language VMs like the JVM or CLR, there can be an enormous amount of work to implement code generation or garbage collection. Even relatively simple changes to the design of a language can cause cascading changes through the layers of a compiler or interpreter meaning that any change involves significant work and takes a long time to complete. That severely limits how many designs can be explored beyond the thought experiment stage and holds back designers from even considering more radical changes.

An additional challenge is that many of the tools and VMs for language design subtly or overtly constraint the design of the language. The JVM and CLR impose their type systems and limitations on languages implemented on them. To some extent, it is possible to circumvent this by mapping language features to more complex structures in the intermediate language. However, in many respects, if the VM doesn’t support something, it just can’t be included in the language. Additionally, the desire for interop with the other languages on the platform often leads designers to further constrain their languages leading to somewhat cookie cutter type systems.

We need a new set of tools for rapidly creating compilers and interpreters. Performance need not be a priority for these, but ideally, there would be a smooth path toward greater performance as the language matures and moves toward general release. Most importantly, we need tools that do not restrict the kinds of languages designers can make. Some time ago there was a move toward the creation of language workbenches (link). These promised to ease the creation of domain-specific languages (DSLs). Language workbenches haven’t lived up to their hype, but we have seen the development of a few intriguing tools. Unfortunately, many of these are too focused on DSLs with insufficient features for creating full languages. Some also suffer from a focus on lexing, parsing and editing but do not assist in the creation of the semantic analysis. However, there are signs of promising possibilities. The Spoofax Language Workbench is still under development and lacking in documentation, but it promises to provide end-to-end support for concrete syntax, abstract syntax, tree rewriting, semantic analysis, dataflow analysis, and dynamic semantics. Another project is the Racket language which supports “language-oriented programming.” It includes support for lexing, parsing, and translating for execution. Unfortunately, it defaults to embedding those languages in Racket files with a #lang attribute at their header. It also encourages languages that can be translated to Racket which is a Scheme-like language. Creating languages without the attribute that translate to machine code or other VMs may be possible, but it isn’t clear.

If it isn’t possible to create tools for rapid language prototyping, perhaps we can change the circumstances under which languages are designed to provide more design time. Even if we have language prototyping tools, it may be beneficial to provide more design time. Today most languages seem to be developed in one of three ways. They may be developed by academia where most languages are as minimal as possible to demonstrate some publishable idea. There are no resources to create a large fully featured language and the surrounding ecosystem of tools. On the other hand, commercial languages are created in the designer’s spare time as a personal project, or they are sponsored by a large company to support their platform or projects. Creating a language in one’s free time may mean that the design process can continue for a long time. However, comparatively little design can be achieved over that time. The enormous workload to the designer for making any change may discourage exploration of design alternatives. If the designer wants to see the language possibly gain some measure of popularity in their lifetime, they must prioritize completing a version they can attract users too. Languages sponsored by companies have vastly more resources behind them. However, those resources come with strings attached. The company’s incentive is usually to release a language as soon as possible. The goal is to create a language that is sufficient to the business purpose, not the language that will generate the most value for the programming profession and the world.

Providing other circumstances for language design will be challenging. We must somehow create an economic situation to provide longer-term funding for language design. If tools are available for language prototyping, this need not be much more than the money to pay the language designer for an extended period. If such tools are not available, then large development teams will need to be supported. Unfortunately, there is no way to profit from programming languages directly. Almost all are open source and the few that aren’t have seen only very modest success. Selling tools connected to languages like editors, IDEs, test runners, etc. provides low to zero profit. Furthermore, that isn’t available until after it has become successful. If languages had the potential for a significant payoff, we could explore something like a venture capital model. As is, we somehow need to create incentives for companies to incubate languages for longer. Additionally, it isn’t enough to create one, it must be brought to market. Microsoft research has produced some exciting research languages including Project Midori, Spec# and most recently F*. However, none of these have been commercialized. Instead, they have a modest impact on the future design of C# that fails to deliver the better languages we need.

I worry that if we don’t solve the language design meta-problem, we will be stuck with lousy programming languages for the foreseeable future. Developers using the first version of Java released in 1996 felt the pain of the lack of null safety and complained about null reference exceptions. The C# language introduced nullable value types in Nov 2005. That showed a clear, workable design for what null safety could be in an OO language. However, it wasn’t until Sept 2014, 18 years after Java was first released, that Swift 1.0 shipped with proper null safety. There was in principle no reason that Java 1.0 couldn’t have had null safety. Indeed, Rust shows that a low-level C style language could have had null safety long before Java was released. If it took that long to solve our billion dollar mistake, how long will it take us to fix all the other problems?

Published: February 7, 2019