Garbage Collection is a Hack

Nov 28, 2018 • Jeff Walker

At the dawn of the heap, there was manual memory management, and it was bad. Programmers were forced to keep track of when to free memory, and whether it had been freed. That distracted from the task of writing domain logic. It made some software architectures neigh impossible because tracking whether memory had been freed is infeasible in them. Predictably, it led to mistakes—the source of many memory leaks, bugs, and crashes. Worst of all were the numerous security vulnerabilities created.

In time, programmers developed regimented practices for managing memory. As a general rule, the allocator of a block of memory was responsible for freeing it. They carefully documented when functions returned memory the caller now owned. In some languages, they codified these rules into data types like “auto_ptr” in C++. These helped, but were insufficient.

It didn’t take long for programmers to think about automating the task of managing memory. Thus was born garbage collection (GC). There were early languages that used garbage collection, but for many years they were regarded as too slow and inefficient. These languages weren’t mainstream or widely used. A few languages offered GC as an option that could be enabled or used for a subset of allocations. One approach to GC, reference counting, often fit better into an existing manual management approach. Some languages provided it in the standard library. Yet, lack of language support limited safety and convenience. Often, programmers had to explicitly increase and decrease the reference count, leading to similar opportunities for mistakes.

Garbage collection was used for academic languages, or when trading performance for ease of programming. The latter situations included languages for scripting, beginners and rapid development. This remained the status quo for many years. As computers got faster, and GC techniques improved and matured, attitudes shifted. The Java programming language seems to have marked a turning point. Garbage collection was an essential part of Java’s virtual machine approach to safe “write once, run anywhere” software. As Java’s popularity grew, it demonstrated to a generation of programmers that computers were ready for the age of garbage collection. Performance was still sometimes an issue, but for the most part, it was good enough. Most languages designed since Java have included garbage collection. Attitudes have shifted so far in favor of GC, that now even “systems” programming languages like D and Go offer or require garbage collection.

The rapid, widespread adoption of garbage collection had several causes. Multiple orders of magnitude improvements in computer performance and memory capacity mitigated the performance issues. Improvements also came from better algorithms and tens of thousands of hours spent by excellent programmers tuning the implementations of garbage collectors. More importantly, the problem GC fixed was so painful that it was worth paying a high price to “fix” it. Remember, garbage collection has the following benefits:

  • Prevents Use After Free by making it impossible to access memory after freeing it.
  • Prevents Double Free since it’s not possible to free a block of memory repeatedly.
  • Significantly Reduces Memory Leaks by eliminating forgetting to free memory when it is no longer used.
  • Improves Programmer Productivity by allowing them to not worry about memory management and by increasing writability, thereby decreasing development time and costs.
  • Increases Flexibility in Software Architecture by promoting modular design. Explicit deallocation causes modules to be responsible for knowing whether other modules are interested in a particular object.
  • Simplifies Efficient Persistent Data Structures by removing the complexity of tracking which parts of the structure are still in use.
  • Simplifies Lock-free Data Structures because the alogrithms for implementing them without GC are more complex.

With that list of benefits, it isn’t hard to see why garbage collection became the de facto strategy for memory management as soon as its performance was acceptable. Given that, how can I call garbage collection a hack?

A hack or kludge is a solution that works but is inefficient, inelegant and hard to maintain. I’ve heard the claim that garbage collection is elegant because it works off the insight that the memory to be freed is exactly the memory that is unreachable. I disagree. Rather than finding a way to manage memory without errors correctly, we just pawned the problem off on the computer. We wrote complex pieces of software that use sophisticated algorithms to manage the fact that we have abandoned any attempt to manage memory. Millions of programs essentially have a second program that is at least as complex running in the background to clean up their memory. GC uses up computing time and requires more memory. Many teams have put countless hours into writing performant garbage collectors. Even after all that, how well they perform is dependent on the particular workload of the application. What could those teams of good programmers have accomplished for their languages and tools if they didn’t have to spend that time just trying to get a garbage collector that was acceptable?

Some claim that garbage collection is better than manual memory management because of other advantages. By relocating memory, it can avoid heap fragmentation. Under some conditions, GC outperforms naive manual memory management because it doesn’t have to free each object individually but only consider the live objects. Those are rationalizations though. If one had problems with heap fragmentation, would they create a garbage collector to solve it? No, instead they would try other heap management strategies or allow for object relocation. If one had problems with the performance of freeing objects, would they create a garbage collector to solve it? No, they would batch or delay deallocation. Indeed, region and pool memory allocators handle both of these problems.

If garbage collection is a hack, then what should we use instead? A real solution would be an automatic, correct way of managing memory that introduced little or no complexity at runtime. We haven’t found that solution yet. However, we have seen the first sign that it might exist. The Rust programming language uses restrictions on memory use and a compile-time analysis to ensure memory safety and automatically insert deallocations at the correct places in the code. However, there is room for improvement. In Rust, the restrictions on memory use can be too restrictive. Also, as a systems language, it isn’t appropriate for many programs. Finally, its strategy doesn’t work for all situations, and so it’s common for Rust programs to use reference counting for some of their memory.

We don’t have a solution today, but my work on a solution leads me to believe one is probably within reach. However, there are very few people working on this. It’s not one of the trendy problems that are attracting lots of work right now. We need more people focused on the problem and exploring many possible solutions. If you want to see an alternative to garbage collection, I encourage you to lend a hand in creating it. Get involved by sharing your thoughts and ideas about it. The only project I’m aware of that is trying to tackle this is my own, the Adamant programming language. Please reach out to me if you’d like to help out.

Published: November 28, 2018