The limits of abstraction

07 Jun, 2022

From Wikipedia:¹

the abstraction principle ... is a basic dictum that aims to reduce duplication of information in a program (usually with emphasis on code duplication) whenever practical

Without abstraction we wouldn't have programming languages. You would be reading this as binary, and I would be typing it with a keyboard that consists of two buttons: one and zero. So abstractions are good, they make great things possible. However, when I talk about abstractions in this article I mean the ones that we write and work with in our own code, when their utility is more questionable.

Abstraction is a big word. It does a lot heavy-lifting. When I first arrived on the shores of the programming world, I was eager to grapple with abstraction, tame it, and make it do my bidding. The problem is the more you fight it the tougher it gets. It's a shape-shifter. You think you've neatly compartmentalised all the gnarly bits of an implementation to save Future You from the hassle of needing to understand them. Alas, poor old Future You will need to understand them. When you hide something it does not go away. If anything it becomes more of a burden than if it lived in plain sight.

When I am writing code and I notice that I have written something twice in the same program, component, library, or whatever, I just notice it. I don't jump to conclusions. I used to look at two pieces of similar code and see it as an opportunity to abstract them. Now I see the beauty of repetition in code. Like poetry, it has a rhythm that ebbs and flows.

A few years ago I went on a poetry course run by Faber & Faber. Towards the end I submitted some of my own work for the tutor to read and give feedback on. His feedback was that my poetry was often too abstract. He didn't understand what I was saying. The same problem occurs in code. It's hard to know intuitively what code does when it is steeped in abstractions.

In poetry, repetition is allowed. "Anything is allowed in poetry," you might say. That's true. Anything is allowed in code too. So why at the outset do we often set our sights at the false ideal of abstraction? There is a lot to say for being specific, obvious, obtuse even, in code.

What do we gain from abstraction? Time saved understanding implementation details... Reduced cognitive load when programs become large... Less lines of code... Faster onboarding of newcomers...

And what do we lose from abstraction? Increased complexity... Increased cognitive load unpicking generic (that often use generics²) implementations that handle many cases... Refactoring becomes non-trivial...

I wonder if we could chart abstraction vs gains vs complexity. I think we'd see a bell-curve for abstraction vs gains, and an S-curve for abstraction vs complexity, like this:

Abstraction gains

There's a sweet spot where the gains and complexity from abstraction intersect, before the gains drop off and the complexity blows up.

We never max out our gains from abstractions. To get our gains line up to the top we need to look at other means of organising and managing code. Documentation is an oft-overlooked way of achieving gains that doesn't even involve writing code.

To keep my code in that sweet spot, I try to follow a few rules:

Don't plan abstractions upfront. The first iteration of an implementation is rarely the best. Let opportunities for abstraction occur naturally.
When the opportunity for abstraction does arise, be wary. Don't jump at the opportunity. Don't try and be too clever, Future You will rarely be grateful. Be as specific in code as you can be. This will make it harder to be tempted to reach for abstractions.
Keep the call stack small. Large call stacks mean lots of functions. Try and write code so that it's "flat". Usually, the deeper the call stack is the larger the number of abstractions that have been made. No one likes to spend their days wading through stack frames of an error, trying to figure out where it all began.
Weigh abstraction vs pattern. Abstractions are a generalisation of a pattern. Patterns are where an approach to a problem is repeated. When you take a piece of code and generalise it you lose important context specific to the original problem. If the problem changes your generalisation may no longer be appropriate. When you want to abstract you might be better off leaving things as they are and introducing a pattern. Document the pattern so that it is followed.

Stay classy, abstraction! ✌️

References

#code