TechWorkRamblings

by Mike Kalvas

202204262054 Prefer duplication over the wrong abstraction

Notes from Dan Abramov's talk The Wet Codebase1

Start with two modules

  flowchart BT
    a((a))
    b((b))

Abstract

  flowchart BT
    a((a))
    b((b))
    c((c))
    a-->c
    b-->c

Awesome, we’re reusing it and everything. But then something comes along and there’s a slight difference.

  flowchart BT
    a((a))
    b((b))
    c((c+))
    d((d))
    a-->c
    b-->c
    d-->c

but then there’s bug in c+ that requires a special case for b’s use-case.

  flowchart BT
    a((a))
    b((b))
    c((c+*))
    d((d))
    a-->c
    b-->c
    d-->c

And then another slightly different one from a

  flowchart BT
    a((a))
    b((b))
    c((c+**))
    d((d))
    a-->c
    b-->c
    d-->c

So we pull those cases out and parameterize the call sites in a, b, and d.

  flowchart BT
    a_((a_))
    a*((a*))
    b_((b_))
    b*((b*))
    c((c+__))
    d_((d_))
    d*((d*))
    a_-->a*
    a*-->c
    b_-->b*
    b*-->c
    d_-->d*
    d*-->c

Later, after small fixes over time, logging, minor changes, you end up somewhere like this.

  flowchart BT
    a_((a_))
    a*((a*))
    b_((b_))
    b*((b*))
    c((c+__))
    d_((d_))
    d*((d*))
    x#((x#))
    x_((x_))
    y#((y#))
    a_-->c
    b_-->b*
    b*-->c
    d_-->d*
    d*-->c
    y#-->c
    x#-->y#
    x#-->x_
    x_-->d_
    c-->b_
    

But the key piece is that each of these individual steps made sense at the time. You just can’t really see the whole picture and understand what the original intention was.

Let’s go back in our time to the place where we added d and updated to c+. What we should have done is duplicate the abstraction, re-inline those pieces in a and b and make just the needed thing in d. Something like

  flowchart BT
    a((ac))
    b((bc))
    d((d+))

Then the bug evolution might look like this

  flowchart BT
    a((a%))
    b((bc*))
    d((d*))

And we don’t spend the time evolving these things in lock-step. And maybe later you end up pulling out something different than you originally thought after they stabilize.

  flowchart BT
    a((a%))
    b((bc))
    d((d))
    *((*))
    b-->*
    d-->*

So, when we educate people, make sure to talk about the benefits and the costs to things like abstractions. Don’t teach the next generation “don’t repeat yourself” teach them, “it depends”.

Easy to replace systems tend to get replaced with hard to replaced systems. The second law of thermodynamics in action in our system designs. — Malte Ubi on Twitter

Avoiding the mess


Duplication is far cheaper than the wrong abstraction.2

Prefer duplication over the wrong abstraction.2

Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary.3

You most often arrive at a code-base after it's been running, working, and changing for a while. You're likely to be subconsciously pushed by this power of existence towards working within a messy abstraction and adding to the mess instead of breaking it down and changing things.


  1. Abramov, D. (2019, July 12). The Wet Codebase. https://www.deconstructconf.com/2019/dan-abramov-the-wet-codebase

  2. Confreaks. (2014, May 21). RailsConf 2014—All the Little Things by Sandi Metz. https://www.youtube.com/watch?v=8bZh5LMaSmE 2

  3. Metz, S. (2016, January 20). The Wrong Abstraction. Sandi Metz. https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction