202107272307 Data-first pipe design
Contrasts with 202107272306 Data-last pipe design.
Background
With data-last being the standard convention of most functional languages, why would we want to adopt a different data-first approach? The motivation comes from type inferences and compilers.
In OCaml, type inference works from left to right, top to bottom.1 If we start a list with one type, then adding another item after the first will give us a type error.
let ints =
^^^
Error: This expression has type string
but an expression was expected of type int
It’s easy to see that this will come into play with a data-last approach. The data object has the type that matters to us as programmers who are manipulating them. The functions are written to accept a known data object type and work on them. This causes confusion because it’s a reversal of the error messaging we receive.1
let words =
let res = map
^^^^^
Error: This expression has type list
but an expression was expected of type list
Type string is not compatible with type int
This error is telling us that the authority of the typing is the function of the list.1 Besides being conceptually backwards — which is a subjective point — there’s an objective issue here as well. It feels as if the type inference should be able to “override” this error because we’ve already nailed down the type of words
in the line above the map
.
Without getting into the details of how it works under the hood we can see a counter-point if we use a data-first convention.
let words =
let res = map
^
Error: This expression has type string
but an expression was expected of type int
Notice that the error is about the actual types of the data involved and not the lifted types of the functions and wrappers of the data. This can become increasingly important as the complexity of our types and wrappers increases.
The Pipe (first) Operator ->
As the name implies, pipes the data object as the first parameter. It is also different than the |>
operator because it’s syntactic sugar, not an actual infix operator. This means the compiler interprets it exactly as if it were written without the pipe instead of applying a function (which results in an intermediate step).
Advantages
-
Helps the compiler infer types in functions that take callbacks as parameters without having to manually add type annotations.
-
Makes error messages simpler
-
Better IDE integration
-
(Subjectively) intuitive order for functions with more than one parameter of the same type
(* What is the expected output here? [|1,2|] or [|2,1|] *) concat (* Data last gives [|2,1|] *)
-
Performance is better because it’s syntactic sugar and not a extra function call.
Disadvantages
-
Worse integration with optional params. We need an ending
unit
param in order to tell the compiler when the function has been fully applied.let update = => (* data last *) let update = => (* data first*)
-
Why does unit need to be at the end to stop application?
It’s because of the syntax and the optional params. We need to explicitly show when we’re partially applying vs evaluating without supplying optional parameters.
let point = ~x, ?y:0, ( = let x1y1 = point ~x:1 ?y:1 (; (* standard *) let x1y0 = point ~x:1 (; (* fully applied, get default *) let x1 = point ~x:1; (* partially applied, don't eval *)
-
Less straight-forward composition. It’s not as easy to partially apply things and instead we need to use pipe placeholders or other strategies.
-
Chávarri, J. (2019, May 10). Data-first and data-last: A comparison. Javier Chávarri. https://www.javierchavarri.com/data-first-and-data-last-a-comparison/ ↩ ↩2 ↩3