I’ve thought about this a bunch, but don’t have an answer.
I’ve come up with a few heuristics that seem nice from previous experience, but I wouldn’t say they’re an answer or rules or anything like that.
-
A non-extensible record is “better” than an extensible record as a “domain model.”
At first, it seems like you gain re-use by making a type synonym that is extensible:
type Person r =
{ name :: String
, age :: Int
| r
}
You can throw more rows in there and build up something like an Employee
and a Manager
and all this other stuff.
type Employee r =
Person
( position :: String
| r
)
type Manager r =
Employee
( division :: String
| r
)
If you are only defining types–as in, nobody is planning to use those types ever–then that’s probably fine. You do reduce the syntactic duplication of defining repeated fields. If you’re also wanting to use those types, the benefits of extensibility do not outweigh the other concerns.
The Fairbairn threshold–a concept small enough that remembering its name is more cognitive work than remembering the concept itself–is real. Anyone else that comes across these types has to unravel the layers of extensibility. Once they do, they will find out that all their work was to save the writer of these types one or two lines. Unless someone works closely with this domain, they have to do this unraveling every time they come across these types and need to understand them.
What’s “better” in this case, is to not use extensibility, but define each record completely:
type Person =
{ name :: String
, age :: Int
}
type Employee =
{ name :: String
, age :: Int
, position :: String
}
type Manager =
{ name :: String
, age :: Int
, position :: String
, division :: String
}
You end up writing rows multiple times, but that sort of duplication isn’t “bad.” It’s syntactic duplication more than anything else. Some might question the DRYness of this approach. On it’s face, it seems very WET. But think about what would happen if you wanted to change the name
rows to have a different type: every part of the domain that dealt with names would have to change; the parts that don’t deal with names, don’t have to change. You wouldn’t run into weird bugs because one place changed name
but other things kept the old name
. Seems pretty DRY to me.
-
An extensible record is “better” than a non-extensible record as a function argument.
If we’re not going to have extensible “domain models,” the only other place for extensibility is in functions. When someone gives a record to a function, it almost never matters if said record has more fields than are used in the function. In the cases where it does matter, it’s usually some invariant that can be better expressed with different types.
Continuing with the example from above, if you wanted to do something with a “person” and still wanted to allow “employees” and “managers”, you could say something like:
foo :: forall r. { name :: String, age :: Int | r } -> Foo
You can pass it a Person
, Employee
, Manager
, or any ad-hoc record you want to create on the fly. The extensibility allows foo
to be reused in many situations.
You also can communicate what is essential to the problem. If you only care about the name
and age
, you make that explicit by requiring only name
and age
. If it turns out you only really care about age
, you can remove the name
and be more explicit.
Consider the alternative: a non-extensible record:
bar :: { name :: String, age :: Int } -> Foo
or
baz :: Person -> Foo
You can pass it a Person
and that’s all. If you wanted to use the logic of bar
or baz
with an Employee
, Manager
or any other record that had a name
and age
, you’d have to project out the name
and age
before anything would type check. That’s noise to the problem.
In the case of baz
, you’re also not able to communicate whether you genuinely need everything that a Person
has or if you really only need some things a Person
has. If all you really care about is the age
of a Person
, you’re not making it clear to anybody that uses this function later.
-
An extensible variant is “better” than a non-extensible variant as a function result.
Consider some function that returned a non-extensible variant similar to Maybe a
:
foo :: Foo -> Variant (just :: Int, nothing :: Unit)
This is nice because you know all the cases it will be. What’s not so nice is that the return type forces you to deal with each case up front somewhere. It also doesn’t allow you to speak more clearly in some situations.
An extensible variant addresses these two concerns:
bar :: forall r. Foo -> Variant (just :: Int, nothing :: Unit | r)
bar
allows you to take the returned variant and embed it in a “larger” variant without any special dance. If you have another function with a different extensible variant:
baz :: forall r. Foo -> Variant (great :: String | r)
bar
and baz
will unify without you doing anything. If you want to embed either of these into an even “larger” variant, that works as well:
qux :: Foo -> Variant (bad :: Bool, great :: String, just :: Int, nothing :: Unit)
qux x = bar x
The other thing extensible variants help with is communicating more clearly about what a function returns. Think about some validation code, you might use V a b
for that. A common non-variant approach is to define some data type with all of the cases:
data Error
= TooOld Int
| TooYoung Int
| TooLong String
Then, write some functions that do validation:
age :: Foo -> V (NonEmptyList Error) Bar
There’s a straight-forward conversion to a non-extensible variant version:
ageV :: Foo -> V (NonEmptyList (Variant (tooOld :: Int, tooYoung :: Int, tooLong :: String))) Bar
age
and ageV
are effectively the same, and both have the same problem: they both imply that they can return any of the possible errors. Most likely, they only deal with the age related errors (maybe only one error and the function could be better named). If an extensible variant is used, that can be communicated more clearly:
ageEV :: forall r. Foo -> V (NonEmptyList (Variant (tooYoung :: Int | r))) Bar
ageEV
would still unify with ageV
, but it also communicates that the only possible “error” it can make is tooYoung
. You might also see the parallel to that Maybe a
variant we defined above. Sometimes you know you’re only returning the “just” case. With Maybe a
or Variant (just :: a, nothing :: Unit)
you can’t always communicate that properly. If you can say Variant (just :: a | r)
instead, it communicates that the “nothing” case cannot exist.
This sort of thing happens quite frequently in real code. You have a large sum type that you know can only return a handful of the cases.
The practice for nominal sums is to split the sum up and rejoin it somewhere later. That can get hard to work with, and generally lacks the granularity you might want in certain situations. For structural sums (variant), you can be as granular as you need/want when you need/want so long as extensibility is used appropriately. You don’t have to do weird dances to get the types lined up, they just figure themselves out.
It might seem that duality would imply an extensible variant is “better” than a non-extensible variant as a “domain model.” I can’t say. It could be the same as records. I don’t have enough experience with extensible variants as “domain models” to say one way or the other.