When to use extensible types when modeling a domain?

chexxor · June 9, 2018, 10:35pm

I’ve only seen trivial/contrived examples of writing functions which have an extensible record as a parameter. I haven’t seen any talk about when it’s good or bad to use a structurally typed parameter. Has anyone here thought deeply about it?

I saw someone compare structural types to a type class’s ad-hoc polymorphism aspects, which is reasonable. Generally in Pure FP, I sense that ad-hoc polymorphism is encouraged to be handled with care, rather than liberally applied. If so, should using the extensibility of PS records likewise be handled with care?

The situations I’ve considered using it is for records which come from a database. Such records represent domain objects, to which I usually give nominal types and specially handle their life-cycle. I’ve been tempted to write some functions which bypass the nominal type to take advantage of knowing its implementation is a specifically structured record. Doing this hasn’t been easy and I kinda regret it, wishing instead I’d just written separate functions, one for each nominal type, such as NewCartItem -> Price and DeletedCartItem -> Price. It seems better to deal these monomorphic type signatures, as their implementation can be separately managed, rather than a function which operates on an extensible record or an extensible record inside some Newtype.

joneshf · June 15, 2018, 2:34pm

I’ve thought about this a bunch, but don’t have an answer.

I’ve come up with a few heuristics that seem nice from previous experience, but I wouldn’t say they’re an answer or rules or anything like that.

tl;dr;:

Non-extensible records as “domain models.”
Extensible records as function arguments.
Extensible variants as function results.

A non-extensible record is “better” than an extensible record as a “domain model.”

At first, it seems like you gain re-use by making a type synonym that is extensible:
```
  type Person r =
    { name :: String
    , age :: Int
    | r
    }
```
You can throw more rows in there and build up something like an Employee and a Manager and all this other stuff.
```
  type Employee r =
    Person
      ( position :: String
      | r
      )

  type Manager r =
    Employee
      ( division :: String
      | r
      )
```
If you are only defining types–as in, nobody is planning to use those types ever–then that’s probably fine. You do reduce the syntactic duplication of defining repeated fields. If you’re also wanting to use those types, the benefits of extensibility do not outweigh the other concerns.

The Fairbairn threshold–a concept small enough that remembering its name is more cognitive work than remembering the concept itself–is real. Anyone else that comes across these types has to unravel the layers of extensibility. Once they do, they will find out that all their work was to save the writer of these types one or two lines. Unless someone works closely with this domain, they have to do this unraveling every time they come across these types and need to understand them.

What’s “better” in this case, is to not use extensibility, but define each record completely:
```
  type Person =
    { name :: String
    , age :: Int
    }

  type Employee =
    { name :: String
    , age :: Int
    , position :: String
    }

  type Manager =
    { name :: String
    , age :: Int
    , position :: String
    , division :: String
    }
```
You end up writing rows multiple times, but that sort of duplication isn’t “bad.” It’s syntactic duplication more than anything else. Some might question the DRYness of this approach. On it’s face, it seems very WET. But think about what would happen if you wanted to change the name rows to have a different type: every part of the domain that dealt with names would have to change; the parts that don’t deal with names, don’t have to change. You wouldn’t run into weird bugs because one place changed name but other things kept the old name. Seems pretty DRY to me.
An extensible record is “better” than a non-extensible record as a function argument.

If we’re not going to have extensible “domain models,” the only other place for extensibility is in functions. When someone gives a record to a function, it almost never matters if said record has more fields than are used in the function. In the cases where it does matter, it’s usually some invariant that can be better expressed with different types.

Continuing with the example from above, if you wanted to do something with a “person” and still wanted to allow “employees” and “managers”, you could say something like:
```
  foo :: forall r. { name :: String, age :: Int | r } -> Foo
```
You can pass it a Person, Employee, Manager, or any ad-hoc record you want to create on the fly. The extensibility allows foo to be reused in many situations.

You also can communicate what is essential to the problem. If you only care about the name and age, you make that explicit by requiring only name and age. If it turns out you only really care about age, you can remove the name and be more explicit.

Consider the alternative: a non-extensible record:
```
  bar :: { name :: String, age :: Int } -> Foo
```
or
```
  baz :: Person -> Foo
```
You can pass it a Person and that’s all. If you wanted to use the logic of bar or baz with an Employee, Manager or any other record that had a name and age, you’d have to project out the name and age before anything would type check. That’s noise to the problem.

In the case of baz, you’re also not able to communicate whether you genuinely need everything that a Person has or if you really only need some things a Person has. If all you really care about is the age of a Person, you’re not making it clear to anybody that uses this function later.
An extensible variant is “better” than a non-extensible variant as a function result.

Consider some function that returned a non-extensible variant similar to Maybe a:
```
  foo :: Foo -> Variant (just :: Int, nothing :: Unit)
```
This is nice because you know all the cases it will be. What’s not so nice is that the return type forces you to deal with each case up front somewhere. It also doesn’t allow you to speak more clearly in some situations.

An extensible variant addresses these two concerns:
```
  bar :: forall r. Foo -> Variant (just :: Int, nothing :: Unit | r)
```
bar allows you to take the returned variant and embed it in a “larger” variant without any special dance. If you have another function with a different extensible variant:
```
  baz :: forall r. Foo -> Variant (great :: String | r)
```
bar and baz will unify without you doing anything. If you want to embed either of these into an even “larger” variant, that works as well:
```
  qux :: Foo -> Variant (bad :: Bool, great :: String, just :: Int, nothing :: Unit)
  qux x = bar x
```
The other thing extensible variants help with is communicating more clearly about what a function returns. Think about some validation code, you might use V a b for that. A common non-variant approach is to define some data type with all of the cases:
```
  data Error
    = TooOld Int
    | TooYoung Int
    | TooLong String
```
Then, write some functions that do validation:
```
  age :: Foo -> V (NonEmptyList Error) Bar
```
There’s a straight-forward conversion to a non-extensible variant version:
```
  ageV :: Foo -> V (NonEmptyList (Variant (tooOld :: Int, tooYoung :: Int, tooLong :: String))) Bar
```
age and ageV are effectively the same, and both have the same problem: they both imply that they can return any of the possible errors. Most likely, they only deal with the age related errors (maybe only one error and the function could be better named). If an extensible variant is used, that can be communicated more clearly:
```
  ageEV :: forall r. Foo -> V (NonEmptyList (Variant (tooYoung :: Int | r))) Bar
```
ageEV would still unify with ageV, but it also communicates that the only possible “error” it can make is tooYoung. You might also see the parallel to that Maybe a variant we defined above. Sometimes you know you’re only returning the “just” case. With Maybe a or Variant (just :: a, nothing :: Unit) you can’t always communicate that properly. If you can say Variant (just :: a | r) instead, it communicates that the “nothing” case cannot exist.

This sort of thing happens quite frequently in real code. You have a large sum type that you know can only return a handful of the cases.

The practice for nominal sums is to split the sum up and rejoin it somewhere later. That can get hard to work with, and generally lacks the granularity you might want in certain situations. For structural sums (variant), you can be as granular as you need/want when you need/want so long as extensibility is used appropriately. You don’t have to do weird dances to get the types lined up, they just figure themselves out.

It might seem that duality would imply an extensible variant is “better” than a non-extensible variant as a “domain model.” I can’t say. It could be the same as records. I don’t have enough experience with extensible variants as “domain models” to say one way or the other.