Language-level support for extensible variants?

mg50 · August 8, 2021, 3:17am

Hello all, I’ve been thinking about extensible variants recently (having recently dived into OCaml), and I wondered if there’d be any interest in adding language-level support for them. Currently Nate Faubion’s excellent purescript-variant library is available, but there are a few shortcomings that could only be solved in the compiler:

1. No normal pattern-matching

Optimally, one would like to be able to match on variants using the normal case statement. This would allow us to do complicated pattern matching that combines extensible variants with normal ADT’s, such as

type Foo = Maybe (Variant (Foo :: Int, Bar :: Int))

processFoo = \case
  Just (`Foo x) -> blah
  Just (`Bar x) -> blah
  Nothing -> blah

where `Foo and `Bar are variant type constructors (see below). And it would be nice to be able to simultaneously match on multiple values (i.e., case x, y, z of ...) which are variants. Currently, I don’t think there’s a good way to do these things using purescript-variant.

2. Better type inference

Consider this example from the purescript-variant documentation:

allToString :: Variant (foo :: Int, bar :: Boolean, baz :: String) -> String
allToString =
  case_
    # on _foo show
    # on _bar (if _ then "true" else "false")
    # on _baz (\str -> str)

If you comment out the type signature, the same type is inferred. However, this is not quite optimal from a design perspective, since it wouldn’t allow us to plug in a value of type Variant (foo :: Int, bar :: Boolean); it would need to be coerced to a Variant (foo :: Int, bar :: Boolean, baz :: String) value first. I’d argue that the optimal inferred type signature of this function should be:

allToString :: forall u u'. Union u u' (foo :: Int, bar :: Boolean, baz :: String)
                         => Variant u
                         -> String

which, again, I’m not sure is possible in a library.

3. More potential opportunities for optimization

If you know statically that `A Int, `B Int and `C Int are the only variant types that occur in your program, you can secretly compile them under the hood to an ADT:

data SecretCompilerSumType = A Int | B Int | C Int

and then perform all the same optimizations you could for an ADT. This could speed up other libraries like purescript-run, which IMO is the most important (but not only) use-case for extensible variants, and which uses purescript-variant under the hood. But purescript-variant's mock-pattern matches use function composition rather than real pattern matching, and I don’t think optimizations like this would be available.

(I do realize that a lot of PureScript is under-optimized currently anyway, so maybe this perk is so far-off anyway that it’s not a big consideration.)

4. Syntactic convenience

Currently, to create a variant in purescript-variants, you have to write something like someFoo = inj (Proxy :: Proxy "foo") 42. This is mildly annoying and discourages their use. It would be nice to be able to just write someFoo = `foo 42 or something like that, which could then be mirrored in first-class pattern matches as above.

There are other pieces of syntactic sugar you could theoretically add on top of this – for example, just as record types have their own special curly-brace syntax, we could give variants (the dual of records) their own special square-brace syntax like OCaml has. But this would be beyond the scope of my informal proposal here.

I realize that variants probably aren’t the biggest priority, and that the capabilities of purescript-variant might suffice for most people’s use-cases, especially given limited compiler maintainer manpower. I’ve just been really impressed by their applicability in OCaml and thought it might be worth asking for you guys’ thoughts!

hdgarrood · August 8, 2021, 9:59am

My main concern with doing this would be that adding language support for variants would make it more difficult for people to decide whether to use a variant or an ADT, and would lead to two separate schools of thought in that some people would start using variants exclusively, whereas others would continue to use ADTs in most situations, which I could imagine becoming quite awkward, especially when you want to combine libraries which use different styles. My personal opinion - and I apologize for how disappointing this will be to read - is that the time to make this change would have been 5 years ago.

natefaubion · August 8, 2021, 7:24pm

I tend to agree with @hdgarrood.

Regarding pattern matching, you can use match and onMatch, which I will grant you is not much better for nested pattern matching, but is much nicer than on.

Regarding inference and structural subtyping, I’m not sure that would even be the desirable behavior by default. I understand that it’s technically “more powerful”, but I personally don’t think I would want to raise constraints like that. You can probably say the same thing for records, if you flip it around to the return type (ie, why can’t we produce a record larger than demanded). For example, we don’t use overloaded literals even though it’s more powerful, which I’d consider to be in the same vein.

Regarding optimizations, I don’t see why inlining couldn’t remove the function composition overhead and turn it into a linear pattern match, or even a log-n pattern match if all branches are comparisons against inlined string constants.

Regarding syntactic convenience, I think that type applications would help immensely (inj @"foo" 42).

You should also consider what you lose by baking it into the language:

Easy composition of branches and groups of branches (it’s just function composition now, or record merging).
What do you want to do when users inevitably want to write recursive variants?

The latter point is what frustrates me the most. I would really want to use this to define recursive ASTs, but I would have to newtype them (losing extensibility). We have the same restriction for records, but recursive products are much less common than recursive sums. The solution here is usually to implement equi-recursive types (and even that only gets you so far), but that brings a whole host of significant complexity that I’m not sure even translates well to PureScript’s type system. OCaml has a lot of additional research and sets of features built in to it aside from standard HM type inference.

mg50 · August 10, 2021, 3:39am

Hi, thanks both of you for your thoughts! I realized coming in that this is a very unlikely feature addition, so I don’t mean to push back too hard, just wanted to add a couple points.

I think OCaml has done a pretty good job of saying that ADT’s should be used 95% of the time and polymorphic variants are a very advanced feature that should only be used sparingly, in the few remaining cases. I don’t think it’s resulted in a big split between people programming in the two different “styles” – but then again, I myself am pretty new to the OCaml ecosystem, so maybe my impressions are mistaken. But if they aren’t, I suppose I was imagining that the PureScript messaging around some hypothetical variants feature would be similar.

Re: recursive variants, I’m actually not sure if OCaml can do this in the way you want, either! Like, you can do

type tree = [ `Leaf | `Node of tree * tree ]

which is recursive, but it’s closed (not extensible), so you might as well just use an ADT there. I tried to figure out a way to make it open; the closest I could come up with was:

type 'a tree = 'a constraint 'a = [> `Leaf | `Node of 'a tree * 'a tree ]

and I don’t think this type can actually be specialized, at least without some bizarre fixed-point type theory stuff that’s beyond my ken. Again, could just be my OCaml novice-ness! But if I happen to be right, my point is just that I’m not sure any system that has extensible variants supports the use-case you had in mind, and people in practice don’t seem to use it for this purpose. (I looked at a bunch of language implementations like Hack, Rescript and OCaml itself which were written in OCaml in order to see if they represented their AST using an ADT’s or polymorphic variants, and they seemed to be going the former route: for example.)

I think your point about optimization might be right. I do think that you could compose and group together extensible variants just as easily if they had first-class language support, though! The functions would look pretty much the same, they’d just use the actual case keyword. But maybe I’m not understanding.

Again, thanks to you guys for considering this either way! I continue to enjoy your work.

natefaubion · August 10, 2021, 2:01pm

Even that simple recursive closed tree requires subtle handling though (which currently beyond PS’s capabilities). That’s sort of what I mean There’s just a lot about it that’s more complicated than it seems at first blush. Maybe I’m wrong though about their non-recursive power-to-weight ratio.

Trequetrum · August 10, 2021, 2:47pm

I’m pretty new to FP in general and type-level stuff is often lost on me.

Extensible variants feel a bit like what I imagine allowing sub-typing on ADTs would be like.

timjs · August 10, 2021, 6:38pm

I’m really sorry to hear that lifting variant types to the compiler is not something on the agenda. I really think it is part of the overall polish of the language before a 1.0 release.

@hdgarrood: I think there is a clear distinction between using regular variants and polymorphic variants: regular variants are nominally typed and can be recursive, polymorphic variants are structurally typed and can’t be recursive. Use regular variants everywhere, only use poly variants when extendability is required or for interop with foreign functions. This is the way ReScript promotes poly variants.

Developers ask for poly variants sometimes, and having it baked into the language is way more discoverable and allows us to better explain how it should be used. Doesn’t matter if they are not as powerful as poly variants on OCaml/Reason/ReScript, they already nicely fit in because of PureScript’s row types. I think the symmetry with records, which are built in, would be beautiful, everything is already there!

Extending the argumentation of @natefaubion, we could remove record syntax from the language and write prj @"fieldname" record instead of record.fieldname, but it would harm readability a lot. I think something like

allToString :: [.Foo Int, .Bar Boolean, .Baz String] -> String
allToString = case _ of
  .Foo x -> show 
  .Bar y -> if y then "true" else "false"
  .Baz z -> z

reads way better than above example. Nothing has to be done to type inferencing, because the way it needs to work is the way its working now, but maybe error messages could be improved in the future.

But hey, maybe I’m just to much striving to symmetry with the usage of structural records in PureScript