Untagged union types

Moving the discussion about anonymous unions here from the What Purescript needs thread.

What are the things/cases that could be better expressed with untagged union vs sum types? What this novelty would really bring to the table (except for complications for the compiler)?

Also, if a separate thread created from another topic, it would be nice to have full issue context here without the need to follow links.

1 Like

I think the biggest advantage to untagged unions is easing the interop with native JS code (or other languages that use a lot of dynamic typing tricks). For example, working with charting libraries can be particularly cumbersome because the API usually involves sending a single JS object to the charting library that includes every possible customization, where every field is optional and can often have different runtime representations depending on how much control you want over that setting. Sure you could define a PureScript-friendly version of that JS object where every union is tagged, and then a conversion function between the PureScript-friendly version and the JS version (involving a whole lot of unsafeCoerce calls), but that could take hours or days even if you limited it to only the settings you ever use, and the sheer number of unsafeCoerce calls necessary means you’re likely to mess it up somewhere. Untagged unions make interoping with the JS API a breeze.

I doubt untagged unions add very much to a project that isn’t doing its own FFI.

2 Likes

@ntwilson is it really so difficult to create a compatibility layer between PS idiomatic API and JS API? That can take some time, but that would actually emphasize that you are doing PS stuff, not just wrapping JS lib. I mean if there is a really hard case for implementation (which I don’t see for now) it would be good to bring them to the table.

I agree with @ntwilson, FFI is the biggest motivator for me.

However, there are other things that you can get with untagged unions that are hard with the existing system. For example, it is possible to match on multiple variants at once -

type Bird = {
  fly :: Effect Unit
}

type Fish = {
  swim :: Effect Unit
}

type Frog = {
  swim :: Effect Unit
  croak :: Effect Unit
}


type SmallPet = Fish | Bird | Frog

-- Fantasy syntax for discriminating untagged union types
move :: SmallPet -> Effect Unit
move p = matchField p
  -- This branch applies for both fish and frogs
  _swim -> p.swim
  -- This only gets called for birds
  _fly -> p.fly

Hard is subjective. It’s tedious, and causes performance issues, and if we can successfully remove that barrier we should.

Here’s a data point - https://twitter.com/RobAshton/status/1354051420618715136
It doesn’t directly refer to untagged unions, but I think some hairy code with variants can be avoided with untagged unions.

1 Like

I would like to hear the case where it caused/may cause the performance issue.

There is a cost associated with boxing/unboxing types. Untagged unions don’t need to perform any conversions for marshalling values to and from JS (just like arrays, strings etc. in Purescript currently).

1 Like

Yes, but to get the performance issues you need a case where you will perform this unboxing multiple times per second. So it is quite hard to imagine that this is the real common use case. It would be nice to hear about a real practical case, not just a hypothetical case.

It’s really quite common to call FFI functions multiple times per second. Almost any binding to an existing JS library with some sort of a UI would do that.

1 Like

Almost any binding to an existing JS library with some sort of a UI would do that.

I don’t believe this is the case. :upside_down_face: I would say this is just kinda bad interop design. For example, if I insert the foreign element in my main view tree (whatever react/web component) I would do it on a certain lifecycle event of the parent element, usually, when it is created, I usually tend not to recreate parent element multiple times per second, esp where its inner content is controled by a foreing context.

Is this similar to Ocaml’s polymorphic variants?

1 Like

See the highcharts API where each individual data point in a chart can be

Number 
| Array Number 
| { x :: Number, y :: Number, name :: String | Undefined, color :: String | Undefined }

All you need is a chart with a bunch of data points in it. Now try to refresh that chart with live data, and you’ve got potentially hundreds of unboxing jobs per chart refresh. And that’s just for the data. Most of the customization settings in the API also allow for different unions that would require unboxing.

1 Like

I believe you would have the same issue with the “native” PS solution (say chart library made with Halogen), you still have to unbox (normalize) values to pass them to the view engine. So this is a potential perf issue not about JS API interop, but about passing big values between contexts. So it should be considered and solved in those terms. And the solution, in this case, would be trying to keep primary underlying data as plain and uniformed as possible to minimize normalization efforts. This is how I see the issue.

You might be right about that. Honestly, the performance concerns haven’t been noticeable to me. I doubt the unboxing PureScript has to do - even in an example like that - is going to add any noticeable overhead.

My main reason to use untagged unions is the ergonomics of it. To echo @ajnsit’s sentiment, FFI can be “tedious” if you’re working with a library with a lot of untagged unions, and can make it significantly more difficult to just try out some npm library that doesn’t have PureScript bindings than it is in a language like TypeScript or even BuckleScript. You could spin up a new JS project to try out some library, but sometimes you want to try it out in your own project without spending a whole lot of time writing a bunch of type coercions first.

1 Like

I used TypeScript/JS a lot. And may say that such flexible and “unobtrusive” (or obtrusive?) JS APIs bring more evil than good. It all stems from the dynamic nature of JavaScript. I like that haskell/purescript actually forces you to think and make API static and more straightforward.

As for the wrapping issue, it should be considered as a necessary evil, an API should be converted to idiomatic PS style (imagine how this API would like for PS based lib), not trying to reflect JS style in PS, even if it would take more time, I’m sure this would be beneficial in the long run.

3 Likes

Untagged unions probably (often) mean different ways/forms for the representation of the same value, just use the most full and uniform that you need.

I think you are conflating flexibility with not being able to reason about code. The former is desirable, the latter is not.

Why? FFI interop is a real world feature that is important to real world projects. I haven’t seen any reason for us to compromise on this yet.

2 Likes

you are conflating flexibility with not being able to reason about code

Not only I, but library authors too often do this when creating “flexible APIs”, while having desire to simplify things for the end user they make their own api/library less reasonable and error-prone.

Why? FFI interop is a real world feature that is important to real world projects.

I think it is obvious, any FFI interop brings unsafety and related issues.It is nessessary, but one should not just try to transparently translate JS API to Purescript, such APIs are mostly designed without FP style in mind.

1 Like

I propose an in between solution:

  • Easy to implement (just syntactic sugar)
  • Should ease writing ffi code
  • Would not weaken the type system

How would it look?

exports.foo = (input) => {
  if (Array.isArray(input))
    return input.length

  return Math.sqrt(input)
}
-- Note: each branch of the union can have one and only one argument
union SingleOrArray a
  = Single a
  | Multi (Array a)

foreign import foo :: SingleOrArray Int -> Int

translates to:

foregin import data SingleOrArray :: Type -> Type

Single :: forall a. a -> SingleOrArray a
Single = unsafeCoerce

Multi :: forall a. (Array a) -> SingleOrArray a
Multi = unsafeCoerce

foreign import foo :: SingleOrArray Int -> Int

This would not implement pattern matching on unions, but should still greatly improve the ffi experience.

What do yall think about this?

1 Like