What Purescript needs

ntwilson · February 4, 2021, 10:26pm

I’ve played around with the untagged-union package a bit, and it seems to me quite fantastic! Are there features missing from that package that you’re hoping to get compiler support for? Or does that package just need to be more “canonical”?

Another concern I have of adding language-level support for untagged unions is I’m not sure all backends could support untagged unions. At least if it’s a library, you could build the support for that library for whichever backends support runtime type checks, but still be able to use PureScript from other backends that wouldn’t support it. A language-level change would require excluding any backends that couldn’t support runtime type checks. (I definitely could be proven wrong on this point).

ajnsit · February 5, 2021, 6:23am

I have not explored it in detail myself, but it does look pretty good!

Agreed, I prefer libraries myself as long as they cover all features.

timjs · February 6, 2021, 9:44pm

What your describing here is a typical use case for record types.

And this is a typical use case for variant types.

If I understand you correctly, your main use case for untagged unions would be to reuse TypeScript types or be compatible with their type system in some way, isn’t it? It seems to me that TypeScript is mixing product and sum types, trying to be compatible with JavaScript. I don’t think PureScript should do the same. In my opinion, it would be a mistake.

I totally agree with this point. So @ajnsit, do you think the untagged-union package is enough for your use cases?

ajnsit · February 7, 2021, 5:27am

No I’m still talking about type sums here, not products.

I wouldn’t say that. This is a typical use case for sum types yes, of which variants are a form of. However the point is to be able to do this without having to box the entire datatype.

Writing this with typical ADTs would look as below. However here the records have been boxed into the constructors (which at the JS level are separate class objects).

data State = LoadingState LoadingState
           | FailState FailState

With purescript-variant I would still have to box the variants into a tag type. The advantage over ADTs is that this is an open union instead of a closed one.

type State =
  ( (failState :: FailState
  + (loadingState :: LoadingState
  )

However we don’t really need the boxing here to distinguish between the variants. The error field, which is present on both FailState and LoadingState, can be used to distinguish between the two.

Union types will allow the compiler to access this information so effectively you could write -

type State = {... fail state fields} | {... loading state fields}

Typescript is just an example of union types. I don’t have any interop with Typescript in mind.

I don’t see how product and sum types are being mixed. Can you give an example?

It looks pretty good to me! Not yet sure if it covers all the usecases, or if it is missing some features then if can they be added without compiler support. But it does seem a solid step forward.

natefaubion · February 7, 2021, 4:19pm

Genuine question: are you aware of any languages or papers that have anonymous union types without subtyping? My impression is that anonymous union types invariably involve fairly complex notions of subtyping and variance, but I could be wrong as I have almost no experience with languages that support this feature.

gdennie · February 7, 2021, 9:39pm

As someone coming from F# and starting to look seriously at Purescript, I am finding the language unnecessarily verbose. A white-space sensitive language simply does not need so much commas. A newline with the same indent should in itself mean a new thing in the sequence (as determined by the bracketing context of [..], {..}, (..), =.., etc. ) without a need for a comma. As such, a more indented following line would mean the continuation of the preceding expression. Consequently, perhaps a pragma to enable a Purescript “light syntax” pre-compilation; modern F# has spoiled me.

mhmdanas · February 8, 2021, 5:40am

@gdennie welcome to PureScript! To be honest, I don’t think your request is going to happen. A member of the team (was it Harry?) has stated before that they’re not going to add language extensions or anything of the sort, since it would hugely increase the maintenance burden. Changing the syntax like that now would break every single library out there, so yeah, it’s probably not happening.

Also, I’m not sure of this, but I think this is off-topic to this thread, so I think you want to create a separate thread for that if you want.

Adrielus · February 8, 2021, 6:46am

I dont think the syntax changes @gdennie suggested would break existing code. And even if some syntax change does, can’t a simple script be written to convert between the two (making for painless updates)?

ajnsit · February 8, 2021, 7:04am

Not commenting on the merit of the idea, but the language largely follows Haskell’s syntax apart from minor differences. It’s not really desirable to change that.

Perhaps a new thread could be made for this suggestion?

mhmdanas · February 8, 2021, 7:13am

Created Suggestion: can we use PureScript's whitespace sensitivity to reduce verbosity?. We can continue the discussion there.

ajnsit · February 8, 2021, 12:35pm

To be honest I haven’t used typescript beyond toy examples either, and I haven’t read any papers on this topic. It was just my impression that it won’t be too complicated to implement this, but I could be wrong.

The untagged-unions package gets most of the way there without compiler support, so it does look like there shouldn’t be many complicated corner cases to consider.

timjs · February 8, 2021, 5:54pm

This is exactly the point why I think anonymous union types in PureScript won’t work with all backends. You can only distinguish a float from an int or a pointer at runtime if and only if it is tagged. (As stated before by me and @ntwilson.)
Subtyping is not necessarily needed I think, but an object hierarchy already has such runtime information, so you can easily reuse it for this feature. As such, untagged unions are a neat way to add some kind of sum types to object oriented languages.

The untagged-unions packages works for JavaScript, because browsers carry runtime information of ints/numbers/bools around (either in their prototype or by truncating the integer representation range). It won’t work for Go or C++ or WebAssembly without using a tag. As en example, the F# solution also needs to box primitive types and struct types for untagged unions (see Drawbacks point 4 in the RFC).

Yes, I agree. That’s where row types come in to express there should be an error field, and then dispatch on the value of that field That’s why I’m saying sums and products are mixed up.

From the TypeScript handbook:

interface Bird {
  fly(): void;
  layEggs(): void;
}

interface Fish {
  swim(): void;
  layEggs(): void;
}

declare function getSmallPet(): Fish | Bird;

let pet = getSmallPet();
pet.layEggs();

This is programmed as if it is a sum type: Fish or Bird. But actually both interfaces fit a common product type: type FishOrBird r = {layEggs :: Effect Unit | r} (which is actually the intersection of two rows). The discriminated union example on the same page is a sum as we know it (could open or closed), the shown intersection type is a row merge (which is actually the union of two rows, to make things more complicated ). So I think it’s all already there in PureScripts type system!

What I’m trying to say: having untagged union and intersection types make a lot of sense when your language actually only has products (i.e. objects). PureScript already has both sums and products, and on top of that row types, to make both more flexible.

But maybe we should discuss this also in another topic (if needed). Because half the What PureScript needs discussion is now about union and intersection types .

natefaubion · February 8, 2021, 6:00pm

I guess I’m imagining something like

type Foo = (A -> String) | (B -> String)

Which would need to canonicalize to

type Foo = (A | B) -> String

And then you would need subtyping and variance to determine that you can still pass in an A -> String or a B -> String.

natefaubion · February 8, 2021, 6:14pm

I’ll note that I think it is extremely unlikely that PureScript would ever get anything like anonymous union types. I can guarantee you that it would be quite difficult to retrofit onto the existing type system. I think it’s much more likely that it would get something like polymorphic variants, as that works nicely within the type system as is. Rows are a slightly less ergonomic alternative to subtyping (since you must use explicit abstraction in some way to thread around the tail), but it fits very nicely and easily into a unification/equality based type system. I think you’d really want something like bi-unification (https://lptk.github.io/programming/2020/03/26/demystifying-mlsub.html).

In general though, I think you can see variant “unboxing” as a backend specific optimization. If you are dealing with a concrete sum of types, then you can make assumptions and optimizations about representations, but as soon as you move into a polymorphic setting, it would need to be boxed into a uniform representation. Rust will do things like this (for example, to make Option<A> be zero overhead when it’s A is non-null).

ajnsit · February 8, 2021, 6:45pm

I don’t see how that follows. Foo is either A -> String or B -> String. There is no way to pass it an A if it’s B->String and vice versa.

In general I didn’t have any expectation for the compiler to canonicalize types. Without canonicalization there’s little complexity. A | B remains A | B no matter what A and B are. To use a value of type A | B you necessarily have to determine whether it’s A or B after which time the compiler treats the value as of type A (or B).

natefaubion · February 8, 2021, 6:48pm

data A = A
data B = B

type Foo = (A -> String) | (B -> String)

useFoo :: Foo -> String
useFoo k = k A <> k B

I would expect something like that to work.

Edit: Or maybe it’s the other way, maybe that should be intersection? I never remember which is co and contra variant, but that’s kind of my point.

ajnsit · February 8, 2021, 6:51pm

What does the type of k A look like? Does it return an empty string when k contains a handler for B? I don’t quite follow this.

I guess in the terms of sum types the conversion would look like -

data Foo = Either (A->String) (B->String)

data Foo = (Either A B) -> String

Which I don’t think is true.

natefaubion · February 8, 2021, 6:58pm

What I’m thinking is, if there’s no additional run-time tag for this, the only way to use this type is to distribute the union/intersection in someway, or to just forbid these sorts of types altogether. With a sum type, you have an explicit injection into a branch with a tag, but you don’t have that with anonymous union types. IMO the compromises you have to make fall apart pretty quick.

ajnsit · February 8, 2021, 7:04pm

Heh, discourse is giving me warnings about replying too much . I suppose we should move the untagged union discussion onto a separate thread.

EDIT: Started a new thread here - Untagged union types

I wrote a bit earlier about disambiguating between the various parts of an anonymous union type. The last method (using typewitnesses) is general purpose, and the other two are more specific.

ajnsit:

Primitives types like Int, Boolean etc. Those should be automatically discriminated by the compiler using some mechanism (such as typeof).

User defined structural types (records) which are discriminated using specific fields with literal string values. The compiler should automatically derive appropriate typesafe ways to discriminate those similar to how it works with purescript-variant but more generic, allowing for customising the fieldnames as well as allowing multiple fields to be used. Here’s an example syntax -
type Maybe a = {tag :: "nothing"} | {tag :: "just", val:: a}
maybe :: forall a b. b -> (a -> b) -> Maybe a -> b
maybe b f m = case m of
    ON _tag "nothing" -> b
    ON _tag "just" -> f
This is more flexible than purescript-variant since you can do this -
type FailState =
  { error :: "1"
  , msg :: String
  }

type LoadingState =
  { error :: "0"
  }

type SuccessState =
  { error :: "0"
  , value :: Number
  }

type State = FailState | LoadingState | SuccessState

processState :: State -> Effect Unit
processState s = case s of
  -- This branch will match both LoadingState and SuccessState
  ON _error "0" -> log "No failure"
  ON _error "1" -> log $ "Failure: " <> s.msg
User defined ADTs or other opaque types. For these the compiler should allow defining an unsafe function which returns a witness for the type. And once that witness is received the compiler can automatically infer the more precise type. This mechanism would be similar to the Typeable mechanism but perhaps with compiler support we will not have to define a new variable everytime we use the witness.

mohaalak · February 26, 2021, 7:17am

I think better documentation and learning resourcrs is key here and better tooling.

Let me start with tooling and IDE support purs ide have some quirks for example you can not get types inside a function, that was a bummer for me

And about documentation we need some tutorial , show some of the good libraries for frontend and backend, some examples of each one in production code. Not just toy examples