What Purescript needs

Genuine question: are you aware of any languages or papers that have anonymous union types without subtyping? My impression is that anonymous union types invariably involve fairly complex notions of subtyping and variance, but I could be wrong as I have almost no experience with languages that support this feature.

5 Likes

As someone coming from F# and starting to look seriously at Purescript, I am finding the language unnecessarily verbose. A white-space sensitive language simply does not need so much commas. A newline with the same indent should in itself mean a new thing in the sequence (as determined by the bracketing context of [..], {..}, (..), =.., etc. ) without a need for a comma. As such, a more indented following line would mean the continuation of the preceding expression. Consequently, perhaps a pragma to enable a Purescript “light syntax” pre-compilation; modern F# has spoiled me. :slight_smile:

4 Likes

@gdennie welcome to PureScript! To be honest, I don’t think your request is going to happen. A member of the team (was it Harry?) has stated before that they’re not going to add language extensions or anything of the sort, since it would hugely increase the maintenance burden. Changing the syntax like that now would break every single library out there, so yeah, it’s probably not happening.

Also, I’m not sure of this, but I think this is off-topic to this thread, so I think you want to create a separate thread for that if you want.

I dont think the syntax changes @gdennie suggested would break existing code. And even if some syntax change does, can’t a simple script be written to convert between the two (making for painless updates)?

1 Like

Not commenting on the merit of the idea, but the language largely follows Haskell’s syntax apart from minor differences. It’s not really desirable to change that.

Perhaps a new thread could be made for this suggestion?

2 Likes

Created Suggestion: can we use PureScript's whitespace sensitivity to reduce verbosity?. We can continue the discussion there.

2 Likes

To be honest I haven’t used typescript beyond toy examples either, and I haven’t read any papers on this topic. It was just my impression that it won’t be too complicated to implement this, but I could be wrong.

The untagged-unions package gets most of the way there without compiler support, so it does look like there shouldn’t be many complicated corner cases to consider.

1 Like

This is exactly the point why I think anonymous union types in PureScript won’t work with all backends. You can only distinguish a float from an int or a pointer at runtime if and only if it is tagged. (As stated before by me and @ntwilson.)
Subtyping is not necessarily needed I think, but an object hierarchy already has such runtime information, so you can easily reuse it for this feature. As such, untagged unions are a neat way to add some kind of sum types to object oriented languages.

The untagged-unions packages works for JavaScript, because browsers carry runtime information of ints/numbers/bools around (either in their prototype or by truncating the integer representation range). It won’t work for Go or C++ or WebAssembly without using a tag. As en example, the F# solution also needs to box primitive types and struct types for untagged unions (see Drawbacks point 4 in the RFC).

Yes, I agree. That’s where row types come in to express there should be an error field, and then dispatch on the value of that field :wink: That’s why I’m saying sums and products are mixed up.

From the TypeScript handbook:

interface Bird {
  fly(): void;
  layEggs(): void;
}

interface Fish {
  swim(): void;
  layEggs(): void;
}

declare function getSmallPet(): Fish | Bird;

let pet = getSmallPet();
pet.layEggs();

This is programmed as if it is a sum type: Fish or Bird. But actually both interfaces fit a common product type: type FishOrBird r = {layEggs :: Effect Unit | r} (which is actually the intersection of two rows). The discriminated union example on the same page is a sum as we know it (could open or closed), the shown intersection type is a row merge (which is actually the union of two rows, to make things more complicated :stuck_out_tongue:). So I think it’s all already there in PureScripts type system!

What I’m trying to say: having untagged union and intersection types make a lot of sense when your language actually only has products (i.e. objects). PureScript already has both sums and products, and on top of that row types, to make both more flexible.

But maybe we should discuss this also in another topic (if needed). Because half the What PureScript needs discussion is now about union and intersection types :laughing:.

I guess I’m imagining something like

type Foo = (A -> String) | (B -> String)

Which would need to canonicalize to

type Foo = (A | B) -> String

And then you would need subtyping and variance to determine that you can still pass in an A -> String or a B -> String.

2 Likes

I’ll note that I think it is extremely unlikely that PureScript would ever get anything like anonymous union types. I can guarantee you that it would be quite difficult to retrofit onto the existing type system. I think it’s much more likely that it would get something like polymorphic variants, as that works nicely within the type system as is. Rows are a slightly less ergonomic alternative to subtyping (since you must use explicit abstraction in some way to thread around the tail), but it fits very nicely and easily into a unification/equality based type system. I think you’d really want something like bi-unification (https://lptk.github.io/programming/2020/03/26/demystifying-mlsub.html).

In general though, I think you can see variant “unboxing” as a backend specific optimization. If you are dealing with a concrete sum of types, then you can make assumptions and optimizations about representations, but as soon as you move into a polymorphic setting, it would need to be boxed into a uniform representation. Rust will do things like this (for example, to make Option<A> be zero overhead when it’s A is non-null).

3 Likes

I don’t see how that follows. Foo is either A -> String or B -> String. There is no way to pass it an A if it’s B->String and vice versa.

In general I didn’t have any expectation for the compiler to canonicalize types. Without canonicalization there’s little complexity. A | B remains A | B no matter what A and B are. To use a value of type A | B you necessarily have to determine whether it’s A or B after which time the compiler treats the value as of type A (or B).

2 Likes
data A = A
data B = B

type Foo = (A -> String) | (B -> String)

useFoo :: Foo -> String
useFoo k = k A <> k B

I would expect something like that to work.

Edit: Or maybe it’s the other way, maybe that should be intersection? I never remember which is co and contra variant, but that’s kind of my point.

1 Like

What does the type of k A look like? Does it return an empty string when k contains a handler for B? I don’t quite follow this.

I guess in the terms of sum types the conversion would look like -

data Foo = Either (A->String) (B->String)

data Foo = (Either A B) -> String

Which I don’t think is true.

1 Like

What I’m thinking is, if there’s no additional run-time tag for this, the only way to use this type is to distribute the union/intersection in someway, or to just forbid these sorts of types altogether. With a sum type, you have an explicit injection into a branch with a tag, but you don’t have that with anonymous union types. IMO the compromises you have to make fall apart pretty quick.

1 Like

Heh, discourse is giving me warnings about replying too much :slight_smile:. I suppose we should move the untagged union discussion onto a separate thread.

EDIT: Started a new thread here - Untagged union types

I wrote a bit earlier about disambiguating between the various parts of an anonymous union type. The last method (using typewitnesses) is general purpose, and the other two are more specific.

1 Like

I think better documentation and learning resourcrs is key here and better tooling.

Let me start with tooling and IDE support purs ide have some quirks for example you can not get types inside a function, that was a bummer for me

And about documentation we need some tutorial , show some of the good libraries for frontend and backend, some examples of each one in production code. Not just toy examples

3 Likes

OCaml may be? There’s Polymorphic variants and Extensible variants.

The manual is still talking to “subtyping”, but it feels more like constraints. For example a function can specify that it accepts any types as long as their constructor is a subset of what the function support.

1 Like

I’m familiar with OCaml variants (I maintain purescript-variant), but row-based tagged unions are specifically not the kind of subtyping that one generally sees with the anonymous unions being discussed.

1 Like

The Purescript book is great. However, I am finding there isn’t a comprehensive guide to the syntax of the entire language. Something like that in the book would be useful. Even an ABNF description of the syntax would serve as a guide to new users once there appetite was wetted.

Additionally, I recently skipped ahead to, “Monodic Adventures: The State Monad”, which introduced a new state monad, State s, in a way oblivious of the previously introduced state monad, ST. As I write this I am suspecting I may have missed some sentence explaining that, type ST = State s for some constrained s. Actually, this terseness in the book is also an issue…it should be more of a conversation. I’ll ignore the fact that the State should have been named, StateConstructor per its function.

One of the things I like about F# is that the core syntax is modeled as function application. However, F# does not have type classes and higher kinds which perhaps permits such syntactic simplicity. However, I wonder whether much of the syntax for defining structures in Purescript can ultimately be of a more consistent simplified form. For example:

thing IDENTIFIER PARM_1 ... PARM_N of CONSTRAINT_1, ..., CONSTRAINT_M where DEFINITION

Here, thing is one of keywords type, newtype, data, class, instance, let. Of course, this may drift Purescript too substantially away from the older Haskell syntax (or that may already be the case but I don’t yet know it).

Finally, forall seems like a thing that should be implicit. However, there does seem to be a need to distinquish these free type variable names from other identifiers that may become bound within the module. Unfortunately, the only thing I can think of would be to augment such names, such as by suffixing them with $ for instance, but this also seems like an inferior solution to forall. :slight_smile:

My $0.02

2 Likes

Have you looked through my repo’s syntax folder?

There’s also Parser.y if that’s more of what you were thinking.

2 Likes