What Purescript needs

Would be enough indeed, but then we’d need to implement our own memory management in linear memory. Also, interop with the DOM would be a harder I think.

I’d love to engage in some kind of generally applicable mid-level IR for functional languages which is suitable for optimisations and an easy target for modern functional languages to compile to! Say Haskell Core but for strict FP. Idris and Elm would benefit as well.

I think the biggest challenge with your proposals is to make them backend agnostic. Points 3, 4, and 5 are all strongly connected with JavaScript. JavaScript currently is not the only backend for PureScript, and people expressed they greatly value that PureScript supports multiple backends. Targeting good interop with other Wasm languages would gain us more I think.

The problem with compiling Nothing to null and Just a to a is that you can’t distinguish Just Nothing and Nothing at runtime, because both would compile down to null. I don’t know if this is a problem right now, maybe with generic (de)serialisation (?), but it is something to be aware of.

About untagged unions: there is a RFC to add them to F#. The discussions are an interesting read. My take is that modelling data with variants (and sometimes poly variants) results in more maintainable and more predictable code. I understand that untagged unions would be nice to talk to TypeScript api’s, but then we should either not support pattern matching in PureScript (so unions can only be used at a foreign boundary), or support a newtype-like unwrapping of types of the form data Union = String String | Number Number | Int Int only if the backend allows for some kind of type matching. This is the case with JavaScript (with instanceof) or F# (where everything can be upcasted to Object), but it can’t be done in traditional type erased functional languages or backends where Int and Number are unboxed (C++, Golang).

1 Like

True. However, I think being able to interop with JS is a strength of Purescript and pushing a dichotomy of either doing everything in Purescript or not doing it at all (like the Elm community tries to do) would not be good.

Which is why I went with annotations rather than language features. Annotations are extendable. Also annotations have no semantic impact on the rest of the Purescript code beyond FFI. So non-js backends can just ignore unrecognised annotations.

Targeting wasm would be very nice I agree!

This breaks referential transparency so it’s a definite problem. However we can work around that as I outlined in my PR. One of the arguments for closing this PR was that this new implementation can easily replace Maybe and that wouldn’t be desirable. Well I think with some special cases in the codegen, it would work out very nicely and we should do it! :slight_smile:

I haven’t thought about this in detail but consider three separate types which can be a part of untagged unions -

  1. Primitives types like Int, Boolean etc. Those should be automatically discriminated by the compiler using some mechanism (such as typeof).

  2. User defined structural types (records) which are discriminated using specific fields with literal string values. The compiler should automatically derive appropriate typesafe ways to discriminate those similar to how it works with purescript-variant but more generic, allowing for customising the fieldnames as well as allowing multiple fields to be used. Here’s an example syntax -

    type Maybe a = {tag :: "nothing"} | {tag :: "just", val:: a}
    maybe :: forall a b. b -> (a -> b) -> Maybe a -> b
    maybe b f m = case m of
        ON _tag "nothing" -> b
        ON _tag "just" -> f
    

    This is more flexible than purescript-variant since you can do this -

    type FailState =
      { error :: "1"
      , msg :: String
      }
    
    type LoadingState =
      { error :: "0"
      }
    
    type SuccessState =
      { error :: "0"
      , value :: Number
      }
    
    type State = FailState | LoadingState | SuccessState
    
    processState :: State -> Effect Unit
    processState s = case s of
      -- This branch will match both LoadingState and SuccessState
      ON _error "0" -> log "No failure"
      ON _error "1" -> log $ "Failure: " <> s.msg
    
  3. User defined ADTs or other opaque types. For these the compiler should allow defining an unsafe function which returns a witness for the type. And once that witness is received the compiler can automatically infer the more precise type. This mechanism would be similar to the Typeable mechanism but perhaps with compiler support we will not have to define a new variable everytime we use the witness.

1 Like

Did you know that you can already use any string as a record property if you quote it?

x = { "foo-bar": 0 }
4 Likes

Oh that’s good to know!

How do you refer to it though?

The same way, by quoting: rec."foo-bar"

2 Likes

True, way to go I think!

Could this be be an annotation on Maybe a, to pick a certain representation or layout for a specific backend?

I’d rather lift support for variant types into the compiler, like records already are, to allow backends to choose an appropriate output representation for poly variants. For example:

type Maybe a = [.Nothing, .Just a]
-- which would be sugar for
type Maybe a = Variant (nothing :: Unit, just :: a)

maybe b f m = case m of
  .Nothing -> b
  .Just a  -> f a

Now make sure a poly variant like .Just 42 would compile to JavaScript as the object {tag: "Just", value: 42}, similar to how ReScript compiles it. No need for untagged unions.

Why not type this as follows

type State r = {error :: String | r}
type FailState = State (msg :: String)
type LoadingState = State ()
type SuccessState = State (value :: Number)

Isn’t that enough for your example? You’d lose exhaustivity, but he, that’s a small price to pay for TypeScript interop.

TypeScript needs to lift strings and integers to the type system, because it needs to wrap around the way people wrote code in a dynamic language. I’ve always found this a stopgap solution, not a particularly nice type system addition. Having two ways to declare variants, algebraic and polymorphic, is already enough to choose from I think.

2 Likes

This syntax looks pretty good, but loses flexibility. You can’t chose the tag fieldname, you can’t have multiple fields to discriminate on, and you can’t use the tag to distinguish between more than one type (as in the state example I gave).

I still think untagged unions are nicer and more flexible, but just this change would be very good to have too!

No, I don’t see how? How would you type a function that can take either a LoadingState or a FailState? And then check at runtime which type was actually passed.

1 Like

I’ve played around with the untagged-union package a bit, and it seems to me quite fantastic! Are there features missing from that package that you’re hoping to get compiler support for? Or does that package just need to be more “canonical”?

Another concern I have of adding language-level support for untagged unions is I’m not sure all backends could support untagged unions. At least if it’s a library, you could build the support for that library for whichever backends support runtime type checks, but still be able to use PureScript from other backends that wouldn’t support it. A language-level change would require excluding any backends that couldn’t support runtime type checks. (I definitely could be proven wrong on this point).

2 Likes

I have not explored it in detail myself, but it does look pretty good!

Agreed, I prefer libraries myself as long as they cover all features.

2 Likes

What your describing here is a typical use case for record types.

And this is a typical use case for variant types.

If I understand you correctly, your main use case for untagged unions would be to reuse TypeScript types or be compatible with their type system in some way, isn’t it? It seems to me that TypeScript is mixing product and sum types, trying to be compatible with JavaScript. I don’t think PureScript should do the same. In my opinion, it would be a mistake.

I totally agree with this point. So @ajnsit, do you think the untagged-union package is enough for your use cases?

1 Like

No I’m still talking about type sums here, not products.

I wouldn’t say that. This is a typical use case for sum types yes, of which variants are a form of. However the point is to be able to do this without having to box the entire datatype.

Writing this with typical ADTs would look as below. However here the records have been boxed into the constructors (which at the JS level are separate class objects).

data State = LoadingState LoadingState
           | FailState FailState

With purescript-variant I would still have to box the variants into a tag type. The advantage over ADTs is that this is an open union instead of a closed one.

type State =
  ( (failState :: FailState
  + (loadingState :: LoadingState
  )

However we don’t really need the boxing here to distinguish between the variants. The error field, which is present on both FailState and LoadingState, can be used to distinguish between the two.

Union types will allow the compiler to access this information so effectively you could write -

type State = {... fail state fields} | {... loading state fields}

Typescript is just an example of union types. I don’t have any interop with Typescript in mind.

I don’t see how product and sum types are being mixed. Can you give an example?

It looks pretty good to me! Not yet sure if it covers all the usecases, or if it is missing some features then if can they be added without compiler support. But it does seem a solid step forward.

1 Like

Genuine question: are you aware of any languages or papers that have anonymous union types without subtyping? My impression is that anonymous union types invariably involve fairly complex notions of subtyping and variance, but I could be wrong as I have almost no experience with languages that support this feature.

5 Likes

As someone coming from F# and starting to look seriously at Purescript, I am finding the language unnecessarily verbose. A white-space sensitive language simply does not need so much commas. A newline with the same indent should in itself mean a new thing in the sequence (as determined by the bracketing context of [..], {..}, (..), =.., etc. ) without a need for a comma. As such, a more indented following line would mean the continuation of the preceding expression. Consequently, perhaps a pragma to enable a Purescript “light syntax” pre-compilation; modern F# has spoiled me. :slight_smile:

4 Likes

@gdennie welcome to PureScript! To be honest, I don’t think your request is going to happen. A member of the team (was it Harry?) has stated before that they’re not going to add language extensions or anything of the sort, since it would hugely increase the maintenance burden. Changing the syntax like that now would break every single library out there, so yeah, it’s probably not happening.

Also, I’m not sure of this, but I think this is off-topic to this thread, so I think you want to create a separate thread for that if you want.

I dont think the syntax changes @gdennie suggested would break existing code. And even if some syntax change does, can’t a simple script be written to convert between the two (making for painless updates)?

1 Like

Not commenting on the merit of the idea, but the language largely follows Haskell’s syntax apart from minor differences. It’s not really desirable to change that.

Perhaps a new thread could be made for this suggestion?

2 Likes

Created Suggestion: can we use PureScript's whitespace sensitivity to reduce verbosity?. We can continue the discussion there.

2 Likes

To be honest I haven’t used typescript beyond toy examples either, and I haven’t read any papers on this topic. It was just my impression that it won’t be too complicated to implement this, but I could be wrong.

The untagged-unions package gets most of the way there without compiler support, so it does look like there shouldn’t be many complicated corner cases to consider.

1 Like

This is exactly the point why I think anonymous union types in PureScript won’t work with all backends. You can only distinguish a float from an int or a pointer at runtime if and only if it is tagged. (As stated before by me and @ntwilson.)
Subtyping is not necessarily needed I think, but an object hierarchy already has such runtime information, so you can easily reuse it for this feature. As such, untagged unions are a neat way to add some kind of sum types to object oriented languages.

The untagged-unions packages works for JavaScript, because browsers carry runtime information of ints/numbers/bools around (either in their prototype or by truncating the integer representation range). It won’t work for Go or C++ or WebAssembly without using a tag. As en example, the F# solution also needs to box primitive types and struct types for untagged unions (see Drawbacks point 4 in the RFC).

Yes, I agree. That’s where row types come in to express there should be an error field, and then dispatch on the value of that field :wink: That’s why I’m saying sums and products are mixed up.

From the TypeScript handbook:

interface Bird {
  fly(): void;
  layEggs(): void;
}

interface Fish {
  swim(): void;
  layEggs(): void;
}

declare function getSmallPet(): Fish | Bird;

let pet = getSmallPet();
pet.layEggs();

This is programmed as if it is a sum type: Fish or Bird. But actually both interfaces fit a common product type: type FishOrBird r = {layEggs :: Effect Unit | r} (which is actually the intersection of two rows). The discriminated union example on the same page is a sum as we know it (could open or closed), the shown intersection type is a row merge (which is actually the union of two rows, to make things more complicated :stuck_out_tongue:). So I think it’s all already there in PureScripts type system!

What I’m trying to say: having untagged union and intersection types make a lot of sense when your language actually only has products (i.e. objects). PureScript already has both sums and products, and on top of that row types, to make both more flexible.

But maybe we should discuss this also in another topic (if needed). Because half the What PureScript needs discussion is now about union and intersection types :laughing:.

I guess I’m imagining something like

type Foo = (A -> String) | (B -> String)

Which would need to canonicalize to

type Foo = (A | B) -> String

And then you would need subtyping and variance to determine that you can still pass in an A -> String or a B -> String.

2 Likes