Allow defining named records

pkapustin · November 24, 2019, 1:07pm

@natefaubion @jy14898
I think I understand your alternative suggestion with overloaded syntax, and I also agree with your arguments regarding the fact that while it simplifies field access, it may complicate other things.
However, this is not what I am proposing.
You are saying “For nominal typing you always have to define a wrapper”. Why is this so?
What is preventing us from defining types like PersonRecord and CompanyRecord, that are parameterized by a row, just like Record?
If we can define such types, then we should get nominal typing in the sense that PersonRecord would not be the same as CompanyRecord, even if the fields are the same.
Let me use an example from Haskell to explain what I mean.
Consider module Data.Row.Records from Haskell’s row-types. If we generalized the code to use a type variable instead of the concrete Rec type, we could define types like PersonRecord and CompanyRecord, that are parameterized by a row.
http://hackage.haskell.org/package/row-types-0.3.0.0/docs/src/Data.Row.Records.html#Rec

jy14898 · November 24, 2019, 2:38pm

You could add a way to create record-like types in the language without newtyping records, but all the same problems exist. Sure, now we no longer talk about wrapping/unwrapping, but you still need to use type classes to allow the record operators to work on multiple types. That’s just how it works, there isn’t another way to overload a function or operator in purescript.

Now, you could hack in something different that doesn’t use a type class, but now you have very different, potentially complicated code that is only used once in the compiler. I think what you want is something that turns all .field uses into specialized getFieldPerson/getFieldRecord ... functions at compile time, depending on how it’s used, without using class constraints? From my knowledge, this is no easier than adding OverloadedRecords, and is less powerful, with more issues

eg:

module Person
  ( Person(..)
  , getName
  ) where

-- Some sort of syntax that tells the compiler this is a new instance of a
-- datatype that has the same semantics as a Record
newrecord Person r = Person ( name :: String | r )

getName r = r.name

-- Which one produces an error? Do we default to Records without type hints? 
-- That would be very weird behavior, once again purescript doesn't do that for
-- anything else that I know of
-- If we didn't have one of the two below, and didn't have defaults, then who
-- decides its type? The outside world of the module? Once again very weird
-- However if we used constraints, getName would be a valid polymorphic
-- function that would work on both
a = getName ({ name :: "Hello world" } :: Record _)
b = getName ({ name :: "John" } :: Person _)

You could try and require type hints for code like that, but you either directly have to hint all uses of _.field, or mess with the inference code such that at some point this ‘unknown type’ expression must be used in a place where it is valid. I don’t like ‘handwaving’ that there’s an unknown type in intermediate expressions, you should be able to give a type to all intermediate expressions of a tree, even if it’s never used

It’s just much simpler to newtype records and wrap/unwrap, there are helpers such as https://pursuit.purescript.org/packages/purescript-newtype/3.0.0/docs/Data.Newtype that let you write code on records and then lift them to your newtype

To be honest, in code where I needed performance (or at least simple js output) and not too much noise from wrapping/unwrapping (which is optimised away for newtypes), I just use single letter newtype constructors

pkapustin · November 24, 2019, 5:52pm

@jy14898
I agree with your arguments.
Maybe we could use an approach similar to the latest proposal for records in Haskell, as @natefaubion mentioned? So that syntax for accessing and updating fields desugars to type classes with instances that are solved automatically?
Then all the expressions will have known types, and we will get all the benefits from using named records rather than wrappers (as only one type is involved).

jy14898 · November 24, 2019, 6:56pm

@pkapustin Yeah, I think compiler solved instances for any of the type class approaches are perfectly fine, the previous arguments people discusses for and against (including more ambiguous errors) still apply

I’d probably vote for it to go in, as long as we optimise the uses of the methods for the Record instance (like how we optimise Semigroup Int/Monad Effect/Monad ST etc)

Just need to find someone motivated enough to implement it

pkapustin · November 24, 2019, 8:40pm

@jy14898
Nice, I think that this could be conceptually separated in two parts:

Type class based overloaded syntax for accessing / updating records.
Named records (named semantic equivalents of the Record type) with compiler solved instances for the needed type classes.

So, while the first part simplifies working with newtype wrappers and may be preferred for existing code, the second part provides improved type safety and allows to define instances for records without the need for newtype wrappers.

natefaubion · November 25, 2019, 1:35am

My two cents as an aggressive user of newtyped records:

As background, at work we code-generate our API bindings and newtype all the record types. We’ve got hundreds of these and a lot of code that deals with APIs, so I’m dealing with newtype ergonomics issues all the time. I really like that we newtype this stuff overall, since it makes maintenance (and general browsing around with IDE feature) a lot easier. We started out just using records (codecs are all generated), and I will gladly put up with newtype ergonomics rather than go back. In non-codegened parts of our codebase, we use a mix.

I don’t really understand this special feature for “named records”. Newtypes are more general, and you don’t lose anything with them. You have to have some sort of declaration for a nominal type regardless, so I don’t see why a newtype declaration is a burden at all. That is, I don’t see what you are gaining with this particular proposition such that it warrants a completely new language feature on top of what we have.

newtype Foo = Foo { bar :: Baz }

vs

newrecord Foo (bar :: Baz)

Is not a burden that I feel in my usage of newtypes. Note that in PureScript, all data types have constructors. It’s not clear to my why one would want a data type that has no way to construct it except implicitly. You might say, “Just add a type signature to direct it”, but that’s exactly what the constructor does, and why it exists.

I also don’t agree that there’s anything type-safe about this, at least in a way that’s any different from newtypes. Newtypes don’t inherently give you type safety. If you expose the guts (which is what your suggestion would do), then you are only gaining explicitness and documentation. If I have a String newtype, and expose the constructor for you to use, there’s nothing inherently safe about it. You just know what the expectation is (eg. Data.String.replace). But if it’s constructed completely implicitly, then you don’t even have that!

It also isn’t clear to me what alternative instances you would be writing for a structural type like in your examples that would be different from the records instances.

newrecord Foo r

or

newrecord Foo (bar :: Baz | r)

Essentially require you to treat this in a generic structural way because there’s a completely unknown, polymorphic component to it. How do you constrain the tail in your instances that’s any different from records as-is?. I don’t know of a lot of instances out there for record newtypes that are also structurally polymorphic. All the instances I’ve wanted to write for newtype records have been things like Monoid or Ord instances and require a closed row, and newtypes have never been an issue for me in that regard. Otherwise, the use case is 100% pertaining to codecs and codec-like things. But again, this are all closed entities. Overall, I would really like concrete examples of the kinds of things you would be writing and why this would be better, noting how it’s more type safe and how the instances are novel.

As far as overloaded syntax, we have never accepted a proposal for overloaded literals for a simple reason: it breaks the repl. If a beginner inputs a literal in the repl just to kick the tires, they are immediately hit with a confusing error pertaining to instances. “Wow! I can’t even type in simple literals in the repl without something breaking. This language is way too complicated for me if I can’t even run simple calculator expressions,” they might say (they would). Ideas to get around this usually involve some sort of defaulting. Defaulting is problematic because it’s only useful in the repl experience, and so you have to essentially split typechecker behavior between module and repl. It also means things behave one way in the repl, and a completely different way in modules is yet another way to confuse newcomers. Consequently, we make sure all literals have a straightforward concrete type that you can completely infer from the syntax. I do not think you will be able to make a case to convince all the maintainers to change this for all users, and essentially break records in the repl.

I’ll admit I’m partial to the idea of having dot-syntax for access though (and update to some lesser degree). I definitely feel this pain (unwrap, unwrap everywhere…). We often use unwrap, but also pattern matching. It’s hard to come up with a consistent way to deal with this that everyone likes. We could code-generate lenses for all of the fields and chuck them in a module, I guess, but I personally think it is just silly to go to this extent. I would really, really like some sort of easy dot syntax. The repl argument is hard to argue for this case since it’s already type-directed, and you would rarely get into ambiguity errors. Maybe with things like _.foo or \a -> a.foo, but these are already functions and don’t print in the repl. Dot-syntax for newtypes is one of our most requested features, and I think it’s a shame we don’t have a good solution to this. But I also think it is hard to argue that having a straightforward, unambiguous type for dot-syntax is a bad thing. Maybe something like Adding syntax for annotations on declarations would allow users to opt in to this at a module level.

jy14898 · November 25, 2019, 3:05am

I still haven’t formalized my idea for an alternative, but I might as well put it out there if there’s a possibility of consideration:

At the moment, all symbols (and fields) are publicly accessible as anyone can construct them, so you can never really have a record whose fields are locked away. Obviously we have discussed newtyping a record as one solution, but imagine a different approach: being able to construct new nominal fields

eg:


-- Not sure on the exact syntax, but doesn't really matter
newfield Person :: ( Name :: String, Age :: Int )

-- The types Name and Age are now introduced, with kind Symbol
-- I guess Person has kind Field?
-- A record must have both 'subfields' inside for it to have the field Person
-- Only one will result in a type error

-- ERROR: Inferred field person, but missing subfield Age
rec = { Name: "Joe" }

-- Works
-- We can mix normal fields
rec :: Record ( Person, other :: Unit )
rec = { Name: "John", Age: 20, other: unit }

rec.Name :: String

-- Unambiguous type inference, dotsyntax is still only defined on Records
_.Name :: forall r. Record ( Person | r ) -> String

-- Can define parameterized fields
newfield Tuple a b :: ( Fst :: a, Snd :: b )

-- Should be able to infer this
_ { Fst = 10 } :: forall r a b. Record ( Tuple a b | r ) -> Record (Tuple Int b | r )

-- Not sure how this would work in terms of introduced names
-- Technically introduces Person.Name, Person.Age etc?
-- I'd prefer these to be 'flat', in that a field is just a collection of other fields
newfield Combined :: ( Person, Tuple Int Int )

comb.Person.Age :: Int

The idea is that if this module doesn’t export these symbols, then people can’t poke around inside. I guess we’d implement it with Symbol() from JavaScript to create unique symbols avoid conflicts with normal symbols

I don’t think it solves the problem of creating class instances on Records (at least not without breaking existing record instances)

Of course if you are newtyping records because you want to represent some foreign object, then this is no use as the symbols are not normal string symbols

I kinda see this as a way of combining haskells records with purescripts (in that a module can own a symbol/field)

EDIT: I have no idea how multiple fields with different arguments works (Like multiple tuples). In my mind I could create a unique key per name like Tuple, but now I’d need a unique key for every possible tuple configuration

hdgarrood · November 25, 2019, 5:07am

As far as overloaded syntax, we have never accepted a proposal for overloaded literals for a simple reason: it breaks the repl

This isn’t the only reason - for me, the more compelling reason not to accept an overloaded literals proposal is that it significantly hampers type inference. In fact, adding overloaded syntax for records like this would be a breaking change. Consider

example =
  [... a big expression involving references to fields of `opts` ...]
  where
  opts = { foo: 1, bar: true, baz: "baz" }

which works currently because a concrete type can be inferred for opts. With overloaded records this would no longer be the case; you’d get a NoInstanceFound error because the compiler won’t know what kind of record to use.

I’ll admit I’m partial to the idea of having dot-syntax for access though […] Maybe something like Adding syntax for annotations on declarations would allow users to opt in to this at a module level.

I’m not keen on this, for the same reason I’m not keen on language pragmas (I think this basically is a language pragma), which is that with n pragmas you have 2^n versions of the language, and as n increases you very quickly find that tooling (eg IDE plugins, formatters) just can’t keep up, and breaks with configurations other than the author’s personal preferred one. Of course this burden will be felt by people, too; it would be a pain to have to go back and check at the top of each module you might be working on to find out what record syntax means in that module.

pkapustin · November 25, 2019, 11:48am

@natefaubion @hdgarrood

I don’t mean that named records should be implicit in any way. Also, we don’t necessarily need overloaded literals. For example, in Haskell or Frege we can write Person {name = “John”} to create a record. As long as this means creating both a record and a newtype wrapper in Purescript, we could use a different syntax to explicitly specify the record type for the literal, for example, Person : {name = “John”} or something else. Personally, I still think overloaded literals would be better, but that’s just a matter of syntactic preference.

Regarding the gains, I agree with you that newtypes are more general. However, while Person (Record (name :: String)) is two things (a wrapper around an anonymous record), Person (name :: String) is one thing (named record) that supports both dot syntax and defining type class instances without the need for wrapping / unwrapping (you mention yourself that there is some inconvenience with wrapping / unwrapping). @garyb explains that we cannot define arbitrary instances for records, as they would overlap. But this is not the case with named records, as instances for Person (name :: String) and Company (name :: String) are not going to overlap. So the idea is to get something a bit like what @ssadler is asking for here, one type that supports everything.

Regarding type safety, what I mean is that if one has a Person (Record (name :: String)), one can, for example, unwrap it and mistakenly re-wrap it as a Company (Record (name :: String)). If one has a Person (name :: String), the nominal Person part cannot be separated from it.

Regarding instances, I am not thinking about any novel instances that one could not define before using a newtype wrapper. I haven’t had the chance to think about the details, but conceptually one should be able to define any instances for named records that one can now define for newtype wrappers, and also derive generic instances that are now available for Record.

pkapustin · November 28, 2019, 10:52am

@jy14898
Regarding your idea, it looks interesting, but it solves a different problem, right?

As far as I understand, you are looking at ways to prevent fields from being publicly accessible, and an additional way to compose / extend records?
If this is correct, it would be interesting to see more in terms of how this relates to rows and how it compares to the existing ways of composing / extending records using rows.

jy14898 · November 28, 2019, 12:08pm

@pkapustin It solves some of the same problems, for eg when you say you gain safety by having your custom type Person not match with Record, this gives you a way of having your custom Field not match with normal Record fields. The way I think about it, is if we implemented rows using PureScript (if we had the right additions like polykinds etc, and syntax wasnt an issue), then it’s just adding a new row constructor:

foreign import data RowField :: Symbol -> Type -> # Type -> # Type
foreign import data RowNil :: # Type

-- custom field constructor
foreign import data RowPerson :: # Type -> # Type

Indeed you could add this constructor today, but without support for the associated Symbols like Name etc, you would wouldn’t be able to use normal Record syntax. Symbols would work a little differently as now they have a Row constructor associated with them (normal string symbols go with RowField, custom ones go with their custom constructor). I guess that still doesn’t encode that a certain constructor requires all their symbols (2 way relation? Row -> Symbols, Symbol -> Row ?)

(this example also doesn’t cover how we deal with the equality of rows, where the order of the constructors doesn’t matter… possibly solvable with a class? I’m not saying we should implement it in purescript anyway, just as an example)

pkapustin · November 30, 2019, 8:48am

@jy14898
These complex fields from your examples, for example, Person, Tuple, Combined, they are essentially rows, right? Like, wouldn’t it be natural to say that we are composing our record by combining these rows? I am trying to understand why you would like to have a notion of a complex field, rather than just a row.

pkapustin · November 30, 2019, 9:02am

@natefaubion @hdgarrood
I would like to consider one more reason for why I think named records is a good idea.
In my project, I have a complex domain where it is relatively difficult to come up with names for things to begin with. With newtype wrappers, there will be pattern matching somewhere in the code. This means that I need to give names to both wrapped things and unwrapped things. They may also have different implementations for Eq, Ord and Show. This means that if I want to make it clear which one is wrapped and which one is unwrapped, i would have to choose two different names for each.
This is why I think named records can be a big improvement for the language, because then you only have one thing.

pkapustin · November 30, 2019, 9:29am

@natefaubion @hdgarrood
Also, today record semantics in Purescript and Haskell is different. You can do things in Purescript that you cannot do in Haskell, and vice versa. This is essentially what is discussed here.
With named records, Purescript records would be able to do everything Haskell records can, and more (because of extensibility).

jy14898 · December 1, 2019, 2:45pm

I may be using terminology weirdly, specifically using the word row to mean something that isn’t of the form (symbol, type) etc. The goal is like you say, composable datatypes that are represented by records underneath. The reason to not use normal string symbols is so that a datatype can own a Symbol (in ES6 implementation), so that when we merge datatypes they’re guaranteed to not conflict (and also to allow hiding the types implementation etc).

Probably very easy to implement all of this without directly using records, and then use the overloaded record syntax to then allow access to the datatypes values. The point was that if we stay inside records, then we don’t have to overload record syntax (although in reality it isn’t that straight forward, and tbh there are issues with my original example like multiple Fst and Snds, which ones are paired together?)

I don’t use all the lens stuff but I assume they solve the same problems, composing types and allowing easy getters/setters?

natefaubion · December 2, 2019, 5:15pm

I mostly agree with this. My consolation is that this doesn’t change syntax or semantics of the language, so it doesn’t affect tooling (at least any that exist). It’s just inserting code that you would otherwise write yourself, so I think it’s more along the lines of instance deriving.

You only need to wrap/unwrap because you can’t use record syntax with them. If you had overloaded record syntax you wouldn’t need to manually wrap or unwrap at all.

pkapustin · December 5, 2019, 10:25pm

@natefaubion
Suppose we have the following:

newtype Person = Person { name :: String, address :: String }

instance eqPerson :: Eq Person where
  eq (Person person1) (Person person2) = 
    person1.name == person2.name

comparePersons :: Person -> Person -> Boolean
comparePersons person1 person2 = person1 == person2

comparePersons' :: Person -> Person -> Boolean
comparePersons' (Person person1) (Person person2) = person1 == person2

While comparePersons uses Eq instance Person's Eq instance, comparePersons' uses Record's Eq instance that has a different behaviour.
If I found function comparePersons' in a code base, I would be wondering whether the use of Record's Eq instance was intentional or an error.
I think that this is increase in accidental complexity is related to the fact that two types are used instead of one: the record itself: { name :: String, address :: String } and the wrapper: Person.
With named records, we would only have one type (similar to Haskell), and this would not be a problem. That’s why I think that named records, in addition to overloaded syntax for records, would be ideal.
What do you think?

garyb · December 6, 2019, 2:58am

That’s a feature as far as I’m concerned - one of the motivations for newtype is using them to direct a different instance choice than the default. Types often have multiple sensible instances, depending on use case.

pkapustin · December 6, 2019, 11:51am

@garyb
I agree that it’s normal to use newtypes to define different instances. However, I think that in some situations it may be a bit problematic to have Eq, Ord and Show instances derived automatically for records, as such instances will not always be sensible. This also means that in many cases we will have two instances for Eq, Ord and Show, when only one is really needed.
I think that writing or deriving instances should be up to the programmer.
With named records, we can have only one type, and instances for Eq, Ord, and Show may be “opt-in”: one may choose to derive them and get a default record instance, write a custom instance or do neither of those, similar to Haskell.

garyb · December 7, 2019, 1:07pm

I think the reason this discussion isn’t really going anywhere, is that, for me at least, I don’t really know what is trying to be solved here - as far as I’m aware newtype already handles all of the problems listed aside from syntax ergonomics.

The most commonly raised issue that people have with the syntax aspect is the inability to access fields without unwrapping the newtype. I certainly understand that, as writing getters or having to unwrap can be quite tedious.

I think the case for needing overloaded record literals in general is much less common, but there are times where it would be nice.

So hypothetically, if those things were possible (dot access, literals), what other problems would this proposal be solving? (Aside from the comparePerson one listed above. Sorry - I’m going to reject that one - the arguments you most recently gave against it apply equally to all newtypes, and they aren’t going away any time soon! The inner record instances are not automatically derived, they’re just the instances for Record row, similar to if it was a newtype over some Maybe it’d be the instances for Maybe a).