Allow defining named records

Disclaimer

My knowledge on the topic is rather limited. I don’t know if this suggestion makes sense when it comes to type theory, and whether it is easy or even possible to implement. Also, maybe there is a better alternative. However, I would still like to share the idea. We discussed this briefly with @natefaubion and @jy14898 on slack.

Background

Purescript records have a lot of benefits.
However, one question that seems to repeatedly come up is whether one should use records with or without newtype wrappers. One typical answer seems to be that one should normally use records without wrappers and create wrappers when necessary (for example, when one needs to define a type class instance).

I would like to propose an idea that could potentially let us have one type that supports dot syntax, type class instances, and also improved type safety without the need for wrappers.

Named records

The idea is to allow defining named records, for example, PersonRecord or CompanyRecord, that behave just like Record.
This gives us a combination of nominal and structural typing: PersonRecord (name :: String | others) unifies with PersonRecord (name :: String), but not with CompanyRecord (name :: String).

Let’s consider an example.

Currently we can write:

person :: Record (name :: String)
person = { name : "John" }

I would like to be able to write:

person :: PersonRecord (name :: String)
person = { name : "John" }

The code above assumes that there is a way to define named records like PersonRecord, and also, as @natefaubion suggested, that we have something like OverloadedRecords, so that we can use normal record construction syntax to construct a named record.

Benefits:

  1. We can allow defining instances for records, as PersonRecord (name :: String) and CompanyRecord (name :: String) are different types.
  2. There is no need for newtypes and wrapping / unwrapping.
  3. Normal dot notation can be used for accessing nested records.
  4. Improved type safety when necessary (for example, we cannot use CompanyRecord (name :: String) where PersonRecord (name :: String) is expected.
  5. Named records can be extended in the same way as anonymous records, so, if a function expects PersonRecord (name :: String | others), it can take PersonRecord (name :: String).

The important thing here is that we get one type that supports all of the functionality, without the need for wrappers. And of course, we could still use anonymous records if we prefer that.

Comparison to other languages

I don’t know any languages that support this kind of records.

Haskell:

Normal records in Haskell are nominally typed:

data PersonRecord = PersonRecord
  {
    name :: String,
    age :: Int
  }

There are several libraries that support structurally typed records, but I couldn’t find any libraries that support named structurally typed records.
Here is an example with row-types:

class Greet a where
  greet :: a -> String

type Person = Rec ("name" .== String)

instance (person ~ Person) => Greet person where
  greet person = "Hi, " ++ person .! #name

person :: Person
person = #name .== "John"

This is very similar to Purescript. In addition, we can make Person an instance of Greet type class, but I don’t think that such usage is ideal, because:

  1. If we define type Company = Rec ("name" .== String), it is now automatically an instance of the Greet type class.
  2. We can quickly run into the problem with overlapping instances.

Idris

Normal records in Idris are also nominally typed:

record Person where
  constructor MkPerson
  name : String
  age : Int

Elm

Elm records seem to be very similar to Purescript: anonymous and structurally typed. But as long as Elm does not have type classes, there is also less need for newtype wrappers.

type alias Person = 
  {
    name : String,
    age : Int  
  }
1 Like

I think just to manage expectations I should start off by saying this is unlikely to happen - adding features to the language is risky in that it’s often difficult to foresee bad interactions with other language features, it complicates the compiler meaning that future work on, say, performance or improving type errors becomes harder, and it means that more effort is required to learn the language, so at first glance my recommendation would be to not add this.

Regarding this proposal specifically, I think some detail is missing here:

The code above assumes that there is a way to define named records like PersonRecord , and also, as @natefaubion suggested, that we have something like OverloadedRecords , so that we can use normal record construction syntax to construct a named record.

To be able to evalute this proposal, more detail is needed on how these named records would be defined, and also on what this OverloadedRecords-like feature would be.

It’s also worth noting that at the moment you can do

newtype PersonRecord rec = PersonRecord (Record rec)

which, as far as I can tell, offers all of the benefits you are proposing, except that the syntax is a little more noisy.

1 Like

@hdgarrod, I agree with many of your comments.

More work needs to be done before the proposal can be fully evaluated, but I think that at this point we could discuss the idea in general, before we spend the time on the details.
Also, I understand that adding features to the language is not an easy thing, but I think that this could be a large improvement for Purescript.
While, as you say, we can solve a lot by defining a newtype wrapper, I think that having one type that can do everything (named record) instead of two types (record and wrapper) will make a big difference:

  1. We can use normal dot syntax for accessing nested records, as intended, without having to use lenses or other complex ways of accessing the fields when the wrappers are used.
  2. We can define type class instances for our types, as intended, without having to define wrappers.
  3. We get a much simpler structure in case of nested records, without the need for “inner” and “outer” types, where “inner” and “outer” types may have different instances for classes like Ord, Eq and Show.
  4. Having a simpler structure in case of nested records means simpler code.
  5. With named records, one cannot mix different named record types by mistake, even if they have the same fields, so one can decide on the desired level of type safety when one is defining a record, and it does not depend on whether a wrapper is used or not.

Here are some discussions that I find relevant:


Note, that for nominal typing you always have to define a wrapper. It’s just how it works. To make something overloaded, you have to

  • Define a nominal type to direct instance search
  • Define a class with methods on which to dispatch
  • Define an instance pointing your type to some implementation
  • To make something completely implicit, you have to have the compiler insert method calls on behalf of the programmer.

This is why I brought up that this would have to be an overloaded syntax extension. The compiler can derive boilerplate for you, but these are the primitives in which you have to frame the discussion. The type system has everything you need to talk about nominal and structural typing, as @hdgarrood points out with the newtype example. This does all that you want from a type system perspective, it’s just that the syntax for using it isn’t to your liking (and there are good reasons for it).

Given that we already have anonymous records, and you want to use the same syntax for both, an overloaded syntax would probably involve operations to, from, and with anonymous records. Kind of like how OverloadedStrings or OverloadedLists just inserts calls to and from the primitive type. You would need a way to talk about all syntactic forms we have.

  • Constructing a record: fromRecord { ... }
  • Property access: (toRecord a).foo
  • Pattern-matching on a record: case toRecord a of ...
  • Updating a record: fromRecord ((toRecord a) { foo = ... })

A class like this would have identity as it’s implementation for stock Record, and newtype instances could be trivially derived as coercions. With a decent optimizer, all these cases would be inlined into their constructors for more complex examples.

The question with any overloaded syntax proposal is what do the errors look like? Overloading anything brings ambiguity, and potentially confusing errors. The only way to really know what it’s like in practice is to implement it and see. One could potentially do an experiment using the CST parser to rewrite modules into this desugared form and see what happens.

I’d just like to add that after thinking about this during the time between we last talked, I realised I would probably never use it. Of course this is just my opinion which relates to how I use purescript

I think full rebindable record syntax would be too powerful, I like that I can assume accessing a record field will be quick and will terminate. So if we just implement the class such that it has methods toRecord and fromRecord, and all instances are defined such that they all wrap/unwrap in O(1) time (perhaps by only allowing the compiler to implement them), then we achieve part of what I like about records.

Records are also nice because like other product types like tuples in haskell, they are convenient. Easy to create and easy to poke around inside. If you create a new type which acts exactly like a Record without needing to unwrap at the use site (such as the class I described previously), then it has the semantic equivalence of a Record: anyone can poke around inside it.

This goes against what I’d say is best practice: to not export your constructors, especially if you have invariants that must hold for your values that may span multiple fields of your type. An example might be a type User that has fields age and paid. A user can only pay if they have an age of 18 or older, as there may be explicit content. Perhaps they can still access other safe for work content without paying, but can then be any age. (what exactly this service does doesn’t really matter…)

You can imagine it’s much simpler to access age and paid fields via the user.age syntax, and modify via the update syntax, but by doing so you are allowing anyone to break your assumptions about the type. Imagine you come back to your code a month later and accidentally write user { paid = true } for a young user: nothing would flag up you have done that at compile time.

If we allow for the full rebindable syntax, then maybe you can just check what values are being assigned to what values (update syntax would have to be its own class too, as multiple field assignments might only be valid together). But as I said previously, I’m against the full syntax and indeed it complicates things greatly.

Indeed maybe you could implement the checks in fromRecord, but then we need to have our types allow for failure, which previously wasn’t required. Plus, failure is still at runtime.

As I said, this is just my opinion, maybe people have enough use cases to where it’d motivate it more…

I have actually been thinking of a different approach, but haven’t formalized it yet, but can potentially give us the convenience of field access without sacrificing assumptions, and also be simpler to add to existing purescript. Not sure if I’ll propose it just yet

@natefaubion @jy14898
I think I understand your alternative suggestion with overloaded syntax, and I also agree with your arguments regarding the fact that while it simplifies field access, it may complicate other things.
However, this is not what I am proposing.
You are saying “For nominal typing you always have to define a wrapper”. Why is this so?
What is preventing us from defining types like PersonRecord and CompanyRecord, that are parameterized by a row, just like Record?
If we can define such types, then we should get nominal typing in the sense that PersonRecord would not be the same as CompanyRecord, even if the fields are the same.
Let me use an example from Haskell to explain what I mean.
Consider module Data.Row.Records from Haskell’s row-types. If we generalized the code to use a type variable instead of the concrete Rec type, we could define types like PersonRecord and CompanyRecord, that are parameterized by a row.
http://hackage.haskell.org/package/row-types-0.3.0.0/docs/src/Data.Row.Records.html#Rec

You could add a way to create record-like types in the language without newtyping records, but all the same problems exist. Sure, now we no longer talk about wrapping/unwrapping, but you still need to use type classes to allow the record operators to work on multiple types. That’s just how it works, there isn’t another way to overload a function or operator in purescript.

Now, you could hack in something different that doesn’t use a type class, but now you have very different, potentially complicated code that is only used once in the compiler. I think what you want is something that turns all .field uses into specialized getFieldPerson/getFieldRecord ... functions at compile time, depending on how it’s used, without using class constraints? From my knowledge, this is no easier than adding OverloadedRecords, and is less powerful, with more issues

eg:

module Person
  ( Person(..)
  , getName
  ) where

-- Some sort of syntax that tells the compiler this is a new instance of a
-- datatype that has the same semantics as a Record
newrecord Person r = Person ( name :: String | r )

getName r = r.name

-- Which one produces an error? Do we default to Records without type hints? 
-- That would be very weird behavior, once again purescript doesn't do that for
-- anything else that I know of
-- If we didn't have one of the two below, and didn't have defaults, then who
-- decides its type? The outside world of the module? Once again very weird
-- However if we used constraints, getName would be a valid polymorphic
-- function that would work on both
a = getName ({ name :: "Hello world" } :: Record _)
b = getName ({ name :: "John" } :: Person _)

You could try and require type hints for code like that, but you either directly have to hint all uses of _.field, or mess with the inference code such that at some point this ‘unknown type’ expression must be used in a place where it is valid. I don’t like ‘handwaving’ that there’s an unknown type in intermediate expressions, you should be able to give a type to all intermediate expressions of a tree, even if it’s never used

It’s just much simpler to newtype records and wrap/unwrap, there are helpers such as https://pursuit.purescript.org/packages/purescript-newtype/3.0.0/docs/Data.Newtype that let you write code on records and then lift them to your newtype

To be honest, in code where I needed performance (or at least simple js output) and not too much noise from wrapping/unwrapping (which is optimised away for newtypes), I just use single letter newtype constructors

1 Like

@jy14898
I agree with your arguments.
Maybe we could use an approach similar to the latest proposal for records in Haskell, as @natefaubion mentioned? So that syntax for accessing and updating fields desugars to type classes with instances that are solved automatically?
Then all the expressions will have known types, and we will get all the benefits from using named records rather than wrappers (as only one type is involved).

@pkapustin Yeah, I think compiler solved instances for any of the type class approaches are perfectly fine, the previous arguments people discusses for and against (including more ambiguous errors) still apply

I’d probably vote for it to go in, as long as we optimise the uses of the methods for the Record instance (like how we optimise Semigroup Int/Monad Effect/Monad ST etc)

Just need to find someone motivated enough to implement it :wink:

@jy14898
Nice, I think that this could be conceptually separated in two parts:

  1. Type class based overloaded syntax for accessing / updating records.
  2. Named records (named semantic equivalents of the Record type) with compiler solved instances for the needed type classes.

So, while the first part simplifies working with newtype wrappers and may be preferred for existing code, the second part provides improved type safety and allows to define instances for records without the need for newtype wrappers.

My two cents as an aggressive user of newtyped records:

As background, at work we code-generate our API bindings and newtype all the record types. We’ve got hundreds of these and a lot of code that deals with APIs, so I’m dealing with newtype ergonomics issues all the time. I really like that we newtype this stuff overall, since it makes maintenance (and general browsing around with IDE feature) a lot easier. We started out just using records (codecs are all generated), and I will gladly put up with newtype ergonomics rather than go back. In non-codegened parts of our codebase, we use a mix.

I don’t really understand this special feature for “named records”. Newtypes are more general, and you don’t lose anything with them. You have to have some sort of declaration for a nominal type regardless, so I don’t see why a newtype declaration is a burden at all. That is, I don’t see what you are gaining with this particular proposition such that it warrants a completely new language feature on top of what we have.

newtype Foo = Foo { bar :: Baz }

vs

newrecord Foo (bar :: Baz)

Is not a burden that I feel in my usage of newtypes. Note that in PureScript, all data types have constructors. It’s not clear to my why one would want a data type that has no way to construct it except implicitly. You might say, “Just add a type signature to direct it”, but that’s exactly what the constructor does, and why it exists.

I also don’t agree that there’s anything type-safe about this, at least in a way that’s any different from newtypes. Newtypes don’t inherently give you type safety. If you expose the guts (which is what your suggestion would do), then you are only gaining explicitness and documentation. If I have a String newtype, and expose the constructor for you to use, there’s nothing inherently safe about it. You just know what the expectation is (eg. Data.String.replace). But if it’s constructed completely implicitly, then you don’t even have that!

It also isn’t clear to me what alternative instances you would be writing for a structural type like in your examples that would be different from the records instances.

newrecord Foo r

or

newrecord Foo (bar :: Baz | r)

Essentially require you to treat this in a generic structural way because there’s a completely unknown, polymorphic component to it. How do you constrain the tail in your instances that’s any different from records as-is?. I don’t know of a lot of instances out there for record newtypes that are also structurally polymorphic. All the instances I’ve wanted to write for newtype records have been things like Monoid or Ord instances and require a closed row, and newtypes have never been an issue for me in that regard. Otherwise, the use case is 100% pertaining to codecs and codec-like things. But again, this are all closed entities. Overall, I would really like concrete examples of the kinds of things you would be writing and why this would be better, noting how it’s more type safe and how the instances are novel.

As far as overloaded syntax, we have never accepted a proposal for overloaded literals for a simple reason: it breaks the repl. If a beginner inputs a literal in the repl just to kick the tires, they are immediately hit with a confusing error pertaining to instances. “Wow! I can’t even type in simple literals in the repl without something breaking. This language is way too complicated for me if I can’t even run simple calculator expressions,” they might say (they would). Ideas to get around this usually involve some sort of defaulting. Defaulting is problematic because it’s only useful in the repl experience, and so you have to essentially split typechecker behavior between module and repl. It also means things behave one way in the repl, and a completely different way in modules is yet another way to confuse newcomers. Consequently, we make sure all literals have a straightforward concrete type that you can completely infer from the syntax. I do not think you will be able to make a case to convince all the maintainers to change this for all users, and essentially break records in the repl.

I’ll admit I’m partial to the idea of having dot-syntax for access though (and update to some lesser degree). I definitely feel this pain (unwrap, unwrap everywhere…). We often use unwrap, but also pattern matching. It’s hard to come up with a consistent way to deal with this that everyone likes. We could code-generate lenses for all of the fields and chuck them in a module, I guess, but I personally think it is just silly to go to this extent. I would really, really like some sort of easy dot syntax. The repl argument is hard to argue for this case since it’s already type-directed, and you would rarely get into ambiguity errors. Maybe with things like _.foo or \a -> a.foo, but these are already functions and don’t print in the repl. Dot-syntax for newtypes is one of our most requested features, and I think it’s a shame we don’t have a good solution to this. But I also think it is hard to argue that having a straightforward, unambiguous type for dot-syntax is a bad thing. Maybe something like Adding syntax for annotations on declarations would allow users to opt in to this at a module level.

3 Likes

I still haven’t formalized my idea for an alternative, but I might as well put it out there if there’s a possibility of consideration:

At the moment, all symbols (and fields) are publicly accessible as anyone can construct them, so you can never really have a record whose fields are locked away. Obviously we have discussed newtyping a record as one solution, but imagine a different approach: being able to construct new nominal fields

eg:


-- Not sure on the exact syntax, but doesn't really matter
newfield Person :: ( Name :: String, Age :: Int )

-- The types Name and Age are now introduced, with kind Symbol
-- I guess Person has kind Field?
-- A record must have both 'subfields' inside for it to have the field Person
-- Only one will result in a type error

-- ERROR: Inferred field person, but missing subfield Age
rec = { Name: "Joe" }

-- Works
-- We can mix normal fields
rec :: Record ( Person, other :: Unit )
rec = { Name: "John", Age: 20, other: unit }

rec.Name :: String

-- Unambiguous type inference, dotsyntax is still only defined on Records
_.Name :: forall r. Record ( Person | r ) -> String

-- Can define parameterized fields
newfield Tuple a b :: ( Fst :: a, Snd :: b )

-- Should be able to infer this
_ { Fst = 10 } :: forall r a b. Record ( Tuple a b | r ) -> Record (Tuple Int b | r )

-- Not sure how this would work in terms of introduced names
-- Technically introduces Person.Name, Person.Age etc?
-- I'd prefer these to be 'flat', in that a field is just a collection of other fields
newfield Combined :: ( Person, Tuple Int Int )

comb.Person.Age :: Int

The idea is that if this module doesn’t export these symbols, then people can’t poke around inside. I guess we’d implement it with Symbol() from JavaScript to create unique symbols avoid conflicts with normal symbols

I don’t think it solves the problem of creating class instances on Records (at least not without breaking existing record instances)

Of course if you are newtyping records because you want to represent some foreign object, then this is no use as the symbols are not normal string symbols

I kinda see this as a way of combining haskells records with purescripts (in that a module can own a symbol/field)

EDIT: I have no idea how multiple fields with different arguments works (Like multiple tuples). In my mind I could create a unique key per name like Tuple, but now I’d need a unique key for every possible tuple configuration

As far as overloaded syntax, we have never accepted a proposal for overloaded literals for a simple reason: it breaks the repl

This isn’t the only reason - for me, the more compelling reason not to accept an overloaded literals proposal is that it significantly hampers type inference. In fact, adding overloaded syntax for records like this would be a breaking change. Consider

example =
  [... a big expression involving references to fields of `opts` ...]
  where
  opts = { foo: 1, bar: true, baz: "baz" }

which works currently because a concrete type can be inferred for opts. With overloaded records this would no longer be the case; you’d get a NoInstanceFound error because the compiler won’t know what kind of record to use.

I’ll admit I’m partial to the idea of having dot-syntax for access though […] Maybe something like Adding syntax for annotations on declarations would allow users to opt in to this at a module level.

I’m not keen on this, for the same reason I’m not keen on language pragmas (I think this basically is a language pragma), which is that with n pragmas you have 2^n versions of the language, and as n increases you very quickly find that tooling (eg IDE plugins, formatters) just can’t keep up, and breaks with configurations other than the author’s personal preferred one. Of course this burden will be felt by people, too; it would be a pain to have to go back and check at the top of each module you might be working on to find out what record syntax means in that module.

1 Like

@natefaubion @hdgarrood

I don’t mean that named records should be implicit in any way. Also, we don’t necessarily need overloaded literals. For example, in Haskell or Frege we can write Person {name = “John”} to create a record. As long as this means creating both a record and a newtype wrapper in Purescript, we could use a different syntax to explicitly specify the record type for the literal, for example, Person : {name = “John”} or something else. Personally, I still think overloaded literals would be better, but that’s just a matter of syntactic preference.

Regarding the gains, I agree with you that newtypes are more general. However, while Person (Record (name :: String)) is two things (a wrapper around an anonymous record), Person (name :: String) is one thing (named record) that supports both dot syntax and defining type class instances without the need for wrapping / unwrapping (you mention yourself that there is some inconvenience with wrapping / unwrapping). @garyb explains that we cannot define arbitrary instances for records, as they would overlap. But this is not the case with named records, as instances for Person (name :: String) and Company (name :: String) are not going to overlap. So the idea is to get something a bit like what @ssadler is asking for here, one type that supports everything.

Regarding type safety, what I mean is that if one has a Person (Record (name :: String)), one can, for example, unwrap it and mistakenly re-wrap it as a Company (Record (name :: String)). If one has a Person (name :: String), the nominal Person part cannot be separated from it.

Regarding instances, I am not thinking about any novel instances that one could not define before using a newtype wrapper. I haven’t had the chance to think about the details, but conceptually one should be able to define any instances for named records that one can now define for newtype wrappers, and also derive generic instances that are now available for Record.

@jy14898
Regarding your idea, it looks interesting, but it solves a different problem, right?

As far as I understand, you are looking at ways to prevent fields from being publicly accessible, and an additional way to compose / extend records?
If this is correct, it would be interesting to see more in terms of how this relates to rows and how it compares to the existing ways of composing / extending records using rows.

@pkapustin It solves some of the same problems, for eg when you say you gain safety by having your custom type Person not match with Record, this gives you a way of having your custom Field not match with normal Record fields. The way I think about it, is if we implemented rows using PureScript (if we had the right additions like polykinds etc, and syntax wasnt an issue), then it’s just adding a new row constructor:

foreign import data RowField :: Symbol -> Type -> # Type -> # Type
foreign import data RowNil :: # Type

-- custom field constructor
foreign import data RowPerson :: # Type -> # Type

Indeed you could add this constructor today, but without support for the associated Symbols like Name etc, you would wouldn’t be able to use normal Record syntax. Symbols would work a little differently as now they have a Row constructor associated with them (normal string symbols go with RowField, custom ones go with their custom constructor). I guess that still doesn’t encode that a certain constructor requires all their symbols (2 way relation? Row -> Symbols, Symbol -> Row ?)

(this example also doesn’t cover how we deal with the equality of rows, where the order of the constructors doesn’t matter… possibly solvable with a class? I’m not saying we should implement it in purescript anyway, just as an example)

@jy14898
These complex fields from your examples, for example, Person, Tuple, Combined, they are essentially rows, right? Like, wouldn’t it be natural to say that we are composing our record by combining these rows? I am trying to understand why you would like to have a notion of a complex field, rather than just a row.

@natefaubion @hdgarrood
I would like to consider one more reason for why I think named records is a good idea.
In my project, I have a complex domain where it is relatively difficult to come up with names for things to begin with. With newtype wrappers, there will be pattern matching somewhere in the code. This means that I need to give names to both wrapped things and unwrapped things. They may also have different implementations for Eq, Ord and Show. This means that if I want to make it clear which one is wrapped and which one is unwrapped, i would have to choose two different names for each.
This is why I think named records can be a big improvement for the language, because then you only have one thing.

@natefaubion @hdgarrood
Also, today record semantics in Purescript and Haskell is different. You can do things in Purescript that you cannot do in Haskell, and vice versa. This is essentially what is discussed here.
With named records, Purescript records would be able to do everything Haskell records can, and more (because of extensibility).

I may be using terminology weirdly, specifically using the word row to mean something that isn’t of the form (symbol, type) etc. The goal is like you say, composable datatypes that are represented by records underneath. The reason to not use normal string symbols is so that a datatype can own a Symbol (in ES6 implementation), so that when we merge datatypes they’re guaranteed to not conflict (and also to allow hiding the types implementation etc).

Probably very easy to implement all of this without directly using records, and then use the overloaded record syntax to then allow access to the datatypes values. The point was that if we stay inside records, then we don’t have to overload record syntax (although in reality it isn’t that straight forward, and tbh there are issues with my original example like multiple Fst and Snds, which ones are paired together?)

I don’t use all the lens stuff but I assume they solve the same problems, composing types and allowing easy getters/setters?