Proposal: Guidelines for VTAs

JordanMartinez · January 2, 2024, 5:15pm

When we do the 0.16.0 ecosystem update, it’s my understanding that we’ll update libraries from a Proxy-based syntax to a Visible Type Application-based one. Below, I’m proposing a few guidelines to follow so as to minimize future breaking changes and keep the resulting APIs consistent across the libraries.

The outline for this post:

What counts as a breaking change for VTAs?
What order should type variables be in?
Which type variables get the @ treatment?
Naming conventions

First, it’s helpful to clarify what would be a breaking change when it comes to VTAs. Suppose we have the following example:

foo :: forall @a b @f g r. Class f => (a -> g r)  -> f a  -> g b  -> g r

The following changes to the above type signature would be a breaking change:

Dropping the @ in front of the a type var because foo @SomeType would now unify with f, not with a. Moreover, foo @_ @myF would no longer compile because there’s now only 1 VTA.
Adding an @ in front of the b type var because foo @aType @fType and foo @_ @fType would cause fType to unify with b, not f.
Swapping the position of @a with @f because foo @aType @fType would cause aType to unify with f and fType to unify with a.

The following changes would not be breaking changes:

Swapping the position of @a with b or @f with b. Because the first VTA would still apply to a and the second still applies to f, either of these changes has no noticeable effect.
Adding an @ in front of g because, while this increases the number of VTAs, it does not cause foo @aType, foo @_ @fType or foo @aType @fType to unify with the wrong variables. However, this would requires a minor version change because one would not be able to downgrade without code breaking.

In short, a breaking change happens when:

the number of VTAs decreases
the order of VTAs changes
an insertion between two existing VTAs occurs

A minor change occurs when:

a new VTA is added

Thus, there are two questions to ask:

Which type variables should get the @ treatment?
What should the order of the type variables be?"

Second, in answering question 2 above, I’d propose the following general guidelines for the order of type variables within a forall, whether they get the @ treatment or not:

type variables for higher-order types (e.g. f, g, h, etc.) should appear before first-order types (e.g. a, b, etc.)
type variables with type class constraints should appear earlier than those without
if two type variables are mentioned in type class constraints, the type variable that appears in an earlier type class constraints should appear earlier in the forall
prioritize what is likely the most common VTA usage of the function/value (i.e. shortest number of VTAs used with the most important ones appearing first).

As an example, we’ll use foo from above

-- Original definition (excluding `@` vars)
foo :: forall a b f g r. Class f => (a -> g r)  -> f a  -> g b  -> g r

-- Put higher order first
foo :: forall g f a b r. Class f => (a -> g r)  -> f a  -> g b  -> g r

-- put those with type classes first
foo :: forall f g a b r. Class f => (a -> g r)  -> f a  -> g b  -> g r

-- while `g` appears within the first argument, we don't put it first
-- because of the type class constraint on `f` which `g` doesn't have
-- so `g` takes precedence here. If `g` had a type class,
-- I would propose putting that type class before `f`'s type class
-- and putting `g` before `f` in the `forall`.
foo :: forall f g a b r. Class f => (a -> g r)  -> f a  -> g b  -> g r

Third, in answering Question 1 above, type variables get the @ treatment if

they have a type class constraint
one will often want/need to clarify which type with which a type variable unifies

Using foo as an example:

f gets the @ treatment because of the type class constraint
g, a, and r are judgment calls.
- a likely does not need the @ treatment because of its container f already getting it
- g likely does not because it appears twice, once in a function’s returned value and another as an argument
- r might be useful to add @ because it can serve as both a clarification and an assertion when row type errors aren’t always helpful

Fourth, how should type variables be named? At the present, most libraries use a single letter to name them: a-e for first-order types and f-h for higher-order types, m-n for monadic higher-order types, p for arrow-like types, and r for record types or row kinds.

Arguably, names in the present implementation do not matter because everything is positional. However, if we did support non-positional VTA syntax via @{ f: Array, r: { aRecord :: Int } }, then names would come into play here. So, unless such names are important for reading and understanding the code (e.g. type-level programming comes to mind), when doing this VTA update, I think any type variable names not using single-letter names should be renamed to single-letter names when possible. And I think this should apply only to type variables in functions and values, not type class heads.

Thoughts?

rhendric · January 3, 2024, 9:53am

Full for all of this.

Hmm. This doesn’t seem right to me. I think there’s too much emphasis here on constraints and I suspect that’s standing in for something that should actually be described as ambiguity.

One distinction I think would be useful is whether a variable occurs in any negative positions—variables that only occur in positive positions are much stronger candidates for VTAing than variables that occur in negative or mixed positions. I don’t know if that fully covers all the cases you have in mind that motivate the constraints-based guidelines; maybe we should hash some out.

Kinda think that this and the second section are the same thing—whatever principles we use to order VTAs are the same principles we should use to determine if something falls below some VTA-usefulness threshold.

on this; I think non-positional VTAs are too speculative at this point and don’t justify having a naming convention debate on top of things.

JordanMartinez · January 3, 2024, 5:03pm

Just curious, but do you think it’s useful to add such syntax? Or do you think it’s unnecessary?

Fair point.

Yeah, that’s definitely true. I approached my proposal from a type class constraint perspective.

Ah, that is a useful distinction.

If I were to think of examples of where we’d use VTAs, the following come to mind:

using type class members (e.g. map @Array ...), Since the class heads’ type vars already get the @ treatment, there’s not much to say here.
derived functions (i.e. sum from Data.Foldable)
Record’s get, set, modify, and other label-like things.
Halogen’s slots? I’m not sure whether that needs to stay as Proxy args or not
implementing a type class instance (e.g. Monoid for a Record) where one would currently use Proxy args to propagate type information

There’s also another way to think about this. In some cases of ambiguity, it may be better to define a new function whose implementation is the original function but with the @ treatment on specific type vars. Case in point: coerce. I may want to write coerceFrom @from and coerceTo @to rather than coerce @from and coerce @_ @to. The downside of this idea is library bloat from the duplicate functions, but it is one way to address such ambiguity.

VTAs could also be used to assert that things have specific types. I’m not sure how useful that is in practice, but I can imagine someone writing foo @{ a :: Int } when tracking down record-related errors.

rhendric · January 4, 2024, 1:09am

Potentially very useful! But also quite novel (for PureScript) to turn a name that acts like a local identifier into something that is part of a public interface. We don’t allow name-based currying with function parameters (though that could also be useful), nor do we do this with type parameters on type constructors (ditto). Some sort of integration with row labels could cut down on the novelty, but I expect that would be designed to be opt-in at the definition site, which would make any renaming of variables at this time irrelevant.

Well, consider Traversable:

traverse :: forall a b m. Applicative m => (a -> m b) -> t a -> m (t b)

Should this be forall @m @a @b? My gut says yes, all of these are candidates for VTAing and the order t m a b—i.e., type of the traversable, type of the traversal effect, input element type, output element type—feels most natural. In a different context, I might argue for a and b to remain invisible, out of concern that the order chosen might not be the best one. But this order feels clearly correct to me.

sum :: forall a f. Foldable f => Semiring a => f a -> a

I think if one can make the case that a function somehow ‘belongs to’ a single type class (despite not being a proper member), that the type arguments given to that type class should be visible and appear first, to be consistent with actual class members. So I would argue for forall @f @a for sum. (I include @a because it’s probably silly to have only one invisible type parameter on an interface that is unlikely to sprout more.)

Another example: MonadState’s get:

get :: forall m s. MonadState s m => m s

I want this to be forall @s @m, by the same principle. (Note that this conflicts with the proposed principle that higher-order variables come before Type-kinded variables.)

The good stuff!

get :: forall r r' l a. IsSymbol l => Cons l a r' r => Proxy l -> Record r -> a

So, if we don’t have at least forall @l, what are we even doing. After that, it gets murkier. We also need to start considering covering sets and type parameters that only appear in constraints (r', here). I’m going to start talking about the covering sets of a function as an extension of the concept of the covering sets of a class—a function’s covering sets are subsets of the function’s type parameters, computed using the usual covering set logic on the union of the dependencies induced by all of the function’s constraints. get’s covering sets come straight from Cons: {l a r'}, {l r}. We should, I think, prioritize specifying some complete covering set using an initial prefix of visible type parameters, but we don’t need to support specifying all of them—one is enough to disambiguate.

So I think I would argue for forall @l @r a r', as follows:

Anything that is currently a Proxy gets priority.
r' doesn’t appear outside of constraints and so I think we should avoid talking about it in VTAs as well; that suggest avoiding the covering set {l a r'} as our preferred set.
The remaining set is {l r}, so r gets the next visible slot after l.
Ordering a and r' is tricky (particularly if we want to be consistent in later examples), and I don’t think there’s a compelling reason to make them visible given that we’ve achieved a covering set, so leave them invisible.

set :: forall r1 r2 r l a b. IsSymbol l => Cons l a r r1 => Cons l b r r2 => Proxy l -> b -> Record r1 -> Record r2

Now the covering set story gets interesting. I believe the covering sets of set are {l a b r}, {l r1 r2}, {l b r1}, {l a r2} (one from each of the four ways to choose one fundep from each Cons). Again I think we should take as a given that l will be visible, and a seems like an awkward parameter to suggest providing to set (here’s the type of the thing you’re replacing!). The remaining covering sets to target are {l r1 r2} and {l b r1}, and of those, I think {l r1 r2} is slightly more compelling, as r2 occurs only in a positive position. So here I propose forall @l @r1 @r2 r a b (for r1 and r2, I think the principle of input-to-output ordering overrides the principle for positive-only parameters to come first).

modify :: forall r1 r2 r l a b. IsSymbol l => Cons l a r r1 => Cons l b r r2 => Proxy l -> (a -> b) -> Record r1 -> Record r2

This is almost the same as set, but now a appears in a positive position and is very much not an awkward thing to suggest! With the same covering sets, I would now prioritize a set including a and r2: forall @l @a @r2 r r1 b. Is it weird that this is so different from the visible parameters in set? I don’t know.

I’ve written a lot just now so I think I’ll come back to these later.

Oh, and to this:

I think we should consider this sparingly but not encourage it widely. Pairs like traverse/for come to mind—good for heavily used cases, bad to give every function a flipped version that could have one.

JordanMartinez · January 5, 2024, 9:30pm

As I was reading this, I also realized one advantage of positional arguments versus named ones: shadowed type variables:

class Foo a where             -- a1
  bar :: forall @a. a -> Int  -- a2

Ignoring the potential uselessness of the above function, bar @Int @String makes it obvious that the Foo Int instance is selected and the resulting bar function has type String -> Int. But how would the shadowed type variable name, a, be resolved if we were using named arguments (e.g. does the a in bar @{ a = Int } (or whatever syntax is used for named VTAs) type-apply Int to the classhead’s a or the class member’s a? One solution would be to promote a shadowed type variable warning to an error when it occurs within a named-VTAs context because the shadowing here is already somewhat confusing. However, I’m not sure what the logic behind that would be.

On a different note, if named VTAs syntax was supported, then wouldn’t both of the issues of 1) which type variables have support for VTA and 2) in what order do those type variables appear–wouldn’t the first be “solved” by making all type variables support VTAs (so long as the shadowing issue was resolved) and the second issue of “which order do the type variables appear” instead becomes an issue of “what is the name of the type variable”?

I guess another downside of named VTAs is that @Int is shorter than @{ typeVariable: Int} (or whatever that syntax would be, such as = instead of :).

And if named VTAs were supported, would one still want to support the current positional VTAs?

rhendric:

JordanMartinez:

derived functions (i.e. sum from Data.Foldable)
sum :: forall a f. Foldable f => Semiring a => f a -> a
I think if one can make the case that a function somehow ‘belongs to’ a single type class (despite not being a proper member), that the type arguments given to that type class should be visible and appear first, to be consistent with actual class members. So I would argue for forall @f @a for sum. (I include @a because it’s probably silly to have only one invisible type parameter on an interface that is unlikely to sprout more.)

I think that works for sum because there’s only one type class. But for things like oneOf, which uses two type classes, which type class takes precedence?

Given your above examples… it feels like adding VTA support needs to be done on a case-by-case basis as the things to consider have stronger or lesser weights based on how and how often the resulting entity is used in code.

Of the things to consider, I think we have the following. + indicates an argument for adding support for VTA; - indicates not supporting; c indicates it depends on a case-by-case basis:

+ is the type variable constrained by some type class?
c is the type variable higher-order or not?
+ is the type variable currently supported by a Proxy arg?
c does the type variable appear in a positive, negative, or mixed position?
+ is the type variable included in a ‘covering set’?
- Is a type variable’s type likely inferrable by one of the function’s arguments? (e.g. Record.set @"label" value; value’s type will likely be inferrable most of the time).

As for order, we have these considerations

What is the current order of the arguments?
Is the value/function a “derived member” of one type class? If so, the type class’ class head type variables should probably appear first.
Which type variable will more often resolve type inference problems? (I’m thinking of the MonadState’s get example above where the m is often known but the s is not.)

ptrfrncsmrph · January 6, 2024, 4:17am

I was recently trying out converting purescript-record to VTA and a thing that might throw a wrench in single-letter naming convention here is label-like types (which are ripe for VTA treatment), which when singular would be l, but more than 1 (as in Record.rename) would follow l1, l2, … (in practice, seems like it might be specific to the rename function that uniquely has two symbol type variables).

edwardw · January 6, 2024, 6:37am

This might be specific to the project, but I found myself using even longer and more descriptive VTA names. E.g., I agree r is often used to mean rest of the record, but here tag is meant to refer to any single tag name of a predefined record.

An even more extreme example is here and an interesting use case of typelevel-regex. I believe a long name in this case is pretty legit?

JordanMartinez · January 6, 2024, 3:14pm

I think that’s a scenario where using single-letter names doesn’t make sense. However, I don’t think that’s the common scenario. But even so, that only matters if we use named VTAs (not supported at the moment) rather than the currently-supported positional VTAs.

JordanMartinez · January 6, 2024, 3:15pm

The focus of these guidelines are primarily for core, contrib, node, and web libraries with more of the focus on core and contrib libraries. Since that’s a 3rd-party library, you can do whatever you want.

garyb · January 8, 2024, 12:51pm

I’d agree with this, and I think the things you list after this are all sensible. I think trying to codify much more than that is going to be difficult because it really is going to be driven by how a particular function is most often used and the context it’s used within.

One thing I would suggest is we add them fairly conservatively at first. If we initially insert them in the places we know for sure will benefit (proxies, variables that only appear in positive positions that aren’t determined by fundeps, etc) that will at least give an uncontroversial initial ordering, and we can consider adding more later when other semi-common occurrences of needing explicit typing arise.

JordanMartinez · January 8, 2024, 3:05pm

Should we start accepting PRs for adding VTAs to the ecosystem then, starting with prelude and going up from there? And by accepting, I mean, wanting people to open such PRs primarily to start discussions for each repo, but not for actually merging them. The reason for not merging them is because releasing all core/contrib libraries to include such a feature is a lot of work I’d rather leave off until the 0.16.0 ecosystem update. However, those writing applications (i.e. things not being published to the registry) could update their package set to reference these not-yet-published libraries and we could get early feedback on which type variables may need to be added or reordered if our original attempt didn’t fully capture all it needed to.