Adding syntax for annotations on declarations

I’ve increasingly found myself wanting a compiler supported annotations on declarations. There are a number of ad-hoc mechanism that have specialized syntax which I think could be subsumed by annotations.

Newtype
Let’s face it, newtype is a hideous keyword, and it only exists to enable a (guaranteed) compiler optimization. PureScript’s strictness means there’s no runtime distinction between a newtype and data aside from boxing (unlike Haskell, which has subtleties wrt bottom). I would like to see this as an annotation over data.

Deriving
We have deriving and deriving newtype. If we add more strategies in the future we need to extend the syntax of the language for each one. I would like deriving strategies to be annotations on instance declarations.

Deprecations
We have the Deprecated typeclass, but this only works on value declarations. What if we wanted to deprecate data types (perhaps SProxy in favor of a polykinded Proxy) or type synonyms (which may no longer be relevant)? I would like to see warnings and deprecations as annotations.

Roles
There’s a PR for safe coercions, which requires adding a new declaration type for roles, and their various keywords. I would like to see roles as annotations on data declarations.

Inlining Directives
We often need to help the compiler along with optimizations if we want them to always fire. Inlining directives can be annotations on declarations.

And I’m sure many more (backend specific configurations?). I think it would be nice to be able to support these in an extensible way that doesn’t require us to extend the syntax of the language with new keywords.

One idea is reusing the type language. We have compiler solved “magic” typeclasses as a way to extend solving, and we could do something similar with annotations. Annotations could be a specific kind with types in Prim.Annotation. We could support array literals in types:

import Prim.Annotation (Deriving, Newtype, Roles, Nominal, Deprecated)

[Newtype, Roles [Nominal], Deprecated]
data Foo a = Foo a

[Deriving Newtype]
instance eqFoo :: Eq (Foo a)

Or perhaps via rows

import Prim.Annotation (Newtype, Nominal, Deprecated)

(rep :: Newtype, roles :: [Nominal], warn :: Deprecated)
data Foo a = Foo a

(deriving :: Newtype)
instance eqFoo :: Eq (Foo a)

I’m sure there are other alternatives to this. What do y’all think?

20 Likes

By being types, they can be added to a Generic spine as metadata so typeclasses can inspect them.

I agree that we should do something like this, as you say it’s less than ideal that these kinds of additions force us to extend the language syntax.

Can you give an example of how it might look for these annotations to be consumed via type classes? I am not sure if Generic is the correct place to put these, since presumably we want to allow annotation on most declarations, whereas lots of declarations either shouldn’t have Generic instances (such as Map or NonEmptyArray), or can’t have them at all (e.g. declarations at the term level).

Is it necessary or indeed desirable to make this information available for consumption within the language?

No, it’s not necessary. Primarily I want to reduce the syntax for all these knobs. It just might be interesting to think about what opportunities you have when using types, since we can dispatch and stamp out code based on types. Certainly all the things in the post affect internal compiler behavior, and I’m not sure how useful they would be to introspect in user code.

I was just thinking that since they are types, they can just be inserted in the Generic spine, like how GHC fills in metadata for data types and fields. If you could write stuff that dispatched based on annotations, it’s not clear to me what you’d get with only the annotations and no structural casing like Generic.

1 Like

I like this idea and I love the idea of reducing syntax - or in this case i’d say what you’re doing is abstracting a whole category of syntactic elements into a wholly new syntax (ie there’s nothing like this in Haskell for example, right?)

Apropos of this, i think that an annotation array of Effects on functions, usable by tooling but not enforced by compiler would have been a better choice than the old Row Effects, but that ship has sailed and i’m quite happy with the new undifferentiated Effects.

I agree that adding more syntax for each change is not ideal, but it’s not clear to me that adding a language of annotations is a better solution, because it relies on the compiler understanding every possible annotation, so it is the opposite of extensible. Having a unified syntax suggests that these things are somehow the same as each other, but they are not. You could build it in such a way that the language of annotations is open, with a core set of magic annotations, but that actually seems worse to me than what we have now, since at least something which invokes magic in the compiler is distinguished by the “derive” keyword (I don’t count auto-solved instances as the same sort of magic as e.g. derived generic or newtype instances).

As for surfacing more information in Generic instances, I think that would possibly be a good addition, but it doesn’t require annotations. For example, we could start by providing the type constructor name via the Generic instance.

For me, while this is a small problem, it is only small, and not worth such large breaking syntax changes.

Edit: what’s the fundamental problem we’re trying to solve here? Maybe there are other solutions, or multiple solutions for smaller individual problems. For example, a more concise syntax for multiple deriving clauses, which has been suggested before.

Edit: I’d also like to say that while roles, in particular, need new syntax, I’m not even sure they are a good addition to the language if we’re trying to keep the language small. Having run into problems with them in production code lately, I’m less convinced that they are a good solution for the problem they solve. I know roles are just one feature, but maybe what I’m saying is - how many more features do we want/need? This is of course a hard question to answer, because there is no roadmap, but if we don’t know, then we also don’t know that annotations are a good solution for all new problems we’ll encounter which require new syntax. At some point, I think it’s okay to say “these are all the features, and new things can be solved with a combination of separate languages/code generation/other external solution instead”.

1 Like

eqFoo from:

instance eqFoo :: Eq (Foo a)

might be considered as annotation too.


Edit: Also dependencies can be considered as annotations too:

module Foo (foo, bar) where
type Deps = (deps :: NodeJS ("underscore.js" :: "1.x -3.x"))

(Deps)
foreing import data Foo

foreign import foo :: Effect Foo

(Deps)
foreign import foo :: Effect Unit

compiler or some tool can then be generating package.json for you based on used modules,functions,values.


Edit: using annotations like this means you can’t add annotation to a module (deprecation of whole module for example).

I’d argue that it makes the language smaller because it separates compiler features from language features.

  • Deriving is a compiler features to stamp out code for you. It doesn’t affect language semantics since you could just write all those instances by hand.
  • Newtype is a compiler feature that affects code generation. It doesn’t affect language semantics since the same code will continue to terminate or not terminate.
  • Roles are a compiler feature that affect resolution of a compiler solved typeclass. It doesn’t affect language semantics because zero-cost coercions are not necessary for program soundness.

I think it’s desirable to keep language semantics small (including syntax), but I think there will always be a desire to introduce compiler features that make development faster and easier. Syntactic changes incur a heavy burden because they necessarily break any syntactic tooling, even for small contextual keywords. I think it’s desirable, especially if users want a 1.0 specification at some point, to have some way to introduce and iterate compiler features while keeping the language stable. AFAIK no one on the core team wants to introduce pragmas for any features. I’m not saying annotations are necessarily the solution, but I think they have the potential to fill that need.

I agree that adding more syntax for each change is not ideal, but it’s not clear to me that adding a language of annotations is a better solution, because it relies on the compiler understanding every possible annotation, so it is the opposite of extensible. Having a unified syntax suggests that these things are somehow the same as each other, but they are not. You could build it in such a way that the language of annotations is open, with a core set of magic annotations, but that actually seems worse to me than what we have now, since at least something which invokes magic in the compiler is distinguished by the “derive” keyword (I don’t count auto-solved instances as the same sort of magic as e.g. derived generic or newtype instances).

I think a lot of this is up to taste.

because it relies on the compiler understanding every possible annotation, so it is the opposite of extensible

I don’t see how it follows that this is the “opposite” of anything. Nothing has to be anything. It doesn’t have to be extensible, but we do try to encourage users to try implementing things as external tooling, and I can see how extensible annotations might help with that.

Having a unified syntax suggests that these things are somehow the same as each other, but they are not.

They are related by being compiler directives. What do you see as it suggesting? The nice thing about using the type language is that they have lexical rules (you need to import them from somewhere) and need to kind check. Maybe different declarations types require different kinds (DeclarationAnnotation vs InstanceAnnotation). If they are extensible, then you can trace it to a support library for some tool.

but that actually seems worse to me than what we have now, since at least something which invokes magic in the compiler is distinguished by the “derive” keyword

I think compiler magic is sufficiently made clear by virtue of being in Prim.

1 Like

I actually think that by disallowing overlaps altogether, it’s possible to always generate a unique, deterministic, and user friendly name. I would like to remove instance names altogether, and a compiler directive would be a decent alternative for someone wanting a specific name.

4 Likes

I’m not sure about the argument that this would not be a considered language feature, but rather a compiler feature. If, in practice, in order to work properly, all real-world code has to be compiled by the purescript/purescript compiler, or else implement all the same extensions it does in the same way, what is a language specification giving us?

… or in this case i’d say what you’re doing is abstracting a whole category of syntactic elements into a wholly new syntax (ie there’s nothing like this in Haskell for example, right?)

I would argue that Haskell does have an equivalent: it is a block-comment with # symbols at each end, like {-# SOMETHING IN HERE #-}. These are used for language pragmas, marking instances as overlappable or incoherent, inlining, rewrite rules, build tool flags, and so on.

what’s the fundamental problem we’re trying to solve here?

In a sentence, as I see it: to allow us to add features in a non-breaking way, and (in the longer term) to reduce the size of the language syntax.

For example, a more concise syntax for multiple deriving clauses, which has been suggested before.

Unless I’m misreading, providing a more concise syntax for multiple deriving clauses in some way other than what is proposed here essentially means adding new syntax for a very specific single case, which is worse, surely? To me, the issue of wanting to write deriving clauses with less noise is another good argument for this proposal. I would quite like to be able to put deriving clauses on the data type and not needing to write a separate deriving declaration at all:

[ Derive [ Eq, Ord, Enum, ToJSON `Via` SomeOtherDataType ] ]
data Foo a = [...]

(Of course this would also need a bit of work inside the compiler to deduce what the relevant instances should actually look like once written out in full, eg what their contexts should be, but GHC shows that this is doable.)

but maybe what I’m saying is - how many more features do we want/need? This is of course a hard question to answer, because there is no roadmap, …

I don’t think it’s hard because there is no roadmap, it’s a hard question to answer because the question itself is the hard part of coming up with a roadmap. Of course we don’t know that annotations are a good solution for all new problems we’ll encounter which require new syntax, but that’s an impossibly high bar. The answer to adding a feature should surely be “sure, but only if there’s a strong case for it, and the problem can’t be solved in the language as things stand now; use your judgement”, not “no more ever”. I think we do have a good case right now for annotations in that there are a number of existing features and one planned feature (i.e. roles) which would benefit from it, as well as inlining (which I don’t think can be described as “planned”, but would probably be good to do eventually) and maybe other performance-related annotations too, like rewrite rules.

I think it’s also worth noting that this feature (or something very similar) already exists in a number of other languages. Off the top of my head: Haskell, Rust, Java, and pretty much all of the .NET languages (C#, F#, etc). We should definitely look closely at how these languages have each implemented annotations and what pitfalls they have (if any) before deciding that we are going to add this, but the fact that this already is a fairly well-trodden path means that I’m less averse to this feature addition than I would be otherwise.

5 Likes

Annotations are a language feature. I mean to say that the things enumerated in the post are not language features, because they do not affect the semantics of the language (syntax, type system, evaluation). They are compiler features which let you tell the compiler how it should code generate things or tune magic classes. This is distinct from something like polykinds, which requires the syntax and type system to be extended. Other language features would be constraint kinds and existentials for the same reasons. I think stability should mean language stability (we should not change the language), not that we should freeze compiler features. In order to do that, I think it’s useful to clearly separate them. There’s only one PureScript compiler, sure.

3 Likes

I’m confident we can come up with a way of adding module level annotations. You can do this in C# with attribute targets, and in Rust with inner attributes.

2 Likes

I just have to say that the longer I think about this, the more I like the idea. I think it is a really neat solution to simplify the core language and separate all the additional help from the compiler (generating classes, doing automatic coercions, inlining functions, optimising data type representations, …) in two different “regions” of the language.

The core PureScript language would just consist of (higher order) functions, algebraic data types, pattern matching, and records. These features are really easy to explain to beginners. They can skip all the annotation stuff because it is additional!

Later on, when becoming a pro: Want to optimise a data type with one constructor and one argument? Write a representation annotation! Fed up with writing your own instances? Write a derive annotation! Etc.

Rust’s attributes should be a good starting point. They already do things like deriving, data representation, and conditional compilation.

As for syntax, I think attributes should be recognisable as such by preceding a list of them by a symbol. Rust uses #[...], Swift and Java use @ in front of a single attribute. As @ is already a reserved symbol in PureScript, why not:

@[ layout Unpack, roles [Nominal], derive [Eq, Ord], deriveVia Bar [Show] ]
data Foo a = ...

@[ deprecated "Use `bar` instead" ]
foo :: a -> Foo a
foo = ...

Also, because I think that attributes are more like functions, I’d write them with a lower case letter. Less noisy I think.

1 Like

If annotations use the type language, then uppercase is a necessity. I would prefer to not introduce a declaration form for this.

I understand, but is it a language on the type level? Or is it a new language floating in between the type and function level? I tend to think the later, but maybe you can persuade me with some examples. Are annotations just types with an Annotation kind? Will users be able to define their own annotations in such a way? That would be nice.

Yes, I am proposing using the type-level language and operating on kind Annotation. Prerequisites would probably be PolyKinds, DataKinds, TypeInType, and ideally type-level Arrays. There would be a Prim.Annotation module with the Annotation type, and built-in annotations. PureScript already supports open kinds via foreign imports, so one would be able to declare a new annotation like so:

data InlineDirective = Always | Advantageous

foreign import data Inline :: InlineDirective -> Annotation

[Inline Always]
foo :: Int -> String
foo = ...

Sounds like a solid plan. Does your work on poly kinds include data kinds?

About basic types at the type level: do you think it is feasible to add arrays to type level computations? I only know of languages which use lists at the type level, they are theoretically sound, don’t know that about arrays. And I’m wondering if we’ll need type level natural numbers to complete the collection of basic types at the type level. I can imagine there would be some annotation needing numbers as arguments and it would make packages simulating Peano numbers obsolete.

Next, I really thing we should merge the syntaxes of foreign and normal data decelerations:

-- currently, you can only write
foreign import data A :: Kind1 -> Kind2
-- or
data B (a :: Kind1) = ...
-- but not
data C (a :: Kind1) :: Kind2
-- which I think is way clearer

You can already write type level lists. With data type promotion, it seems odd to me to support [] syntax at the type level and it be something different, but that’s just me. I would probably treat it opaquely like row types, with a conversion to and from lists for induction. Type level arrays aren’t a requirement in any case. It’s possible to support a [] syntax only for top-level annotations, but I do think there are cases where it would be desirable to have [] syntax within the parameters to an annotation, which would require some sort of type-level collection.

Yes, you’re right, we’d need some kind of type level collection and as we use [] to denote arrays on the term level, we should do the same on the type level. I really like this consistency of primitives usable on both term and type level.

But then, what to do with type level numbers? I think naturals would be the most convenient to have on the type level, as they have an induction principle. Will probably speed up compilation when they are backed in as well. But their denotation will clash with term level integers. Besides that, we currently do not have naturals on the term level.

I’d be in favour adding naturals to the language. I currently miss a way to denote positive integers at the type level which have a convenient term level denotation (using a newtype and writing toNat 3 all the time is annoying). We could use the same trick as Dhall: 1 :: Nat, +1 :: Int.