Adding syntax for annotations on declarations

I’d argue that it makes the language smaller because it separates compiler features from language features.

  • Deriving is a compiler features to stamp out code for you. It doesn’t affect language semantics since you could just write all those instances by hand.
  • Newtype is a compiler feature that affects code generation. It doesn’t affect language semantics since the same code will continue to terminate or not terminate.
  • Roles are a compiler feature that affect resolution of a compiler solved typeclass. It doesn’t affect language semantics because zero-cost coercions are not necessary for program soundness.

I think it’s desirable to keep language semantics small (including syntax), but I think there will always be a desire to introduce compiler features that make development faster and easier. Syntactic changes incur a heavy burden because they necessarily break any syntactic tooling, even for small contextual keywords. I think it’s desirable, especially if users want a 1.0 specification at some point, to have some way to introduce and iterate compiler features while keeping the language stable. AFAIK no one on the core team wants to introduce pragmas for any features. I’m not saying annotations are necessarily the solution, but I think they have the potential to fill that need.

I agree that adding more syntax for each change is not ideal, but it’s not clear to me that adding a language of annotations is a better solution, because it relies on the compiler understanding every possible annotation, so it is the opposite of extensible. Having a unified syntax suggests that these things are somehow the same as each other, but they are not. You could build it in such a way that the language of annotations is open, with a core set of magic annotations, but that actually seems worse to me than what we have now, since at least something which invokes magic in the compiler is distinguished by the “derive” keyword (I don’t count auto-solved instances as the same sort of magic as e.g. derived generic or newtype instances).

I think a lot of this is up to taste.

because it relies on the compiler understanding every possible annotation, so it is the opposite of extensible

I don’t see how it follows that this is the “opposite” of anything. Nothing has to be anything. It doesn’t have to be extensible, but we do try to encourage users to try implementing things as external tooling, and I can see how extensible annotations might help with that.

Having a unified syntax suggests that these things are somehow the same as each other, but they are not.

They are related by being compiler directives. What do you see as it suggesting? The nice thing about using the type language is that they have lexical rules (you need to import them from somewhere) and need to kind check. Maybe different declarations types require different kinds (DeclarationAnnotation vs InstanceAnnotation). If they are extensible, then you can trace it to a support library for some tool.

but that actually seems worse to me than what we have now, since at least something which invokes magic in the compiler is distinguished by the “derive” keyword

I think compiler magic is sufficiently made clear by virtue of being in Prim.

1 Like

I actually think that by disallowing overlaps altogether, it’s possible to always generate a unique, deterministic, and user friendly name. I would like to remove instance names altogether, and a compiler directive would be a decent alternative for someone wanting a specific name.

4 Likes

I’m not sure about the argument that this would not be a considered language feature, but rather a compiler feature. If, in practice, in order to work properly, all real-world code has to be compiled by the purescript/purescript compiler, or else implement all the same extensions it does in the same way, what is a language specification giving us?

… or in this case i’d say what you’re doing is abstracting a whole category of syntactic elements into a wholly new syntax (ie there’s nothing like this in Haskell for example, right?)

I would argue that Haskell does have an equivalent: it is a block-comment with # symbols at each end, like {-# SOMETHING IN HERE #-}. These are used for language pragmas, marking instances as overlappable or incoherent, inlining, rewrite rules, build tool flags, and so on.

what’s the fundamental problem we’re trying to solve here?

In a sentence, as I see it: to allow us to add features in a non-breaking way, and (in the longer term) to reduce the size of the language syntax.

For example, a more concise syntax for multiple deriving clauses, which has been suggested before.

Unless I’m misreading, providing a more concise syntax for multiple deriving clauses in some way other than what is proposed here essentially means adding new syntax for a very specific single case, which is worse, surely? To me, the issue of wanting to write deriving clauses with less noise is another good argument for this proposal. I would quite like to be able to put deriving clauses on the data type and not needing to write a separate deriving declaration at all:

[ Derive [ Eq, Ord, Enum, ToJSON `Via` SomeOtherDataType ] ]
data Foo a = [...]

(Of course this would also need a bit of work inside the compiler to deduce what the relevant instances should actually look like once written out in full, eg what their contexts should be, but GHC shows that this is doable.)

but maybe what I’m saying is - how many more features do we want/need? This is of course a hard question to answer, because there is no roadmap, …

I don’t think it’s hard because there is no roadmap, it’s a hard question to answer because the question itself is the hard part of coming up with a roadmap. Of course we don’t know that annotations are a good solution for all new problems we’ll encounter which require new syntax, but that’s an impossibly high bar. The answer to adding a feature should surely be “sure, but only if there’s a strong case for it, and the problem can’t be solved in the language as things stand now; use your judgement”, not “no more ever”. I think we do have a good case right now for annotations in that there are a number of existing features and one planned feature (i.e. roles) which would benefit from it, as well as inlining (which I don’t think can be described as “planned”, but would probably be good to do eventually) and maybe other performance-related annotations too, like rewrite rules.

I think it’s also worth noting that this feature (or something very similar) already exists in a number of other languages. Off the top of my head: Haskell, Rust, Java, and pretty much all of the .NET languages (C#, F#, etc). We should definitely look closely at how these languages have each implemented annotations and what pitfalls they have (if any) before deciding that we are going to add this, but the fact that this already is a fairly well-trodden path means that I’m less averse to this feature addition than I would be otherwise.

5 Likes

Annotations are a language feature. I mean to say that the things enumerated in the post are not language features, because they do not affect the semantics of the language (syntax, type system, evaluation). They are compiler features which let you tell the compiler how it should code generate things or tune magic classes. This is distinct from something like polykinds, which requires the syntax and type system to be extended. Other language features would be constraint kinds and existentials for the same reasons. I think stability should mean language stability (we should not change the language), not that we should freeze compiler features. In order to do that, I think it’s useful to clearly separate them. There’s only one PureScript compiler, sure.

3 Likes

I’m confident we can come up with a way of adding module level annotations. You can do this in C# with attribute targets, and in Rust with inner attributes.

2 Likes

I just have to say that the longer I think about this, the more I like the idea. I think it is a really neat solution to simplify the core language and separate all the additional help from the compiler (generating classes, doing automatic coercions, inlining functions, optimising data type representations, …) in two different “regions” of the language.

The core PureScript language would just consist of (higher order) functions, algebraic data types, pattern matching, and records. These features are really easy to explain to beginners. They can skip all the annotation stuff because it is additional!

Later on, when becoming a pro: Want to optimise a data type with one constructor and one argument? Write a representation annotation! Fed up with writing your own instances? Write a derive annotation! Etc.

Rust’s attributes should be a good starting point. They already do things like deriving, data representation, and conditional compilation.

As for syntax, I think attributes should be recognisable as such by preceding a list of them by a symbol. Rust uses #[...], Swift and Java use @ in front of a single attribute. As @ is already a reserved symbol in PureScript, why not:

@[ layout Unpack, roles [Nominal], derive [Eq, Ord], deriveVia Bar [Show] ]
data Foo a = ...

@[ deprecated "Use `bar` instead" ]
foo :: a -> Foo a
foo = ...

Also, because I think that attributes are more like functions, I’d write them with a lower case letter. Less noisy I think.

1 Like

If annotations use the type language, then uppercase is a necessity. I would prefer to not introduce a declaration form for this.

I understand, but is it a language on the type level? Or is it a new language floating in between the type and function level? I tend to think the later, but maybe you can persuade me with some examples. Are annotations just types with an Annotation kind? Will users be able to define their own annotations in such a way? That would be nice.

Yes, I am proposing using the type-level language and operating on kind Annotation. Prerequisites would probably be PolyKinds, DataKinds, TypeInType, and ideally type-level Arrays. There would be a Prim.Annotation module with the Annotation type, and built-in annotations. PureScript already supports open kinds via foreign imports, so one would be able to declare a new annotation like so:

data InlineDirective = Always | Advantageous

foreign import data Inline :: InlineDirective -> Annotation

[Inline Always]
foo :: Int -> String
foo = ...

Sounds like a solid plan. Does your work on poly kinds include data kinds?

About basic types at the type level: do you think it is feasible to add arrays to type level computations? I only know of languages which use lists at the type level, they are theoretically sound, don’t know that about arrays. And I’m wondering if we’ll need type level natural numbers to complete the collection of basic types at the type level. I can imagine there would be some annotation needing numbers as arguments and it would make packages simulating Peano numbers obsolete.

Next, I really thing we should merge the syntaxes of foreign and normal data decelerations:

-- currently, you can only write
foreign import data A :: Kind1 -> Kind2
-- or
data B (a :: Kind1) = ...
-- but not
data C (a :: Kind1) :: Kind2
-- which I think is way clearer

You can already write type level lists. With data type promotion, it seems odd to me to support [] syntax at the type level and it be something different, but that’s just me. I would probably treat it opaquely like row types, with a conversion to and from lists for induction. Type level arrays aren’t a requirement in any case. It’s possible to support a [] syntax only for top-level annotations, but I do think there are cases where it would be desirable to have [] syntax within the parameters to an annotation, which would require some sort of type-level collection.

Yes, you’re right, we’d need some kind of type level collection and as we use [] to denote arrays on the term level, we should do the same on the type level. I really like this consistency of primitives usable on both term and type level.

But then, what to do with type level numbers? I think naturals would be the most convenient to have on the type level, as they have an induction principle. Will probably speed up compilation when they are backed in as well. But their denotation will clash with term level integers. Besides that, we currently do not have naturals on the term level.

I’d be in favour adding naturals to the language. I currently miss a way to denote positive integers at the type level which have a convenient term level denotation (using a newtype and writing toNat 3 all the time is annoying). We could use the same trick as Dhall: 1 :: Nat, +1 :: Int.

My personal desire, and this is just pie-in-the-sky-but-might-be-nice-extremely-long-term, would be to support lifting all core terms to the type level as is, so that they could be reflected/reified, with goal of supporting compile-time evaluation of some kind. I doubt other maintainers agree :laughing:. But I think if we want different numeric types, we should keep the possibility of supporting them at both the type and term level, and they should have consistent syntax. I don’t really want Int literals to mean something else at the type level.

2 Likes

I would see the benefit of having an extensible annotation syntax. I’d argue for a row type where the labels denote the annotations the compiler currently understands. The ones the compiler doesn’t know about would be dropped for forward compatibility. With an optional warning flag.

@(layout :: Unpack, roles :: [Nominal], derive :: [Eq, Ord], deriveVia :: ( via :: Bar, targets :: [Show] ) ]
data Foo a = ...

@( deprecated :: "Use `bar` instead" )
foo :: a -> Foo a
foo = ...

I’m wondering if it’d be possible to remove the need for typelevel lists, so it’d be possible to implement faster. roles could probably be changed to a record via

@(roles :: (nominal :: True) )

But I don’t see such an option for derive, which would imo be the main driver here. We could hack something akin to

@( derive :: (eq: Eq, show: Show) )

And ignore the labels. Ugly, but should work. Upgrading once the “real” [] syntax shows up should be possible with some compiler magic.

4 Likes

I don’t think adding syntax and kind-checker support for arrays, booleans, etc would be particularly difficult. The only thing that’s difficult is if we expect solver support for operations on type-level arrays, which is not necessary for annotations.

2 Likes

Really like your thinking! Rows for the win!

1 Like

GHC has accepted a very similar proposal for “modifiers”, with similar motivating examples.

6 Likes

If I understand correctly, they are ment to modify the next token in a type signature or type declaration. This is totally useful for the multiplicity and matchability of arrows, or for roles of type variables. But how does it fit, for example, a deprecation annotation? Or a derive directive? Both are annotations on declarations not on a particular part of a type…

1 Like

I don’t think it’s explicitly only in a type declaration.

Modifiers somehow change the meaning of the next token or of the construct they appear in. The blah is parsed and renamed as a type.

My interpretation of that is that it can be used anywhere, while using the type language as the “syntax”.