This post aggregates papers which are useful for understanding the PureScript type system.
Note: while these papers inspired or relate to PureScript type system features, they do not necessarily correspond directly to its implementation.
This is a wiki post: if you notice features or papers missing from this list, feel free to add them!
I was also interested in this, so we can link to them in the PureScript documentation. I asked a similar question in the purescript/documentation project: https://github.com/purescript/documentation/issues/180
Excellent, thank you for the link @chexxor – I was thinking of going the other way, gathering and then adding these to PureScript documentation, but I should have expected you were already on this
I think the way to go about answering this question is to start from the existing things in PureScript and then ask which paper(s) it is based on or referenced. Without this, the question is too ambiguous to answer. The easiest way to organize this would be a table with rows being PS features and columns being “Feature Name” and “Papers Referenced”. If like this, it’s easy to see the missing paper citations.
Organization for this could occur on this thread or in the GitHub project issue I referenced - it likely won’t matter much.
@chexxor I’ve added a few links I was able to discover to the table. I’ll keep an eye out for more.
I haven’t taken the time to write down all the paper references in this meetup talk about the PureScript compiler, but I’ll link to it from here for now in case someone else gets to it before I do.
update: I added the papers referenced by Phil’s comment on that video. If someone watches the video and sees a paper referenced that isn’t in the comment, please add it to this list in Thomas’ wiki post.
I’ve updated the post with Let Should Not Be Generalised, which is the inspiration for the PureScript compiler intentionally not inferring the most general possible type signature within
I just want to point out that many of the “Direct sources” here were never actually used in the implementation directly. The main example is “Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism”, where many of the ideas in the paper (using a separate instantiation judgment, for example) inspired the implementation, but the key idea (the idea of using an ordered context) wasn’t used at all. The same goes for “Extensible records with scoped labels”, where the key idea was not used, but many of the ideas improved my understanding of the problem domain at the time.
I would say that all of these are great resources for someone trying to understand the type checker code, but it’s wrong to say that the type checker is based on any one in particular.
Hey @paf31, thanks for weighing in! I’m happy to adjust this post to remove any suggestion of “direct” vs “indirect” sources and instead focus on “these are all papers which will help you understand how and why the type checker works as it does” without any one being a literal, direct mapping to the implementation.
It’s a wiki post, by the way, so if you have any other papers you would suggest adding I’d love to hear about them (or you can add them yourself as you see fit, and make any adjustments you think would improve accuracy).
I’m curious why. Was it just that you had already written a bunch of code using skolemization? I’ve been neck-deep in all of this working on polykinds, and I’ve had to deal with the discrepancy here and there. The style I’ve noticed is that you’ve done more in terms of rewriting the AST rather than putting it into the context. The advantage being that it’s less stateful, but consequently one does a crap-load of rewriting all over the place. I’ve found that you have to keep a fair bit of contextual/scope state anyway though, so I’m not convinced that it’s advantageous in practice.
I just felt like I understood skolemization better as a technique, having read things like the HMF paper already, and I could more easily see how it would fit into the general approach with the other type system features. There’s definitely a lot of rewriting going on, but maybe that could be avoided in other ways, by keeping an substitution separate from the types. That said, I wouldn’t be opposed to changing the approach to something context-based, assuming it was appropriately tested of course.
I think my primary motivation for it would just be because all the recent papers I’ve read have converged on the ordered context approach for their algorithmic specifications, as it’s easily extended with other forms of constraints besides unknowns and their substitutions, and also extends to dependent types. It seems to be “the way forward”.
As I say, I’m not fundamentally opposed, but here’s a few thoughts:
- The difficulty is not in implementing rank N, but implementing the combination of rank N + type classes + row polymorphism. It took several years to iron out all of the corner cases in the current implementation, so when I say “appropriately tested”, I think it’s perhaps easy to underestimate what that might involve.
- We can also extend the constraint language using something like HM(X) which might fit in alongside the current code more naturally.
- How do the error messages compare?
- Does PureScript really need dependent types?
- The current code is the product of a few years of experimentation, and it shows. If I had the time to go back and reimplement it, there’s probably a few things I would do differently, and the code structure could certainly be improved a lot. But I wonder whether the readability of things could be improved enough via refactoring.
- Will this delay a 1.0 release?
I should be clear, I don’t mean this as a direct criticism of the implementation. Sorry if it sounded like that. I only mean that emerging literature has started to converge on a particular specification style due to it’s flexibility. It might be advantageous to adopt that as it’s a lot easier to both verify (if implementing things based on “newer” literature, eg PolyKinds) and specify, if one were to write a specification of the typechecker.
Regarding PolyKinds, by “discrepancies”, I’m referring to a paper specification that relies on this style for verifying correctness. It doesn’t exist in the compiler as is, so I have to adapt it or introduce it in some form. Dependent types aren’t directly relevant to PS runtime code, however, PolyKinds as implemented by GHC is a dependently-typed (kinded?) system.
I didn’t take it as a criticism Also, I think the current implementation deserves at least some criticism
I trust you all will make the right decision, but I just wanted to point out my concerns. Verifying the implementation certainly does sound nice.
I’m curious - how does one annotate the syntax tree using the new ordered contexts?
I’m willing to believe it eliminates some edge cases, but how does one back-substitute once variables are solved later?
(I haven’t been able to figure an efficient way to do so!)
I think we should add the following papers:
It’s a wiki post — feel free to add these and any others you’d like!