Automated upgrades for breaking changes

milesfrain · December 17, 2020, 7:42pm

A tool to automatically update your code for breaking changes would be really convenient.

A hand-wavy proposal is outlined in this registry issue. This same idea was also suggested recently in another discourse post. Figured it would be good to setup a dedicated thread for more brainstorming.

I’m thinking a good next step is to investigate the auto-upgrade tools in other languages. Here’s what I’m aware of so far:

natefaubion · December 17, 2020, 7:52pm

FWIW, I’ve been working (currently stalled) on a CST parser for PureScript written in PureScript. I think that a Haskell target is a fairly large barrier to entry for many people (much more difficult to maintain and deploy binaries). My hope is that being able to write such tooling in PureScript itself would inspire more people to take this on. If there is interest in unstalling this, I can open up a WIP repo for it.

thomashoneyman · December 17, 2020, 11:36pm

@natefaubion Just based on discussions I’ve had with folks in the community in the past I think there is a lot of desire for this.

hdgarrood · December 18, 2020, 2:56am

While I think many (perhaps most?) of the fixes we might want to make are automatable with a tool that works with the CST, I think it’s probably worth pointing out that working on the CST probably isn’t ideal in some of the cases we might come across. For example, locating use sites of the Semigroup Map instance would be quite difficult to do with the CST; you’d at least want access to the AST after type class desugaring, or perhaps corefn (especially since PureScript libraries already exist to parse corefn). In other cases, such as if there’s a change in the syntax where previously compiling code becomes a syntax error, the CST representation might be too late in the pipeline to be useful!

natefaubion · December 18, 2020, 3:23am

Any sort of refactoring is going to be on the CST though, otherwise you won’t be able to preserve anyone’s source code in a meaningful way since both the AST and CoreFn discard too much information. In any case, you’d have to figure out how to do analysis on what you can get from something like CoreFn and translate that into syntactic updates on the CST, which is a non-trivial problem itself. The existing compiler tooling is frankly not suited for this kind of stuff anyway.

Ideally, I would like to make an error tolerant parser and CST that would make transformations on incomplete code feasible.

natefaubion · December 18, 2020, 3:27am

I would like to maybe see a compiler extension that can compile a project and produce a full database/graph of types and their usages (mapping to spans), which would help with writing tooling vs trying to use the AST, which is not consumable in any realistic way by external tooling. You could use this to due type-based refactorings.

ajnsit · December 18, 2020, 6:46am

I’d like to add that the tool doesn’t need to be perfect and catch everything. A tool that helps with the tedious 80% of the migration is also very helpful, as long as there are going to be compile errors for the remaining 20%.

afc · December 18, 2020, 8:34am

There’s so much leverage to be had from this - in code discovery and more general re-factoring and impact analysis of refactorings that you’d like to do, and in enabling drag and drop refactoring of code from graphic representations of the call-graph.

jvliwanag · December 18, 2020, 8:46am

I’d definitely be interested trying out your project @natefaubion. @srghma has done great work on https://github.com/purescript-codegen/purescript-ps-cst on which I’m helping out as well. I’ve been mainly using it for codegen of react bindings and aws libraries.

Indeed, a number of tools I wish to use are on node - so having the cst in purescript definitely helps.

On the side, I’ve wanted to work on creating a more customizable pretty printer for purescript - though I’m still a beginner on these. I’m getting a hang of it writing some printers (using dodo-printer) on the ps-cst project. But I’m not sure how to bridge it to anything that parses purescript code onto CST in the first place.

natefaubion · December 18, 2020, 4:26pm

Yes, y’all are doing good work over there! It does a great job of being a convenient tool for code generation. However, I’m not sure I can use it as a parser target since I don’t think it can represent arbitrary user sources with spans, which would be necessary for a refactoring tool (since we are potentially dealing with the product of whitespace artisans ). Additionally, I don’t know if you want error tolerance in a representation that should ideally be correct by construction. I think it would straightforward, if a little tedious, to convert parser output into the codegen input.

milesfrain · December 31, 2020, 9:40pm

An API diffing tool (similar to what Elm and Haskell have [1]) would be a great addition to our ecosystem. I also see it playing a key role in making upgrades easier.

Here’s a proposed sequence of developer conveniences we can enable once we have the API diffing tool.

Stage 1 - Verbose upgrade report

spago --upgrade-set (or a wrapper around it) runs an API diff for all packages in the package set (between the versions you’re upgrading from and to) and logs all changes to a single text file. This file will contain the before-and-after type signatures. There will be a lot of irrelevant changes listed in this file, such as the functions that you’re not even using, but at least this will be a reference to search through as you’re fixing compilation issues.

Stage 2 - Concise upgrade report

This is the same as the Stage 1 report, but only includes relevant differences. This likely requires comparing the report to the CST/AST/corefn representation of your project.

Stage 3 - Local upgrade annotations

Insert fix-me comments into .purs files wherever a modified API is called. This might be more convenient than having to constantly jump between your code and the upgrade report. These instructions could alternatively be summarized at the top of each .purs file.

Stage 4 - Automated renames

Automatically apply simple upgrades directly to .purs files when possible. I think the only realistic opportunities for this are renames and moves. Everything else likely requires some manual intervention to resolve.

Stage B - Changelog integration

We could leverage API diffs to enforce SemVer and good changelogs, or even help write the changelogs. This work doesn’t depend on any of the numbered stages, but will improve those stages. For example, the reports or fix-me comments could include the relevant changelog entries. We could also allow special changelog syntax to help identify renames and moves that are otherwise not detected by the diffing tool, which would make Stage 4 more effective. Many projects already follow the keep a changelog guidelines. Another tool (based on the API diff tool) could identify changelog gaps and offer to write some entries for you.

[1] https://old.reddit.com/r/haskell/comments/6vq3pd/tool_to_detect_breaking_changes/

Is there a tool that will allow me to discover if I have made breaking changes to a module?

The tools section of the PVP spec lists a few options:

ghc-pkg-apidump: A tool for comparing the exported API of different version of the same GHC library package.

precis: Summarizes API differences between revisions of Cabal packages.

There’s also hackage-diff, which compares the public API of different versions of a Hackage library.

And for what it’s worth, Elm’s package manager (elm-package) has a diff subcommand that does exactly this: https://github.com/elm-lang/elm-package/blob/1a364bc/README.md#publishing-updates