RFC: a new Registry for PureScript packages

f-f · April 26, 2020, 7:09pm

Prelude

Many of you will be aware of the fact that the Bower registry is not accepting package submissions anymore.

If you followed that thread you’ll note that I had some ideas about making a new Package Registry for PureScript packages only.

Things moved quite nicely on that front, and today I’d like to announce here that we have a candidate design and proof of concept for that.

What’s happening now

I’d like to now open a review phase, where the whole community is called in to give feedback, suggest improvements, and express their feelings about this.

More specifically, I’d like people to:

have a read at the README, which goes through the design and shows how all the different parts fit together, and then take a look at the type definitions and the package manifests
…and engage in the issue tracker, which already contains some discussions on various aspects and open questions of the design. Feel free to participate in the discussions and/or open new threads there if something is unclear or unsatisfactory in other ways.

Generally I expect that package authors/maintainers will be most interested in the various aspects of this, but everyone is very welcome to join and every input is valuable

The goals for this phase

find any flaws/oversights in the design as a whole, and address them
stabilize the design by addressing the already known issues

What will happen next

I would like this “general review” phase to last no less than two weeks from now, so that we can gather enough input. If we figure that’s not enough time we can of course extend this period.

Once we are happy with this we can move to the next phase, which will be about incorporating feedback and coding a first version of the Curator so that we can see this in action.

Once we’re happy with that and we feel like we want to commit to it, then we can start adapting our package managers and workflows to work with this

f-f · July 10, 2020, 4:29pm

The last “public” update about this is now months ago, so after answering @paf311’s questions in Slack the other day, I figured that gathering all of the latest updates in a post here would be a good idea.

Here are the questions:

I’m curious about the state of the new registry, but when I was looking, I also realized I was not clear on why the registry was necessary. Can someone explain why it’s preferable to maintain the registry vs. just using URLs and putting the package metadata with the package?

The only reason I can infer from the readme is that we don’t want to lose data, but there are other solutions to that (mirroring), and presumably the registry is not going to keep copies of the code anyway.

Why do we need a registry

I summarized a list of reasons here, but to address directly the concern of “why is a mapping between names and sources not enough?”: “just keeping a mapping” is what Bower does, and what we’re doing right now with just a json file.
The main reason for not settling for that is immutability: packages could get removed from GitHub, or people could edit tags, and if we’re just pointing names to repos, then your build might not work anymore if the content is moved/removed.

So the current idea of how to go at it is to actually store the code for every package on the registry repo, uplading the tarballs as “releases”: every package would be a “GitHub release”, and the tarball for a package version would be an artifact of the release. This is explained in more detail in the current draft here.

Note: yes, this is slightly twisting GitHub’s concept of “releases”, but allows us to not worry about storage in the beginning. I’d like to mirror things on S3 as well though (or whatever other storage), so there’s more than one copy around.

Followup: why does the registry need to keep copies of the code? Can we not just use whatever mirror?

The reason for doing this is to provide a “trusted” source for packages and consistency in the package ecosystem (as in, mostly everyone uses this mirror) - packages are consistently named, we can ensure a decent level of security as there are many eyes on this, etc.
However, if people want to use a different mirror/registry, that’s fine - this is what happens today with package-sets, where one can freely compose their own set and point the package manager to that.

So this thing is about:

writing down a spec for the interface to manage package publishing - so that package managers can implement that
building the first instance of a storage implementing this interface.

Current status of the Registry

We’re still in “spec draft” phase, right now mostly figuring out the schema for package manifests.

By now I think that “manifests” will most likely be JSON files, typechecked with a Dhall schema - this allows accessibility and flexibility in the schema while keeping things reasonably typesafe and explicitly typed (i.e. there’s a schema you can just look at).
The latest work-in-progress schema is in this branch.
I feel like this might be good enough to move forward - over the weekend I’ll update the sample manifests (which still use the old draft versions of the schema) and open a PR.

The next step after we settle on a v1 of the schema is to mass-import packages from Bower to the new manifests (the code for that exists already in Haskell, but I’m porting that to PureScript for CI reasons).
Only after that we’ll need to patch package managers to follow the new publish pipeline.

If you feel that things are unclear/confusing/unnecessary/wrong/don’t make sense, don’t hesitate to open an issue in the repo or comment here

paulyoung · July 11, 2020, 1:12am

Have you considered an approach which basically hosts and/or makes releases of entire package sets?

It seems like that would address the concerns about packages being mutated or removed while keeping all the existing benefits of package sets and with potentially less work.

f-f · July 11, 2020, 4:51pm

Great question!

Yes, I did investigate going down the route of just making immutable package-sets instead of encompassing all packages.
And it’s a really tempting idea, because as you noted there are several advantages:

less work - just download all packages in the set, zip them and we’re done!
stemming from the above: shorter “time to market”, and simpler design

Though after looking at it for a while I got stuck on a design that felt inconsistent. A short sample of the open questions I had about it:

most of the projects don’t use the vanilla set, but they have overrides/forks/etc. If these packages are not included in the set then we forego immutability for them?
If we don’t, then we end up potentially caching all packages in the registry…
If we do, then what kind of implications this has for users? I.e. this might incentivize people to publish weird things in the set just because they get immutability with that.
what happens when we cut a new set? If the whole registry is immutable then we just update the pointers and we’re fine (see end of the post), but if it’s not then do we fetch packages again from the source? What happens if tags have changed in the meanwhile, i.e. my-lib@v4.1.1 points to another commit than the previous set? Or do we try to keep a cache for all packages to avoid surprises? But in that case we’re basically implementing the immutable registry…

I hope the above rambling makes sense
Let me know if you think I missed something fundamendal and there is actually a nice way to make this work - I hope there is! This kind of reminds me of the ~6 months I spent in hesitation of going down the Spago route, hoping that there would be a simpler way other than “write another package manager” to get things to play nice.

Note: with an immutable registry, a package set is just a simple record:

{ prelude = "v4.1.1"
, effect = "v2.0.0"
, simple-json = "v7.0.0"
..
}

hdgarrood · July 11, 2020, 5:47pm

Another couple of problems with a “host immutable package sets” approach, assuming I’ve understood you correctly at least:

It basically precludes any way of working with packages other than package sets, i.e. bounds and solving. I would really like to (eventually) provide a tool which does something similar to what stack solver used to do.
It means that we lose the centralized set of package names, which is important for Pursuit as well as just generally being understood. For example, in the absence of a centralized set of package names there would be at least three separate repos which the package name halogen-svg could refer to right now.

f-f · September 27, 2020, 7:52pm

The “PureScript Registry RFC” linked in the first post has now been heavily reworked/rewritten including feedback from many people and various discussions.
There is a pull request open in the Registry repo that details all the updates and what the next steps are.

JordanMartinez · September 28, 2020, 3:30am

Thanks for all your work @f-f!