So the Bower registry is no longer accepting new package registrations:
? Registering a package will make it installable via the registry (https://registry.bower.io), continue? Yes
bower register git://github.com/jordanmartinez/purescript-interpolate.git
bower EUNKNOWN Unknown error: 500 - Registering bower package names is not supported anymore. You can install any bower package on github with command like "bower install jquery/jquery-dist --save"
* ERROR: Subcommand terminated with exit code 1
The impression I have is that most PureScript users aren’t heavily depending on Bower these days anyway, but using Spago instead. Also, this isn’t an immediate issue for people consuming packages via Bower, as installation still seems to be working fine.
However, this change does have implications for Pursuit and package-sets. The Pursuit package publishing guide, and the
pulp publish flow, both assume that you are publishing your package to Bower before publishing it to Pursuit. (The main reason we have asked package authors to do this is that it ensures that package names are unique, i.e. it should prevent two different authors both thinking they own a certain package name.) The
purescript/package-sets repo has a similar policy:
All packages that are included here must first be published via
bower with no exceptions. Since there are two distribution methods for packages (the Bower registry and the package sets), we rely on the Bower registry to act as a “central registry of package names” for both methods. This prevents divergence in the ecosystem - e.g. having two different codebases for a package called “prelude”.
We’ll need to come up with a new policy for both of these, and more generally work out what the implications are for packaging in PureScript going forward, so I’m designating this thread for this purpose. There’s some related discussion in the thread Blogged: thoughts on PureScript package management too which might be useful as background.
@f-f looking back at your proposal for a package registry in that thread I think I am more open to that now, especially since we are now accepting funding for infrastructure costs via opencollective.
Also, for anyone who can’t move off of bower for some reason, you can still install new libraries and use the bower solver. You just need to use the repo url with a version suffix (
#^1.0.0). I believe it also accepts shortcut urls for GitHub (
@hdgarrood happy to hear!
Since that comment you linked we further refined that idea with @justinw, to remove hosting costs entirely (at the price of leaning even more on GitHub)
So the idea would now be something like:
- there’s a GitHub repo that is “the registry”
- this repo contains a file for every version of every package that is published
- each one of these files contains all the info about a package at a specific version, like the package name, address, dependencies, bounds, etc
"adding a package to registry" is done by opening a pull request to the repo adding such a file.
Then CI automation kicks in and a bot:
- fetches the package from its source
- packages it in a tarball together with its package file from the repo
- hashes the content
- uploads the tarball and the hash to a github release for hosting. Note: every package has a dedicated github release that holds all the versions of that package
- generates docs and uploads them to Pursuit
upgrading a package is the same process as above, except one only has to copy an existing file over and change the version
downloading a package for a package manager consists of:
- getting the tarball and the hash with the right version from the package’s github release in the registry repo
- hashing the tarball and checking that the hash matches
- unpack, proceed as usual, etc
Notes and remarks:
- the fact that bounds are contained in a file for a version means that “registry editors” can change version bounds without republishing the package. This is a feature, and it’s analogous to “revisions” on Hackage. This of course opens up security/reproducibility concerns, so we’d need a careful consideration of all the attack vectors when implementing
- no server code needs to be implemented: authentication is handled by GitHub, hosting of both the metadata and the packages is handled by GitHub.
The only code that needs to be written is for the bot that watches pull requests on the repo, packages stuff, and uploads tarballs to releases.
- we can actually import all the Bower packages to pre-populate the registry, by crawling for all packages prefixed with
purescript-, and converting the bowerfiles to the registry format
(this can actually be run periodically so that if people publishe new versions on Bower we can keep this registry synced)
However, this is more code to be written.
- there are no costs except for the infrastructure running the registry bot (read: a VM, or a Lambda, etc), which doesn’t need to be secured as it doesn’t expose any ports to the outside (this is very much alike to how Spago’s Curator runs today, it’s a cronjob that does things)
- mirroring the registry is just a matter of:
- mirroring the git repo
- swapping the “tarball upload destination” backend in the bot
Things still to define:
- the format for the package info file (both the file format and the data structure)
- how to make the “publishing flow” as smooth as possible. Last time I checked it was not possible to link users to “open this PR”, so I think we can either:
- ask people to add a GitHub token (akin to
pulp login today)
- or link people to open an issue on the registry repo, where we then have the bot pick them up, and open PRs automatically
- a migration policy/path. It probably makes sense to migrate all the Bower packages to the new registry and then tell people to switch package manager, but we should explicitly think about the migration path to ensure there are no rough edges and paper cuts
- a solver for Spago/other-package-managers-supporting-this. Some months ago I looked at how reusable Cabal’s solver is, and it looks pretty good, but the implementation still needs to be tried out. The bottom line of course is that I don’t want to implement/maintain a solver
How to proceed from here:
- I’m working on a proof of concept of this, but before getting any code out I’d like to hear some feedback on how viable is the idea when deployed at community scale. Thoughts on all of this?
- If the above makes sense: before writing more code I’d like to iterate/expand on this design and get more details down. I’m thinking of an actual “registry design document” in RFC-style, so that we can address issues and questions before getting tangled in implementation details
- If the above makes sense: if anyone is available to help out with design/implementation/ops for this just comment below or ping me somehow and we can coordinate
So I want to be quite careful about this, because whatever we end up doing now will very probably be what we keep using for quite a long time. So I don’t want to prioritise ease of just getting something set up as quickly/easily as possible.
I think it would be good to be able to perform common queries without downloading or maintaining a copy of the entire registry. For example: “What versions of this package exist? What packages does this package at this version depend on?” I think getting info about a single package at a particular version should be okay with this design but I’m not sure about asking what versions exist. Perhaps we could achieve this by denormalising, by adding a file per package which lists all the available versions it has (and checking that this file is accurate in CI).
It’s probably worth looking into how existing package registries provide APIs for solvers to use too, just for the ‘which versions of which packages have which bounds’ question, if we do intend to support solvers (which I would be very happy about). I’m not sure how they usually work - having to make one API call per package per version to get bounds data seems like a non-starter.
Elm’s solver might also be of interest.
Just a thought, but would using GitHub’s package registry be worth considering?
Yes, I think that’s worth consideration too. The main problem is that it appears to be intended to be used only by a select few package managers, all of which are tied to specific languages (which are not PureScript). Perhaps that’s not a dealbreaker though - I’m not sure.
I think it may be worth considering Entropic: https://github.com/entropic-dev/entropic
I first heard about it here: https://youtu.be/MO8hZlgK5zc
Crates (Rust’s package archive) stores its index in a “file-trie” here: https://github.com/rust-lang/crates.io-index as the source of truth.
They then built a web-app on top that feeds that information into a Postgres (I think?) and adds documentation etc.
Entropic looks like it has stalled - the readme still says it ‘will probably fall over in a stiff breeze’ but that that’s ‘exceeding expectations for a project just over a month old,’ and there have been barely any commits for months. For a project in its infancy, that’s concerning. I don’t think it can be a candidate in its current state.
Yeah that’s exactly what the design I posted above is about: decoupling “package registry” duties from “package registry metadata and public relations” duty. We could have a model similar to
- we store the data in a repo and handle the package-registry process through that (as I detailed above)
- generate a trie-repo like
- …and then build a small server on top to add functionality
(thanks for the link btw, TIL)
It’s definitely good to know that this design has been done already and can work, especially since I’d guess the Rust registry is probably bigger now than ours will be for a while.
I think we probably need a stopgap solution in the meantime between now and when we have set this new registry up. Since the bower registry is basically just a mapping from package names to repository urls, how about we create a new repository under the purescript org containing a json file which is just that, based on scraping the current contents of the bower registry, update the package publishing instructions in the package sets repo and in Pursuit to say that you should send a PR to this repo instead of registering on bower, and update pulp and/or spago to allow publishing with the new flow? I think we should probably continue to suggest that packages are at least installable by bower for now, in the sense that if you add the package to the dependencies in a bower.json file in a downstream project with the full repo url (so as not to require hitting the bower registry), it should be able to install together with needed dependencies.
@hdgarrood I think this is a good idea.
I created the purescript/registry repo, which contains:
- an RFC for the registry design that I described above, to be discussed/commented/improved
bower-packages.json file, that I put together by using the
libraries.io API (which was very flaky so this was not entirely straightforward…), that should contain all the PureScript packages ever published on the Bower registry.
It’s a mapping from package names to their GitHub address, which should be enough for the stopgap solution you propose?
Please check this thread for a link to a more detailed publishing instructions by @ursi: Up-to-date instructions for publishing new packages
Could you please tell me how do you publish new libraries these days? Should I take the usual road and make pulp publish together with some package name prefixing like paluh/may-incredible-lib ?
I want to ask before I make some mess on the Pursuit by publishing some garbage
pulp publish will work as expected, but first you should add the library address to this file: https://github.com/purescript/registry/blob/master/new-packages.json (which is the “temporary registry”). That’s all there is to it.
There are docs on pursuit.purescript.org for package authors which need updating.