Registry vs package sets

Hi all - first, sorry for spaces after https below, since I’m not allowed to post more than 2 links.

I see that last year the registry was announced https ://discourse.purescript.org/t/registry-alpha-launched/3146. My understanding is that import jobs such as this one https ://github.com/purescript/registry/actions/runs/5368511003/jobs/9739242114 are, among other things, transferring things to the package sets repository GitHub - purescript/package-sets: PureScript packages for Spago and Psc-Package which is still used by the current spago.

I see some packages e.g. https ://pursuit.purescript.org/packages/purescript-sequences/3.0.2 on pursuit and in the above import job’s logs, but I don’t see it in the latest package sets repo. I see the import stats say there are a bunch of non-fully-successful imports:

----------
IMPORT STATS
----------

1727 packages processed:
  1182 fully successful
  354 partially successful
  127 reserved (no usable versions)
  62 fully failed
  ---
  38 Cannot Access Repo
  17 Package URL Redirects
  5 Disabled Package
  1 Invalid Package Name
  1 Invalid Package URL

Is that the reason they are not included?

If so:

  1. Is there a list of packages that failed somewhere?
  2. What’s the procedure to figure out why a particular package failed?
  3. Assuming it’s something easily correctable, is republishing the package per GitHub - purescript/registry: Registry of PureScript packages and metadata about them the only thing needed to get it into the package?
1 Like

Hi there @icyrockcom, welcome to the Discourse! I’ve updated your trust level so that you can post links without them being stripped. I work on the registry, so I have some answers for your questions.

I see some packages e.g. purescript-sequences - Pursuit on pursuit and in the above import job’s logs, but I don’t see it in the latest package sets repo. I see the import stats say there are a bunch of non-fully-successful imports. Is that the reason they are not included?

Failure to be imported is indeed one reason a package might not be in the package sets, but it isn’t the only one. For example, the sequences library you listed has been imported successfully. You can tell because it has an entry in the metadata directory, and version 3.0.2 is in there:

There are many packages that are published in the registry but which aren’t in the package sets. A package is omitted from the package sets any time that including it causes a compilation failure with the existing set. This can happen if the package relies on an older compiler than the package set does (for example, the package sets are on 0.15.9 but the package only works with 0.14.x), or if the package has a dependency that’s at an incompatible version in the set (it depends on prelude@1.0.0 but the package sets have prelude@2.0.0), or if the package has a duplicate module name with something else already in the set, and that sort of thing.

Is there a list of packages that failed somewhere?

We haven’t compiled an official list, though it’s tracked in an issue:
https://github.com/purescript/registry-dev/issues/300

That said, if you run the registry importer locally with nix develop --command registry-importer dry-run then you will see a scratch directory get created with two files: version-failures.json and package-failures.json. The first lists the specific reason why a version of a package failed to be imported. The second lists the reason why an entire package could not be imported (and therefore no versions were attempted). When the registry leaves alpha we will commit these files so folks have a list of the packages / package versions that aren’t in the registry and why.

What’s the procedure to figure out why a particular package failed?

If you are publishing a package via the registry’s “publish” workflow, then you will get logs as comments on the GitHub issue you open:
https://github.com/purescript/registry#publish-a-package

For auto-imported packages, you can either a) run the registry-importer locally to get package-failures.json and version-failures.json files and inspect their contents, or b) you can look at the GitHub workflow logs for the day the package was imported, if those logs are still preserved.

Assuming it’s something easily correctable, is republishing the package [via the registry] the only thing needed to get it into the package sets?

If there is a correctable issue with the package, then you should fix it, push a new tag to GitHub, and then publish the package via the registry’s “publish a package” workflow. The registry will automatically attempt to add the package to the next package sets release, if possible.

However, many issues are not correctable.The “import” process attempts to take a package that was originally published to the Bower registry (or “published” via a tag pushed to GitHub) and package it in the registry according to the registry’s validation rules. But this can fail; for example, some packages are no longer on GitHub (“Cannot Access Repo”), so we can’t package them. Others might fail because their dependencies are not solvable, or because they don’t have an open-source license, or another reason.

If you want to see the exact code run to publish a package, see this function:
https://github.com/purescript/registry-dev/blob/30e00032cd98b3cdc9d6632f8608f9cb63fc583a/app/src/App/API.purs#L336-L337


Hope this helps clarify a few things — please feel free to ask followup questions!

3 Likes

I should have linked to the spec, now that I think of it, which also outlines the publishing process:

https://github.com/purescript/registry-dev/blob/master/SPEC.md#51-publish-a-package

Thanks @thomashoneyman for the write-up!

Great, appreciated!

I assume this should be run in a checkout of GitHub - purescript/registry-dev: Development work related to the PureScript Registry? If so, that doesn’t seem to be working on my end for some reason:

$ git branch -v
* master 466146b Add Nix builds for the full registry (#619)
$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
$ nix develop --command registry-importer dry-run
/tmp/nix-shell.yvutDM: line 1852: exec: registry-importer: not found
$ ll -d /nix/store/*purescript-registry*/
dr-xr-xr-x 3 root root 4096 Dec 31  1969 /nix/store/pxqrq8q5j3ksca2an2dkzy9y9ilmv5gj-purescript-registry/
$ ll /nix/store/pxqrq8q5j3ksca2an2dkzy9y9ilmv5gj-purescript-registry/
total 8
lrwxrwxrwx  1 root root   84 Dec 31  1969 bin -> /nix/store/pxqrq8q5j3ksca2an2dkzy9y9ilmv5gj-purescript-registry/node_modules/js/.bin
dr-xr-xr-x 10 root root 4096 Dec 31  1969 js
$ ll /nix/store/pxqrq8q5j3ksca2an2dkzy9y9ilmv5gj-purescript-registry/node_modules/js/.bin
ls: cannot access '/nix/store/pxqrq8q5j3ksca2an2dkzy9y9ilmv5gj-purescript-registry/node_modules/js/.bin': No such file or directory
$ 

I don’t know if that’s expected or not (recent changes in the command line params?). I was able to run it with:

$ nix develop
$ GITHUB_TOKEN=xxx nix run -- .#legacy-importer dry-run

which seemed to work.

So looking at purescript-sequences again, here’s what I see in the logs:

$ rg sequences legacy-importer-2023-07-14T06\:18\:09.log|head -n50
[2023-07-14T06:21:19.278Z INFO] Processing sequences
[2023-07-14T06:21:19.278Z DEBUG] Listing tags for hdgarrood/purescript-sequences
[2023-07-14T06:21:19.278Z DEBUG] No cache entry found for GET /repos/hdgarrood/purescript-sequences/tags in memory.
[2023-07-14T06:21:19.279Z DEBUG] Fell back to on-disk entry for GET /repos/hdgarrood/purescript-sequences/tags
[2023-07-14T06:21:19.280Z DEBUG] Wrote cache entry for GET /repos/hdgarrood/purescript-sequences/tags in memory.
[2023-07-14T06:21:19.281Z DEBUG] Found cache entry but it was modified more than 4 hours ago, refetching GET /repos/hdgarrood/purescript-sequences/tags
[2023-07-14T06:21:19.281Z DEBUG] Making request to GET /repos/hdgarrood/purescript-sequences/tags
[2023-07-14T06:21:19.436Z DEBUG] Wrote cache entry for GET /repos/hdgarrood/purescript-sequences/tags in memory.
[2023-07-14T06:21:19.437Z DEBUG] Wrote cache entry for Request__GET /repos/hdgarrood/purescript-sequences/tags at path scratch/.cache/Request__GET___repos_hdgarrood_purescript-sequences_tags as JSON.
[2023-07-14T06:22:32.605Z DEBUG] No cache entry found for PublishFailureCache__sequences__3.0.2 in memory.
[2023-07-14T06:22:32.605Z DEBUG] No cache file found for PublishFailureCache__sequences__3.0.2 at path scratch/.cache/PublishFailureCache__sequences__3.0.2
... <other versions>

I see the relevant files there:

$ fd sequences scratch
scratch/registry/metadata/sequences.json
scratch/registry-index/se/qu/sequences

and the latest version 3.0.2 listed in all:

$ tail -n+1 $(fd sequences scratch scratch/.cache)|rg '^==|3.0.2'
==> scratch/.cache/Request__GET___repos_hdgarrood_purescript-sequences_tags <==
        "name": "v3.0.2"
          "sha": "6a68dda0c9dc77d2cdb9caf890a68983343d0e26",
          "url": "https://api.github.com/repos/hdgarrood/purescript-sequences/commits/6a68dda0c9dc77d2cdb9caf890a68983343d0e26"
==> scratch/registry/metadata/sequences.json <==
    "3.0.2": {
      "ref": "v3.0.2"
==> scratch/registry-index/se/qu/sequences <==
{"name":"sequences","version":"3.0.2","license":"MIT","location":{"githubOwner":"hdgarrood","githubRepo":"purescript-sequences"},"description":"An efficient, general purpose sequence type.","dependencies":{"arrays":">=6.0.0 <7.0.0","assert":">=5.0.0 <6.0.0","console":">=5.0.0 <6.0.0","effect":">=3.0.0 <4.0.0","lazy":">=5.0.0 <6.0.0","maybe":">=5.0.0 <6.0.0","newtype":">=4.0.0 <5.0.0","nonempty":">=6.0.0 <7.0.0","partial":">=3.0.0 <4.0.0","prelude":">=5.0.0 <6.0.0","profunctor":">=5.0.0 <6.0.0","psci-support":">=5.0.0 <6.0.0","tuples":">=6.0.0 <7.0.0","unfoldable":">=5.0.0 <6.0.0","unsafe-coerce":">=5.0.0 <6.0.0"}}

At this point, I’m not sure what to look for to see why it’s failing from the logs themselves.

Now purescript-sequences might have an issue with 0.15.x based on the latest github actions result:

which seems to have failed for the 0.15.4 compiler version (logs have been purged, so unfortunately not able to confirm more). So let’s assume purescript-sequences are not buildable on 0.15.x as the reason.

However, let’s take purescript-rationals as another example. The latest version is 6.0.0 and based on the logs it builds fine with 0.15.9:

2023-05-12T11:01:51.7507030Z ##[group]Run purescript-contrib/setup-purescript@main
2023-05-12T11:01:51.7507316Z with:
2023-05-12T11:01:51.7507500Z   purescript: latest
2023-05-12T11:01:51.7507703Z   spago: latest
2023-05-12T11:01:51.7507894Z   psa: latest
2023-05-12T11:01:51.7508073Z ##[endgroup]
2023-05-12T11:01:51.9730257Z Fetching latest stable tag for purs
2023-05-12T11:01:51.9835490Z Fetching latest stable tag for spago
2023-05-12T11:01:51.9889518Z Fetching latest stable tag for psa
2023-05-12T11:01:51.9931740Z Constructed build plan.
2023-05-12T11:01:51.9962861Z Fetching purs
2023-05-12T11:01:52.2791238Z [command]/usr/bin/tar xz --warning=no-unknown-keyword -C /home/runner/work/_temp/152d3067-bbd0-411e-a149-6b1949f8a37a -f /home/runner/work/_temp/8443089a-6c6b-4671-ae9f-55aa9faea50d
2023-05-12T11:01:52.7016848Z Cached path /opt/hostedtoolcache/purs/0.15.9/x64, adding to PATH
2023-05-12T11:01:52.7022145Z Fetching spago
2023-05-12T11:01:52.9679856Z [command]/usr/bin/tar xz --warning=no-unknown-keyword -C /home/runner/work/_temp/a59f8c63-4bc8-4fd3-9329-4c72b6e0e1de -f /home/runner/work/_temp/8b8987bc-73a6-4b5e-bd2a-11781e1781fa
2023-05-12T11:01:53.3166745Z Cached path /opt/hostedtoolcache/spago/0.21.0/x64, adding to PATH
2023-05-12T11:01:53.3167598Z Fetching psa
2023-05-12T11:01:53.3200795Z [command]/usr/bin/sudo npm install -g purescript-psa@0.8.2
2023-05-12T11:01:54.4210227Z 
2023-05-12T11:01:54.4210725Z added 1 package in 448ms
2023-05-12T11:01:54.4279873Z Fetched tools.

so that should not be the issue. It seems to have been imported fine as well: Update rationals · Issue #269 · purescript/registry · GitHub, including the github action: Update rationals · purescript/registry@b71089a · GitHub build succeeding.

One thing I noticed is this:

$ rg 'rationals.*(0.1.0|5.0.1|6.0.0)' legacy-importer-2023-07-14T06\:18\:09.log 
17028:[2023-07-14T06:20:58.155Z DEBUG] Did not find manifest for rationals@0.1.0 in memory cache or local registry repo checkout.
17029:[2023-07-14T06:20:58.155Z DEBUG] No cache entry found for ImportManifest__rationals__v0.1.0 in memory.
17030:[2023-07-14T06:20:58.156Z DEBUG] Fell back to on-disk entry for ImportManifest__rationals__v0.1.0
17031:[2023-07-14T06:20:58.156Z DEBUG] Wrote cache entry for ImportManifest__rationals__v0.1.0 in memory.
26815:[2023-07-14T06:22:32.576Z DEBUG] No cache entry found for PublishFailureCache__rationals__6.0.0 in memory.
26816:[2023-07-14T06:22:32.577Z DEBUG] No cache file found for PublishFailureCache__rationals__6.0.0 at path scratch/.cache/PublishFailureCache__rationals__6.0.0
26817:[2023-07-14T06:22:32.577Z DEBUG] No cache entry found for PublishFailureCache__rationals__5.0.1 in memory.
26818:[2023-07-14T06:22:32.577Z DEBUG] No cache file found for PublishFailureCache__rationals__5.0.1 at path scratch/.cache/PublishFailureCache__rationals__5.0.1
31229:[2023-07-14T06:22:33.607Z DEBUG] No cache entry found for PublishFailureCache__base-rationals__0.1.0 in memory.
31230:[2023-07-14T06:22:33.607Z DEBUG] No cache file found for PublishFailureCache__base-rationals__0.1.0 at path scratch/.cache/PublishFailureCache__base-rationals__0.1.0

I.e. it imported 0.1.0, but not 5.0.1 (which is in the latest package set) 0.6.0. I’ve tried removing files (*rationals*) from .cache - is there a better way to repeat only a single package from scratch? - and repeating and now it’s a bit different:

$ rg 'Import.*_rationals.*(0.1.0|5.0.1|6.0.0)' legacy-importer-2023-07-14T06\:51\:02.log 
14435:[2023-07-14T06:51:38.606Z DEBUG] No cache entry found for ImportManifest__rationals__v5.0.1 in memory.
14436:[2023-07-14T06:51:38.607Z DEBUG] No cache file found for ImportManifest__rationals__v5.0.1 at path scratch/.cache/ImportManifest__rationals__v5.0.1
14472:[2023-07-14T06:51:40.905Z DEBUG] Wrote cache entry for ImportManifest__rationals__v5.0.1 in memory.
14473:[2023-07-14T06:51:40.908Z DEBUG] Wrote cache entry for ImportManifest__rationals__v5.0.1 at path scratch/.cache/ImportManifest__rationals__v5.0.1 as JSON.
15036:[2023-07-14T06:51:57.942Z DEBUG] No cache entry found for ImportManifest__rationals__v0.1.0 in memory.
15037:[2023-07-14T06:51:57.943Z DEBUG] No cache file found for ImportManifest__rationals__v0.1.0 at path scratch/.cache/ImportManifest__rationals__v0.1.0
15066:[2023-07-14T06:51:58.916Z DEBUG] Wrote cache entry for ImportManifest__rationals__v0.1.0 in memory.
15067:[2023-07-14T06:51:58.917Z DEBUG] Wrote cache entry for ImportManifest__rationals__v0.1.0 at path scratch/.cache/ImportManifest__rationals__v0.1.0 as JSON.

but still no 6.0.0 there.

I don’t see 6.0.0 in version-failures.json either - the latest is 0.3.1:

$ <version-failures.json jq .rationals
{
  "v0.1.0": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  },
  "v0.1.1": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  },
  "v0.1.2": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  },
  "v0.2.0": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  },
  "v0.3.0": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  },
  "v0.3.1": {
    "reason": "Legacy manifest could not be parsed.",
    "tag": "InvalidManifest",
    "value": "MissingLicense"
  }
}

Does it have anything to do with bower.json being removed from purescript-rationals? Use BigInt instead of Int (#42) · purescript-contrib/purescript-rationals@d85ddb5 · GitHub

What would be the next step in troubleshooting?

I’ve got answers for your questions, but to fast-forward to the conclusion: if you want to know why a package did not get added to the package sets, then you can either 1) locate the day that the package should have been added to the package sets (package sets update every day) or 2) submit a package set update to the registry including the package version.

The registry will try to apply your update, and if it fails then you can see errors in the job logs. I went ahead and submitted an issue so that we can see the results:
https://github.com/purescript/registry/issues/309

On to your questions!

That doesn’t seem to be working on my end for some reason…
$ nix develop --command registry-importer dry-run
I don’t know if that’s expected or not (recent changes in the command line params?). I was able to run it with:
$ GITHUB_TOKEN=xxx nix run -- .#legacy-importer dry-run

Sorry about that — indeed, we’ve just changed this in the past week now that all the builds are in Nix. The command you used (which worked) is the correct one.

For the rest of your troubleshooting here, you’re doing the right thing as far as checking why a package version fails to be imported to the registry. But I may have lead you astray — a package can be published to the registry, but fail to be added to the package sets. Those are entirely separate processes and running the importer won’t tell you anything about why a package version is not in the package sets.

Really quickly, I’ll explain more about the importer, but you can skip to the next section to debug the package sets — again, a separate thing altogether from the legacy importer.

Back to the importer. You can run the importer and inspect the logs and the two failures.json files to see problems with packages being imported to the registry. Results are cached, so to get rid of the cache for a package you can do what you already did — delete the relevant entries from .cache.

However, the package versions you’re inspecting have already successfully been imported into the registry. The legacy importer does not reimport anything that’s already in the registry. If you want to force a reimport locally, then you can delete the relevant entries from the local checkout of the registry and registry-index you see in the scratch directory as a sibling to .cache. For example, you’d go to:

  • scratch/registry/metadata/rationals.json and delete the entry "6.0.0" from the "published" section.
  • scratch/registry-index/ra/ti/rationals and delete the line that includes "version": "6.0.0"

Then you can import again and see how these are processed. Of course, since these versions are successful, you won’t see anything in the package-failures or version-failures. But you’ll see them show up in your checkout of the registry and registry-index.


OK, back to the package sets. You’re trying to figure out why a package version isn’t in the package sets. The way package set publishing works is that at midnight every night we collect up all package versions published in the last 24 hours and attempt to add or upgrade them in the package sets. Any package version that produces a compiler failure is rejected from the package sets, and the rest are included in the upgrade.

There is a local script that will let you do this, callable via nix run .#package-set-updater, but it doesn’t let you try to add a specific package. You could tweak the script to manually insert the package version you want updated into it, which is a bit hacky but will let you run the package set update and see why that package is being rejected.

But the easiest thing of all is to submit a package set update to the registry. The registry will try to apply your update, and if it fails then it will tell you why.

I’ve commented in the registry issue about the specific failures of sequences and rationals keeping them out of the package sets, and we can discuss it there if you’d like to help getting at least rationals into the package sets.