- Feature Name: Hashes for packages in PureScript package set
- Start Date: 202…1-11-25
- RFC PR: #1043
## Summary
It would be nice to have hashes for all the packages in the package set.
This repo distributes package sets where packages are defined like the following:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
This RFC proposes changing them to also include hashes. Packages with hashes would look something like this:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
Note there is a new `hash` key.
## Motivation
Distributing the package set with hashes for each package would have two nice effects:
- Some build systems have special support for downloading sources that are hashed. Nix and Bazel both fall into this camp. The PureScript package set could be directly used by these build systems if each package had a hash. Currently, an impure step that adds hashes needs to run before the PureScript package set can be used by these systems.
- Support could be added to Spago to check the hash of the packages it is downloading. Spago could report an error if it downloads a package that doesn't match the specified hash.
## Detailed design
The main change to this `purescript/package-sets` repo would be generating a hash for each package when generating a new package set. This hash generation code would run right after this line in CI:
https://github.com/purescript/package-sets/blob/c8efa6bd161ba92a4184b58487b628984d8068c9/.github/workflows/release.yml#L18
Here's a PR that implements this proposal: #1043.
Here's an example package set produce with the above PR. Note that each package contains an additional `hash` field: https://github.com/cdepillabout/package-sets/blob/25b062fa7a5b875aab56c6c45198f2b51ee078fe/packages-with-hashes.dhall
There are a few design questions in this RFC. I discuss each of them in the following sections, including my suggestions.
### What exactly should be hashed?
When hashing git repos, you need to figure out what exactly you want to hash. Unlike hashing a single file, there are different ways of hashing a directory of files. This RFC suggests side-stepping this issue, and instead just hashing the archive automatically produced by GitHub. For instance, take the `zipperarray` package:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
This RFC suggests hashing the following tarball that is automatically produced by GitHub:
```
https://github.com/jamieyung/purescript-zipperarray/archive/v1.1.0.tar.gz
```
Trying this on the command line looks like:
```console
$ curl -L https://github.com/jamieyung/purescript-zipperarray/archive/v1.1.0.tar.gz | openssl dgst -sha256 -binary | openssl base64
fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY=
```
The big advantage of relying on these tarballs produced by GitHub is that the implementation on our side is extremely simple. Hashing a tarball at a known location is almost as easy as it gets.
There are a couple disadvantages to this:
- We rely on GitHub for creating these download URLs. We rely on GitHub that the produced tarballs files won't change sometime in the future (and cause the hash to change). My guess is that there is only a very small chance that we would ever be negatively affected by this.
- We don't support alternative hosting sites (like GitLab). See the below **Alternatives** section for some ways to work around this.
### What format should the hashes be in?
This PR suggests using [SRI hashes](https://www.w3.org/TR/SRI/).
Hashes look like: `sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY=`. The format is `HASH_FUNC`, raw `-` byte, `B64_HASH`.
The benefit of SRI hashes is that there is a document we can point people to if there are ever any questions about this. (It is also natively supported by Nix, which is convenient for me.)
I don't know if there are any real drawbacks to using SRI hashes. There might be some people that prefer base-16-encoded hashes rather than base-64-encoded hashes.
## Alternatives
There are a few alternatives to some of the design points. This section lists various alternatives.
### Hash something other than the tarball produced by GitHub?
The above section suggests that we hash the archive tarballs produced by GitHub.
For example, take the `zipperarray` package:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
This PR suggests we hash the following tarball: `https://github.com/jamieyung/purescript-zipperarray/archive/v1.1.0.tar.gz`
We could instead download the Git repo at `git@github.com:jamieyung/purescript-zipperarray.git` and hash its contents.
The problem with this is that there isn't a single method for hashing a directory of files. There was a [PR](https://github.com/purescript/package-sets/pull/426) that tried to implement this using Nix NAR hashes, but it didn't end up going anywhere. There is also an existing issue about adding hashes to the package set: https://github.com/purescript/package-sets/issues/32.
### Future-proof hash types
One of the big disadvantages of this RFC is that it is relying on GitHub-produced tarballs. One alternative could be to make the hash style more flexible. For instance, instead of:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
The hash could be specified like:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hashes =
[ { hash-type = "github-tarball"
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
}
]
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
This could allow someone to come around in the future and add support for more hash-types. For instance, if someone wanted to revive [PR 426 for adding NAR hashes](https://github.com/purescript/package-sets/pull/426), all they would need to do is modify the CI script to produce an additional hash. Then the package-set might have packages that look like this:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hashes =
[ { hash-type = "github-tarball"
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
}
, { hash-type = "nar-hashes"
, hash = "sha256-ezF7yq/V/CkzbRVhdM45MV4Ig2aZTCj7wYwgz9cYrqw="
}
]
, repo = "https://github.com/jamieyung/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
There are two different hash types here. Consumers of the package set could pick which hash they want to use.
This could also be extended to other hosting platforms other than GitHub. For instance, if the package set ever gets packages from GitLab, we could have GitLab specific hashes:
```dhall
{
zipperarray =
{ dependencies =
[ "arrays", "maybe", "prelude", "naturals", "strictlypositiveint" ]
, hashes =
[ { hash-type = "gitlab-tarball"
, hash = "sha256-fP5dDqsc3/rjZ+WTKFrfIGzuwFnmSQLQg3o+NC17jpY="
}
]
, repo = "https://gitlab.com/someone/purescript-zipperarray.git"
, version = "v1.1.0"
}
}
```
### Separate package sets
If there is a lot of opposition to this proposal, it would still be possible for someone to create a repo that provides package sets identical to the ones provided from this repo, just with additional hashes added.
For instance, I have a proof-of-concept package set with hashes here: https://github.com/cdepillabout/package-sets/blob/0f23b4cfc2e9a0af5b3f5f78fcc10b3ba57a14eb/packages-with-hashes.dhall
This package set was created semi-automatically, but it wouldn't be difficult to create a small git repo with CI that does the following:
- Watches the releases on `purescript/package-sets`.
- Whenever there is a new release, take the newly uploaded package set and generate hashes for each package.
- Upload this package set with hashes to this alternative repo.
End users would then just change their `packages.dhall` file to point to the package sets on the alternative repo.