Hey, PS community! Recently I’ve been pondering on the issues related to general PureScript package management and working on the proposal that may help to resolve them, now it is ready to be presented. It is a bit wordy, but I tried to give the rationale behind the suggested ideas in detail. I want to thank some members of the core team that had conversations with me on the topic and helped me to come to certain conclusions. Have a nice and thoughtful read!
The goal of the current proposal is to suggest and put forward a solution to problems that relate primarily to the package management part of the Purescript ecosystem. Those issues are mainly addressed in the GitHub main repo issues and concern the compiler package awareness and problems with module’s namespaces in consumed packages (e.g., duplication of module names).
This proposal assumes re-thinking and re-evaluating the current understanding of a packaged entity, its relationship with included modules and publishing infrastructure. These changes if accepted and implemented should presumably lead to a more clear and concise design of packages, more simplified consumption, and management in general.
The proposal does not contain any particular low-level implementation details, but the higher-level overview of the proposed model and justifications for it. It is absolutely open for changes and corrections.
Packages and module namespaces
A package should be an encapsulated module namespace like Data.String or Some.Platform. A single package includes and exposes only one namespace. This namespace should be the name of the package and should serve as its primary identity.
Package may include and expose nested namespaces. Say package Data.String may include modules Data.String.RegEx and Data.String.NonEmpty and in this case even having Data.String module file itself could be not required, because nested namespaces should be available when just importing parent namespace: for example if we
import Data.String as S then we get access to the nested
S.RegEx.test for free. I believe this should be enabled by convention, and it allows to implement what has been discussed without adding extra re-export syntax and will expose only what should be exposed according to the structure for a package.
A package has a version. Packages may depend on other packages with particular versions or version ranges (nothing special about versioning here). A package is fully opaque for a consumer, which has no access to its dependencies or internal implementation but only to an exposed namespace.
This approach with putting module namespace in front forces to think about module names and namespaces hierarchy in the first place. It gives a much more clear notion what are the package purpose and author’s intentions. And this definitely should lead to better design decisions in general and will help to establish more order in the ecosystem through better standardization and categorization that will be partly embedded in package names.
While the current approach when the actual relation between package naming and inner module namespaces is absolutely unregulated and hidden only leads to additional cognitive load, weird and poor design decisions, issues like module names duplicates, unrelated namespaces, and modules in the same package, bloated interfaces.
In this changed paradigm, package authors need to start thinking about module names and take them seriously. Modules should not be “just a bunch of files”, and this attitude should be conveyed throughout the ecosystem. “How do I better name my package (what namespace should I use for it)?” should become a common question for package authors in the community.
In more abstract terms what we get with the proposed change on packages and modules is:
- Removing what is not needed not losing anything (made up package names).
- Hiding what should be hidden (inner implementation details).
- Exposing what was concealed but should be explicit and visible (module namespaces).
- Decomposing what was compound and complected (bloated packages).
It all may seem to be a big or significant change but actually it is not: it is just restructuring of the packages and removing unnecessary and label by replacing them with a single exposed module namespace. But the impact of it may be huge and positive.
In the next paragraphs, it will be shown how to solve the issues that arise (or become more explicit) with the new approach.
The registry should be divided into user namespace (or domains, or scopes), this would allow users to have a place for packages they own, and where they don’t mess with others. A user may publish their one’s packages without any particular limitations.
Such namespaces can be used not only as personal dev accounts but for teams and groups as well. This username-based organization can also help to communicate better the kind of package origin, level of trust, etc.
Important user namespaces like core, contrib, etc. should be defined and taken. Namespaces should be immutable and treated as a constant secondary part of package identity in the ecosystem, they can not be discontinued or changed by a user’s desire.
Why is is this important to have user namespaces? In the proposed design this is actually necessary. Currently developer’s freedom is achieved by allowing to come up with an arbitrary name for a new package, and inside this package, an author may place whatever one wishes. For example, a user decides to publish package “purescript-task”, then another user with a similar or related idea would like to do the same, but “task” is taken, so the user has to come with some other name: “taks2”, “better-task”? Maybe one could come up with and publish “purescript-work”? That looks really weird and clumsy, definitely a source of ambiguity and miscommunication and probably even a point of frustration for people.
I wonder why would anyone want to have such an explicit point of conflict and to be engaged in additional problems and resolution processes like package name squatting, name reassigning, disputes, reservations, etc. I see a registry system of that type without user namespaces as an app with a mutable shared state at scale, the more actors involved the more obvious and painful it becomes**,** and this feels wrong.
Another type of communication for packages authors should be encouraged: if it is necessary to resolve some module’s namespaces conflicts in codebases (see bellow) or collaborate, and it will be a whole different level of discussion, not just “he took my package name”.
User namespaces solve problems and create future design opportunities, I’ve been advocating them in general, but as it has been saying it is a necessary part of the proposed approach and this makes it even more valuable because there is no need to make a decision on this in particular.
From the implementation standpoint introducing user namespaces should not require much work or any significant efforts. We should just add the username to the package name, and treat it as a package identifier when discovering a package. It is definitely should not be seen as a complication because it is not (we anyway already have a username that is coupled with a package), but a simple solution to the identity and uniqueness problem, without shifting this onto a user’s head.
Continuity, unity, and stability of package ecosystem are achieved on another level by other means like package-sets and relying on packages from trusted users/teams rather than just squashing everything into one big pile of names. All people/users can not be headed in the same right/better direction but they should be given freedom and space to experiment with their own ideas and approaches without interfering with others.
With having introduced package-sets and relying on them as a standard it is actually possible to get the best of both worlds: one part of the system where have and need namespaces and another part where we have a flat conflict-fee set of packages, without compromises and trade-offs.
There should be bundles of packages. A package bundle is an entity that just holds a list of packages (potentially with version ranges) that are included in this bundle. Different bundles may include the same packages so packages are fully decoupled from bundles.
Bundles have simple names (lower case, like package names currently) and are published inside user namespaces, but can include packages from any user namespace, as long there is no module namespace conflicts.
For example, prelude bundle could include packages that are needed for starter: Prelude, Effect, Effect.Console. And node-http bundle may include Node.HTTP package, as well as Node.Process, Node.Path, Node.Url, etc. to help in fulfilling common tasks related to using http in node.js. There maybe even bundle node-all or something. Or a user may always choose a granular approach if needed and install individual packages instead of using bundles.
Though bundle is an immutable registry versioned entity (as it contains a list of packages with version ranges), a consumer user should not depend on them. It should be just a helper thing to simplify the installation of a bunch of modules that are supposed to be useful in achieving consumer’s goals. It may also be involved in uninstalling/managing tasks, but no direct tight dependency with a codebase should be engaged.
A package is bad if not in the package set. Maybe just not good enough yet.
Package sets are great. This approach obviously should be a foundation and de-facto standard of package and dependency management in the PureScript ecosystem. It ensures stability and consistency, ease of dependency management across the ecosystem and in user codebases.
Generally, package authors should be encouraged to add their packages in the set, but there also should be an emphasis on the considerations when users should strive to get on board and when it is worth to stand by for a moment.
The package set should be a place for useful and commonly/widely used packages with well thought out design, structure, interface, and documentation, clearly communicated purpose and intention. It should be not just a pile of whatever exists (and accidentally compiles). The registry with user namespaces makes it easy and unobtrusive (to other users in the ecosystem) to present to the world one’s creative efforts. Anyone can consume any packages of any user. And if a package really in demand and other packages and users want to depend on it then it is worth adding it to the set.
Managing of official and standard package sets can be greatly automated, but it still would require manual efforts that should be spent reasonably.
Adding a package to the package-set should be a point where users are helped with the analysis and review of the design of their packages. And it should be aligned with core community vision and understanding what and where should go. For example, if a user wants to create a module for working with a certain database and share it, he/she should be guided on what package namespace is better to use, etc. Developers should feel responsible for what they are doing and especially for being accepted in the package-set.
Dependency management and conflicts
In a codebase (compilation target) there should be no conflicting module namespaces (packages that expose modules with the same name). This is known as the flat dependencies model and this is the currently accepted (and only possible) way of doing things, and I’m quite sure this should not be changed.
With implementing current proposal names collisions and duplications should be less likely to happen because the potential surface for a conflict (per package) becomes smaller. Another and the more important reason is that a package name based on module namespace will more clearly convey the intention, and conflicts with the existing ecosystem should not be the intention of an author.
But a situation where a user needs a conflicting package may happen: if one needs to use a package with a conflicting namespace, or another version of the same package. To deal with this we should either invent some sophisticated way to make it possible to compile a code that references to the same name of the module but from different packages or propose a workflow that users can use to achieve one’s goal without introducing any complications for a compilation process. The latter option is definitely more simple and appropriate.
A concrete viable option, in this case, seems to be the following: if a user finds oneself in a situation where he/she really needs a conflicting package, one should fork it, change the package namespace, update deps if needed, publish it in one’s own user namespace and then add it in the project a new non-conflicting package. The workflow may be variable and for basic cases could be even fully automated, but should not disclaim the responsibility from the user.
This option seems to be is absolutely viable because conflicting situation should not be seen and treated as normal. If you need an older version you should then strive to eventually update to the newer version (supported by the package-set). If you need a newer version that is not currently in your package-set you will eventually update the whole of your project to the newer version. If you need a conflicting package from another user, you should somehow try to resolve this issue with the package author, which should understand that his or her package conflicts with the stable/another part of the ecosystem. Demanding “out of the box” flexibility in this area from the compiler will potentially lead to much worse problems for a user in the long run making one’s codebase more fragile.
The compiler package awareness
With all proposed above the compiler doesn’t seem to require a lot of or really complicated changes. If currently, the input is a list of globs of *.purs files, in addition to it the compiler would need to:
- know a list of direct dependencies available to a compilation target to expose only allowed modules and hide from it transitive deps.
- understand where is a package (by locating a package manifest) and be able to expose a single namespace from the package.
Another additional complication would be needed if a single package will be treated as a compilation unit. But it all this seems to be quite straightforward and potentially brings no breaking changes to the consumer side with properly updated metadata.
As for the deal of building against pre-compiled dependencies, I don’t see the reason why such packages could not be potentially precompiled by an appropriate version of the compiler and used in end-user projects.
Here is a kind of high-level layout of what should be accomplished to bring proposed changes to life, details should be devised and corrected if the proposal is accepted in general.
The registry. This is probably the biggest part. The registry is still currently in the planning and intermediate implementation stage, so in addition to planned features, it would need to support: user namespaces, package bundles, and automatic versioning via analysis of package API changes (which I think should be a strong point because it is very important for providing consistently of the ecosystem).
Package management tools. This is tightly coupled with the registry implementation and requirements.
The compiler. As discussed above, the compiler should be updated to deal with and expose packages. This doesn’t seem to be a huge change.
Redesign of core/contrib packages. This is too quite a big task that would require significant efforts from the core team. It will probably require organizing new repositories (monorepos), transferring the code, etc. This should be planned beforehand and started after the registry implementation achieves some stable and working stage. But it is important to understand that this transition actually can be planned and made smooth and gradual, because there is no reason why the compiler and tools could not support both kinds of packages, the old and the new at the same time.
New version of Pursuit. This change will require creating a new version of the Pursuit site designed around the new package vision. It can be made more ergonomic and API-oriented, this is already discussed.
Ecosystem and users transfer. After the registry is ready, compiler and tools are updated, and core/basic packages are published, then users may start to transfer their packages and projects to new the version of the compiler and package infrastructure, from the users perspective there two kinds of updates:
- Upgrade the metadata for consuming new packages infrastructure. This can be mostly automated as old packages (which are bunches of modules) could be translated to new packages in a quite straightforward way.
- Update to PureScript source code, which is affected only by namespace/module names redesign of modules which is happening with each new compiler version.
As for package authors, I don’t think that there can be made anyway for “automatic” transition, most of the existing packages have to be restructured into a new form and published in the registry.
This also implies that the compiler and tools should facilitate the transition by support both old and new ways of installing and compiling dependencies, I believe it is totally possible as under the hood we are still dealing with separate module files. The point of all this is to re-construct the notion of the package entity which is still a pack of related modules but more granular and concrete.
In the paragraphs above we were discussing the benefits of the proposed model and some drawbacks of the current approach. But to conclude we need to answer another two questions:
- What would be the cons and trade-off of the proposed approach if accepted?
- What would be the benefits of staying more with the current approach?
As for the first question, among points of concern, I would notice the following:
- It requires a more granular approach to development. There will be more independent packages and it will be not viable and efficient to use for each separate git repository. This will require embracing monorepo dev practices for closely related groups of packages. In my practice, monorepos are great for reducing complexing and eventually making the management of related releases easier.
- It will also require a more granular approach to dependencies and version management (for some packages). For example, if now we have a package prelude which includes multiple modules that are developed, versioned and published as a single unit and that may feel easier to manage. With the proposed approach there will be multiple packages Prelude, Control, Data.Eq, etc, and Prelude package will depend on them if one of those packages is updated and this may require also updating Prelude package. But there wouldn’t be many packages like this with a big amount of dependencies which previously were in a single package. This is I don’t see this as a complication, but as sophistication that can expose hidden coupling between packages. In terms of dev experience, a monorepo approach discussed in the previous paragraph will definitely help do deal with this.
- It requires more breaking changes to come, maybe. Though I don’t see how those changes may seem to be intimidating for the language and tools implementers or disruptive for the end-users, at least there are all the means to prevent it. And I doubt that it even can be called a significant breaking change because even if the language and core libraries may be considered as a mature part, the package management part is far from a satisfactory state and any enchantment to it should be considered and perceived as beneficial and not breaking at the current stage.
As for the second question:
I don’t really see any strong motivation and reason to stick with and promote the model based on the current approach with contrived package names and arbitrary modules included, and a flat repository of those package names because it brings with it so many problems and complexities already exposed and discussed.
In my view, the main reason why this problem still didn’t get actual progress is that people intuitively feel that this primordial status-quo situation with a flat ungrouped list of package names and compound package entities along with the simple and lean Haskell style module system just doesn’t have an elegant resolution if the goal is to have a robust, flexible and stable package ecosystem.
I’ve seen people discussing much more radical and breaking ideas for the module system, which are still around packages in the current understanding. So I believe it may be hard for people to imagine that so “familiar packages” with fancy composed names where one can put anything “that is needed” are not here anymore. People will need time to consider and ponder on it. But keeping up with the current approach to packages will only bring more problems in the long run. The Haskell ecosystem is fighting with the mess and trying to invent some workaround solutions but they can not allow to introduce more or less radical changes and diverge from bad design decisions made in the past, or it is much harder for them to think about it. I believe Purescript should strive to make more correct and eventually right decisions in all the areas and try to avoid mimicking the bad parts of Haskell’s traditional approaches.
Another reason there may be comments like: “This will require a lot of work and resources we don’t have.” - so we are trying to save time and energy and keep ourselves on the taken track. This could be a valid and fair argument, but only in one case: if the path is really chosen, everyone involved in making the decision agreed and there is a perception that it is really right and correct, then we do need to change anything.
But if we have some hesitations and unresolved questions, we probably should deeply consider the alternatives. In this case, we should answer the question “we don’t have resources for what”? We need to answer it with specifics, we should estimate whether the alternative path of change really requires more resources and work and give less or nothing (maybe it just takes and hams), or it just requires some amount of mental and emotional energy to make a switch and get on the right track eventually.
In any case, I believe people, esp. those who are responsible, should think in the first place about the needs of the community and future of the Language and the ecosystem, and only in the second place about their personal preferences and self-regarding aspects.
Nevertheless, I also assume that I may be failing to observe some significant and important drawbacks of the proposed approach compared to what is happening right now and where things are going. So if one should generally disagree with the proposal I think it is reasonable to answer those two questions when formulating and presenting the judgment. It really would be nice to see and hear more concrete technical and design considerations about maybe invalidity, irrelevance, or difficulties with implementation of proposed changes, but not just general words.
Related links to consider
There are some links to the posts I found resonating and inspiring, actually, I found many more resources to be useful in composing this proposal, but these are just a few:
- This is an insightful and beautiful comment by joneshf about a fundamental issue of the relationship between packages and modules that pushed me to think further on the problem.
- There is a comment by paf31, which proposes that each package like “foldable-traversable” could have its own conventional namespace like “FoldableTraversable”. So why not just get rid of the former name if possible? It also says about restricting conventions for module namespaces which are not discussed in this proposal because it is impossible to implement such an idea with the current design of namespaces.
- Discouse hdgarrood’s comment on user namespaces/scopes and also hdgarrood’s blog posts on package management in general.
- The post of a haskeller that talks about the flaws of having “Internal” modules conventions and that each module that presents its own functionality should be shared as a separate package.