Proposal on PureScript package management

wclr · April 5, 2021, 6:31am

Right, Chart is just a typo (too much work with different charts affects). @hdgarrood proposed that Data.String could become just strings (a package name) when importing (import strings as String). And when importing module that is currently Data.Char it becomes strings.Char I said that such breaking of imports semantics can barely be acceptable and should be definitely avoided.

If to follow the proposal I presented here, there is no changes to the module system or module names assumed (at least they do not follow from the proposal idea itself), but if we want to have module Data.Char in the same package as Data.String, we should it have as Data.String.Char inside Data.String package. Otherwise, we may have two independent packages Data.String and Data.Char which present different module namespaces (functional domains).

The whole idea is about lifting the module namespace (functional domain) to the level of the package.

wclr · April 5, 2021, 10:32am

Thanks for joining! At least for appearing in the discussion -)

The fortitude appears when you struggle for making the right things happen.

Of course, when you dealing with a mess such things becomes indispensable. But that is because Haskell in many ways failed to create the stable and ordered ecosystem of packages (and most of the languages fail). And one of the root causes for it, this is my claim, is that a package itself is a potential mess because is allowed to contain anything (any module namespaces). And the value and proposition of module names that should represent a strict functional domain are not conveyed throughout the ecosystem to the users, but were just hidden from the eye. I even believe that the understanding of the importance of module namespaces as functional domains was not so clear to the authors of the language and its module system, they did the right thing but didn’t understand what they did (which is normal). Otherwise, they should have not allowed existence for such a complected and messy package entity that conceals functional domains from the user under some contrived label, which is deceptively perceived as something that helps the system be more simple and communicate better.

Another reason is of course is the lack of consciously directed efforts to convey the right attitude to this issue (there are some guidelines about naming but not enough efforts to promote them, and no way to impose the right things). So what I propose is to try to fix this mistake, first of all technically when those names will be exposed and visible, and there will be much less motivation to ignore the naming conventions and introduce conflicts in the ecosystem. This is one of the main reasons and purposes of the proposal.

garyb · April 5, 2021, 11:00am

Okay, I understand what you’re getting at with that now. I think it’s definitely a reasonable suggestion, but it’s not quite as easy as it sounds.

For instance, I’m not sure the Char example is necessarily the best case for this, because there’s a philosophical argument that could be made about it being perfectly valid for Char to be a sub-module of String.

Perhaps a better example would be Data.Map and Data.Set residing in the same package currently. But then there’s an argument that they belong together there too, since they’re both “ordered collections” - but is that a way people actually think about them? I doubt it, it’s basically an implementation detail.

Saying packages need to correspond to a single “functional domain” doesn’t really make it any clearer than the way things are now, since that terminology probably won’t mean anything to some people, and it likely is going to be interpreted differently by others too (like I hinted at with whether Char is part of the domain of Strings).

The Map and Set example also raises the question of how to deal with things when there are circular relationships between modules that currently necessitate a Types / Internal module, but that should be in separate packages. I very much dislike it any time we have to introduce one of those modules, but it’s the least-bad solution to a problem when it arises, do you have any suggestions for how that kind of thing should be resolved? Duplicating implementations might be the most correct solution from an aesthetic/principle point of view, but I think it’s unacceptable for other reasons.

wclr · April 5, 2021, 12:32pm

A package represents a functional domain or part of what we can refer as functional domain (for example Data it self is a functional domain of a higher level). If it is decided that for users it is better to have an attribute to Data.Char namespace rather than Data.String.Char, then we just release separate Data.Char package (which Data.String may depend on). As I said this proposal is not about re-structuring the functional domains or renaming and changing module namespaces, but about re-evaluating what is a package and what it should include based on another more clean and concise model (which may be referred as functional domains). Things will become cleaner because we are starting to talk about them more explicitly.

I see actually no problem with this. One of the main simplifications that should follow after the proposed changes is the significant decomposition of existing bloated packages. Once we decompose them in a more strict and mathematical manner (based on functional domains) we may compose them with more ease. And this decomposition should definitely expose hidden dependencies etc. It may seem at first that this “splitting apart” will introduce new difficulties as there appear more smaller packages, but those packages are simpler pieces that can be more easily combined together as needed.

As for the case with Data.Map and Data.Set there would be two options:

the first if it is perceived and decided that those two functional domains are actually subdomains of something more powerful domain say Data.Ordered and should come together then we should make a new package Data.Ordered, include there both, and access them with Data.Ordered.Map and Data.Ordered.Set, though I assume it is not the valid option for the case.
the second option is to have two separate packages Data.Map and Data.Set, but they have common code, repeating oneself is definitely not an option, that means we should introduce a new package that exposes commonly needed functionality, which those two will depend on. This one can be named Data.Ordered or Data.Collection as it seems more correct, and this package could be a basis for some other packages as well. There is a good article on this matter I found useful when preparing the proposal.

As for combined referencing to packages I proposed the idea of a package bundle. This is very simple entity that is just a list of packages that come with it. Packages are fully decoupled from it, and the user codebases should not depend on it. It is just a means to distribute packages in a chunk. For example, we could have a bundle named ordered or collections (name ordered-collections seems just a little bit complicated to me) and include in it needed packages Data.Map, Data.Set, maybe also something like Data.List if appropriate. Bundles are here to replace bundling functionality that is baked in packages in their current understanding. There is a section in the proposal on package bundles.

hdgarrood · April 5, 2021, 3:19pm

Can you please define “functional domain”? It’s still very unclear to me what you mean by that, and how the packages that currently exist in the ecosystem fail to describe “functional domains.” Please be specific and give examples.

wclr · April 5, 2021, 6:00pm

Functional domain is another naming for module namespace but that brings a deeper notion and communicates kind of application boundaries of contained functions. This is just a term I came up with during this discussion, I think it better describes the meaning and importance of bringing module namespaces to the top.

So I would define it as a specially designed module namespace that helps in achieving of two tasks:

communicating application boundaries, i.e., give a notion of the source and the purpose of the namespace and its usage.
avoiding potential conflicts by introducing a hierarchal structure, which forces authors to think about why the package with its modules should be placed in a certain hierarchal order.

With current package names to communicate package intention, you eventually do one of these things:

repeat contained (main) module name, in this case, we often get a kind of hyphenized lowercase slug of module namespace.
summarize and categorize in some other way directly not related to contained module namespaces. For example “ordered-collections” that contains modules Data.Map, Data.Set. It is not a functional domain (as described above) it is just a description of some common categories that things we decided to place in the package belong to. And in the general case, we can not practically place all the worth noticing categories in one name. In this sense, the name ordered-collections is also the example of some kind of complicated label.
communicate poorly or miscommunicate, so that the name partly repeats the module namespace, partly does something else, because of different reasons including that the name the author wanted to use was already taken.

So I believe it has been already said much on what is wrong with package names in the current form and packages as entities that are compound bundles, which obscure contained modules and make module name conflicts much easier to emerge eventually.

And it is a good moment to emphasize another point: users use modules by their names, they get known module namespaces eventually and remember them well because constantly see them in their code. They don’t use package names as often, they don’t see or type them often, they forget them because they are not needed while the work, and it only brings additional headache and pain when they have to return to those names.

toastal · April 7, 2021, 1:54pm

In general I do like the discussion here. One big question with namespaces is where does one register or how are the namespaces assigned? Maybe this is something WebFinger can solve?

The last thing I would like to see is a situation like Elm’s where you must have a GitHub ID and a public repo on that platform to publish packages … or should we even being required to use git really. When GitHub has gone down – and it does – the whole Elm ecosystem falls over. But also if a user has a preference or objection to the chosen service or source management tool, they’re required to this lock-in in order to participate in the community; and I’m not a fan of this.

f-f · April 7, 2021, 2:09pm

I feel that introducing namespaces is one of these things that sound awesome at first, but when looking closely one finds out that it might create more issues than it’s meant to solve.
We had to think about this as part of the new Registry design and I summarized here why I don’t think we should have them, touching also on your question of “how do we assign namespaces” (TLDR: I think it’s not possible to do it in a fair way)

Adrielus · April 7, 2021, 2:36pm

Imo the lack of orphan instances makes forking packages a lot more appealing. From my experience, “I’d rather fork purescript-debugged than go through my entire project and add newtype wrapping/unwrapping”

toastal · April 7, 2021, 2:43pm

Yeah, I think it’s a hard problem to do identities. The best I could come up is either having a independent and governed service specific to PureScript, or something completely decentralized which may accidentally take the burden of entry unnecessarily high. I still think there’s a lot of good merits as people mention though, but a hasty solution could be bad. If I wanted to publish GPL PureScript project on GNU Savannah with Mercurial, that should a valid option.

hdgarrood · April 8, 2021, 8:08pm

Ok thanks, that makes a bit more sense.

I think we ought to consider the practicalities here too. As @garyb said (heavily paraphrasing), the reason ordered-collections is like that is because the alternative is publishing 3 separate packages (maps, sets, ordered-collections-internals), and this is more awkward than publishing 1 package. It requires more work on the maintainer’s part (more dependencies and bounds to specify, more changelogs to maintain, more releases to cut), on the package consumer’s part (it’s harder to audit the dependency tree), and on the package set curator’s part as well (more packages to curate). If a consumer tries to install the package with a solver, the solver has more work to do and more opportunities to get it wrong, leading to frustrating dependency hell problems. Small packages are nice in theory but in practice they can be a pain. It’s largely because of these issues that we are starting to move in the opposite direction in core and put more things into packages together. For example, a couple of releases ago we merged monoid into prelude, and in 0.14.0 we merged generics-rep into prelude as well. I fully believe this was the correct decision and I don’t see us going back on it any time soon.

I also disagree that package names should be hidden away. I think they are very important, even if it’s relatively infrequent that users will type them out. I think it should be immediately obvious which package a module belongs to in source code so that it’s easy for users to eg find the package on Pursuit, or find other resources for the package elsewhere on the internet.

mhmdanas · April 9, 2021, 4:17pm

I’m just throwing out ideas here, but maybe we could make VSCode’s PS plugin show the package a module comes from before it somehow? Something like this:

import {- "prelude" -} Prelude

The comment would not be actually present in the file, but shown visually by the editor (what would this feature be called?).

I have no idea how feasible this is, and I realize that this has the potential con of causing more dependence on the IDE/editor, but I thought I’d at least say it.

wclr · April 9, 2021, 4:38pm

I’m talking about in many ways whole different approach to packages. There would be no packages with any strange names. There would be Data.Set, Data.Map, Data.Collections (for example), and it is not “internal” or anything. Packages are decomposed in quite a strict manner to make things simpler, less complected.

Smaller packages have a smaller number of dependencies, they need to be updated less frequently. And we are talking about minimal sufficient and necessary package composition. There wouldn’t be even questions should we merge those packages or not. If they need to belong to the same functional domain namespace, well yes they may be (or should be) merged depending on simple composition considerations, if namespaces are different we have different packages. Many of such subjective level questions are just removed.

Simplicity is hard, and it is really so, this discussion is another good illustration for the statement. Why not merge argonaut packages into a single one? You merged generics-rep because it didn’t make sense (from the perspective of the end-user) to have it as a separate bundled thing marked with just another made-up label. You turned prelude into a kind of big bundle. I’ll repeat what I said here above - there would be no need to make such decisions with the proposed approach. If modules have their own namespace it is another package - that is simple. But the problems with namespaces become more explicit, there is no way to sweep it under the carpet anymore.

You are trying to solve different package composition issues within the existing paradigm, and solutions are obviously not refined and universal. Maybe there is a place for just a more radical approach that solves the problems on a different level?

According to the current proposal, package names, in their current understanding, are not just to be hidden, but they are to be removed completely as an unnecessary label, which may communicate poorly and make little sense, I explained why. And packages in the current form are dismissed as an excessive level of bundling.

This is quite immediately obvious that module Data.String belongs to Data.String package, as well as module Data.String.RegExp, there may be variants because there maybe a separate package Data.String.RegExp, which exposes this namespace, but it’s still very communicative. That is what I’m talking about. And to achieve the same with the current approach you need to apply a brute force, break the module system, and make it clumsy, as I too explained above. Yet with the current approach to packages and modules there is no other way, though I would prefer to leave the things as they are than to break the module system.

But I propose the alternative here that is different, but more elegant and simple in the end. This is unsurprising that these ideas may seem unclear and suspicious without deep consideration. But your questions and doubts do really help to refine and clear up things for me as well.

wclr · April 9, 2021, 6:46pm

This definitely could be accomplished. And not even with comments, but with a more sophisticated form. Vscode has such capabilities to display additional info above a certain code line. It could even contain the link to the documentation or something like that. Other editors probably should have means for such extensions. Anyway, more simple forms of interactions (tooltips, context action menus) for working with module imports could be made accessible by the tooling.

Though what is important to understand here is that the exiting module system is abstracted from the packaging system (as well as from the file system), and this is a good thing and should stay the case. Why? Because abstracting makes underneath changes possible. But if a package name is just a module namespace (as proposed), we already communicate the package source quite clearly in the code, but we still stay abstracted and decoupled from the way packaging works.

wclr · April 11, 2021, 2:53pm

Posted a comment on GitHub related to this discussion (on user namespaces), just to have a crosslink.