Unbiasing the Semigroup instance for Map

The upcoming compiler release is the occasion to batch breaking changes to the core libraries and for the occasion we’d like to unbias the Semigroup instance for Map k a.

This is a breaking change though because singleton k [a] <> singleton k [b] currently evaluates to singleton k [a] (the instance is left-biased, it prefers values from the left when there’s duplicate keys in both maps) but it would evaluate to singleton k [a, b] instead should we unbias the instance (and require a Semigroup instance for the values).

There’s three reasons motivating this change:

  • The Semigroup instance for Semigroup a => Object a is already unbiased. We could bias this instance instead but I believe an unbiased instance to be more useful, especially with the appropriate combinators, see next points. As @garyb noted in https://github.com/purescript/purescript-ordered-collections/issues/36#issuecomment-733753621, an unbiased instance is also consistent with the instance for Semigroup a => Maybe a.

  • We can recover the behaviour of a left-biased instance with First or even a right-biased instance with Last. The unbiased instance is more general and we can get wildly different behaviours by varying the underlying semigroup.

  • Converting some Map k a to Map k (First a) by mapping over all the values would be inefficient but we can safely coerce between those types now that we have Coercible. We can also greatly improve the ergonomics with a newtype and the combinators from Data.Newtype (see the issue).

There’s two ways to proceed if we decide to do this:

  • We can update the instance with the next compiler release. This means that some programs will silently change meaning, but at least this will be over in a single major release.

  • We can embark on a multi-year journey: deprecate the existing instance in 0.14, remove it in 0.15 and then add an unbiased instance in 0.16. This is safer but will take much longer.

That’s why we’re asking the community! Please vote to let us know what you would prefer.

  • In favor of updating the instance in 0.14
  • In favor of deprecating the instance first
  • Neutral
  • Against

0 voters

1 Like

I voted in favour of deprecating the instance first, but I’d actually be fine with just dropping it and then reintroducing it in 0.15. That’s no more breaking than updating the instance in 0.14, but at least in that case it won’t be invisibly breaking at compile time.

4 Likes

I don’t see why adding the unbiased instance shouldn’t be in 0.15. Is there any reason for that?

I voted for deprecating, but I’m in a similar boat as @garyb. We use the existing instance pretty extensively at work, so I’m hesitant to silently change it, but I don’t think it’s necessary to stretch a process out over so many compiler releases.

One option, absent tooling, would be to have an ad-hoc, opt-in “upgrade” release for the library which includes a warning which notifies people at each usage. For spago users, this could be added as an override over the 0.14 release, and for bower users it can be added as a resolution. Once the upgrade is complete, the override can be reverted. I mainly just care that there’s a straightforward option to help out rather than silence.

1 Like

This risks breaking existing code for little reason, in my opinion (and splitting it over two releases doesn’t help with the case where someone jumps two major versions at once and is then surprised by the new instance).

Each of the reasons given has an equally-valid counterpoint: Object’s instance being unbiased has no impact on the validity of bias in this instance (consistency between instances is less important than surprise changes in behavior IMO); we can use First to recover the current behavior, but we can also use a newtype to get the desired behavior - this is easy and people are already doing it; finally, the Coercible trick applies just as well to this newtype, so it is also potentially zero cost.

I’m also uneasy about the breakage, but I’m not completely convinced by your first counterpoint. I agree that consistency between instances is less important than surprise changes in behaviour, but for me, the most compelling reason to do this is not consistency with Foreign.Object, it’s just that it’s a better/more useful instance.

1 Like

I also want to add a disclaimer: PureScript is not a democracy, so I don’t think we should commit to doing whatever gets the most votes. The results of a poll are a useful signal, but I think it’s also important to note that the people who are answering this poll are probably not a very representative sample of the PureScript community as a whole. For example, I think it’s possible (perhaps even likely) that there are a number of people who would be strongly against this change were they aware of it, but who are not aware of it because they do not follow this forum.

5 Likes

Some prior discussion on this subject from Haskell Cafe which might be worth revisiting.

I’d also urge you to consider the impact that a package-sets-based workflow has on people upgrading across compiler versions. It’s not realistic to expect people to upgrade and then check the release notes of every package that they use which was upgraded in the set. It’s also not realistic to put together a single set of release notes for every package in the set.

So for something as core as Map, where the expectation is that these things should break very rarely if ever, I hope you’d make the warning very very loud in the compiler and package set release notes, if you decide to break this. For me, it’s just not worth it at this stage.

I have always found that line of argument (that Alternative and Monoid should agree) very unconvincing. In the case of Maybe specifically, the argument that Maybe represents failure and prioritized choice and therefore the Monoid and Alternative instances should agree feels circular to me. Why do people think Maybe represents failure and prioritized choice in particular, as opposed to some other kind of failure? Because of the existing Alternative instance? In my mind it’s equally reasonable/common to express “try all these things and combine the results of the successes” with Maybe as it is to express “try all these things and take the first success.” Especially since the second is a special case of the first if you understand “combine” as a semigroup operation! It’s actually quite frustrating to me that one of the most crucial parts of that argument - the part that Maybe represents failure and prioritized choice specifically - is just asserted without explanation.

I 100% agree on the point that we can’t expect people to go out of their way to read all of the release notes for all of the libraries they’re upgrading, and that therefore this would have to be warned about very loudly.

2 Likes

From the perspective of composing monoids/semigroups, I really believe the (Ord k, Semigroup a) => Monoid (Map k a) instance is preferable. Admittedly I probably care about composing monoids more than most PS users, but I think it’s really nice to have the right monoid instance available by default, and almost always I find that I need the one that makes use of the inner semigroup instance too.

I actually agree with this post the most from the linked Haskell Cafe discussion:

No, my argument is that Monoid and Alternative ought to have nonsame semantics, since one is a class of types of kind (*), and the other, (* -> *). Thus, Monoid operations ought to mean the whole type, and Alternative operations, just the outer type.

It shouldn’t be a surprise – it’s impossible to put a constraint on the inner type for an Alternative instance, since there is none (^_~)

from [Haskell-cafe] Alternative versus Monoid (emphasis added)

I just realized that we could, in fact, add an instance Ord k => Alternative (Map k) which has the biased behavior. Right? That fits in with the above philosophy I quoted, and it also fits in with the pervasive convention that Alternative is left-biased. And if you want it as a monoid instance to fit in with other APIs, use First or Last – and then the convention of which way it is biased is encoded in the types, so it’s harder to forget.


I actually think the default combination operation should be unionWithThese :: (These a b -> c) -> Map k a -> Map k b -> Map k c, along with append, since they are both unbiased and the most useful semantically. We already have union for biased union (and could add <|> too).

This is actually the thing that other languages struggle with so much: when you want to combine things stored in dictionaries, which are ostensibly monoidal operations, it’s always so gosh darn difficult, because there’s no predefined operation for it.

If you want to keep track of a bunch of counters in a dictionary, you can’t just increment them with append (singleton myKey (Additive 1)), you have to first check if the key exists, initialize it to 1 if it doesn’t, or retrieve the current value, increment, and replace it if it already exists. And all of this logic just obscures how simple the operation is conceptually.

function incrementCounter(counters, name) {
  if (name in counters) {
    counters[name] = counters[name] + 1;
  } else {
    counters[name] = 1; // Why 1?
      // oh because it should really start at 0 but we have to immediately add 1
  }
}

But these languages all make it easy to perform biased unions, because that is the easiest to implement, but it’s rarely the behavior I want. (I always have to double-check whether it’s left- or right-biased.)

The other advantage of unionWithThese is, although it might be obvious already, it enforces at the type level that you are only combining at most one item from each side, whereas unionWith :: (v -> v -> v) -> Map k v -> Map k v -> Map k v leaves the possibility open of multiple combinations or something weird like that. It generally doesn’t matter, but sometimes the difference is important.


Sorry if this is a little ranty, I just wanted to get my ideas out there!

3 Likes

I propose another solution to this problem:

v0.14.0:

  • We reimplement ordered-collectionsMap as a completely new module Data.Map.Unbiased that has the updated instances. Those who want to “opt-in” to the new instances can use the new module.
  • The original Data.Map module’s Map one will have Warn constraints put on its instances that complain loudly about their upcoming change in v0.15.0.
  • we define a function that is effectively coerce between the two data types (i.e. Data.Map k v -> Data.Map.Biased k v and vice-versa), so that the ecosystem doesn’t have to create two things for each implementation.

v0.15.0

  • The original Map instances will be updated to use the “opt-in” version’s implementation of the instances. However, it’s Warn constraints haven’t been removed. Now they say, “These instances are now unbiased and are the same as the instances in Data.Map.Biased.”

v0.16.0

  • The Warn constraints from the original Map are removed. This Map is effectively the same as the Biased Map in v0.14.0
  • The biased Map module is deprecated, users of this code in v0.14.0 are now told to migrate back to the original implementation.

v0.17.0

  • The biased Map module is removed

This approach has the following benefits at the cost of how long this migration approach takes:

  • v0.14.0 users can immediately use the unbiased versions. Those who are currently depending on the biased versions are loudly notified of the issue
  • Those who “jump” two versions are still notified of the issue.

I don’t think there is any way of preventing users from running into this if they skip over major versions. @JordanMartinez with your proposal, people jumping from 0.13.x (or earlier) to 0.16.x (or later) run into the same issue; any code using Map’s Semigroup instance where the values have a Semigroup instance will silently change behaviour. It’s true that skipping over two major version series in a row will be rarer, though.

Rereading through this issue, we seem to have the following parties represented:

  1. Those who have 10,000s LOC backing real production-level apps that depend on the biased instance. Upgrading a codebase to v0.14.0 is already going to be somewhat tedious and this change is going to add to their workload and steal developer time away from other things that add value to the business.
  2. Those who do not have production-level apps and who pay no cost if the change is made
  3. Those who fall somewhere in-between 1 and 2
  4. The unknown group of people who will upgrade to v0.14.0 or some future breaking PS release sometime after this point

Reading through this again, it seems like we have a few options:

  1. do not change the instance and provide a newtype to get the unbiased version
  2. provide some multi-major-release solution that migrates from the current instance to the changed one
  3. change the instance to an unbiased one now

#1 seems to satisfy most people’s concerns.

  • Those who want the unbiased version can use coerce the Map to a newtype that specifically does that. Assuming they do not have a lot of code, that’s not hard to do.
  • Those who want to gain the benefits of other v0.14.0 changes do not have to be concerned about this otherwise “silent” change. Thus, they are not penalized by having a real-world app.
  • Anyone who upgrades to v0.14.0, which has a LOT of other breaking changes, won’t need to remember to read through the release notes of Map. They won’t experience a “silent” change, nor will they need to fix tooling via psa to remove UserDefinedWarnings if we did some sort of Warn-based constraint to notify people of it.

Normally, it is our preference for pushing breaking changes in core libraries when we make breaking PS releases. However, due to the “silent” change isue, perhaps this is a situation where we should just make a new library release that’s a few months after the v0.14.0. Then, the change would be less “silent” because there is less going on and people are more likely to read the release notes. We could also address the package set issue that Phil raised above by creating tooling that shows all the breaking changes across packages in a set from set A to set B.
Lastly, deciding not to make this change now would also give the community (or at least those who have the most to lose by this change occurring) time to develop a tool that could immediately highlight all usages of the original biased instance, notifying them that a decision needs to be made (e.g. coerce the value to Map k (First a)).

Put differently, perhaps this change should not be done until we have better tooling that prevents “silent” changes from occurring.

1 Like

I suppose, now that we have coerce, we can make an unbiased Map module with the exact same API, and each implementation just coerces the corresponding one from the other module. That sounds ideal, to reduce code duplication. (We just have to remember to keep them in sync later.)

Just thought to clarify why I voted against.

I really appreciate how the PureScript community usually deals with mistakes and new insights in that when it is necessary breaking code changes are not avoided. Some people don’t like this, but to me it is a necessity in that without it, the debt will be increasing and it just wouldn’t be possible for both the PureScript language and the library ecosystem to be in such an excellent state.

Also in this case, it feels to me that just changing the code (in any of the proposed ways) would leave the PS standard libs in a better end state, i.e. the unbiased version should be the one to take the Data.Map namespace.

However, this case is interesting in that the cost is so big compared to the benefit that I think it would be better to leave the existing Map unchanged for now, introduce an unbiased Map next to it and then explore ways how in the future these kind of changes can be made with less cost.

2 Likes

Whenever the issue of backward compatibility comes up, I always feel the need for a tool to perform mechanical source code updates automatically. In this case, automatically moving all existing code to use a First wrapper would provide the best of both worlds.

Some prior art - Go provides a go fix tool for this.

Is there interest in building something like this? We at Juspay would be happy to sponsor such an effort.

7 Likes

I would really love syntactic refactoring tools regardless, so I think this is a good direction, but I think this sort of migration is more complicated since it involves working with instantiations of Monoid (Map k a) which would require elaboration.

1 Like

Yes! This has been on my wishlist for a while. Created another topic for this here: Automated upgrades for breaking changes

2 Likes

I’m also on board for making an unbiased Map module with the exact same API. It seems ideal to have this module live alongside Data.Map in the same library, since it’s clearly something that plenty of people would find useful, and it would probably be easier to keep the interfaces in sync this way. That also allows us to make some progress on this without needing to break anything.

The more I think about it the more I like @JordanMartinez’s idea of having one release series where Map does have the unbiased instance, but it also still has a warning saying “this instance has changed”. I think skipping over a major version is probably already rare; skipping over two at a time is probably so rare that I think it becomes difficult to argue that people in this category should hold up the task of being able to provide a better instance.

Automatically rewriting code Map k a to Map k (First a) would be nice, but I think that’s likely to be extremely difficult to do well, to the extent that I don’t see it happening. I’d love to be proved wrong, of course.

perhaps this is a situation where we should just make a new library release that’s a few months after the v0.14.0. Then, the change would be less “silent” because there is less going on and people are more likely to read the release notes.

I’m not sure this solves the issue either, really, because we can’t make assumptions about when people will upgrade. I suspect it’s not all that uncommon for people to only upgrade their package set when updating to a new compiler version. If I update to 0.14.0 and then leave my package set alone until 0.15.0, this change could just as easily get lost among all of the other changes I make during the upgrade to 0.15.0.

We could also address the package set issue that Phil raised above by creating tooling that shows all the breaking changes across packages in a set from set A to set B.

Yes! This would be really good to have too.

1 Like