Some thoughts on type class instances in libraries

maxdeviant · November 7, 2021, 4:44pm

I’ve been doing some thinking lately about type class instances, specifically about how libraries should be providing type class instances for the type classes and types they define.

Currently there are three approaches (that I am aware of) for providing type class instances:

1. The library providing the type class provides the instances

In this approach, the library that defines the type class provides instances for it. For example, the EncodeJson and DecodeJson type classes come with a variety of instances for types that will come up frequently.

This approach works well for ubiquitous types, like primitive values (Int, Boolean, String, etc.), common collection types (Array, Map), and other common types like Maybe and Either.

However, it does not scale well to the total set of possible types that may exist. The library that provides the type class would have to be aware of each type that someone may want to have a type class instance for in order to provide it.

2. The library providing the type provides the instances

In this approach, the library that provides a type also provides the instances on that type. For example, the CaseInsensitiveString could provide an EncodeJson instance to make it usable with Argonaut out of the box.

While this scales better than the first approach in terms of spreading out the responsibility for providing instances across the ecosystem, it comes with the trade-off of each library needing to pull in extra dependencies to provide the type classes.

In the case of CaseInsensitiveString, it probably doesn’t make sense for the strings package to take a dependency on argonaut-codecs to provide an EncodeJson instance, as not everyone using strings is going to need it.

3. Library consumers use `newtype` wrapping to provide their own instances

In this approach, the library consumer is responsible for providing their own type class instances for the types that require them. Since orphan instances are not permitted, this requires wrapping the type in some way (e.g., with a newtype) in order to provide instances for the type.

For example, say I want to deserialize a CaseInsensitiveString from JSON. Since strings does not provide a DecodeJson instance, I now have to wrap the type so I can provide my own instance of DecodeJson.

This is a worse user experience than if CaseInsensitiveString just provided the instance out of the box.

---

What we’re left with is a matrix that looks something like this:

Approach	Scalable?	Efficient?	UX
1
2
3

Legend

Approach - The number of the approach (from the previous section)
Scalable - Whether the approach scales well to large cross-products of type class-providing libraries and type-providing libraries
Efficient - Whether the approach is efficient in the sense that it avoids libraries having to take dependencies that not all consumers might need
UX - The user experience the consumer has when using the library

What we can see is that approaches 1 and 2 provide the best user experience, as the following two statements hold true (at least in the ideal case):

When using a library that provides a type class, an instance is provided for the types I want to use
When using a library that provides a type, an instance is provided for the type classes I want to use

However, this improved user experience comes at the expense of being unable to scale the approach, or in adding potentially unneeded dependencies.

Of these three approaches, the second one of having libraries provide type class instances shows the most promise:

It is scalable, as the libraries providing the types can bring their own instances without upstream changes in the library providing the type class
It provides the great user experience of things working out of the box

Is there a way we can make approach 2 efficient (by avoiding library consumers having to take on additional dependencies that they don’t need)?

---

There is some prior art in this space from the Rust ecosystem. Rust’s trait implementations are analogous to type class instances, and Rust also prohibits orphaned instances.

Rust solves this through the use of cargo features. These features can be used to conditionally compile parts of the code based on which features are present.

It is relatively common for crates to have a serde feature that provides trait implementations for serde for consumers who want to serialize/deserialize the types from the crate.

The uuid crate, for example, can be consumed like so:

[dependencies]
uuid = { version = "0.8", features = ["serde"] }

Internally, the crate uses the cfg attribute to put the serde_support module that provides the serde trait implementations behind this feature:

#[cfg(feature = "serde")]
mod serde_support;

Likewise, serde and serde_json are marked as optional dependencies in Cargo.toml, meaning that they are only pulled in when using the serde feature.

---

To wrap things up, I’m curious to hear what others have to say about the current state of affairs for providing type class instances for libraries.

Does having something like Cargo features in Spago sound desirable? Feasible?

Are there other solutions or ideas in this space that you’ve thought of?

garyb · November 7, 2021, 11:39pm

This may be an unpopular opinion, but I think this is a problem that only exists for “bad” typeclasses. The main ones I can think of that suffer from this are:

EncodeJson
DecodeJson
Arbitrary
Coarbitrary

However, I’ll accept that perhaps this is somewhat of a circular argument, because part of the reason I consider them to be “bad” is exactly to do with instance-related difficulties.

One school of thought is that classes should have laws. I don’t think it’s possible to write laws for any of the aforementioned classes (aside from perhaps something along the lines of decodeJson ∘ encodeJson ∘ decodeJson = identity for that case, but I don’t feel like that counts as it’s a law for the interaction of two classes that aren’t required to be implemented together, and says nothing about either in isolation).

Another criteria would be that the class operations are “universally applicable”. By this I mean classes that are purpose agnostic - they’re a tool that help you manipulate values and in theory could be used in any program. The above examples fail this because they’re “dead ends” in a way. This is a big part of what I feel like leads to instance problems: if the class was generally useful enough then adding a dependency on it for downstream libraries would be a no-brainer.

The last thing that springs to mind is that classes should generally result in instances that have one sensible/uncontroversial implementation for most types.This is probably a consequence of both the previous things I mentioned though - with laws it cuts down the possible implementation drastically, and similarly if the operations are generic in nature.

There are counter-examples to all the above in classes that are common and widely used - Foldable has no laws, there are two sensible Monoid implementations for Number, MTL style classes are a thing, etc. but I think generally these principles are what makes a class “good” in my opinion. Are there examples of classes out there that don’t fit in with what I’ve said here and that suffer from instance problems?

Because I mentioned them, my solutions for the aforementioned classes:

JSON codecs should be implemented explicitly as values/functions. It can be tedious and annoying but unless you truly don’t care about the serialization format or compatibility over time then it ends up worth it in the long run.
Arbitrary instances should instead be written as values with a MonadGen constraint. I admit this is also a slightly questionable class, since it doesn’t really fit any of the criteria above, aside from it not being a “dead end” since it works for generating random values outside of quickcheck too. But I do think abstracting over the generator is much closer to the spirit of “goodness” than providing a single instance for each type, and also means it’s easy to add MonadGen values for types that are missing instances or are generating values in a different range than is desirable for a particular scenario.

Typeclasses are an extremely powerful and useful tool, but I think that leads to people wanting to use them a bit too much. Sometimes all you need is values and functions. @joneshf may have some more thoughts on this point as he’s expressed similar things in the past that I found convincing.

monoidmusician · November 8, 2021, 2:02am

I wouldn’t frame approach 1 and 2 as competing or coherent approaches. The choice between 1 and 2 is dictated by where in the import/popularity hierarchy the types vs typeclasses appear. For builtin types, of course the instance has to be provided with the typeclass: there type is not defined in any module. Similarly, within the prelude, modules have to be carefully arranged to avoid circular imports, so the choice of type vs typeclass is dictated by which one has access to the other. For core libraries the situation is similar: mostly the typeclasses are defined in prelude, so instances for the types defined in other core libraries live with the type.

So if you define a type outside of core, you’re sort of obligated to provide instances for the appropriate core-defined typeclasses. And if you define a typeclass outside of core, you should provide instances for many types.

The real problem when it’s hard to determine which module/library is more popular, and sometimes the answer is both are relatively unused, so it’s hard to justify a dependency on an scarcely-used package just to provide an instance. This is where it gets awkward with the orphan instance restriction.

So yeah, I agree there could be some kind of glue provided to fill in the gaps, but whether the instance lives with the type or the typeclass is basically dictated by the relative popularity of the two.

I would think if we were to provide something it would have to live in the compiler so it could work with multiple package managers and not just spago.

One problem is that it can’t really be separate modules that are conditionally compiled, right?
Maybe some kind of conditional import inline in the module, “if this module exists, import it and these other modules, and provide these instances”. (I’m not sure if that works well with incremental compilation, which is one of the main driving factors for the orphan instance restriction, I believe.)

But my idea still means you need to know whose responsibility it is to provide the instance. It just eases concerns with hard dependencies, basically. And I’m not sure if any solution will really avoid that …

klarkc · November 8, 2021, 12:43pm

I agree with that. I believe that enabling such feature (aka Rust trait implementations) would actually bring a worse UX because it incentives lib authors overuse typeclasses where it should not be used.

Adrielus · November 8, 2021, 4:08pm

Let’s take a practical example i always hit into: At and HashMap. This being a simple lens, copy pasting the implementation as a value, to each project of mine is pretty easy. But that’s obviously not the best ux. What solution do you think would fit the best here?

jy14898 · November 8, 2021, 4:36pm

As it currently stands, there’s a clear way to place instances based on not having cyclic dependencies:

moduleA: defines type T
moduleB: defines class C

If moduleA is a dependency of moduleB, moduleB defines the instance
If moduleB is a dependency of moduleA, moduleA defines the instance

The issue arises when neither are a dependency of the other, there’s no clear module to hold the instance. I think the solution is that both modules state an ordered list of delegate modules (for the type or class) where instances could be defined, and the first point where they come to a consensus is where the instance may be defined. That way, no two packages may define the same instance, as no two packages can define the same module

One benefit is the compiler can then suggest the module name of where an instance might be, when it fails to find an instance given the current modules imported

maxdeviant · November 9, 2021, 1:36am

I think this case would fall under the second approach described above where unordered-collections would provide an At instance for HashMap.

If we imagine a world where the equivalent of Cargo features exists in PureScript, then you can imagine you would add the unordered-collections dependency to your project and enable the profunctor-lenses feature, which would then provide instances of At (and potentially other classes provided by profunctor-lenses) for each of the relevant types in unordered-collections.

Some thoughts on type class instances in libraries

1. The library providing the type class provides the instances

2. The library providing the type provides the instances

3. Library consumers use newtype wrapping to provide their own instances

Legend

3. Library consumers use `newtype` wrapping to provide their own instances