When should you use primitive types instead of custom types?

thomashoneyman · October 31, 2018, 3:41am

I have developed some opinions about cases in which type safety can be overkill, and I wanted to share with the community and hear what other people think! I’m specifically talking about the use of custom types to add meaning to values (like using CustomerId and OrderId to distinguish one int from another) or to enforce validity (like using a smart constructor to restrict anything with an Email type to be a valid email), and not the absence of types altogether.

I would love to hear dissenting opinions or more cases in which you might just reach for a primitive value rather than try to give everything a meaningful type in your application.

Types bring safety and soundness to applications. They are powerful tools to manage complexity and enforce data integrity. But they do carry a cost: it takes careful thought and non-trivial code to create custom types, and the benefits are not always worth the price. There are some cases in which custom types may be the correct solution, but for various reasons, it is acceptable to reach for primitive types instead.

You know a value is only valid within some constraints, but the type is not the right place to enforce those constraints.

This situation occurs when you know the constraints that a value is supposed to satisfy, but if it doesn’t satisfy them you aren’t allowed to reject or alter it. It’s not your call to make.

For example, you might know that a bio is only valid if it is less than or equal to 300 characters. But you only receive bios via API calls and the backend promises that all bios have been validated. If somehow an invalid one slipped through, then you are supposed to just try and use it anyway. What should you do?

In this case, forcing any Bio type to be a string that satisfies the 300 character limit is unacceptable. You have two choices: trust your backend at the risk of accepting strings that are not actually valid, or adjust your type to account for both valid and invalid bios. Which you should choose will often depend on whether…

Enforcing constraints is not worth the effort

Some guarantees are more important than others. Guaranteeing that a payment includes a valid dollar amount is more likely to be critical than guaranteeing that all bios in the system are within 300 characters.

If you need to secure and validate all data in the system and enforce constraints with the compiler, then you should make everything type-safe. But quite often these requirements are fuzzy, and you’re required to handle inputs even when they don’t fit the spec, and so on. In these cases the imprecision of a simple primitive type may be fine.

Choosing this path has the opposite effect of restricting the domain, however. Instead, you’re expanding the domain of functions over this value to include anything that the primitive you choose could be! For a String, that’s essentially infinite. Functions operating on these values are now going to have to handle the potential of bad input in order to maintain some semblance of integrity in your system. That means a lot more functions returning Maybe and Either values and pushing the responsibility to handle failure deeper into your system.

The far extreme of laziness is to not even try to handle the failure cases at all. For example, you might just represent a bio as a String and attempt to render it out to the page no matter how long it is, or if its empty, or full of strange unicode symbols, and just accept that it will look absolutely terrible.

If you’re confident that the data is valid coming in and you can’t change it even if its invalid, then this may simply be a trade-off your company is willing to make.

The type isn’t informative on its own, but it’s part of a larger type that is informative.

CustomerId and OrderId newtypes can be used to distinguish two Int values from one another. This is a fantastic way to give primitive types a little more meaning. But it isn’t always necessary. Consider a record type that contains a user’s public profile information:

type UserProfile =
  { ...
  , following :: Boolean
  }

This type records some data about a user, including whether you follow them. Should we create a custom type so that we don’t mix up this boolean with some other boolean?

I wouldn’t. It’s a field in a record, so you already know quite a lot about it: it has a name, following, and belongs to a larger UserProfile type. Most likely it will be accessed with dot syntax, like user.following, which provides more context vs. being an isolated boolean value. I am also assuming the value is used infrequently and primarily to control the display of a button on a user profile.

We need this value to be correct, but we have plenty of information beyond the type with which to identify it and distinguish it from other boolean values.

felixschl · October 31, 2018, 4:41am

I personally do not use newtypes for enforcing constraints but I will reach for them to disambiguate multiple values of the same type. Primitive types make sense in records where the label effectively does the disambiguation, but I wouldn’t wrap every function argument in a row just to get a label, and generally tend to avoid records as arguments. I tend to export the newtype’s constructor and don’t mess with smart constructors at all, hence I don’t pretend to have actually narrowed down the domain of the type.

I’d argue there are at least two more cases for custom types, however.

Custom type class implementations
Phantoms (i don’t really use it often, but can be useful)
Avoiding Right (Left (Right ...))

natefaubion · October 31, 2018, 4:03pm

Record labels only work to disambiguate if you never hold a reference to the field outside of the record as a whole. As with anything, you can go overboard, but I think it’s just good hygiene to disambiguate between arbitrary String blobs and things like UUIDs at the type-level.

thomashoneyman · October 31, 2018, 6:03pm

Some replies from /r/haskell:

@BartAdv

You wrap an integer identifier in newtype not only so that you don’t accidentally write other integer there. You do this, because identifier doesnt have the semantics of integers - you don’t want to perform arithmetics on them, it’s just unique identification. And because it’s unique it makes also sense to have such newtype for other records, because they’re just not the same (even if underlying representation is).

When it comes to your boolean example, this field is just a boolean, and you in fact, might want its value to be coming from other booleans - suppose from checkbox on the Ui. So there’s little value in wrapping it innewtype - you want its boolean semantics.

@Tarmen

I newtype everything that has different semantics from the base type. That’s admittedly somewhat subjective but when in doubt I usually just add the wrapper. Worst case it adds some coerce statements and usually it lets me do more type tetris which means less thinking which means fewer bugs.

For Bio I would use a newtype wrapper around Text, possibly with a smart constructor that logs a warning but continues anyway for invalid inputs.

bklaric · October 31, 2018, 10:19pm

I generally use newtypes whenever I need to validate a piece of data, or in other words, whenever I need to constrict the possible values of the base type. I also use them only where I’m actually doing the validation.

For a simple client-server app this might mean using newtypes and algebraic types extensively on the server for domain modeling and validation purposes, while leaving the client to deal with primitives.

paluh · November 1, 2018, 10:01am

I like smart constructors but I also export actual newtype constructors which are restricted by them in my libs. Users can access them freely when they need to for example define statically known valid values etc. Of course this is a bit dangerous as constraints may change over time but for me it is a good balance between convenience and safety.
I’m exposing these constructors usually by using Internal namespace…

thomashoneyman · November 1, 2018, 4:53pm

~~They could always reach for a nice unsafeCoerce, as well, to make it even more obvious they’re saying “I know what I’m doing!”~~ We use this for lots of datetime helpers where we know for sure the date is valid. Things like unsafeMkDate, unsafeMkTime, etc.

Edit: I was actually thinking of unsafePartial <<< fromJust, not unsafeCoerce. We use this with unsafeMkDate. Don’t write posts too early in the morning!

paluh · November 1, 2018, 5:21pm

@thomashoneyman Cool!
If these unsafe* helpers are provided by given library and they are created with just constructor usage I think it is ok to use them…
But leaving unsafeCoerce as the only option for a library user and not providing such a helpers or constructors directly which create given newtype can be a little more dangerous. If your library internal data representation changes she/he can end up with a runtime exception. Probably rare case but still…

thomashoneyman · November 2, 2018, 3:42am

You’re absolutely right. The unsafe* functions are the right move, not unsafeCoerce, which should be avoided.

bklaric · November 2, 2018, 8:38am

Would a slightly better option be to use something like unsafePartial fromJust? I’m assuming here the newtype constructor isn’t exported and there’s a smart constructor available, like create :: String -> Maybe MyStringNewtype.

If I understand it right, if you need a known value at compile time, you can write it as a package value. This way during runtime when the package gets loaded and the value evaluated, the unsafePartial fromJust will throw if the smart constructor precondition isn’t met. This provides an effective smoke test while also keeping things type safe.

thomashoneyman · November 2, 2018, 4:05pm

That’s what I was (not very clearly) referring to with our unsafeMkDate, unsafe* functions. These used unsafePartial (fromJust (toDate ...))) under the hood and allowed us to recover convenience when we didn’t feel like handling a Maybe case all over the place that didn’t truly represent the possibilities for the value.

@paluh that might let you be a little more sure that the type is being used correctly – ~~for example, you can’t inadvertently use wrap or unwrap if the constructor is hidden, but you might make this mistake if it isn’t.~~ Edit: This is false. You still can.

Whether you export those functions directly or rely on the user using your smart constructor + unsafePartial $ fromJust the result is still the protection you wanted plus an escape hatch for times the protection isn’t needed.

paluh · November 6, 2018, 2:42pm

I think that wrap and unwrap are rather related to existance of a Newtype instance.

Regarding fromJust - it is for sure better than plain unsafeCoerce but still we are at risk of run time error. I’m not saying that it is really worse than having broken value created directly with exposed constructor.
I’m just not sure why we should prevent your library user to decide how she/he wants to create “unsafe” values and what kind of possible risk and failure he/she likes better

I think that the approach to hiding constructor is a matter of personal preference too so you know… it is possible that there is no final conclusion

thomashoneyman · November 8, 2018, 4:15am

How embarrassing! I made two inaccurate comments in a row.

As you noted, wrap and unwrap relate to the Newtype instance. If you have the instance then whether the constructor is exported or not is irrelevant. You can freely use these two functions regardless.

I’m just not sure why we should prevent your library user to decide how she/he wants to create “unsafe” values and what kind of possible risk and failure he/she likes better…I think that the approach to hiding constructor is a matter of personal preference too

I agree with you, and I think I’ve made a mistake by not differentiating between use in an application vs. use in a library.

I much prefer this used in an application where you have domain-specific rules about what is valid and not valid and you want to enforce it throughout the application. As a matter of personal preference, in these cases I’d rather use a smart constructor and rely on unsafePartial $ fromJust in the rare cases I want to construct it unsafely and am 100% sure I’ve got a valid value.

I don’t particularly like it in a library, because the library will need to handle all kinds of general use cases and domains and I’ve found this pattern too restrictive before. For example, I’m not such a fan of the Validation library using a smart constructor and not being able to pattern match on valid / invalid ever. unV is…OK, at best.

I also think that personal preferences have much greater weight as an argument when you talk about an application vs. a library meant for general use. There are exceptions (highly opinionated libraries), but in general libraries ought to flexible in a way that’s not usually required of apps.

safareli · January 7, 2019, 11:32am

Instead of using unsafePartial $ fromJust ... I tend to use pattern matching and unsafeCrashWith, so in case my assumption gets invalidated at some point, I’ll get proper error message:

foo = case mkFoo "baz" of
  Nothing -> unsafeCrashWith "baz should be valid Foo"
  Just v -> v

thomashoneyman · January 7, 2019, 8:09pm

That’s a smart idea. I haven’t made enough use of that pattern.