Can We Cross-Pollinate with Unison?

I’ve been building a Purescript Deku/Hyrule front end and Haskell back end app for a while. Obviously, even with the best codegen, it is an ever-present problem keeping my types aligned between the back and front end and making sure there’s no corner cases in the API I am constructing.

Anyway, I’ve always had my eye on Unison for the back end…and as it hit 1.0 recently, I looked into the project. Picture that meme of the man walking with his girlfriend and looking back at another woman. That woman is Unison perhaps. :slight_smile:

Anyway, I started to look into porting my back end into Unison and it doesn’t actually seem that bad. However, they (unlike us) don’t seem to have much in way of strong front end libraries available. Is that an opportunity for cross-pollination? Or are we too different for that to be possible? I’m pretty sure that’s the case but both of our syntax are MLesque…

What do the Purescript community and devs think of this project? And what do you think are the ramifications of it? Do you think there’s any opportunity for wiring our front end tools to the Unison ecosystem?

I’ll keep building with the tools I have (and I sincerely love Purescript) but I’ll always have an eye toward new ideas…and Unison is surely one. Or perhaps an old idea that we haven’t seen done quite like that…

Edit: I stumbled across a fairly pertinent video today. It seems that I wasn’t completely out of left field with this idea since it is mentioned in the video that one would still need JavaScript to use Unison for browser UI’s.

I think Unison is a very cool project. It’s quite some time since I looked at it though!

As with many cross-language integrations I think the main problem will be adapting the type systems to each other, depending on what you had in mind.

1 Like

I had the entire gamut of cross-pollination in mind all the way from subtle to complete Purescript codegen using Unison.

Mainly, I wanted to see how (more knowledgeable and experienced) people would go about it since there’s so many ways that could happen.

Honestly, the only reason I’m not going full-throttle toward learning Unison (or trying to use it like I’m implying) is their discord-centered community where questions get buried never to be seen again after like an hour.

I do think it is fertile ground, though. Our Chez Scheme back end is another example of great cross pollination. There’s so many elegant ideas in Unison that I’d like to start using either it or Chez-Scheme for back ends.

Unison’s key insight is alpha-equivalence aware, name-agnostic structural hashing.

With that in mind, I had an interesting idea today.

A module akin to tidy-codegen that could be executed on one’s codebase as a post-compilation step to obtain Unison-like features in Purescript without any additional cost.

Hashes derived from the abstract syntax tree in Purescript. I looked further into the idea and it’s not impossible and with my recent Chez Scheme work on the sha3 module, I’ve already tread some of the path toward that.


Research:

PureScript’s compiler already produces CoreFn as a serialized JSON intermediate representation (the same IR that purescm consumes). CoreFn is a desugared, simplified AST that strips away syntactic sugar, do-notation, type class dictionaries are made explicit, etc. It’s already quite close to the “structural essence” of each function.
Each module’s CoreFn output lives in output//corefn.json and contains every top-level binding with its full expression tree. This is my ideal input!

Steps to bring Unison functionality into Purescript:

  1. Normalize the CoreFn AST before hashing:
    ∙ Replace all local variable names with de Bruijn indices (or positional identifiers). This makes \x → x + 1 and \y → y + 1 hash identically.
    ∙ For references to other top-level definitions, there is a choice: use their hash (fully content-addressed, Unison-style) or use their qualified name (simpler, but renames break hashes). The recursive/content-addressed approach is more powerful but requires topological sorting of the dependency graph.
    ∙ Strip source spans, comments, annotations that don’t affect semantics.
  2. Serialize the normalized AST deterministically - a JSON or a custom binary encoding where key order, whitespace, etc. are all deterministic.
  3. Hash with SHA-256 (or SHA-3, given my recent work) to get a hash per binding.

Easy wins:

Structural identity: renaming a function or its local variables doesn’t change the hash. Only actual logic changes do.
Dependency-aware hashing: if a helper function changes, all callers’ hashes change too (because their normalized AST embeds the helper’s hash, not its name).
Incremental builds: we could skip recompilation/analysis of anything whose hash hasn’t changed, similar to how Unison’s scratch files work.
AI token compression: Share the code once, then reference by hash. One could maintain a local hash→code database and even have a convention where they’d send an LLM a manifest at the start of a conversation.

MUCH harder to achieve:

Unison’s type-aware hashing considers types as part of identity. CoreFn has type annotations but they’re not always fully elaborated. We could include them or exclude them depending on whether we want Int → Int and Number → Number functions with identical bodies to hash differently.
Unison also handles structural type equivalence: record field reordering, etc. CoreFn is more rigid here, so we’d need to sort record fields by name during normalization if we want that.