Functional Dissections

PureFunctor · January 23, 2022, 11:19am

A while back, I implemented a library for functional dissections in PureScript with dissect, which then enabled me to write a suite of stack-safe recursion schemes that operate on types that can be dissected ssrs.

While the stack safety that these two libraries bring about is awesome, it’s also an ergonomic nightmare. For one, Dissect instances for containers have to be derived, and they’re quite difficult, even impossible (at the moment) when taking into account mutual recursion. A solution that the original paper proposed was polynomial functors: essentially generic types that when put together, allow one to define a data type with a free Dissect instance—though with caveats like having to write “smart” constructors, making it difficult to write pattern matching, and really bad performance.

Though, I’ve invested some time implementing Dissect for Variant, which takes away the complexity of nested Sum types. Likewise, I’ve also managed to approximate mutual recursion using Variant, and here’s a general example with an evaluation catamorphism.

That aside, I’ve been thinking about encoding functional dissections a bit differently for quite a while now. I’ve noticed a pattern that hand-written instances for Dissect have, and it’s that the number of points of recursion that a constructor in a recursive data type has directly corresponds to the number of “states” it has when being dissected. Take for example the following type:

data TreeF n
  = Leaf
  | Fork n n n

The corresponding structure that I’d have to build for this in order to get a dissection would be:

data TreeF_2 n m
  = ForkRR m m
  | ForkLR n m
  | ForkLL n n

Which gives the following Dissect instance:

instance Dissect TreeF TreeF_2 where
  right = case _ of
    Left Leaf -> Right Leaf
    Left (Fork m n o) -> Left (Tuple m (ForkRR n o))
    Right (Tuple w c) -> case w of
      ForkRR m n -> Left (Tuple m (ForkLR c n))
      ForkLR n m -> Left (Tuple m (ForkLL n c))
      ForkLL n o -> Right (Fork n o c)

When a Fork is initially dissected, it yields some m, then ForkRR, which means that there’s 2 items left in the structure. Similarly, if we follow this chain, we see that ForkRR yields another m, and gets transformed into ForkLR, with an item c injected in place of it. ForkLR then goes into ForkLL at which point there’s nothing left to yield, and it instead reconstructs a Fork with all the replacement items carried over from the previous states.

This actually reminds me of Python’s generators, essentially a generalization of iterators that allow for duplex communication. I can send some data to a generator, and get some data back.

In [33]: def example(xs):
    ...:   ys = []
    ...:   for x in xs:
    ...:     y = (yield x)
    ...:     ys.append(y)
    ...:   return ys
    ...: 

In [34]: i = example([1, 2, 3])

In [35]: a, b, c, d = i.send(None), i.send(1), i.send(2), i.send(3)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-35-9b84b70d7082> in <module>
----> 1 a, b, c, d = i.send(None), i.send(1), i.send(2), i.send(3)

StopIteration: [1, 2, 3]

In this case, a, b, and c now have 1, 2, and 3 respectively, meanwhile at the end of the chain I get an exception carrying the list of all the items that I’ve sent over. Had I structured this a bit more, I’d actually have a way to have a mapping operation.

Translating this onto PureScript was something that I’ve had at the back of my mind for quite a while now, though I’ve been reluctant to explore possible solutions or abstractions that could help translate this over to use with dissections. A monadic interface for this type of duplex communication likely won’t be too complicated, though either doing it through codegen or type classes is a different story.

On that note, it’d be pretty interesting to see a solution that involves generating abstract state machines for each recursive type, like parser generators in a way, though frankly my knowledge on this is particularly limited.

Related Stuff:

Clowns to the Left of me, Jokers to the Right (Pearl)

Original paper where dissect was derived from.

Nate’s post about mutually recursive variants.

Generic programming with fixed points for mutually recursive datatype.

jy14898 · January 23, 2022, 2:03pm

It’s been a while since I worked on this, but when I saw your SSRS I was inspired to try something similar to what I think you’re hinting at with generators

The basic idea was that the (pseudo-)dissection happens in-place in the algebra:

sumAlg :: Algebra (ListF Int) Int
sumAlg = case _ of
  Nil -> 0
  Cons a b -> force a + force b

When force a is evaluated, an exception is thrown that jumps back to a ‘machine’ that then evaluates the algebra for the a, and with a mutable reference the a is replaced with the actual result, instead of an exception throwing thunk.

Because sumAlg itself is a pure function, we just evaluate sumAlg again on the original input, however this time it doesn’t throw an exception on force a and instead throws an exception on force b. This continues until nothing throws an exception. Using a generator/coroutine/CPS would save us having to catch up to the original point by just going straight to the next piece of work. The challenge is having it look like the standard algebra fed into cata

In my actual code I used a constraint to implicitly call force, eg something similar to a :: Magic => Int, so Cons a b -> a + b. This is similar to how GitHub - natefaubion/purescript-call-by-name: Syntactically light-weight call-by-name arguments in PureScript. No guarantees. Completely gratuitous. works.

In the end I think it was slower than SRSS, most likely because of repeated work and exception jumping, and also required a change to the compiler (Can't solve constraints in values which came from constructor binders where the constructor was polymorphic by jy14898 · Pull Request #4196 · purescript/purescript · GitHub)

I hope this makes sense, I completely forgot I even worked on this until seeing this post

natefaubion · January 23, 2022, 4:57pm

I like the dissection approach a lot because when done well, my experience is it’s nearly an order of magnitude faster than trampolined traversals. I’m glad you’ve been working on it . The generic approach is a total non-starter though IMO, and unfortunately it is excruciatingly tedious to write (especially since it’s totally first order). However it’s really modest in allocations and seems to JIT really well. My suspicion about the generator approach is that you will essentially end up with a Free Monad construction and hit the same performance profile as a trampolined traversal.

xgrommx · January 23, 2022, 9:47pm

How about Mendler’s recursion schemes? We can control every branches of adt because we can apply recursor manually.

PureFunctor · January 24, 2022, 5:35am

What about through something like ST? I’ve made a pretty naive approximation here (Try PureScript!). When generalized, the consumer would have to provide destructuring and restructuring for the data types being dissected. Some safety would have to be sacrificed as the burden of correctness is now on the consumer, unless something like a sized vector is used.

I can’t say I’m familiar with them. How would they relate here?

Adrielus · January 26, 2022, 8:45am

What about compiling to native es6 generators?

PureFunctor · January 28, 2022, 5:33am

I cleaned up the ST interface and it produced a lot more overhead compared to the dissections interface, so I ended up rewriting the internals as well as the catamorphism in raw JS to even come close to the performance of dissect + ssrs. Very arbitrary benchmarks:

OTOH I’m currently exploring how I could write instances for records, considering that Variant is a viable alternative to nested Sum types as well as making mutual recursion possible, though I have to make sure that the type-level operations are as optimized as possible.

I ought to try that.

jy14898 · February 6, 2022, 12:48pm

I realised last weekend that you can combine the destructure and restructure (from makina) into one method by not defuncionalizing the tag/fields.

foreign import data Info :: Type -> Type -> Type -> Type

-- Info is equivalent to Info'
-- data Info' a b c
--   = forall n. Info'
--     { points :: Vec n a
--     , restructure :: Vec n b -> c
--     }

-- Sketch of how the applicative instances are implemented
-- pure v = Info' { points: [], restructure: \_ -> v }
--
-- point :: forall a b. a -> Info' a b b
-- point v = Info' { points: [v], restructure: \[w] -> w }
--
-- map f (Info' info) = Info' (info { restructure = f <<< info.restructure })
--
-- apply (Info' info_f) (Info' info)
--   = Info' { points: concat info_f.points info.points
--           , restructure: \w -> info_f.restructure (slice ... w) (info.restructure (slice ... w))
--           }

class IsoStruct f where
  isostruct :: forall a b. f a -> Info a b (f b)

instance IsoStruct (TreeF a) where
  isostruct = case _ of
    Leaf a -> pure $ Leaf a
    Fork a b -> Fork <$> point a <*> point b

I like the succinctness of isostruct and how it both unpacks and creates the packer in one. Once I find more time I’ll upload my version