Understanding the tradeoff of abstracting using the State Monad

Trequetrum · April 15, 2021, 6:56pm

My understanding of the State Monad is that these two functions are very similar:

addIncState :: Number -> State Int Number
addIncState n = do
  state <- get
  modify_ (_ + 1)
  pure $ toNumber state + n

addIncTuple :: Tuple Int Number -> Tuple Int Number
addIncTuple (Tuple state val) = Tuple (state + 1) (toNumber state + val)

The key difference is that the type signature is much more descriptive when State is used.

I recently re-wrote an algorithm that was searching a pretty expansive state-space. I switched from manually threading state to using the State monad. It worked, but the final result was an algorithm that was considerably slower.

Is that a surprising result or is it an expected result?

Does this added level of abstraction come with a performance trade-off? I have read that Monad Transformers and Type Classes combine to create broadly applicable/well architected code at the cost of performance. I gather the problem there, however, isn’t applicable to Monads as a whole.

A quick look at the compiled code:

var addIncTuple = function (v) {
    return new Data_Tuple.Tuple(v.value0 + 1 | 0, Data_Int.toNumber(v.value0) + v.value1);
};

var addIncState = function (n) {
    return Control_Bind.bind(Control_Monad_State_Trans.bindStateT(Data_Identity.monadIdentity))(Control_Monad_State_Class.get(Control_Monad_State_Trans.monadStateStateT(Data_Identity.monadIdentity)))(function (state) {
        return Control_Bind.discard(Control_Bind.discardUnit)(Control_Monad_State_Trans.bindStateT(Data_Identity.monadIdentity))(Control_Monad_State_Class.modify_(Control_Monad_State_Trans.monadStateStateT(Data_Identity.monadIdentity))(function (v) {
            return v + 1 | 0;
        }))(function () {
            return Control_Applicative.pure(Control_Monad_State_Trans.applicativeStateT(Data_Identity.monadIdentity))(Data_Int.toNumber(state) + n);
        });
    });
};

While addIncTuple is much more straight forward. I don’t really spot the gotcha with State, is it death by a thousand cuts or has it given me just enough rope to shoot myself in the foot?

natefaubion · April 15, 2021, 7:19pm

Type class heavy code has suboptimal code generation, so it will allocate far more than necessary. There’s a PR to help with this, but it needs careful review.

Trequetrum · April 15, 2021, 8:25pm

Does addIncState use type classes under the hood? I get that State (or StateT a Identity) has an instance of the MonadState typeclass, but I assumed code using State directly wouldn’t be bothered by that.

Ah! I see that in this mini-example, the get and modify_ functions both have type class constraints. I’m not planning on doing this, but if I forked Control.Monad.State, and exposed the implementations of its MonadState instance as separate functions.

For example, maybe like this:

modify_ :: forall s m. MonadState s m => (s -> s) -> m Unit
modify_ f = state \s -> Tuple unit (f s)

modifyFaster ::  forall s m.  (s -> s) -> m Unit
modifyFaster = stateFaster \s -> Tuple unit (f s)

Should I expect the performance of this to roughly rival the explicit state weaving I was doing before?

Adrielus · April 15, 2021, 10:09pm

I think the actual equivalent type definition is

addInc :: Number -> Int -> Tuple Int Number

, so I guess there’s an extra function application there, but idk if that matters in terms of performance.

ntwilson · April 18, 2021, 9:45pm

This doesn’t directly answer your question, but if you like the simplicity of your addIncTuple function but like the more descriptive type signature of the State monad, you can take advantage of the fact that State is just an alias for StateT s Identity and that StateT and Identity are both just newtypes. So you could write it as

addIncState :: Number -> State Int Number
addIncState val = StateT $ \state -> Identity $ Tuple (toNumber state + val) (state + 1)

I have a feeling that would also improve performance by avoiding the get and modify_ calls, but I’m not positive.

natefaubion · April 20, 2021, 3:46pm

Without an inliner, binds are not likely to be equivalent from a performance standpoint to hand-inlined code (pattern matching on tuples). They are semantically equivalent, and it’s possible that a JIT gets you close enough that it doesn’t matter, but in general I always assume some performance hit for abstraction since PureScript does not have a sophisticated optimizer.

Trequetrum · April 20, 2021, 4:07pm

You’re right on both counts. I gave it a try and the type class constraints seemed to be the bulk of the slowdown. Good to know

Adrielus · April 20, 2021, 10:39pm

I wonder if QualifiedDo + hardcoding the bind/discard for StateT in a separate module can remove the overhead