Experiment: Representing data constructors as plain arrays to reduce code size (and possibly increase performance)

utkarshkukreti · January 16, 2019, 6:13pm

I just uploaded a patch to the PureScript compiler here which changes the way constructor types are representated in the generated JS code.

Right now, the following code:

data Tuple a b = Tuple a b
data Either a b = Left a | Right b

compiles to:

var Tuple = (function () {
    function Tuple(value0, value1) {
        this.value0 = value0;
        this.value1 = value1;
    };
    Tuple.create = function (value0) {
        return function (value1) {
            return new Tuple(value0, value1);
        };
    };
    return Tuple;
})();
var Left = (function () {
    function Left(value0) {
        this.value0 = value0;
    };
    Left.create = function (value0) {
        return new Left(value0);
    };
    return Left;
})();
var Right = (function () {
    function Right(value0) {
        this.value0 = value0;
    };
    Right.create = function (value0) {
        return new Right(value0);
    };
    return Right;
})();

With my patch, it compiles to:

var Tuple = function (value0, value1) {
    return [ "Tuple", value0, value1 ];
};
var Left = function (value0) {
    return [ "Left", value0 ];
};
var Right = function (value0) {
    return [ "Right", value0 ];
};

(Q: What happened to the .create version? A: I currently generate an anonymous function every time a constructor is partially applied. This is not very common in the code I’ve analyzed.)

(Q: Why is the first element a string and not an integer or unique object? A: I’m working on it. It should reduce the code size mentioned below even more!)

Benchmark 1

I ran the patched compiler on purescript-halogen-realworld and the minified size of the bundle went from 652.59KB to 609.06KB.

Benchmark 2

I picked a package which uses sum types extensively: purescript-unordered-collections and wrote a simple program to construct a huge map 20 times:

module Main where

import Prelude

import Data.Array ((..))
import Data.Foldable (foldl)
import Data.Map as Map
import Effect (Effect)
import Effect.Console (log)

main :: Effect Unit
main = go 20
  where
    array = 1 .. 100000
    go 0 = pure unit
    go n = do
      let map = foldl (\acc x -> Map.insert x x acc) Map.empty array
      log $ show $ Map.size map
      go (n - 1)

The runtime of this function went from ~4.1 seconds to ~3.2 seconds with Node v11.6.0.

Trying it out

Check out my fork of the compiler: https://github.com/utkarshkukreti/purescript/tree/compact-constructor-repr-at-runtime and build it with make build.
Run stack exec bash. This will drop you into a bash instance where purs will point to the fork. Verify this with which purs. (Thanks to @LiamGoodacre on Slack for this tip!)
cd into your project and compile it normally (e.g. with pulp build).

All the compiler tests pass with my patch. I compiled purescript-halogen-realworld and tried it out and it seems to be working fine. There might still be bugs in either my patch or some FFI code that relies on the current object based representation of data constructors.

Please let me know what you all think! Is a change like this desirable as long as it causes no regressions?

JordanMartinez · January 17, 2019, 3:36am

Faster code and a smaller file size sound like good wins to me.

What regressions do you think might occur by changing to use this by default?

utkarshkukreti · January 17, 2019, 10:02am

The only one I can think of right now is if some FFI code relies on the representation of constructor type values e.g. accessing the contents of a Tuple using .value0 and value1. I believe this is an implementation detail and the FFI code is incorrect but it’s still something that will break.

This has been implemented! purescript-halogen-realworld now compiles to 599.72KB!

utkarshkukreti · January 17, 2019, 11:03am

I made some more changes:

Product Types do not store the constructor name in the returned array.

data Tuple a b = Tuple a b

Before:

var Tuple = function Tuple(value0, value1) {
  return [Tuple, value0, value1];
};

After:

var Tuple = function Tuple(value0, value1) {
  return [value0, value1];
};

Product Types with only one field are not boxed. They behave the same as newtype values.

data Length = Length Int

Before:

var Length = function Length(value0) {
  return [value0];
}

After:

var Length = function Length(value0) {
  return value0;
};

garyb · January 17, 2019, 2:46pm

We could really use a decently complex program to benchmark this kind of thing on, I don’t really have any faith that micro benchmarks like the above tell us anything useful for what effect it would have in real world situations - I’m pretty sure last time we concocted a benchmark comparing arrays with classes that classes were still quite significantly faster, it just depends what exactly the benchmark is doing.

Originally we used purescript-in-purescript to try out options, and that’s what led us to the hidden-class based data constructor representation - it was by far one of the most effective improvements we made at the time (originally it used anonymous objects).

if some FFI code relies on the representation of constructor type values

We’ve always encouraged people to avoid this and instead use FFI strategies where constructors are passed in to the FFI code, so if people are depending on a particular representation and it breaks, that’s their fault anyway.

utkarshkukreti · January 17, 2019, 2:54pm

I agree. I don’t think there are any open source projects which we can use?

My primary motivation for this change though was to reduce the compiled output size. The array representation is inspired by BuckleScript and it minifies really well compared to the hidden-class implementation.

This is probably why purescript-halogen-realworld didn’t break with this change.

garyb · January 17, 2019, 3:08pm

Yeah, and I’m a bit more concerned about performance since PS is not exactly breaking any speed records as it is.

The hidden class optimisation sure doesn’t minify well, but it does gzip very well at least!

iclanzan · January 18, 2019, 5:52pm

I really appreciate the work being done here!

Have you considered inlining the values when possible instead of calling the constructor functions?

utkarshkukreti · January 19, 2019, 8:17am

I will try it out. That’s what BuckleScript does. I don’t think there will be any measurable improvement in either the minified size or performance but I’ll try!