What is the state of the art minification/compression?

What are you all using to compress the generated JavaScript?

I am currently generating my final code using this combination:

psc-package build
purs bundle 'output/**/*.js' -m Main --main Main -o app.js

The output size is very large: 471K.

Gzip can bring this down for bandwidth motivations, but the uncompressed size also places a burden on the client (especially low end mobile phones), and therefore the non-gzipped size of the JS is a concern.

In the past on Fay and GHCJS, I would throw JS output like this at Closure Compiler or UglifyJS, but you have to prepare JS output to cater well to Closure Compiler, and UglifyJS brings it to about 202K.

So, what are y’all doing to bring down those numbers? (Besides gzip and pray.)

I’m looking at zephyr, webpack and rollup. What are your experiences with these? webpack and rollup have wide use, so perhaps they can be considered the modern versions of Closure Compiler and UglifyJS. But I’m concerned about the realiability of something like zephyr which has a small community of users; how do I know it won’t break my code in subtle ways? Experiences?

Coming from a Haskell background I prefer to use Haskell binaries that I can pindown and shy away from installing node; else if a node app is the only option I’d definitely be using docker as a hazmat suit to protect against the moving parts ecosystem of nodejs.

Interested to hear any opinions and experiences.

2 Likes

I can’t give you much personal experience but I can point you to the Spago readme (https://github.com/purescript/spago#make-a-project-with-purescript--javascript). It covers packaging with both Parcel and Webpack (and less relevantly, Nodemon) so maybe check those out.

Great question. I’ve been wondering what we can do to get closer to Elm’s bundle sizes.

Edit: Looks pretty good now. Down to only a 2x gap with Halogen, parcel2, and zephyr.

Language / Framework Optimized (KB) Gzipped (KB)
elm 28 10
purescript react-basic-hooks 134 44
purescript halogen v5 113 24

repo

Edit2: Sizes with just spago bundle-app and parcel2 are not much different. +1 KB for react, + 2 KB for halogen.

Framework spago bundle-app parcel(2) build gzip
purescript react-basic-hooks 26 (PS only) 139 45
purescript halogen v5 316 124 26

Interesting that compressed output with Parcel1 is slightly better for react and slightly worse for halogen. But this difference does not seem significant enough to justify swapping over to Parcel2:

Framework spago bundle-app parcel(1) build gzip
purescript react-basic-hooks 26 (PS only) 140 44
purescript halogen v5 316 126 27

Original post:


Here’s what I’ve observed for equivalent apps:

Language / Framework Optimized (KB) Gzipped (KB)
elm 28 10
purescript react-basic-hooks 348 81
purescript halogen v5 680 127

Here’s the repo with the three apps.

@kevinbarabash and @chrisdone proposed some further minimization techniques in this issue.

Trying those out is on my todo list, but anyone else should feel free to give it a shot on the above example and see if we can get some better numbers.

4 Likes

I think the real optimizations will occur when type class constraints are optimized within the compiler. There’s an issue on the compiler repo that shows the before and after we would want to achieve, but I can’t recall what it is off the top of my head.

5 Likes

@kl0tl was able to achieve small bundles with rollup and ES modules.
Looks like this feature might make it in 0.15.

Is this the size of purs bundle output? If yes you should minify the bundle with UglifyJS, Terser or the Closure Compiler. You should get the best possible results by bundling with rollup and its PureScript and Terser plugin though (after optimizing the compiler output with zephyr).

ES modules are nice to shave even more bytes but zephyr really does most of the work, see https://github.com/purescript/purescript/pull/3791#issuecomment-615883448.

I recently tried bundling with zephyr and parcel a project which used spago and parcel only and the bundle size went from 600kb to 650kb so idk what to say about that

I’d be curious to see an example of that.

I was able to achieve smaller bundles with parcel2 and zephyr and updated my original post with the latest results.

I haven’t tested this bundling strategy with the ES modules change, but I’m mostly looking for workflows that we can recommend to users who aren’t up for rebuilding purs from source. I’ll revisit this at 0.15 to see if the numbers improve.

1 Like

I’ve noticed that the code generate for showing records inlines a big chunk of JavaScript for each call to show, e.g.

log $ show results

is turned into this:

return Control_Bind.discard(Control_Bind.discardUnit)(Effect_Aff.bindAff)(Effect_Class_Console.log(Effect_Aff.monadEffectAff)(Data_Show.show(Data_Show.showArray(Data_Show.showArray(Data_Show.showRecord()(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "functionName";
}))(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "isBlockCoverage";
}))(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "ranges";
}))(Data_Show.showRecordFieldsNil)(Data_Show.showArray(Data_Show.showRecord()(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "count";
}))(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "endOffset";
}))(Data_Show.showRecordFieldsCons(new Data_Symbol.IsSymbol(function () {
    return "startOffset";
}))(Data_Show.showRecordFieldsNil)(Data_Show.showInt))(Data_Show.showInt))(Data_Show.showInt)))))(Data_Show.showBoolean))(Data_Show.showString)))))(results)))

and then repeated for each copy log $ show results in the source file.

Are any of the minification methods mentioned in this thread able to extract code that’s duplicated like this into a helper function? If not, is this something that would make sense to add to the compiler’s code generation?

The compiler could definitely be smarter about specialisation and type class code generation in general, yes. That’s something we’ve been wanting to do (or at least vaguely thinking about doing) for quite a while.

1 Like

Stefan Fehrenbach wrote a detailed summary of an attempt to use Google Closure Compiler on PureScript’s output:

His conclusion is that PS needs to be tweaked to annotate constructors as such, and that inconsistent use of record fields as r.f or r[“f”] confuses Closure and would need to be ironed out.

I agree with his conclusion,

I don’t think it would be too hard to either change PureScript itself, or write an alternative backend, to produce Closure-compatible code. I still think the easy way to better performance is through using Closure, rather than reimplementing optimizations in the PureScript compiler.

And it’s likely I will lean towards that.

Everything else I’ve tried is woefully insufficient. Zephyr did nothing to my code, just made it smaller by about 1kb. Closure and UglifyJS can only do some name mangling and removing whitespace at this point, which a gzip compressor is better at. The key things to remove are dead code (which purs bundle is fine at), and removing curried functions. Phil is not interested in adding uncurrying to the compiler, so I’ll have to add it to a fork. This is a step that Closure will not do (because it’s not normal in JS), but is a huge saving in size and performance, and removing it permits Closure to do better work.

I’ve done this before on Fay, which was pretty much the same as PS. Curried, with also a thunk forcer for laziness. Give Closure the right hints and it can inline things away.

Watch this:

(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d,v){
    console.log(d.show(v));
  }

  function greet(d,v){
    return print(d,v + "!");
  }

  return greet(Show_string, "Mary");
})();

Here’s what Google Closure outputs:

console.log("Mary!");

Type class instance dictionaries removed!

However, stick currying in,

(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d) {
    return function(v){
      console.log(d.show(v));
    }
  }

  function greet(d){
    return function(v){
      return print(d)(v + "!");
    }
  }

  return greet(Show_string)("Mary");
})();

and here’s what you get:

(function() {
  function c(a) {
    return function(b) {
      console.log(a.show(b));
    };
  }
  return function(a) {
    return function(b) {
      return c(a)(b + "!");
    };
  }({show:function(a) {
    return a;
  }})("Mary");
})();

In other words, currying blocks Closure from doing its job. This is the one thing Closure is bad at. However, it is excellent and stable at everything else.

All we have to do is walk the tree and find:

  1. Definitions of functions with chains of function(x){ return function(y) { ... and call that “arity_2”.
  2. Check whether this function is saturated (called) with 2 arguments anywhere. If so, spit out the arity_2 version of it.
(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d) {
    return function(v){
      console.log(d.show(v));
    }
  }

  function greet(d){
    return function(v){
      return print(d)(v + "!");
    }
  }

  function print_2(d,v){
    console.log(d.show(v));
  }

  function greet_2(d,v){
    return print_2(d,v + "!");
  }

  return greet_2(Show_string, "Mary");
})();

Then closure produces the console.log("Mary!"); that we desire. It drops the unused unsaturated curried versions.

Writing up because I don’t have time to do this now, but may have time to do it later. I’ll need this comment to warm up my cache next time.

9 Likes

I thought I’d query the Closure Compiler GitHub repo about this, in case this is something that can be pushed down the pipeline instead of touching PureScript. I’m doubtful that a maintainer would add this just to satisfy our little community, but it’s worth asking.

2 Likes

I think that exact strategy is already implemented in https://github.com/Pauan/rollup-plugin-purs. My main worry with an optimization like this, though, is that I think it could mess with evaluation order in a way that could make some code slower, or even blow the stack, where it previously wouldn’t have done so. I’m not certain that this would be a problem, but I think we do need to be relatively certain that it won’t be a problem before we can endorse that approach in any way (eg. by adjusting the compiler output to help Closure do better optimizations).

3 Likes

I think that exact strategy is already implemented in https://github.com/Pauan/rollup-plugin-purs.

Not to be judgy about anyone’s work, but I had looked at that one and deemed it to be an experiment only, as the source is full of TODOs and questions, from 2017, and written in JS. Not to say that JS code is all bad, but I usually trust JS when e.g. 1k users have been using it for a few years. I’m not motivated to maintain that myself.

My main worry with an optimization like this, though, is that I think it could mess with evaluation order in a way that could make some code slower, or even blow the stack, where it previously wouldn’t have done so. I’m not certain that this would be a problem, but I think we do need to be relatively certain that it won’t be a problem before we can endorse that approach in any way (eg. by adjusting the compiler output to help Closure do better optimizations).

I agree with caution. That’s worth being careful about.

I think the obvious example is use of return function(){ .. } to delay evaluation for functions that run in Effect. On the other hand, most of these are defined in foreign modules, and as such could be annotated (/* don’t unwrap this function(){} */) or ignored wholesale.

My approach if I had the time would be to keep it additive; don’t change definitions. Add new definitions specializing different arities, and update call sites when they’re saturated to N args.

Other limits like “consecutive lambdas, not by the type” can avoid pitfalls: So \x y z -> if x then (\a -> a) else (\a -> a) has an arity of 3, not 4.

I would note on the topic of blowing the stack, that the current way that PS outputs code gives no reason to make me think I’m invulnerable to hitting stack limits with just normally written code. This worries me because I don’t have any “metrics” at the moment regarding this.

EDIT: It’s worth adding, if anyone knows some subtle gotchas, please point them out. I think this optimization is worth trudging through caveats.

3 Likes

Just as a reference, there’s a recent paper on optimising curried functions into uncurried: Kinds are calling conventions.

I’ve only skimmed over the paper, and I don’t expect we’d take the path they did to implement it, but I think it may have some useful insights into past solutions and issues with things like evaluation order (only a hunch that it’s useful to us).

6 Likes

the current way that PS outputs code gives no reason to make me think I’m invulnerable to hitting stack limits with just normally written code

Yes, this is definitely true - you do have to worry about your stack usage when writing PS code every now and again. To the extent that it’s reasonably possible, we try to handle that concern in libraries so that library consumers don’t need to worry about it, but this isn’t always possible. To clarify though, my concern is specifically to do with code which is currently stack-safe and could end up being transformed under this proposed approach so that it would no longer be stack-safe. https://github.com/purescript/purescript/pull/3439 is one example of how this sort of thing can happen.

1 Like

A Closure Compiler maintainer said they’re going to discuss it, which is cool. I’m not hopeful, as it’s a niche optimization. But if a dev gets inspired, it might be fruitful.

11 Likes

The google closure team have responded that this will not be implemented as a transformation. Their reasoning that this doesn’t affect most of their user base is reasonable, therefore the cost of implementing isn’t worth it for them.

So it looks like this returns to the PureScript community. I’m imagining a thin rewrite that happens before Closure, but your ideas may differ. I don’t have time to work on such an idea, presently.

3 Likes