Remove unicode from parsing because of bundle size?

Continuing the discussion from Dear purescript-parsing users,:

On the subject of purescript-parsing:


Now that yall mentioned unicode, I wonder, is the dependency on the unicode package really necessary? I’m pretty sure I had to drop this lib from a previous project because of the insane bundle sizes (ended up ffi-ing to nearley )


I agree that purescript-unicode leads to excessive bundle sizes. I don’t think there’s a reason that it couldn’t be trimmed significantly, but it would require someone making a dedicated effort to help with that.

I have questions about this. Doesn’t everyone, in practice, run some kind of “dead-code-elimination,” “tree-shaking,” “minimization” when they bundle their code? And doesn’t that remove all of the parts of the unicode package which are not used?

I could remove the unicode dependency from Parsing.String.Basic pretty easily and just explain how to combine i.e. isLetter with satisfyCodePoint. But is it worth it? Would that actually, in practice, improve bundle sizes?

The main issue with Unicode is the lookup tables. Realistically, calling anything in purescript-unicode is going to incur a dependency on these lookup tables. One could certainly avoid these particular parsing functions, and tree shaking should avoid including unicode. It would be worth testing how well that works in practice.

1 Like