CoreFn + purescript-backend-optimizer as JS backend for other languages

rolyp · September 27, 2022, 4:16pm

Hi,

[Mainly intended for Nate Faubion, but I’d be interested in others’ thoughts too :]

First off, congratulations on a great piece of work. I got ~25% performance improvement out of the box with purescript-backend-optimizer, without directives.

I’m super-excited about the possibility of using your backend as a JS compiler for my own research language (currently implemented in PureScript), and I was interested in your thoughts. My language is a plain untyped functional language, with the interesting bit (dependency-tracking) happening in the interpreter. (GitHub here, some background here.) My plan is to do the dependency-tracking via a source-to-source translation rather than a specialised interpreter, which I can then compose with a translation to JavaScript for performance. My hope is that I can just generate CoreFn and have backend-optimizer do the rest!

Does this sound plausible, and a valid/interesting application of your backend? Any gotchas I might need to be aware of?

Couple of minor questions about purescript-backend-optimizer and the JS encoding:

I had some JavaScript FFI code that made some assumptions about the structure of PureScript objects – in particular that fields of data values were called value0, value1, etc. Fixing it was an easy search-and-replace to change these to _1, _2, but I was wondering if there was a more portable way of writing such FFI code in the first place? (Given that these new names are not necessarily going to be stable over time either.)
it looks like you generate fields called _2, _3 etc if there is any constructor of the datatype with that many arguments. Is that necessary, or could you just generate the fields appropriate to the tag? (Sorry if I’m misunderstanding. I realise it’s harmless as is.)

many thanks,
Roly

natefaubion · September 27, 2022, 9:18pm

I think that CoreFn makes sense if your language follows the semantics listed in the README:

If your untyped functional language is not pure, then you should not use backend-optimizer as it will probably trash your code. If you can encode effectfulness through foreign code, like we do in CoreFn, then you can provide semantics to lift it into the backend-optimizer machinery.

github.com

aristanetworks/purescript-backend-optimizer/blob/main/src/PureScript/Backend/Optimizer/Semantics/Foreign.purs#L72


      
          , data_ord_ordString
          , data_ring_intSub
          , data_ring_numSub
          , data_semigroup_concatArray
          , data_semigroup_concatString
          , data_semiring_intAdd
          , data_semiring_intMul
          , data_semiring_numAdd
          , data_semiring_numMul
          , data_string_codePoints_toCodePointArray
          , effect_bindE
          , effect_pureE
          , effect_ref_modify
          , effect_ref_new
          , effect_ref_read
          , effect_ref_write
          , partial_unsafe_unsafePartial
          , record_builder_copyRecord
          , record_builder_unsafeDelete
          , record_builder_unsafeInsert
          , record_builder_unsafeModify

I had some JavaScript FFI code that made some assumptions about the structure of PureScript objects – in particular that fields of data values were called value0, value1, etc. Fixing it was an easy search-and-replace to change these to _1, _2, but I was wondering if there was a more portable way of writing such FFI code in the first place? (Given that these new names are not necessarily going to be stable over time either.)

I would recommend never relying on data constructor or type class representations in FFI, as the compiler should be free to optimize the representation. I would pass in eliminators/accessors to FFI as needed to extract fields or convert it to a foreign representation so you can provide the stability guaranty.

it looks like you generate fields called _2, _3 etc if there is any constructor of the datatype with that many arguments. Is that necessary, or could you just generate the fields appropriate to the tag? (Sorry if I’m misunderstanding. I realise it’s harmless as is.)

Data constructor padding ensures monomorphic dispatch in JIT inline caches. The data constructor representation is a large part of the runtime improvement. Without the padding, you either need extra boxing and indirection (memory bloat), or you risk degrading property lookup performance if you have a variety of constructor sizes in a tagged union, which directly effects tag dispatch. JIT engines will keep a polymorphic inline cache of around 4 “shapes”, and if your tagged union has more shapes than what fits in the polymorphic inline cache, it will degrade to megamorphic dispatch (which is essentially hashtable lookups based on the property name). Any case on these data constructors will be quite slow. Therefore, padding constructors so they always have the same fields in the same order ensures the JIT always sees a single shape, and thus can use optimal monomorphic dispatch for data constructor fields.

If you were to remove the padded arguments, it would continue to work, but you would see variability in performance.

rolyp · September 29, 2022, 7:36am

Great. My language is pure so far, so when I come to think about effects, I’ll follow the Foreign pattern. And thanks for suggestion re. passing in accessors/eliminators to FFI code, that sounds sensible.

I’m not sure I understand what the treatment of non-totality means in practice. I have a convention where I bomb out with unsafePerformEffect (throw msg) in “absurd” situations – i.e. points in the execution that (if reached) means there is a bug in my application. So perhaps the upshot is that programs that exhibit such behaviours may behave differently (terminate earlier or later) with purescript-backend-optimizer. (If so, that sounds fine, as these are all programs that already have undefined behaviour as far as my application is concerned.)

thanks again
Roly

artemisSystem · October 4, 2022, 8:46am

Your convention of crashing on unreachable code is not uncommon, in fact there is a function speficially for that in core: Partial.Unsafe - purescript-partial - Pursuit