Boosting performance of Map - HashMap, ST, or FFI


gregberns [10:39 PM]
hey I’m doing a PS POC to prove out performance compared to C#/raw JS, and having some pretty significant performance issues with Map, specifically when trying to insert 1 million times into a Map. I’m assuming the creation of new arrays each time is creating the issue. Are there any optimized libs that are faster? ST performance is OK, and I might be able to do some FFI work but not excited about it.
Any thoughts on other structures…. really anything to get performance times up??

John Mendonça [10:42 PM]
I know there is but i dont know about its performance, etc

gregberns [10:43 PM]
I was looking at that and I’ll go that direction if I need to, but it seems like some of my type support disappears.

monoidmusician [10:46 PM]
it has the same amount of type safety, it’s just a different implementation of Map String, essentially
it has a nice mutable API through ST, and obviously quick lookups through the native JS operation, but it scales worse at other operations (immutable insertion is O(n) I believe, and there’s no sharing done – but I suspect you don’t care about that)

natefaubion [10:50 PM]
Are you just doing a benchmark, or do you really need a million items in a Map?

gregberns [10:51 PM]
hmmm… I can transform my record to Array (Tuple String Record) which gives me my keys and records. Then convert that to the Object. I’m kinda hesitant (probably cause I havent worked with it) just because it might look challenging to others coming in
@natefaubion right now its a POC doing benchmarking. but the organization is totally new to PS and so were a little hesitant to need to do anything too drastic or difficult to understand to get performance gains. After 2 days of perf work, were at 2 minutes compared to 7-10 seconds with C# and it looks like its all about insertion into Map

natefaubion [10:54 PM]
I don’t think you can compare JS objects with C# let alone PS
They are in totally different leagues
If your usage is predicated on insertion of millions of items into a Map, you are using the wrong platform :stuck_out_tongue:

gregberns [10:55 PM]
haha well… yea I get that, but… with raw JS 1M insertions into an object vs a C# dict is about the same

natefaubion [10:56 PM]
I would use ST then
which is going to be roughly the same as the JS version that mutates an object
But make sure you use unsafeFreeze

gregberns [10:56 PM]
We used ST and got some gains, but its definitely still not there

natefaubion [10:56 PM]
Then use JS. That’s why the FFI exists :slightly_smiling_face:
You should not expect PS to be as fast as raw JS
Map is an immutable tree data structure, so GC churn for a million items is going to be insane

gregberns [10:57 PM]
please dont get me wrong, I do get that, just struggling to figure out a better solution.
haha yea!

natefaubion [10:58 PM]
How are you using ST?
vs the JS solution?
I think if you are ever in the situation where you need to insert a million items as fast as possible, you should use JS. I would be very surprised if that is the case, and in that case I would question why you want to do it in PS with a JS backend :smile:

gregberns [11:00 PM]
we’ve been looking at using the FFI and I think I have something decent for array, but once the insertions are complete I’d really like to return a Map instead of our own object

natefaubion [11:00 PM]
There is no way you can build a Map with a million items in reasonable time

gregberns [11:01 PM]
We are totally open to Haskell, but… record syntax.

natefaubion [11:01 PM]
For a million items there is probably a better data structure than PS Map
You could try the HashMap implementation

gregberns [11:02 PM]
For the POC there are two parts: first build an initial Map, which is just insertions based on Key. The second part is to do a lookup, modify the object, and insert it back in.

natefaubion [11:02 PM]
But anything immutable is going to be rough
for initial construction
If you want fast initial construction you absolutely need a data structure that supports an ST API

gregberns [11:04 PM]
I didnt know about HashMap. Any idea if its lookup perf will be slower?

natefaubion [11:04 PM]
Map does not support that. Building a Map from an Array of Tuples is no different than just calling a bunch of inserts

gregberns [11:04 PM]
(I will definitely try it)

natefaubion [11:05 PM]

gregberns [11:05 PM]

Building a Map from an Array of Tuples is no different than just calling a bunch of inserts

Yea that’s what I realized after looking at the source code :slightly_smiling_face:

natefaubion [11:06 PM]
However if they are already ordered, I would just use binary search on the array
I think someone wrote an API for that

gregberns [11:07 PM]
Thats an interesting idea as well
Here’s the ST test we wrote (note: I knew nothing about ST 2 days ago)

test :: Effect Unit
test = do
    inner :: forall r. Array Int -> ST r (Object Int)
    inner arr = do
      map' <- new :: ST r (STObject r Int)
      _ <- foreach arr (\i -> poke (show i) (i * 10) map' *> pure unit)

      freezeST map'

  let arr = range 1 1000000
  log "start"
  t0 <- now
  let a = run (inner arr)
  t1 <- now
  log $ "End " <> show (t1 - t0)
  log ""

natefaubion [11:10 PM]
You should use unsafeFreeze
freezeST will copy the object
which will just build it all over again

gregberns [11:10 PM]
I was wondering about that…

natefaubion [11:11 PM]
you should also use do syntax instead of *> if you are really serious about eking out performance

natefaubion [11:11 PM]
since in 0.12.1 it’s optimized to statements

gregberns [11:11 PM]
… what do I need to change to do that??

natefaubion [11:11 PM]

foreach arr \i -> do
  _ <- poke ...
  pure unit

magic-do does not optimize applicative operators
it probably should

gregberns [11:13 PM]
ah alright I’ll give that a try too

natefaubion [11:14 PM]
I would expect that to still be a good bit slower though, because it will still allocate thunks for the effects
Unfortunately there’s no EffectFn analog for ST

gregberns [11:16 PM]
I think that test took something like 7 seconds vs another similar example that was at 14 sec (w/o ST), so we did cut it in half. I’ll re-run the test with those two changes.

Unfortunately there’s no EffectFn analog for ST

yea that’s what we were wondering, if there could be a way to do all the mutations in an Effect.
Oh here was another question I had. If we do just use ST, is there any way to get away from using STObject/ get away from returning a generic Object and having the rest of the system work on that?

natefaubion [11:22 PM]
ST is just an FFI type, you can do anything in ST if you write FFI to do it

gregberns [11:22 PM]
ah, ok that makes sense

natefaubion [11:23 PM]
That HashMap implementation supposedly has an efficient implementation of fromArray
that uses a mutable API
I can’t speak for the library though
its backed by a lot of JS so hard to verify :smile:

gregberns [11:29 PM]
gottcha, alright I’ll take a look at that instead. I cant think of any reason to dislike HashMap at this point.
Thanks for all your help. We’ve been working the last 2 weeks in PS and its the deepest dive into a Haskell-like lang and I am loving it and am about willing to do anything to stick with it. Love the “it compiles, it works”… coming from JS and C#, my life is just better :slightly_smiling_face:

natefaubion [11:31 PM]
I would encourage you to stick with it, but be aware that is a relatively naive language, runtime-wise (at least the primary JS backend)
It gives you a few tools, but it is not really optimized at all

gregberns [11:36 PM]
Were doing all we can to prove it out. The CIO is concerned with the performance of Node vs C#, and I think we can decently prove the Node/C# is generally neglagible. We just need to mitigate any serious concerns about PS’s performance. I dont think we need to be equal, but I think were 20 times slower now. We’re just trying to get that down as much as possible… and not use too many ugly tricks (ST).
(ugly tricks: hard for noobs to understand)

gregberns [12:12 AM]
@natefaubion the ST changes you suggested halved the time.
– 727.0 w/ unsafeFreeze and do
– 984.0 w/ freeze and do
– 1574.0 w/ freeze and NOT do
– 1298.0 w/ unsafeFreeze and NOT do
The HashMap impl is worse, but the implementation ‘looks’ very clean and readable
– 2936.0 w/ HashMap

gregberns [12:26 AM]
HashMap just 3x it with our process (1m entries)
Insertion with Map: 10138.0
Insertions with HashMap: 2857.0