Feature request: String interpolation

natefaubion · December 20, 2019, 4:02am

I’m not opposed to string interpolation syntax, but it’s a surprisingly complicated feature:

We have almost no new syntax to give it aside from an escape code in “normal” string literals. Backticks are not an option since we use that for infix expressions. I’m going to assume \{ ... } escape syntax (or something equivalent like \${ ... } or whatever sigil you choose).
The issue always comes down to how you lex and parse strings then.
String interpolation must emit a series of delimiter tokens. It’s no longer a single token, but you must have things like TokStringStart, TokStringMid, TokStringEnd to represent the different boundaries because they can contain arbitrary expressions.
We already use }, ], and ) for other delimiters so this will require a stateful, context-sensitive lexer. It must know that it has emitted string “start” or “mid” token in order to decide how it should lex the delimiter.
Parsing literals is no longer a matter of casing on a single token, but now must consume an unbounded number of tokens.

All of these are surmountable of course, but are also very non-trivial.

I’m going to throw out an alternative using typeclases (and instance chains!):

class Interp a where
  interp :: String -> a

instance interpString :: Interp String where
  interp a = a
else instance interpFunction :: Interp a => Interp (String -> a) where
  interp a b = interp (a <> b)
else instance interpShow :: (Show b, Interp a) => Interp (b -> a) where
  interp a b = interp (a <> show b)

i = interp

test = i "foo" 42 "bar" true "baz"

There are a few things to note about this:

It requires no language changes.
You can write any interpolation function you want this way (for example, trimming and separating by spaces).
You can add directives in the middle of interpolation (it can be an extensible DSL).
It’s the same number characters typed as “normal” interpolation
With a very simple inliner it will compile to the code you’d write by hand.

i-am-the-slime · December 20, 2019, 10:04am

I love it . I am convinced.
I’m not sure about using Show for anything but basic types but that is now up to the implementor of the function you showed above.

paluh · December 22, 2019, 11:06am

@natefaubion Amazing!

Maybe something like some custom ~~Format~~ Display class instead of Show with some default instances for basic types and some more configurable newtypes too could be packaged up?
@i-am-the-slime, @natefaubion what do you think? Do you have any plans related to packaging this solution?

natefaubion · December 22, 2019, 10:45pm

I don’t have any plans to package this.

danielo515 · January 7, 2020, 3:12pm

Would it be possible to use the proposed implementation as a library?

JordanMartinez · January 11, 2020, 2:16am

The only issue with this implementation (which is pretty minor considering the brilliance of the solution and the minor inconvenience) is that the first argument must always be a String value.

For example, this code fails to compile:

interp 42 " apples and " 52 " oranges."

  Could not match type
       
    Int
       
  with type
          
    String
          

while checking that type Int
  is at least as general as type String
while checking that expression 42
  has type String
in value declaration main

I’m not yet sure whether it’s possible to get around that for the below reasons.

If I define a local binding that applies an empty String argument to interp, I can get around this:

main = do
 let interp' = interp ""
 log $ interp' 42 bar "baz" true

However, using the same binding in two different ways will produce problems:

main = do
 let interp' = interp ""
 log $ interp' 42 "baz" true
 log $ interp' 42 true "baz" true
 -- `Boolean` (from true) does not unify with String (from "baz")

natefaubion · January 11, 2020, 8:29pm

This is because of interp :: String -> a, so the initial application is fixed to String. You could probably reformulate this with just interp :: a, or make interp :: a -> b with a multi-parameter typeclass.

JordanMartinez · January 12, 2020, 12:02am

I wasn’t sure how to encode the type class using a multi-parameter typeclass. Everything I’ve tried runs into a problem sooner or later. Your solution seems to work only because the first argument is hard-coded to String.

i-am-the-slime · January 12, 2020, 10:34am

I think if you just change i to be

i = interp ""

you’re good.

natefaubion · January 12, 2020, 3:24pm

Right, this is due to (lack of) let generalization. You need a type signature if it’s in a let.

i :: forall a. Interp a => a
i = interp ""

This is generalized in a top-level declaration, but not in a let binding.

JordanMartinez · January 12, 2020, 3:34pm

Ah… that’s why my original interp' = interp "" didn’t work.

Otherwise, yeah, this works:

i :: forall a. Interp a => a
i = interp ""

JordanMartinez · January 12, 2020, 8:25pm

So, I’ve created purescript-interpolate to do this, but I’m getting stuck when publishing this library.

How do I fix the issue?

hdgarrood · January 13, 2020, 4:01pm

Ah, that’s a problem. We will have to amend the instructions for publishing to Pursuit. Thanks for bringing this to my attention!

JordanMartinez · January 13, 2020, 8:19pm

I’ve opened an issue for it here: https://github.com/purescript/pursuit/issues/402

JordanMartinez · January 15, 2020, 2:48am

Library has been published. See its installation instructions. PR to the official package set builds and is awaiting review. If you have an idea for building off of this, see how to refer to it in your bower.json file

Also, don’t use this library when doing a fold (e.g. foldl i "" arrayOfInts) until the inliner optimization is done. See this benchmark, which I hope isn’t naive.

Edit: the above benchmark was implemented incorrectly. See this comment for an accurate one.

Kamirus · January 15, 2020, 11:56am

So I saw the benchmark, the blue line indicates standard append and it blows up, while I guess the red should have, right?

I’ve run this benchmark with changed functions with one size = 10000.
Raw data results (because making a graph is beyond my skills )

foldl (\a b -> show a <> show b) "" array
causes JavaScript heap out of memory
0.001428 mean of foldl (\a b -> a <> show b) "" array
0.001227 mean of foldl i "" array
0.001106 mean of foldl interp "" array

In the original benchmark (1) is blue (the blow-up one) and (3) is red.

Remarks:

Running show on an accumulated string breaks my memory
Difference between (2),(3),(4) is negligible - using fold with interp is ok here

I think the performance problem is not here, but when you write a long expression with interp (or when it is generated with another typeclass generic magic)

Example

Simple usage of interp vs. append

fooI = i "a" 1 "b" 2 "c" 3
fooA = "a" <> show 1 <> "b" <> show 2 <> "c" <> show 3

And the compiled JS (without inlining)

var fooI = function (dictInterp) {
    return Data_Interpolate.i(
      Data_Interpolate.interpStringFunction(
        Data_Interpolate.interpIntFunction(
          Data_Interpolate.interpStringFunction(
            Data_Interpolate.interpIntFunction(
              Data_Interpolate.interpStringFunction(
                Data_Interpolate.interpIntFunction(
                  dictInterp)))))))("a")(1)("b")(2)("c")(3);
};

var fooA = "a" + (Data_Show.show(Data_Show.showInt)(1) + ("b" + (Data_Show.show(Data_Show.showInt)(2) + ("c" + Data_Show.show(Data_Show.showInt)(3)))));

JordanMartinez · January 15, 2020, 3:57pm

You can convert the outputted json file into a graph by uploading it here and then exporting the result as an SVG or PNG file: http://harry.garrood.me/purescript-benchotron-svg-renderer/

JordanMartinez · January 15, 2020, 3:59pm

Looking at my original benchmark again, I realized I misread it. I thought the blue line was the i a b one, not the a <> b one…

JordanMartinez · January 16, 2020, 8:27pm

So, the original benchmark I created used \acc next -> show acc <> show next when it should have been \acc next -> acc <> show next. @paluh pointed this out.

Here’s the correct benchmark, which better reflects my expectation:

FranklinYu · June 25, 2020, 9:58pm

Did anyone check out purescript-template-strings or purescript-formatting?