Feature request: String interpolation

i-am-the-slime · November 29, 2019, 11:42am

Although I know Strings are evil etc. it’s often possible to construct some. The problem with:

  "a really really" <>
  "long string " <>
  age

Is that it’s easy to forget spaces. Would there be interest in string interpolation syntax?

hdgarrood · November 29, 2019, 12:10pm

Two thoughts: firstly, you can use the same “whitespace gap” syntax as Haskell has. If you have a backslash followed by any whitespace within a string, then all whitespace is ignored up until the next backslash, so you can write e.g.

x = "a really really \
  \long string \
  \" <> age

although this doesn’t really address the issue that it’s still easy to forget spaces around interpolated values from variables like age.

Secondly, I have to give the usual spiel about how syntax additions are, in general, pretty unlikely to be accepted, as they are kind of antithetical to language stability, which I think is one of the most highly desired things across the community. In particular, syntax additions are a problem for tooling such as syntax highlighters or autoformatters. Then again I want to add \u character escape sequences like JavaScript has (https://github.com/purescript/purescript/issues/3750) so maybe this is a little hypocritical of me.

michaelficarra on github asks: could this be added in a backwards-compatible way? I think it could, if we used a backslash escape sequence for them. Perhaps, for instance:

greet name = "hello, \#{name}"

which could desugar to

greet name = "hello, " <> name

One problem, though, is what types are allowed inside an interpolation block. In dynamic languages it makes sense to insert an implicit toString call inside any interpolation block, so that you can put anything in and hopefully it will usually do what you want. In the context of dynamic languages, that’s acceptable, because there is already an expectation that the language can’t help you if you have null when you expected a string or whatever.

The PureScript analog of this approach would probably be adding a type class Display, with a member display :: forall a. Display a => String, and inserting an implicit display in interpolation blocks (note: we can’t use Show, because a number of instances won’t do what we want - consider e.g. Show String and Show (Maybe a)).

However, I think there’s also a strong case to be made for requiring interpolation blocks to be strings, so that the conversion to string (if any) is explicit.

Benjmhart · November 29, 2019, 2:39pm

if we’re talking about a world where PS syntax gets extended in a backwards compatible way:

I’d really love to see this as a feature that corresponds to JS template string literals. the idea of a Display typeclass is also pretty great. use of the anonymous argument would be great. backticks probably wouldn’t work since they’re used for inlining (I think other languages will use triple quotes), but for these examples, lets assume backticks for templates just to demonstrate some ideas:


hellofunc :: Display x => x -> String
hellofunc = `Hello, ${_}!` 

hellofunc2 :: Display x => x -> String
hellofunc2 x = "Hello, " <> display x <> "!"

With the anonymous argument, these two functions would be equivalent.

I for one would really value how terse this could make string construction while maintaining type safety.

I’m not too concerned with explicit conversions for something that is a language feature. however this would add to the typeclass ‘load’ that a beginner - even one coming from haskell would need to grok in order to be proficient with the language features. perhaps we default it to Show instead of Display, but if you want to use a custom conversion you just supply the string, that would probably be the best compromise of intuitive usage and explicit conversion.

blankhart · November 29, 2019, 2:42pm

Scala also added string interpolation late. As a point of comparison on syntax and decoding https://docs.scala-lang.org/overviews/core/string-interpolation.html

hdgarrood · November 29, 2019, 3:05pm

I share the desire to introduce as little machinery as possible but I don’t think Show is an option, because it will just do the wrong thing far too often. You will easily end up with things like, say, “Hello, (Just “Jos\x0000e9”)” or “Your balance is (fromNumber 14.34)”. Show instances are almost always intended for use in the repl, whereas I suspect this feature would mostly be used in very different settings, eg for generating strings which are to be used in user interfaces. In fact I am struggling to think of any Show instances which will do what you want here other than Int and Number.

blankhart · November 29, 2019, 3:55pm

IIRC Rust has a separate Debug display typeclass for that reason

jy14898 · November 29, 2019, 4:12pm

Probably not worth going down this route, but if we were also trying to replicate JavaScript and its syntax (so that we can just compile to the same syntax, after transforming our variable names inside the literals), tagged template literals don’t necessarily need their expressions and result to be Strings, for eg:

"use strict";

function expression(strings, a, b) {
  switch(strings[1]) {
  case "+":
    return a + b;
  ...
  };
};
var a = 10;
var b = 20;
var c = expression`${a}+${b}` 
// c === 30

exports.expression = expression;
// String.raw is the default tag for template literals
exports.raw = String.raw;

exports.mkTag = function (tag) {
  return function (strings, ...args) {
    return tag(strings, args);
  };
};

As tags would now be first class, the user can just pick if they want the Show/Display etc abstraction or not. Biggest limitation of this is that now all expressions have to be the same type, as below I’m using Array a, and not some other ordered product representation. Might be possible to overcome that with a type class however

-- Can't use normal functions to type this, as it has var args
-- JS template implementation guarantees that num strings = num expr + 1
foreign import data Tag :: Type -> Type -> Type
foreign import expression :: Tag Int Int
foreign import raw :: Tag String String
foreign import mkTag :: forall a b
    . (Array String -> Array a -> b)
  -> Tag a b

class Display a where
  display :: a -> String

displayTag :: forall a. Display a => Tag a String
displayTag = 
  mkTag \strings args -> someIntercalater (map display args) strings

sum = expression`${a}+${b'}`
  where
    a = 10
    b' = 20 -- note this will require us parsing the template and swapping
            -- b' for b$prime (or whatever we change it to)

helloWorld = raw`${hello}, ${world}!` -- displayTag`...` would also work if we had a String instance
  where
    hello = "Hello"
    world = "world"

The point of this approach would be to avoid doing as much as possible in the compiler, only require renaming variables inside the template/checking they exist` and marking them as used (or whatever, just so we don’t optimise them away), and then emitting almost the same code as the source purs. Actually I suppose there’s more work than that, like all transformations such as 123 -> 123|0, but I imagine it’s not too difficult to do?

Benjmhart · December 13, 2019, 4:44pm

as long as Display is a derivable typeclass I think it’s fine.

Tagged literals would also be cool. - it would make it easier to port something like polymer/lit-html into PS

natefaubion · December 20, 2019, 4:02am

I’m not opposed to string interpolation syntax, but it’s a surprisingly complicated feature:

We have almost no new syntax to give it aside from an escape code in “normal” string literals. Backticks are not an option since we use that for infix expressions. I’m going to assume \{ ... } escape syntax (or something equivalent like \${ ... } or whatever sigil you choose).
The issue always comes down to how you lex and parse strings then.
String interpolation must emit a series of delimiter tokens. It’s no longer a single token, but you must have things like TokStringStart, TokStringMid, TokStringEnd to represent the different boundaries because they can contain arbitrary expressions.
We already use }, ], and ) for other delimiters so this will require a stateful, context-sensitive lexer. It must know that it has emitted string “start” or “mid” token in order to decide how it should lex the delimiter.
Parsing literals is no longer a matter of casing on a single token, but now must consume an unbounded number of tokens.

All of these are surmountable of course, but are also very non-trivial.

I’m going to throw out an alternative using typeclases (and instance chains!):

class Interp a where
  interp :: String -> a

instance interpString :: Interp String where
  interp a = a
else instance interpFunction :: Interp a => Interp (String -> a) where
  interp a b = interp (a <> b)
else instance interpShow :: (Show b, Interp a) => Interp (b -> a) where
  interp a b = interp (a <> show b)

i = interp

test = i "foo" 42 "bar" true "baz"

There are a few things to note about this:

It requires no language changes.
You can write any interpolation function you want this way (for example, trimming and separating by spaces).
You can add directives in the middle of interpolation (it can be an extensible DSL).
It’s the same number characters typed as “normal” interpolation
With a very simple inliner it will compile to the code you’d write by hand.

i-am-the-slime · December 20, 2019, 10:04am

I love it . I am convinced.
I’m not sure about using Show for anything but basic types but that is now up to the implementor of the function you showed above.

paluh · December 22, 2019, 11:06am

@natefaubion Amazing!

Maybe something like some custom ~~Format~~ Display class instead of Show with some default instances for basic types and some more configurable newtypes too could be packaged up?
@i-am-the-slime, @natefaubion what do you think? Do you have any plans related to packaging this solution?

natefaubion · December 22, 2019, 10:45pm

I don’t have any plans to package this.

danielo515 · January 7, 2020, 3:12pm

Would it be possible to use the proposed implementation as a library?

JordanMartinez · January 11, 2020, 2:16am

The only issue with this implementation (which is pretty minor considering the brilliance of the solution and the minor inconvenience) is that the first argument must always be a String value.

For example, this code fails to compile:

interp 42 " apples and " 52 " oranges."

  Could not match type
       
    Int
       
  with type
          
    String
          

while checking that type Int
  is at least as general as type String
while checking that expression 42
  has type String
in value declaration main

I’m not yet sure whether it’s possible to get around that for the below reasons.

If I define a local binding that applies an empty String argument to interp, I can get around this:

main = do
 let interp' = interp ""
 log $ interp' 42 bar "baz" true

However, using the same binding in two different ways will produce problems:

main = do
 let interp' = interp ""
 log $ interp' 42 "baz" true
 log $ interp' 42 true "baz" true
 -- `Boolean` (from true) does not unify with String (from "baz")

natefaubion · January 11, 2020, 8:29pm

This is because of interp :: String -> a, so the initial application is fixed to String. You could probably reformulate this with just interp :: a, or make interp :: a -> b with a multi-parameter typeclass.

JordanMartinez · January 12, 2020, 12:02am

I wasn’t sure how to encode the type class using a multi-parameter typeclass. Everything I’ve tried runs into a problem sooner or later. Your solution seems to work only because the first argument is hard-coded to String.

i-am-the-slime · January 12, 2020, 10:34am

I think if you just change i to be

i = interp ""

you’re good.

natefaubion · January 12, 2020, 3:24pm

Right, this is due to (lack of) let generalization. You need a type signature if it’s in a let.

i :: forall a. Interp a => a
i = interp ""

This is generalized in a top-level declaration, but not in a let binding.

JordanMartinez · January 12, 2020, 3:34pm

Ah… that’s why my original interp' = interp "" didn’t work.

Otherwise, yeah, this works:

i :: forall a. Interp a => a
i = interp ""

JordanMartinez · January 12, 2020, 8:25pm

So, I’ve created purescript-interpolate to do this, but I’m getting stuck when publishing this library.

How do I fix the issue?