Parsing Mustache templates - choice combinator confusion

YanikCeulemans · May 1, 2021, 2:15pm

Hi everybody,

I’m trying to write a parser for the Mustache template language for learning purposes and I’m getting stuck on parsing plain text portions. e.g.

render "\n  {{^inverted}}\n  test\n  {{/inverted}}\n" $ unsafePartial $ fromRight $ parseJson "{}"

The result here should be: \n test\n .
I’m using this plainTextParser to parse the plain text parts of the template

plainTextParser :: Parser Doc
plainTextParser = fromFoldableChars <$> many1Till anyChar endings
  where
  fromFoldableChars = Array.fromFoldable >>> fromCharArray >>> PlainText
  endings = choice $ lookAhead <$> endSymbols

  endSymbols =
    [ eof
    , symbol "{{^"
    , symbol "{{#"
    , symbol "{{/"
    , symbol "{{"
    , many1 space *> symbol "{{#" -- <-- This parser matches and causes the error
    , many1 space *> symbol "{{^"
    , many1 space *> symbol "{{/"
    ]
  symbol = string >>> void

However, using this parser results in an error where it expects spaces to be followed by “{{#” (and in my example, spaces were followed by {{^). i.e. (Left "{ error: Expected '{{#'., pos: 3 }"). I’m pretty sure the cause here is the many1 space parser succeeding and then trying to match the “{{#” which causes it to fail. However, I thought the choice combinator would try each parser and move on to the next one if one fails?

Link to the github repo: https://github.com/YanikCeulemans/pure-mustache
Link to the code sample above: https://github.com/YanikCeulemans/pure-mustache/blob/07c1f868228aecaec00386521b8cc586d0d2652b/src/Mustache.purs#L208

Any help is greatly appreciated!

Kind regards

natefaubion · May 1, 2021, 3:33pm

parsing and string-parsers don’t backtrack if a parser has consumed input. You need to opt-in to this by using try over the minimal parser prefix needed to commit to a branch. In your example many1 space consumes input, so it commits to the {{# branch. You can wrap these in try, but you want to be careful since too much backtracking leads to terrible errors with these simple parsing libraries.

Usually, when writing parsers like this, it’s common to specify a combinator that eats leading whitespace/comments, then you don’t need to handle the symbols both with and without whitespace.

-- Using many, rather than many1 means it's optional
token p = many space *> p

parseEndSymbols =
  token $ choice
    [ eof
    , symbol "{{^"
    , ...
    ]

It’s also more efficient in this case since it’s eating the whitespace once, rather than having to backtrack and do it all over again for the next production.

YanikCeulemans · May 2, 2021, 8:39am

Thank you for your response. This clears up much of my confusion.

However, won’t the use of many space always cause it to commit to that branch considering that it always succeeds? 0 spaces is a perfectly valid input after all?

If I change my example to use the token combinator like so:

plainTextParser :: Parser Doc
plainTextParser = fromFoldableChars <$> many1Till anyChar endings
  where
  endings = token $ choice endSymbols

  endSymbols =
    [ eof
    , symbol "{{^"
    , symbol "{{#"
    , symbol "{{/"
    , symbol "{{"
    ]

and then try it out in the REPL, i see the following:

> render "\n  {{^json}}\n  test\n  {{/json}}\n  " $ unsafePartial $ fromRight $ parseJson "{}"
(Left "{ error: Expected '{{'., pos: 15 }")

Which seems to indeed get stuck on the spaces before the “test” text inside the inverted section.

\n  {{^json}}\n  test\n  {{/json}}\n
                 ^

So what I think is happening is that the parser sees spaces, so it commits to the ending branch and expects to see any of the endSymbols

The other tricky problem I seem to be having is that my plainTextParser literally will accept any character until one of the end characters, that includes spaces and newlines ofcourse.

Does this mean that my only option is to use the try combinator?