Hi everybody,
I’m trying to write a parser for the Mustache template language for learning purposes and I’m getting stuck on parsing plain text portions. e.g.
render "\n {{^inverted}}\n test\n {{/inverted}}\n" $ unsafePartial $ fromRight $ parseJson "{}"
The result here should be: \n test\n
.
I’m using this plainTextParser to parse the plain text parts of the template
plainTextParser :: Parser Doc
plainTextParser = fromFoldableChars <$> many1Till anyChar endings
where
fromFoldableChars = Array.fromFoldable >>> fromCharArray >>> PlainText
endings = choice $ lookAhead <$> endSymbols
endSymbols =
[ eof
, symbol "{{^"
, symbol "{{#"
, symbol "{{/"
, symbol "{{"
, many1 space *> symbol "{{#" -- <-- This parser matches and causes the error
, many1 space *> symbol "{{^"
, many1 space *> symbol "{{/"
]
symbol = string >>> void
However, using this parser results in an error where it expects spaces to be followed by “{{#” (and in my example, spaces were followed by {{^). i.e. (Left "{ error: Expected '{{#'., pos: 3 }")
. I’m pretty sure the cause here is the many1 space
parser succeeding and then trying to match the “{{#” which causes it to fail. However, I thought the choice
combinator would try each parser and move on to the next one if one fails?
Any help is greatly appreciated!
Kind regards
parsing
and string-parsers
don’t backtrack if a parser has consumed input. You need to opt-in to this by using try
over the minimal parser prefix needed to commit to a branch. In your example many1 space
consumes input, so it commits to the {{#
branch. You can wrap these in try
, but you want to be careful since too much backtracking leads to terrible errors with these simple parsing libraries.
Usually, when writing parsers like this, it’s common to specify a combinator that eats leading whitespace/comments, then you don’t need to handle the symbols both with and without whitespace.
-- Using many, rather than many1 means it's optional
token p = many space *> p
parseEndSymbols =
token $ choice
[ eof
, symbol "{{^"
, ...
]
It’s also more efficient in this case since it’s eating the whitespace once, rather than having to backtrack and do it all over again for the next production.
1 Like
Thank you for your response. This clears up much of my confusion.
However, won’t the use of many space
always cause it to commit to that branch considering that it always succeeds? 0 spaces is a perfectly valid input after all?
If I change my example to use the token
combinator like so:
plainTextParser :: Parser Doc
plainTextParser = fromFoldableChars <$> many1Till anyChar endings
where
endings = token $ choice endSymbols
endSymbols =
[ eof
, symbol "{{^"
, symbol "{{#"
, symbol "{{/"
, symbol "{{"
]
and then try it out in the REPL, i see the following:
> render "\n {{^json}}\n test\n {{/json}}\n " $ unsafePartial $ fromRight $ parseJson "{}"
(Left "{ error: Expected '{{'., pos: 15 }")
Which seems to indeed get stuck on the spaces before the “test” text inside the inverted section.
\n {{^json}}\n test\n {{/json}}\n
^
So what I think is happening is that the parser sees spaces, so it commits to the ending branch and expects to see any of the endSymbols
The other tricky problem I seem to be having is that my plainTextParser
literally will accept any character until one of the end characters, that includes spaces and newlines ofcourse.
Does this mean that my only option is to use the try
combinator?