[SOLVED] Parsing: how to get all available matches?

I’m trying to find matches in a text, would seem to be basic stuff. But I can’t get it to work. Or rather, the code I expect to do the job keeps throwing EOF parsing error.

Given this:

module Main where

import Prelude

import Effect (Effect)
import Effect.Console (logShow)
import Parsing (runParser)
import Parsing.Combinators (many, manyTill)
import Parsing.String (anyChar, string)


main :: Effect Unit
main = do
  let matchFoo = manyTill anyChar (string "foo")
  logShow $ runParser "foo foo bar" (many matchFoo)

I keep getting

(ParseError "Unexpected EOF" (Position { column: 12, index: 11, line: 1 }))

Tried inserting various combinations of try and optional to no avail. This is also a minimal testcase from a somewhat larger parser that I’ve built that kept randomly failing.

Honestly, it feels like a bug, because many documentation says, quoting:

Match the phrase p as many times as possible.

So is it not correct for me to expect it to match matchFoo 2 times here? What am I missing?

Have you looked at manyTill_ and try?

Could you please elaborate, I’m not quite getting how these help. manyTill_ perhaps I could apply by replacing the manyTill I am using, but this doesn’t help with the error. And try “[on fail] backtracks the input stream to the unconsumed state”, which would result in infinite loop because at some point due to no matches stream will stop moving and will not be exiting either.

(useless text to avoid discourse error “body is similar to what you recently posted”)

Hm, I didn’t expect discourse shows up removed posts. I removed it because apparently I pressed the wrong “reply” button, so the other user wasn’t tagged. I do that sometimes on StackOverflow: if you posted a comment like 5 seconds ago, nobody had read it anyway, so remove it and repost it. Well, apparently it doesn’t work that well here…

The purescript-parsing library includes the design decision that when faced with a choice between two different parses, the second option is only tried if the first fails having consumed no input, and otherwise the failure (or success) of the first option is raised up the parser stack. many inherits this decision, as it is effectively a choice between running one more copy of its argument and stopping. So if the argument to many fails after consuming some tokens, the entire many will fail instead of backtracking to the last completed inner parse.

matchFoo will eat anyChar it sees, so when many runs matchFoo for the third time after the second "foo", it consumes " bar" and then fails because it ran out of tokens to eat without seeing a "foo". Having consumed a non-zero amount of tokens, many will propagate this failure up.

try is what you’re looking for because try matchFoo is a version of matchFoo that, when it fails, will always act as if it never consumed any tokens, because it backtracks the stream to where it was before matchFoo runs. However, it still propagates the failure up to many so many knows not to run its argument any more.

1 Like

Oh, I see, thank you for elaboration! Okay, so many $ try matchFoo it is. Indeed, this is not obvious, because if I’d be looking at this line without knowing about such library nuance, I’d be wondering why someone inserted try in here. Will try (no pun intended) to apply it to my other code, thanks!