Parsing: how to correctly advance on fail?

Related to “how to get all available matches”. I want the usual behavior you’d expect from regexp engines: for every character, try to match, on fail advance by one. E.g. if you do echo foo ffoo foo | grep foo you’ll get all 3 matches, because when grep reaches ffoo and fails to match ff with fo, it simply advances by a single character and tries again.

Now, I know I could do something like myParser <|> anyChar and I was doing that. However, for a 7kb file with just 2 matches this was returning a List with thousands of Nils, which is both slow and then have to be filtered out.

So my question is: how can I achieve the behavior of simply advancing the parser internal index by one on fail without returning anything related to the failure?

Code for the grep example:

module Main where

import Prelude

import Effect (Effect)
import Effect.Console (logShow)
import Parsing (runParser, Parser)
import Parsing.Combinators (many, try)
import Parsing.String (string)

myParser :: Parser String String
myParser = string "foo"

main :: Effect Unit
main = do
  logShow $ runParser "foo ffoo foo" (many $ try $ myParser)

I want this to return 3 matches and perhaps a single Nil for the end, but definitely not a dozen.

2 Likes

I think that’s provided in the Parser.String.Replace module:
https://pursuit.purescript.org/packages/purescript-parsing/10.2.0/docs/Parsing.String.Replace#v:splitCap

1 Like

Thanks, that might work, though neither performant due to the Left return values, nor convenient for the same reason.

Additionally, today I stumbled upon orthogonal challenge but related solution: I have a parser that returns HashTable with a single match, whereas my ultimate goal is having a single HashTable with all matches. So, inconveniently, I had to concatenate the tables. If a solution exists, it would be similar to the discussed problem, because I’d have to modify results of many after every match, whether it be dropping a result entirely (e.g. if I matched anyChar branch in a myParser <|> anyChar-like expression), or inserting to the table.

After sitting for a while I came up with a solution:

module Main where

import Prelude

import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, runParser)
import Parsing.Combinators (optionMaybe)
import Parsing.String (anyCodePoint, string)

-- Search for all matches in the text
manyViaRetries :: ∀ r. Parser String r -> Parser String (Array r)
manyViaRetries p = manyViaRetries' p []
  where
    manyViaRetries' _ acc = optionMaybe p >>= case _ of
      Just v  -> manyViaRetries' p $ acc <> [v]
      Nothing -> optionMaybe anyCodePoint >>= case _ of
        Just _  -> manyViaRetries' p acc -- try again at next position
        Nothing -> pure acc              -- end of string

myParser :: Parser String String
myParser = string "foo"

main :: Effect Unit
main = logShow $ runParser "foo ffoo foo" $ manyViaRetries myParser
1 Like

Well, after applying this code to the real parser it turned out it fails randomly, and upon reducing to the minimal testcase it turns out this is due to Parsing module making distinction between “failing without consuming” and “failing with consuming”. Yeah, different kinds of “fails”.

“What that means?” you’re asking? That means the above works for non-matching string "foobar" but will fail for non-matching string "foo" *> string "bar". Here, optionMaybe is documented as:

[returns] Nothing in the case where a parser fails without consuming input.

So a simple string "foobar" fails without consuming, but anything more complicated will consume input before failing, so parser failure will propagate past optionMaybe via the monad.


I’ve no idea what to do. I’m so tired of this Parsing module, you can’t even imagine… I’m removing the Resolved mark till anybody comes up with an idea, but for now for my MVP I guess I will have a half-backed parser with carefully crafted inputs, and if I’d go past MVP and there’s still no solution, I’d probably have to look at either writing an ad-hoc parser or removing PureScript from backend completely, whichever is faster…

Yeah that’s exactly how it works.

It is confusing. This is how most traditional monadic parser combinator libraries in Haskell work, including Parsec, Megaparsec, Attoparsec. Other parser combinator libraries have made different choices about this.

There is reasoning behind it though. There is more discussion in the Alt instance section of the docs for the type ParserT.

Have you tried anyTill?

myParser :: Parser String String
myParser = map snd <<< anyTill $ string "foo"
2 Likes

Thank you, an interesting solution, similar to splitCap