Related to “how to get all available matches”. I want the usual behavior you’d expect from regexp engines: for every character, try to match, on fail advance by one. E.g. if you do echo foo ffoo foo | grep foo
you’ll get all 3 matches, because when grep reaches ffoo
and fails to match ff
with fo
, it simply advances by a single character and tries again.
Now, I know I could do something like myParser <|> anyChar
and I was doing that. However, for a 7kb file with just 2 matches this was returning a List with thousands of Nil
s, which is both slow and then have to be filtered out.
So my question is: how can I achieve the behavior of simply advancing the parser internal index by one on fail without returning anything related to the failure?
Code for the grep example:
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (runParser, Parser)
import Parsing.Combinators (many, try)
import Parsing.String (string)
myParser :: Parser String String
myParser = string "foo"
main :: Effect Unit
main = do
logShow $ runParser "foo ffoo foo" (many $ try $ myParser)
I want this to return 3 matches and perhaps a single Nil
for the end, but definitely not a dozen.
2 Likes
Thanks, that might work, though neither performant due to the Left
return values, nor convenient for the same reason.
Additionally, today I stumbled upon orthogonal challenge but related solution: I have a parser that returns HashTable
with a single match, whereas my ultimate goal is having a single HashTable
with all matches. So, inconveniently, I had to concatenate the tables. If a solution exists, it would be similar to the discussed problem, because I’d have to modify results of many
after every match, whether it be dropping a result entirely (e.g. if I matched anyChar
branch in a myParser <|> anyChar
-like expression), or inserting to the table.
After sitting for a while I came up with a solution:
module Main where
import Prelude
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, runParser)
import Parsing.Combinators (optionMaybe)
import Parsing.String (anyCodePoint, string)
-- Search for all matches in the text
manyViaRetries :: ∀ r. Parser String r -> Parser String (Array r)
manyViaRetries p = manyViaRetries' p []
where
manyViaRetries' _ acc = optionMaybe p >>= case _ of
Just v -> manyViaRetries' p $ acc <> [v]
Nothing -> optionMaybe anyCodePoint >>= case _ of
Just _ -> manyViaRetries' p acc -- try again at next position
Nothing -> pure acc -- end of string
myParser :: Parser String String
myParser = string "foo"
main :: Effect Unit
main = logShow $ runParser "foo ffoo foo" $ manyViaRetries myParser
1 Like
Well, after applying this code to the real parser it turned out it fails randomly, and upon reducing to the minimal testcase it turns out this is due to Parsing
module making distinction between “failing without consuming” and “failing with consuming”. Yeah, different kinds of “fails”.
“What that means?” you’re asking? That means the above works for non-matching string "foobar"
but will fail for non-matching string "foo" *> string "bar"
. Here, optionMaybe
is documented as:
[returns] Nothing
in the case where a parser fails without consuming input.
So a simple string "foobar"
fails without consuming, but anything more complicated will consume input before failing, so parser failure will propagate past optionMaybe
via the monad.
I’ve no idea what to do. I’m so tired of this Parsing
module, you can’t even imagine… I’m removing the Resolved
mark till anybody comes up with an idea, but for now for my MVP I guess I will have a half-backed parser with carefully crafted inputs, and if I’d go past MVP and there’s still no solution, I’d probably have to look at either writing an ad-hoc parser or removing PureScript from backend completely, whichever is faster…
Yeah that’s exactly how it works.
It is confusing. This is how most traditional monadic parser combinator libraries in Haskell work, including Parsec, Megaparsec, Attoparsec. Other parser combinator libraries have made different choices about this.
There is reasoning behind it though. There is more discussion in the Alt
instance section of the docs for the type ParserT
.
Have you tried anyTill
?
myParser :: Parser String String
myParser = map snd <<< anyTill $ string "foo"
2 Likes
Thank you, an interesting solution, similar to splitCap