Help with Purescript parser Alternative

I’m having an issue where my parser is failing and not trying the next alternative branch.

module TestParser where

import Prelude
import Control.Plus ((<|>))
import Data.Array (many)
import Data.Either (Either)
import Data.Int (fromString)
import Data.Maybe (Maybe(..))
import Data.String.CodeUnits (fromCharArray)
import Text.Parsing.Parser (ParseError, Parser, fail, runParser)
import Text.Parsing.Parser.String (string, whiteSpace)
import Text.Parsing.Parser.Token (digit)

parseCommand :: String -> Either ParseError String
parseCommand strCommand = runParser strCommand commandParser

commandParser :: Parser String String
commandParser = twoArgCommandParser <|> oneArgCommandParser

twoArgCommandParser :: Parser String String
twoArgCommandParser = do
  parsedRef1 <- refParser
  parsedRef2 <- refParser
  command <- twoArgCommandNameParser
  pure (command <> parsedRef1 <> parsedRef2)

oneArgCommandParser :: Parser String String
oneArgCommandParser = do
  parsedRef <- refParser
  command <- oneArgCommandNameParser
  pure (command <> parsedRef)

oneArgCommandNameParser :: Parser String String
oneArgCommandNameParser = whiteSpace *> string "dd" *> pure "DirectDerivation"

twoArgCommandNameParser :: Parser String String
twoArgCommandNameParser = whiteSpace *> string "mp" *> pure "ModusPonens"

refParser :: Parser String String
refParser = whiteSpace *> (lineRef <|> premiseRef)

lineRef :: Parser String String
lineRef = decimalParser >>= \num -> pure (show num)

premiseRef :: Parser String String
premiseRef = string "pr" *> decimalParser >>= \num -> pure ("premise" <> show num)

decimalParser :: Parser String Int
decimalParser =
  many digit
    >>= \digits -> case fromString (fromCharArray digits) of
        Just number -> pure number
        Nothing -> fail "Failed to parse number"

If you do parseCommand “2 dd” it should work and return 2DirectDerivation but instead it fails while running twoArgCommandParser and never tries the second branch. What am I doing wrong here?

purescript-parsing does not automatically backtrack if a parser has consumed any input. In order to backtrack you need to wrap your prefix in try. I say “prefix”, because while you can just wrap everything in try, you will get very poor errors. My recommendation is that for any given branch, you should wrap the minimal amount in try in order to commit to that branch.

Oh sweet, that worked thanks! Can you please explain what you mean when you say “in order to commit to that branch”?

For example, say you have two alternative branches that share a common prefix:

  1. One parses just an identifier
  2. One parses an identifier followed by an equals-sign and an expression.

In parser pseudo-code, that might look like:

assignment =
  Assignment <$> (identifier <* char '=') <*> expr

variable =
  Variable <$> identifier

parser = assignment <|> variable

Both assignment and variable start with parsing an identifier. Since assignment is longer, we want to try it first. One possibility is wrapping assignment completely:

parser = try assignment <|> variables

However, this is unnecessary, and will lead to a poor error. For example, if there is an error in the expr parser, this will still backtrack to try and parse a variable. The lookahead that matters is the =. If we’ve parsed an identifier, and we see an =, then we know that there is no reason to backtrack to variable. Instead we can move the try inside of assignment:

assignment =
  Assignment <$> try (identifier <* char '=') <*> expr

That is, I’ve only wrapped the minimum prefix of assignment necessary in order to commit to parsing that branch.

This article goes into more detail: http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/

1 Like

thank you, that’s very helpful :slight_smile: