Peculiar indentation rules for let in do block?

michelrandahl · October 16, 2022, 5:43pm

Yesterday I was struggling with some code in a do-block where the type checker kept complaining about Unexpected or mismatched indentation. I was starring myself blind at the problem and eventually just pulled out the code to its own function instead -and things worked fine…

Today I ran into a similar problem again, and randomly I decided to give the code a little more indentation, and then the type checker magically stopped complaining.

A simplified silly example to show the problem:

fourspaces :: Effect Unit
fourspaces = do
  let stuff =
      [40,41,42,43,44] -- this line is indented with four spaces
        # Array.filter (\x -> x >= 42)
        <#>  (\x -> x + 1)
  log $ show stuff

The type checker reports

  Unable to parse module:
  Unexpected or mismatched indentation

Same example, but with 6 spaces:

sixspaces :: Effect Unit
sixspaces = do
  let stuff =
        [40,41,42,43,44] -- this line is indented with six spaces
          # Array.filter (\x -> x >= 42)
          <#>  (\x -> x + 1)
  log $ show stuff

The type checker have no problems with the version with 6 spaces…

Is this a bug? or is there some logic to why only the six space indentation works?
Its certainly a bit confusing to a beginner

JordanMartinez · October 16, 2022, 9:38pm

The entire expression bound to bindingName must be indented to the right of the column in which bindingName first appears. So, everything must be to the right of that | symbol. x represents an invalid indentation whereas ✓ represents a valid position:

foo = do
  let bindingName =
x     |
 x    |
  x   |  
   x  |  
    x |  
     x|  
      |
      |✓
      | ✓

People usually newline-then-indent the reference name for expressions that span multiple lines because it reduces the amount of indentation needed for the expression (5-6 spaces below vs 7-8 spaces above):

foo = do
  let
    refName =
x   |
 x  |
  x |
   x|
    |
    |✓
    | ✓

natefaubion · October 17, 2022, 1:36am

To elaborate a bit on why that’s the case: it’s because PureScript has a concept of binding groups. Multiple bindings under a single let form a single binding group, where each binding in the group is in scope within the entire group.

example = do
  let
    f = ...
    g = ...
 ...

So f and g are in scope within the definitions of f and g. However,

example = do
  let f = ...
  let g = ...
  ...

These are separate binding groups, where f is in scope for g, but g is not in scope for f. This is in contrast to something like JavaScript, which treats an entire block as a single binding group (for let and const), or a function body as a binding group (for var).

function example () {
  let f = ...;
  let g = ...;
}

Here f and g are similar to the first example, as they are both in scope for both bindings, though they may still be undefined depending on when they are accessed.

michelrandahl · October 18, 2022, 7:58pm

Thank you that makes sense!
Hope authors of educational material remembers to mention it explicitly. I did see code with lots of indentation after let bindings in the book I’m reading but never really thought about it and just assumed any indentation would do fine

natefaubion · October 18, 2022, 8:47pm

I believe this is mentioned in the documentation repo

github.com

purescript/documentation/blob/master/language/Syntax.md#indentation-in-binding-blocks

# Syntax

## Whitespace Rules

Syntax is whitespace sensitive. The general rule of thumb is that declarations which span multiple lines should be indented past the column on which they were first defined on their subsequent lines.

That is, the following is valid:

``` purescript
foo = bar +
  baz
```

But this is not:

``` purescript
foo = bar +
baz
```

This file has been truncated. show original

Quelklef · October 21, 2022, 2:26am

Am I correct in saying that this rule is necessary to disambiguate the grammar? Seems like otherwise the case

let
  x = e
    A y = ee

is ambigous because it could either be x = e A; y = ee or x = e; A y = ee.

Are there other ambiguities avoided by this parser rule?

(Just curious, and it looks like the linked documentation doesn’t already mention it.)

natefaubion · October 21, 2022, 4:54pm

Yes, it avoids ambiguity in the grammar. All whitespace sensitive languages have the same basic rule (which is generally referred to as the offside rule). At an implementation level, the lexer scans the input token stream, inserting implicit delimiters for whitespace (we call this layout). You can see some of the layout golden tests which makes these implicit delimiters visible by printing them as braces and semicolons.

github.com

purescript/purescript/blob/9166355079a65288855b623f010c0a0b7a479b2e/tests/purs/layout/DoLet.out

module Test where{

test = do{
  let {foo = bar};
  foo};

test = do{
  let {foo = bar};
  in baz;
  foo};

test = do{
  let {foo = bar}
    in baz;
  foo}}
<eof>