Help with decoding tagged JSON needed

yuki · October 17, 2020, 9:02pm

Please ignore the first post and jump straight to my update Help with decoding tagged JSON needed

I’m trying to reduce boilerplate code without compromising on type safety. The data I’m dealing with consists of lots of records which all have at least one field in common, type, such as

[
  { "type": "foo" },
  { "type": "bar" }
]

I would like to create a sum type, where the constructors represent each possible type of node. My first version consisted of a record where the “type” could be specified with a type argument.

type Node nodeType dataValuesType r
  = { nodeType :: nodeType
    | r
    }

This let me create data types with only a single constructor and no arguments…

data FooType = FooType

…and use those to create variations of the JSON records, like this:

data MyNode = Node FooType ()

What’s left is to implement readImpl from Simple JSON…

expectString ::
  forall a.
  String -> -- | nodeType value we're looking for
  a -> -- | Constructor function
  Foreign -> -- | nodeType value we're decoding
  F a
expectString query consFn f = do
  s <- readString f
  if s == query then
    pure consFn
  else
    fail <<< ForeignError $ "expected hyperlink but got: " <> s


instance readForeignFooType :: JSON.ReadForeign FooType where
  readImpl = expectString "foo" FooType

and things will just work (I’m using generics to generate the decoding code for the actual sum type as outlined in the Simple JSON docs)`. I get nodes where I can easily pattern match on constructors and so on. But the whole process involves a lot of typing and I’m sure I’m doing this wrong.

I have this vague idea that I should be able to use something like a Symbol and move all of the decoding into my generic Node type rather than repeat this “check if string is this or that” logic everywhere where I’m using Node.
I’ve tried various things below but they all lead nowhere in the end.

-- can't access the symbol in `readImpl`
type Node (sym :: Symbol) r
  = { nodeType :: String
    | r
    }

TL;DR: The goal is to specify the value of the type key when I create the specific Node type and implement the decoding logic that matches the given value against the { "type": "SOME_VALUE" } JSON object on Node rather than on all the users of Node to create things like MyNode

sharkbrainguy · October 18, 2020, 6:55am

I think the existing behaviour for variants might solve your problem.

yuki · October 19, 2020, 1:56pm

I had a small epiphany yesterday when I finally realized that I can get the type level string out of the type and into the value level with

nameP = SProxy :: SProxy name
name  = reflectSymbol nameP

and that tagged JSON decoding is exactly what I’m trying to do here. Now I just need to figure out how I can make my Node function polymorphic over all the fields of the JSON object that are not the nodeType field. I know that I can of course use row types for that but I need to figure out the specifics.

yuki · October 20, 2020, 10:19am

I’ll rephrase my question because I’m stuck again.

I am trying to decode JSON objects which all have one key in common, which functions as tag and from which I can determine which variant I’m dealing with.
For example, both { nodeType: "foo", bar: 1 } and { nodeType: "foo", bax: "boo" } are valid JSON objects. I want to represent all variants with a sum type, like so:

data Thing = Foo { bar :: Int } | Bax { bax :: Strong }

I looked at the implementation of tagged JSON decoding for Simple.JSON and the most important part is this:


instance taggedSumRepConstructor ::
  ( GenericTaggedSumRep a
  , IsSymbol name
  ) => GenericTaggedSumRep (GR.Constructor name a) where
  genericTaggedSumRep f = do
    r :: { "type" :: String, value :: Foreign } <- JSON.read' f
    if r."type" == name
      then withExcept (map $ ErrorAtProperty name) $ GR.Constructor <$> genericTaggedSumRep r.value
      else
        fail $ ForeignError $ "Wrong type tag " <> r."type" <> " where " <> name <> " was expected."
    where
      nameP = SProxy :: SProxy name
      name = reflectSymbol nameP

Now my case is a bit different. The way I understand the linked code is that it expects the name of the constructor to match the values of the tag field. So name in the snippet above is both a JSON value and a constructor name. For me that’s not the case and so I need a mapping from JSON field value to constructor. And that’s where I’m stuck.

My idea was to pass this mapping around via symbols, but how would I do that? Every one of my constructors (Foo and Bax) would need a symbol. But in the code snippet the instance for GR.Constructor only gets the constructor name and then a, which is the argument. In my case the type level symbol (e.g., data Thing = Foo "jsonFieldValue" ...) is probably not even valid syntax and even if it was how would I extract these type level symbols from the constructor to use them inside the genericTaggedSumRep function above?

yuki · October 20, 2020, 5:32pm

Okay I think I finally figured it out. I haven’t tested in “at scale” by which I mean I don’t know if this breaks down in certain cases I currently can’t think of but here we go.

The Glorious Gist In All Its Glorious Glory

I might turn this into a blog post with proper explanations or at least annotate the gist but here’s the idea behind it.

I still don’t know if it’s easily possible to extract symbols from constructors in instances for the sum type to which these constructors belong. Maybe some fancy reflection can do that? But we don’t even need to do this.

I want to represent variants of JSON objects as a sum type

data Foo
  = FooA (Node "foo" ( some :: String ))
  | FooB (Node "bar" ( other :: String ))

Notice how each constructor has a Node type. Thanks to this “trick” I can implement the decoding logic on Node, where it’s now pretty straight forward to extract the symbol (~ string).

The readImpl implementation of Node first looks at the nodeType value, and only this. If the value of the nodeType key matches what we expect (for example to decode a FooA we need nodeType: "foo") then we go ahead and do this:

        if peek.nodeType == jsonTagValueS then do
          case ( JSON.read f ::
                JSON.ReadForeignFields rowList () row =>
                JSON.E { | row }
            ) of
            Left e -> throwError e
            Right full -> pure $ Node $ Record.union peek full

I’m still surprised that I actually managed to write some type level code that actually compiles We now try to decode the JSON object based on the row of types for that constructor. Again using FooA as an example, we’re looking for an object that has a key called some with a String as its value. The type signature “simply” states that the compiler needs to understand how to read these fields, meaning there needs to be a ReadForeignFields instance.

What we’ve achieved is that Node is essentially a tagged record where you can specify the expected value of the tag field. Furthermore Node then takes the row of types for all the other fields you expect in that record.

Everything else in the gist is the usual boilerplate for untagged sum type decoding which I copied without modification from the Simple.JSON docs

Now I’d like to write some tests and I struggle with

instance nodeEq ::
  ( Eq (Record row)
  , RowToList row list
  , IsSymbol s
  , EqRecord list row
  ) =>
  Eq (Node s row) where
  eq (Node n1) (Node n2) = eqRecord (RLProxy :: RLProxy list) n1 n2

[1/1 NoInstanceFound] /data/private/lions-purescript/src/Data/Contentful/RichText/Node.purs:30:28

  30    eq (Node n1) (Node n2) = eqRecord (RLProxy :: RLProxy list) n1 n2
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  No type class instance was found for

    Data.Eq.EqRecord list2
                     ( nodeType :: String
                     | row3
                     )

  while applying a function eqRecord
    of type EqRecord t0 t1 => RLProxy t0 -> Record t1 -> Record t1 -> Boolean
    to argument RLProxy
  while inferring the type of eqRecord RLProxy
  in value declaration nodeEq

  where row3 is a rigid type variable
          bound at (line 0, column 0 - line 0, column 0)
        list2 is a rigid type variable
          bound at (line 0, column 0 - line 0, column 0)
        t0 is an unknown type
        t1 is an unknown type

… of course

Also why doesn’t

instance nodeEq ::
  Eq (Node s r) where
  eq (Node n1) (Node n2) = n1 == n2

work. Adding constraints that r must have an EqRecord instance also doesn’t help

JordanMartinez · October 20, 2020, 7:21pm

It was referenced previously, but would purescript-variant help you do what you want?

yuki · October 20, 2020, 9:16pm

I’ll look into that tomorrow!

yuki · October 22, 2020, 8:45pm

I looked into variants but the default decoding for variants in Simple.JSON is strangely specific as to the JSON structure that’s expected. I’d have to create my own instance in which case I’d be exactly where I am now with my readForeignNode instance expect I’d be decoding into Variant instead of a record type so I don’t think it would help.

Also I really have zero ideas whatsoever why

newtype Node (jsonTagValue :: Symbol) r
  = Node { nodeType :: String | r }

instance eqNode ::
  ( RowToList r list
  , EqRecord list r
  , Record.EqualFields list r
  ) =>
  Eq (Node s r) where
  {-- eq (Node r1) (Node r2) = Record.equal r1 r2 --}
  {-- eq (Node r1) (Node r2) = r1.nodeType == r2.nodeType --}
  eq (Node r1) (Node r2) = r1 == r2

gives me

  32    eq (Node r1) (Node r2) = r1 == r2
                                 ^^^^^^^^

  No type class instance was found for

    Prim.RowList.RowToList ( nodeType :: String
                           | r1
                           )
                           t2

If I do visual pattern matching I arrive at the following

Prim.RowList.RowToList ( nodeType :: String | r1 ) t2   <- from error
RowToList                r                         list <- from instance

Now I’ll draw some really wild conclusions. It doesn’t understand the combination of nodeType :: String plus the row. And because it derives the list from r according to my RowToList it doesn’t get that either.

But I also don’t know what I can do here.

natefaubion · October 22, 2020, 9:32pm

You are only asking for RowToList on r, but the Eq instance for record requires RowToList for the entire Record, which is r with the additional nodeType :: String label. In your instance constraints, wherever there is r, it should be (nodeType :: String | r).

yuki · October 22, 2020, 9:35pm

Oh my god it is so obvious now. I was trying wacky stuff with Union because I kind of suspected that this was the issue but it never occurred to me to just replace r with ( nodeType :: String | r )…

I shall build you a monument, thank you so much

natefaubion · October 22, 2020, 9:39pm

The reason for decoding into Variant would be to lose the additional data type you are defining to wrap everything into a unified type.

data Thing = Foo { bar :: Int } | Bax { bax :: Strong }

This is exactly the kind of thing that Variant represents. For example:

type Thing = Variant (foo :: { bar :: Int }, bax :: { bax :: String })

example = match
  { foo: \{ bar } -> ...
  , bax: \{ bax } -> ...
  }