Parsing Bible References with Elm

04 Jan 2019

This post will give a brief intro into how to write a simple parser in Elm. The thing we will be parsing are Bible references: a Bible reference is a shorthand that let’s one quickly look up a specific verse or range of verses in the text of the Bible.

Sounds simple…

What’s a reference look like?

A Bible reference can be broken down into a start location and an end location. Each location consists of a book, a chapter and a verse.

So a reference might look like Genesis 1:1 - Exodus 2:1 which tells us to start at the first verse of the first chapter of Genesis and end at the first verse of the second chapter of Exodus.

But the reference might also look like any of these:

  • Genesis 1 - A whole chapter
  • Genesis 1:1 - A single verse
  • Genesis 1:1-20
  • Genesis 1:20-2:24
  • Genesis 1-5 - Multiple whole chapters
  • Genesis 1 - Exodus 5
  • Genesis 1:1 - Exodus 5:20
  • Genesis 1:1 - Exodus 5
  • Genesis 1 - Exodus 5:20

Additionally, some books of the Bible only have a single chapter (e.g. Jude) and, by convention, the chapter number is dropped from the reference. So Jude 2 is the second verse of the first (and only) chapter of Jude, not all of Jude chapter 2.

We’ll aim to handle all of these cases when we write our parser.

How do we do parsing in Elm?

elm/parser is a super nice parsing library written by the creator and maintainer of Elm. I won’t go into details on it here - there is a nice tutorial and conference talk if you want to dig deeper.

We will parse the Bible reference in two steps:

  1. We parse the string into a list of statements
  2. Then we validate the list of statements to check it is a valid reference

Getting a list of statements

A Bible reference can have a space, a colon, a hypen, a book name and a number, so we define:

type Statement
    = BookName Book
    | Num Int
    | Dash
    | Colon

and a parser to turn a string into a list of statements:

{-| A `List Statement` parser. We use `P.loop` to consume the whole string
-}
parser : P.Parser (List Statement)
parser =
    P.loop [] statementsHelp


statementsHelp : List Statement -> P.Parser (P.Step (List Statement) (List Statement))
statementsHelp revStmts =
    P.oneOf
        [ P.succeed (\stmt -> P.Loop (stmt :: revStmts))
            |. P.spaces
            |= statement
            |. P.spaces
        , P.succeed ()
            |> P.map (\_ -> P.Done (List.reverse revStmts))
        ]

{-| A `Statement` parser
-}
statement : P.Parser Statement
statement =
    P.oneOf
        [ P.map BookName (P.oneOf bookTokensList)
        , P.map (\_ -> Dash) (P.symbol "-")
        , P.map (\_ -> Colon) (P.symbol ":")
        , P.map Num P.int
        ]

With this parser we can now turn a string into a List Statement:

parse : String -> Result String (List Statement)
parse str =
    P.run parser str

Validating the list of statements

Now we will either have a list of statements, like [Book Genesis, Colon, Num 1] or [Book John, Colon, Num 2, Dash, Num 2], etc. But there is nothing to guarantee that we have a valid collection of statements. For example, we could have [Colon, Colon, Colon] which is obviously not valid, or [Book Genesis, Num 52] which appears to be valid, but Genesis only has 50 books - so it is invalid.

First we will define a Reference type:

type alias Reference =
    { startBook : Book
    , startChapter : Int
    , startVerse : Int
    , endBook : Book
    , endChapter : Int
    , endVerse : Int
    }

And a function processStatements : List Statement -> Result String Reference that will validate our list of statements. This function is rather large to account for all the potential formats available and to handle single chapter books, but the function is essentially a case statement:

processStatementsHelp : List Statement -> Result String Reference
processStatementsHelp stmts =
    case stmts of
        -- Gen
        [ BookName bk ] ->
            reference
                bk
                1
                1
                bk
                (numChapters bk)
                (numVerses bk (numChapters bk))

        -- Gen 1
        [ BookName bk, Num ch ] ->
            if numChapters bk == 1 then
                reference
                    bk
                    1
                    ch
                    bk
                    1
                    ch

            else
                reference
                    bk
                    ch
                    1
                    bk
                    ch
                    (numVerses bk 1)
    
    -- truncated for brevity (full function can be seen: https://github.com/monty5811/elm-bible/blob/2.0.0/src/Internal/Parser.elm#L38-L243)

        -- Genesis - Revelation
        [ BookName startBk, Dash, BookName endBk ] ->
            reference
                startBk
                1
                1
                endBk
                (numChapters endBk)
                (numVerses endBk (numChapters endBk))

        [] ->
            Err "No reference found"

        _ ->
            Err <| "No valid reference found"

Now we have a Reference that contains a start book, start chapter, start verse, end chapter and end verse but we haven’t checked that all of these are in order (e.g. the reference cannot end before it starts) so we use one last function to validate the reference:

validateRef : Reference -> Result String Reference
validateRef ref =
    validateBookOrder ref
        |> Result.andThen validateChapterOrder
        |> Result.andThen validateVerseOrder
        |> Result.andThen validateChapterBounds
        |> Result.andThen validateVerseBounds

-- see each validate function here: https://github.com/monty5811/elm-bible/blob/2.0.0/src/Internal/Parser.elm#L363

Finally! We have a validated Bible reference!

Note I think it should be possible to move all of this validation inside the parser and do everything in one step, but I think this is a cleaner approach.

Conclusion

This post has shown you how to create a parser in Elm so we can validate Bible references. Hopefully this will help you get started building a parser.

If you don’t care about building a parser and just want an elm package to do this for you, then check out monty5811/elm-bible that provides a parser, nice formatting and a compact encoder/decoder.

Comment at dev.to.