Parsing Cause


« More on Assign To

Updating A Swift Kickstart »

Although the programming methodology for which I am primarily known is HDD (Hate Driven Development), I am also a champion of the "Work hard on the thing that isn't due that you aren't getting paid for and don't plan to ship" methodology.

That's the one where I have a ton of things I should do and suddenly find myself really busy and engaged on one or more things that isn't close to being on that list.

And that's how I came to play with parsers.

Just because.

I'd read about parsers in Programming in Scala. I really like a lot of things in that book. The Objc.io guys also covered the topic several times in different ways in Swift Talk videos. This method is essentially the one they cover in the first few episodes and also is the same as the one described in the Scala book.

So this is the parser combinator idea. You build a parser that can handle a single character off the front of the, say, String it is parsing. If it can handle the character, it does and returns the type you expected the character to be and the remainder of the String.

For performance reasons we use a Substring.

Here's what our Parser might look like.

struct Parser<A> { let parse: (Substring) -> (A, Substring)? }

I return an optional from parse rather than deal with errors.

Also, I know I'm seeing this pattern everywhere these days, but this looks a lot like the State monad to me.

struct State<S, A> { let run: (S) -> (A, S) }

Parser is essentially State where S is Substring.

At first we're going to use our Parser to examine the first character and see whether it satisfies some condition. Maybe we're looking for a digit, or a letter, or an opening angle brace - whatever. Each of these can be expressed as a Parser generic in Character.

So we start with a function that accepts a condition and creates a Parser that satisfies that condition.

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> { // not implemented yet }

character() takes a condition in the form of a function on Character and returns a Parser<Character>.

The body must create an instance of Parser so we have to specify the parse function which we do as a trailing closure.

parse accepts a Substring we call stream. If the stream is non-empty we examine its first character to see if it satisfies the condition. If the stream is empty or the condition is not satisfied, parse returns nil. Otherwise we return the character and the rest of the stream.

func character(condition: @escaping (Character) -> Bool) -> Parser<Character> { Parser{stream in guard let char = stream.first, condition(char) else {return nil} return (char, stream.dropFirst()) } }

Notice that character returns a Parser it doesn't run it.

For example, here's how we might write a Parser that parses digits.

let digit = character(condition: \.isNumber)

Isn't that some really satisfying code? digit is a type of Character parser where the condition is, "the character is a number".

Let's run our parser. This should fail as the first character of abc is not a number.

digit.parse("abc")

Run this in a playground and you'll get nil.

While we're at it, we should check the empty string.

digit.parse("")

This is nil as well.

Finally, let's parse the substring "1bc".

digit.parse("1bc")

We get the tuple (1, "bc").

This is the moment in parsing where the teacher's and audience's mood are as far apart as they could possibly be.

The presenter is so pleased that this simple example works and we can now parse a digit from the front of a string. The learner has a giant "I just wasted time and effort for this" look on their face which they don't bother to disguise.

What we've built is the first building block.

One of the keys to functional programming is we build these atoms and then we build functions that allow us to combine and modify them. These are the combinators referred to in a parser combinator library.

For example, we might want a combinator that allows us to pull digits off the front of a string until we hit our first non-digit. We might want to convert this number to an Int instead of a sequence of characters. We might want to look for a digit or something else. We might want to look for a digit and then something else.

Once we have those pieces we can assemble complicated parsers that can perform sophisticated tasks.

This is as far as we're going to go for now - but if people want, I can return to this topic some other time.