Thoughts Heap

A Blog by Roman Gonzalez.-

RSS
Sep
29th
Thu
permalink

Testing URI canonicalization using QuickCheck

Today I was working on a custom web crawler for the company I’m working in. One of the problems I was having in the first implementations was that I was not canonicalizing the links, and I was visiting the same link more than once in the crawl.

Canonicalization of URI’s basically consist on removing any query string and fragment out of the given URI (at least, that is my understanding of canonicalization).

I needed to test a function that was doing the canonicalization of links, and decided to use a QuickCheck property to check just that. I implemented an Arbitrary instance for a pair of URI Strings, so that I could get the canonicalized version and a version with a query string and/or fragment, and compare the two after the canonicalization algorithm I developed.

The following code shows the Arbitrary instance of a newtype called URIPair



The next one shows how this is being used in a QuickCheck property test



Finally the Main action



By using the Gen monad combinators, and a newtype for URIs, I was able to implement in around 50 LOC, a generator for both canonicalized and normal links.

I’m still believing there must be a more elegant and shorter way to do this, or maybe I could reuse some code on Arbitrary instances for URIs, sadly no luck so far.

If you are Haskell developer yourself and want to provide insights, please fork the gist and show us how this can be done in a better way.

Cheers.

  1. romanandreg posted this
blog comments powered by Disqus