29th
Testing URI canonicalization using QuickCheck
Today I was working on a custom web crawler for the company I’m working in. One of the problems I was having in the first implementations was that I was not canonicalizing the links, and I was visiting the same link more than once in the crawl.
Canonicalization of URI’s basically consist on removing any query string and fragment out of the given URI (at least, that is my understanding of canonicalization).
I needed to test a function that was doing the canonicalization of links, and decided to use a QuickCheck property to check just that. I implemented an Arbitrary instance for a pair of URI Strings, so that I could get the canonicalized version and a version with a query string and/or fragment, and compare the two after the canonicalization algorithm I developed.
The following code shows the Arbitrary instance of a newtype called URIPair
The next one shows how this is being used in a QuickCheck property test
Finally the Main action
By using the Gen monad combinators, and a newtype for URIs, I was able to implement in around 50 LOC, a generator for both canonicalized and normal links.
I’m still believing there must be a more elegant and shorter way to do this, or maybe I could reuse some code on Arbitrary instances for URIs, sadly no luck so far.
If you are Haskell developer yourself and want to provide insights, please fork the gist and show us how this can be done in a better way.
Cheers.
