A few days ago, I set a quick challenge to derive an efficient version of nub over Int from a textbook definition of nub. So, starting from nub :: [Int] -> [Int] nub [] = [] nub (x:xs) = x : nub (filter (\y -> not (x == y)) xs) derive an O(n log n) version of nub. Specifically, I called for the derivation of a new function nub', which has the type nub' :: [Int] -> Set Int -> [Int] and a wrapper function that calls this new function, defined nub :: [Int] -> [Int] nub = nub' xs empty There are many solutions, and many, many possible implementations of nub'. Thanks to everyone that submitted an answer. The neatest implementation of nub I saw was nub = map head . group . sort Using the worker/wrapper transformation Here is a possible solution, and a systematic way of solving the problem of finding an efficient nub for Int's, using recent work on the worker/wrapper transformation. I argue that this problem can be solved by thinking only about about changing the data-structures use by nub, and letting the function definition follows. Discovery A quick look at our original code shows that nub reconstitutes the argument list many different times, using filter. If we had a way of allowing filter to be cheaper, rather than costing O(n) in the length of the list, then nub would be cheaper. We need a new type of list which has different properties that traditional lists! So, what would these properties be? The first thing we can do is enumerate what happens to the the consumed list. There are only two things going on If we think of this in abstract terms, we have two combinators over a datatype R type R = [Int] caseR :: R -> Maybe (Int,R) filterR :: Int -> R -> R caseR is a combinator for taking R apart, returning the head of the list, and the tail of the list, or Nothing (to signify an empty list). filterR is a combinator for building a new list with a specific element, well, filtered. We could implement these functions as follows type R = [Int] caseR :: R -> Maybe (Int,R) caseR [] = Nothing caseR (x:xs) = Just (x,xs) filterR :: Int -> R -> R filterR x xs = filter (\ y -> not (x == y)) xs So far we have simply All check-list like things. Invention Now we have some fun. Can we invent a better version of R? How about data R = R [Int] (Set Int) nextR :: R -> Maybe (Int,R) nextR (R xs except) = case dropWhile (\ x -> x `member` except) xs of [] -> Nothing (x:xs) -> Just (x,R xs except) filterR :: Int -> R -> R filterR v (R xs except) = R xs (insert v except) This data-structure is optimized for the API of R. Yes, a bit of invention, but nothing deep, and nothing recursive. We need to maintain the ordering of the original list, so we use a list. We need to sometimes exclude specific elements, so we have a set of things to exclude. I argue that inventing a better R in the only imagination needed to solve the problem. Changing of the type So, to solve our puzzle, we need to convert a function that uses [Int] into a function that uses our optimized R. So we need coercion functions. By convention, these are called c2a (concrete to alternative) and a2c (alternative to concrete). c2a :: [Int] -> R c2a xs = R xs empty a2c :: R -> [Int] a2c (R xs except) = Prelude.filter (\ v -> not (member v except)) xs These obey a simple rule: a2c . c2a = id Proof: a2c . c2a :: [Int] -> [Int] { apply . } ==> \ xs -> a2c (c2a xs) { inline a2c, c2a } ==> \ xs -> case (R xs empty) of (R xs except) -> Prelude.filter (\ v -> not (member v except)) xs { case-of-known-constructor } ==> \ xs -> Prelude.filter (\ v -> not (member v empty)) xs { member v empty = False } ==> \ xs -> Prelude.filter (\ v -> not False) xs { not False = True } ==> \ xs -> Prelude.filter (\ v -> True) xs { property of filter } ==> \ xs -> xs { QED } But we need to convert the nub function! So we have (again by naming convention) the wrap and unwrap function. wrap :: (R -> [Int]) -> ([Int] -> [Int]) wrap f xs = f (c2a xs) unwrap :: ([Int] -> [Int]) -> (R -> [Int]) unwrap f r = f (a2c r) unwrap takes a function of the type of the original nub, and generates the new version of nub. wrap does the converse, and converts the function back. Now we have all the pieces, and can use the worker/wrapper theorem! The worker/wrapper theorem states:
If comp :: A is defined by comp = fix body for some body :: A -> A,
and wrap :: B -> A and unwrap :: A -> B satisfy any of the worker/wrapper assumptions,
then comp = wrap work where work :: B is defined by work = fix (unwrap . body . wrap)
In other words, if we have a way of converting to and from a new type, we can use the worker/wrapper theorem to mechanically achieve recursion over the new type (in general, B in our case, R -> [Int]) There are three possible worker/wrapper assumptions; the simplest of which is wrap . unwrap = id. Do our wrap and unwrap satisfy the worker/wrapper assumption? wrap . unwrap :: ([Int] -> [Int]) -> ([Int] -> [Int]) { apply . } ==> \ x -> wrap (unwrap x) { apply wrap, unwrap } ==> \ x -> (\ f xs -> f (c2a xs)) ((\ f r -> f (a2c r)) x) { beta reduce } ==> \ x -> (\ xs -> ((\ f r -> f (a2c r)) x) (c2a xs)) { beta reduce } ==> \ x -> (\ xs -> (\ r -> x (a2c r)) (c2a xs)) { beta reduce } ==> \ x -> (\ xs -> x (a2c (c2a xs))) { unapply . } ==> \ x -> (\ xs -> x ((a2c . c2a) xs)) { law: a2c . c2a == id } ==> \ x -> (\ xs -> x xs) { QED } Yes. So we can use the following. nub :: [Int] -> [Int] nub = wrap nub' nub' :: R -> [Int] nub' = fix ( unwrap . (\ nub w -> case w of [] -> [] (x:xs) -> nub (filter (\y -> not (x == y)) xs) ) . wrap ) Note that the nub inside the body of fix is a locally bound function. Now we simplify, simplify, simplify. Unfolding wrap and unwrap, and applying straightforward transformations gives: nub' = fix (\ nub' w -> case a2c w of [] -> [] (x:xs) -> x : nub' (c2a (Prelude.filter (\ y -> not (x == y)) xs)) ) Now we start the magic; we have two laws: case a2c w of case caseR w of [] -> ... ==> Nothing -> ... (x:xs) -> ... Just (x,r) -> let xs = a2c r in ... Prelude.filter (\ y -> not (x == y)) (a2c xs) ==> a2c (Prelude.filterR x xs) We leave proving these rewrites as optional exercises, but both are the way we introduce caseR and filterR. Using the first law gives: nub' = fix (\ nub' w -> case nextR w of Nothing -> [] Just (x,xs) -> x : nub' (c2a (Prelude.filter (\ y -> not (x == y)) (a2c xs))) ) Then using the second law gives: nub' = fix (\ nub' w -> case nextR w of Nothing -> [] Just (x,xs) -> x : nub' (c2a (a2c (filterR x xs))) ) Now we do something strange, unapply wrap and unwrap. nub' = fix (\ nub' w -> case nextR w of Nothing -> [] Just (x,xs) -> x : (unwrap (wrap nub')) (filterR x xs) ) This was so we can use the worker/wrapper fusion property, which is (wrap . unwrap) work = work This simply means that wrap . unwrap is the identity over any value produce by the recursive call (even though wrap . unwrap might not be the identity in general). Taking advantage of this gives: nub' = fix (\ nub' w -> case nextR w of Nothing -> [] Just (x,xs) -> x : nub' (filterR x xs) ) Now we inline nextR, filterR, and fix, giving our answer: nub :: [Int] -> [Int] nub xs = nub' (R xs empty) nub' :: R -> [Int] nub' (R xs except) = case dropWhile (\ x -> x `member` except) xs of [] -> [] (x:xs) -> x : nub' (R xs (insert x except)) With a small piece of invention (the efficient representation of R), we have systematically derived an efficient nub. But I asked for nub' of type [Int] -> Set Int -> [Int]! Sigh. Ho Hum. Here we go again. (Re)using the worker/wrapper transformation So we want to translate something of type R -> [Int] into something of type [Int] -> Set Int -> [Int]. Lets reuse the worker/wrapper transformation. wrap :: ([Int] -> Set Int -> [Int]) -> (R -> [Int]) wrap f (R xs except) = f xs except unwrap :: (R -> [Int]) -> ([Int] -> Set Int -> [Int]) unwrap f xs except = f (R xs except) Of course, this is a currying operation, and a well known transformation. But we can use the worker/wrapper to handle this type rewrite. (The worker/wrapper precondition used above does not hold in this case, but another one of the worker/wrapper preconditions does hold, capturing the strictness of nub' in the R argument.) After cranking the handle, again, we get our answer: nub :: [Int] -> [Int] nub xs = nub' xs empty nub' :: [Int] -> Set Int -> [Int] nub' xs except = case dropWhile (\ x -> x `member` except) xs of [] -> [] (x:xs) -> x : nub' xs (insert x except) We have an efficient solution, based on choosing a good data-structure representation. We used the original function's consumption of a simple list as a specification of what we wanted our eventual implementation to do. The worker/wrapper was uses twice, once to incorporate our efficient data-structure, and once to uncurry its use. Paper about the worker/wrapper transformation generalization The formalization and exploration of the worker/wrapper transformation is joint work with Graham Hutton, started during his sabbatical at Galois. A draft of the worker/wrapper paper is available on my publications and papers web page. In the paper, Graham and I introduce and prove the general worker/wrapper theorem then use it on four examples: Comments and feedback welcome. Let the transformations begin! AndyG