Quick and dirty equivalence classes in Haskell

Recently at $WORK I had a problem and the solution is a known trick in the Haskell world that you occasionally see, but I figure I would write it up.

I had a graph that I needed to ‘flatten’ to a list of vertices, and “deduplicate” the vertices such that duplicate nodes in the graph are thrown away, and edges containing duplicates are then adjusted to point to a single “representative” node (the one we kept). You can use algorithms like union find for this, but there’s an easy way to do this when you want to keep things simple and pure.

This problem can be stated more formally: given a graph G with a vertices V, simply calculate the set of equivalence classes for V based on some equivalence relation R, i.e. for every v \in V, form a set of values E(v) = \{\; e \in V\; | \; R(v,e) \;\}. This set of all equivalence classes on V is denoted V/R, and is called the quotient space of V.

The Haskell way of thinking about it is a bit easier: if you have a list of verticies [a] in some Graph with a-nodes, and an equality function eq :: a -> a -> Bool, then for every unique a in the list, find a list of values [a] that contains a, and every value equal to a. The result is a list [[a]], i.e. a list of equivalence classes, where each class is itself a list, containing all the elements considered “equivalent” to each other, according to the eq function.

To do this naively, you might do something like this to start with, assuming your eq function is just (==):

-- | equivalence class of @a@ with respect to some set
classes :: a -> [a] -> [a]
classes v xs = filter (== v) xs

-- | calculate every equivalence class for an input list
quotient :: [a] -> [[a]]
quotient xs = ... map (\v -> classes v xs) xs ...

So if you have a set of vertices vs :: [a], simply performing classes x vs for some vertex x will calculate its equivalence class among the vertices.

Of course the problem becomes quickly apparent: classes is O(n), and so iterating the entire vertex set vs and calculating the classes of each vertex results in O(n^{2}) complexity – every vertex requires you to scan every other vertex.

This is the basic problem with nub in Haskell, too – the Eq constraint is not powerful enough to allow anything better than O(n^{2}). Strengthening this constraint to Ord allows us to perform sorting due to the fact there’s a concrete ordering. And sorting allows us to bundle equivalent nodes together with good time complexity using a variety of standard algorithms. And there are a variety of proposals to add Ord-powered nub to base and other libraries, though you can find various implementations (using Data.Set internally, normally) hanging around.

So then this gives a hint to solving our problem without horrific complexity: simply require an ordering on the graph nodes, and then sort the nodes. Because sorting will place equivalent nodes next to each other, we can then just group them together. This can be done using sortBy and groupBy in Haskell.

The ordering for nodes is all we need, actually: from an ordering function a -> a -> Ordering, we can derive an equality function a -> a -> Bool – simply check if the result of the ordering gave back EQ, and if it did, they’re equal. Otherwise, they aren’t.

Putting all that together, we can have an easy combinator to calculate the equivalence classes/quotient set of a given input list, provided you have an ordering:

-- | Construct the set of equivalence classes, AKA the /quotient set/, for a
-- given input list, given an ordering relation between values.
  :: (a -> a -> Ordering)
  -- ^ The equivalence relation \(R\).
  -> [a]
  -- ^ The input set \(X\), represented as a simple list.
  -> [[a]]
  -- ^ The resulting quotient set \(X/R\), defined as a list of lists, where
  -- every item in the resulting list is another set of items grouped by \(R\).
quotientBy k
  = groupBy (fmap (== EQ) . k)
  . sortBy k

-- | Construct the set of equivalence classes for a given list of items, using
-- @'(==)'@ as the equivalence/ordering relation.
quotient :: Ord a => [a] -> [[a]]
quotient = quotientBy compare

And with that, we can simply flatten the original graph G to a set of nodes, and then use quotientBy in order to partition them into the corresponding quotient set. Picking a single entry out of each equivalence class gives us a “canonical” node to use throughout the graph, and we can throw the other nodes away.

Of course, I don’t think this combinator really passes the Fairbairn threshold which would justify giving it a name/provenance in a library somewhere, but it’s a useful trick to keep in mind. (I would, however, love a better nub!)

Return to the homepage. π
more contrast