## Monday, March 18, 2013

### Disjoint-set Data Structure in Python

I needed this code to write Kruskal's minimum spanning tree algorithm.

`class DisjointDataStruct(object): `
`  """Disjoint-set data structure. `

`  For details see:`
`  http://en.wikipedia.org/wiki/Disjoint-set_data_structure`

`  Public functions:`
`  union(x, y)`
`  find(x)`
`  """`

`  def __init__(self, vertices):`
`    """My personal representation of the spaghetti stack.`

`    vertices - a list of vertex names from a graph`
`    """`
`    self.struct = {v: [v, 0] for v in vertices}`

`  def union(self, x, y):`
`    """Joins subsets x and y into a single subset.`

`    Uses union by rank optimization.`

`    x, y - vertex names corresponding to keys in self.struct`
`    """`
`    xRoot = self.find(x)`
`    yRoot = self.find(y)`
`    if xRoot == yRoot:`
`      return`

`    xV = self.struct[xRoot]`
`    yV = self.struct[yRoot]`
`    if xV < yV:`
`      xV = yRoot`
`    elif xV > yV:`
`      yV = xRoot`
`    else:`
`      yV = xRoot`
`      xV += 1`

`  def find(self, x):`
`    """Determines which subset a particular element x is in.`

`    Uses path compression optimization.`

`    x - vertex name corresponding to key in self.struct`
`    """`
`    if self.struct[x] != x:`
`      self.struct[x] = self.find(self.struct[x])`
`    return self.struct[x]`

The timings are as follows:
__init__ is 17.5ms per loop in 100 loops with 100000 vertices (using timeit)
union has a mean of 1.0ms, standard deviation of 1.10e-7, min of ~0.9ms, and max of 1.0ms.
The union data was computed over the 224 union time values that did not equal 0 over a total of 100000 random unions of 100000 vertices (using time.time end - start).

On a single run of cProfile with a vertex set of 100000, and 100000 random unions where the two elements are not equal (I increment one of them if they are), I get the following values:
0.042s for __init__
0.197s for union in total (100000 calls) so per call is ~1.97e-06s
0.130s for find in total (299990 total / 200000 primitive) so per call is ~4.33e-07s

All of this was clocked running an i5-2500K @ 3.3GHz

#### Post a Comment

A place in which to share your thoughts...