`class DisjointDataStruct(object): `

` """Disjoint-set data structure. `

` For details see:`

` http://en.wikipedia.org/wiki/Disjoint-set_data_structure`

` Public functions:`

` union(x, y)`

` find(x)`

` """`

` def __init__(self, vertices):`

` """My personal representation of the spaghetti stack.`

` vertices - a list of vertex names from a graph`

` """`

` self.struct = {v: [v, 0] for v in vertices}`

` def union(self, x, y):`

` """Joins subsets x and y into a single subset.`

` Uses union by rank optimization.`

` x, y - vertex names corresponding to keys in self.struct`

` """`

` xRoot = self.find(x)`

` yRoot = self.find(y)`

` if xRoot == yRoot:`

` return`

` xV = self.struct[xRoot]`

` yV = self.struct[yRoot]`

` if xV[1] < yV[1]:`

` xV[0] = yRoot`

` elif xV[1] > yV[1]:`

` yV[0] = xRoot`

` else:`

` yV[0] = xRoot`

` xV[1] += 1`

` def find(self, x):`

` """Determines which subset a particular element x is in.`

` Uses path compression optimization.`

` x - vertex name corresponding to key in self.struct`

` """`

` if self.struct[x][0] != x:`

` self.struct[x][0] = self.find(self.struct[x][0])`

` return self.struct[x][0]`

The timings are as follows:__init__ is 17.5ms per loop in 100 loops with 100000 vertices (using timeit)

union has a mean of 1.0ms, standard deviation of 1.10e-7, min of ~0.9ms, and max of 1.0ms.

The union data was computed over the 224 union time values that did not equal 0 over a total of 100000 random unions of 100000 vertices (using time.time end - start).

On a single run of cProfile with a vertex set of 100000, and 100000 random unions where the two elements are not equal (I increment one of them if they are), I get the following values:

0.042s for __init__

0.197s for union in total (100000 calls) so per call is ~1.97e-06s

0.130s for find in total (299990 total / 200000 primitive) so per call is ~4.33e-07s

All of this was clocked running an i5-2500K @ 3.3GHz

## No comments:

## Post a Comment

A place in which to share your thoughts...