class DisjointDataStruct(object): """Disjoint-set data structure.
For details see: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Public functions: union(x, y) find(x) """
def __init__(self, vertices): """My personal representation of the spaghetti stack.
vertices - a list of vertex names from a graph """ self.struct = {v: [v, 0] for v in vertices}
def union(self, x, y): """Joins subsets x and y into a single subset.
Uses union by rank optimization.
x, y - vertex names corresponding to keys in self.struct """ xRoot = self.find(x) yRoot = self.find(y) if xRoot == yRoot: return
xV = self.struct[xRoot] yV = self.struct[yRoot] if xV[1] < yV[1]: xV[0] = yRoot elif xV[1] > yV[1]: yV[0] = xRoot else: yV[0] = xRoot xV[1] += 1
def find(self, x): """Determines which subset a particular element x is in.
Uses path compression optimization.
x - vertex name corresponding to key in self.struct """ if self.struct[x][0] != x: self.struct[x][0] = self.find(self.struct[x][0]) return self.struct[x][0]
The timings are as follows:__init__ is 17.5ms per loop in 100 loops with 100000 vertices (using timeit)
union has a mean of 1.0ms, standard deviation of 1.10e-7, min of ~0.9ms, and max of 1.0ms.
The union data was computed over the 224 union time values that did not equal 0 over a total of 100000 random unions of 100000 vertices (using time.time end - start).
On a single run of cProfile with a vertex set of 100000, and 100000 random unions where the two elements are not equal (I increment one of them if they are), I get the following values:
0.042s for __init__
0.197s for union in total (100000 calls) so per call is ~1.97e-06s
0.130s for find in total (299990 total / 200000 primitive) so per call is ~4.33e-07s
All of this was clocked running an i5-2500K @ 3.3GHz
No comments:
Post a Comment
A place in which to share your thoughts...