class DisjointDataStruct(object):
"""Disjoint-set data structure.
For details see:
http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Public functions:
union(x, y)
find(x)
"""
def __init__(self, vertices):
"""My personal representation of the spaghetti stack.
vertices - a list of vertex names from a graph
"""
self.struct = {v: [v, 0] for v in vertices}
def union(self, x, y):
"""Joins subsets x and y into a single subset.
Uses union by rank optimization.
x, y - vertex names corresponding to keys in self.struct
"""
xRoot = self.find(x)
yRoot = self.find(y)
if xRoot == yRoot:
return
xV = self.struct[xRoot]
yV = self.struct[yRoot]
if xV[1] < yV[1]:
xV[0] = yRoot
elif xV[1] > yV[1]:
yV[0] = xRoot
else:
yV[0] = xRoot
xV[1] += 1
def find(self, x):
"""Determines which subset a particular element x is in.
Uses path compression optimization.
x - vertex name corresponding to key in self.struct
"""
if self.struct[x][0] != x:
self.struct[x][0] = self.find(self.struct[x][0])
return self.struct[x][0]
The timings are as follows:__init__ is 17.5ms per loop in 100 loops with 100000 vertices (using timeit)
union has a mean of 1.0ms, standard deviation of 1.10e-7, min of ~0.9ms, and max of 1.0ms.
The union data was computed over the 224 union time values that did not equal 0 over a total of 100000 random unions of 100000 vertices (using time.time end - start).
On a single run of cProfile with a vertex set of 100000, and 100000 random unions where the two elements are not equal (I increment one of them if they are), I get the following values:
0.042s for __init__
0.197s for union in total (100000 calls) so per call is ~1.97e-06s
0.130s for find in total (299990 total / 200000 primitive) so per call is ~4.33e-07s
All of this was clocked running an i5-2500K @ 3.3GHz
No comments:
Post a Comment
A place in which to share your thoughts...