moin
1.9.0~rc2

Public Member Functions  
def  __init__ 
def  set_seqs 
def  set_seq1 
def  set_seq2 
def  find_longest_match 
def  get_matching_blocks 
def  get_opcodes 
def  get_grouped_opcodes 
def  ratio 
def  quick_ratio 
def  real_quick_ratio 
Public Attributes  
isjunk  
a  
b  
matching_blocks  
opcodes  
fullbcount  
b2j  
isbjunk  
isbpopular  
Private Member Functions  
def  __chain_b 
SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (RO doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people. SequenceMatcher tries to compute a "humanfriendly diff" between two sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the longest *contiguous* & junkfree matching subsequence. That's what catches peoples' eyes. The Windows(tm) windiff has another interesting notion, pairing up elements that appear uniquely in each sequence. That, and the method here, appear to yield more intuitive difference reports than does diff. This method appears to be the least vulnerable to synching up on blocks of "junk lines", though (like blank lines in ordinary text files, or maybe "<P>" lines in HTML files). That may be because this is the only method of the 3 that has a *concept* of "junk" <wink>. Example, comparing two strings, and considering blanks to be "junk": >>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;") >>> .ratio() returns a float in [0, 1], measuring the "similarity" of the sequences. As a rule of thumb, a .ratio() value over 0.6 means the sequences are close matches: >>> print round(s.ratio(), 3) 0.866 >>> If you're only interested in where the sequences match, .get_matching_blocks() is handy: >>> for block in s.get_matching_blocks(): ... print "a[%d] and b[%d] match for %d elements" % block a[0] and b[0] match for 8 elements a[8] and b[17] match for 21 elements a[29] and b[38] match for 0 elements Note that the last tuple returned by .get_matching_blocks() is always a dummy, (len(a), len(b), 0), and this is the only case in which the last tuple element (number of elements matched) is 0. If you want to know how to change the first sequence into the second, use .get_opcodes(): >>> for opcode in s.get_opcodes(): ... print "%6s a[%d:%d] b[%d:%d]" % opcode equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:29] b[17:38] See the Differ class for a fancy humanfriendly file differencer, which uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (nearmatching) lines. See also function get_close_matches() in this module, which shows how simple code building on SequenceMatcher can be used to do useful work. Timing: Basic RO is cubic time worst case and quadratic time expected case. SequenceMatcher is quadratic time for the worst case and has expectedcase behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear. Methods: __init__(isjunk=None, a='', b='') Construct a SequenceMatcher. set_seqs(a, b) Set the two sequences to be compared. set_seq1(a) Set the first sequence to be compared. set_seq2(b) Set the second sequence to be compared. find_longest_match(alo, ahi, blo, bhi) Find longest matching block in a[alo:ahi] and b[blo:bhi]. get_matching_blocks() Return list of triples describing matching subsequences. get_opcodes() Return list of 5tuples describing how to turn a into b. ratio() Return a measure of the sequences' similarity (float in [0,1]). quick_ratio() Return an upper bound on .ratio() relatively quickly. real_quick_ratio() Return an upper bound on ratio() very quickly.
Definition at line 45 of file difflib.py.
def MoinMoin.support.difflib.SequenceMatcher.__init__  (  self,  
isjunk = None , 

a = '' , 

b = '' 

) 
Construct a SequenceMatcher. Optional arg isjunk is None (the default), or a oneargument function that takes a sequence element and returns true iff the element is junk. None is equivalent to passing "lambda x: 0", i.e. no elements are considered to be junk. For example, pass lambda x: x in " \\t" if you're comparing lines as sequences of characters, and don't want to synch up on blanks or hard tabs. Optional arg a is the first of two sequences to be compared. By default, an empty string. The elements of a must be hashable. See also .set_seqs() and .set_seq1(). Optional arg b is the second of two sequences to be compared. By default, an empty string. The elements of b must be hashable. See also .set_seqs() and .set_seq2().
Definition at line 153 of file difflib.py.
00153 00154 def __init__(self, isjunk=None, a='', b=''): 00155 """Construct a SequenceMatcher. 00156 00157 Optional arg isjunk is None (the default), or a oneargument 00158 function that takes a sequence element and returns true iff the 00159 element is junk. None is equivalent to passing "lambda x: 0", i.e. 00160 no elements are considered to be junk. For example, pass 00161 lambda x: x in " \\t" 00162 if you're comparing lines as sequences of characters, and don't 00163 want to synch up on blanks or hard tabs. 00164 00165 Optional arg a is the first of two sequences to be compared. By 00166 default, an empty string. The elements of a must be hashable. See 00167 also .set_seqs() and .set_seq1(). 00168 00169 Optional arg b is the second of two sequences to be compared. By 00170 default, an empty string. The elements of b must be hashable. See 00171 also .set_seqs() and .set_seq2(). 00172 """ 00173 00174 # Members: 00175 # a 00176 # first sequence 00177 # b 00178 # second sequence; differences are computed as "what do 00179 # we need to do to 'a' to change it into 'b'?" 00180 # b2j 00181 # for x in b, b2j[x] is a list of the indices (into b) 00182 # at which x appears; junk elements do not appear 00183 # fullbcount 00184 # for x in b, fullbcount[x] == the number of times x 00185 # appears in b; only materialized if really needed (used 00186 # only for computing quick_ratio()) 00187 # matching_blocks 00188 # a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k]; 00189 # ascending & nonoverlapping in i and in j; terminated by 00190 # a dummy (len(a), len(b), 0) sentinel 00191 # opcodes 00192 # a list of (tag, i1, i2, j1, j2) tuples, where tag is 00193 # one of 00194 # 'replace' a[i1:i2] should be replaced by b[j1:j2] 00195 # 'delete' a[i1:i2] should be deleted 00196 # 'insert' b[j1:j2] should be inserted 00197 # 'equal' a[i1:i2] == b[j1:j2] 00198 # isjunk 00199 # a usersupplied function taking a sequence element and 00200 # returning true iff the element is "junk"  this has 00201 # subtle but helpful effects on the algorithm, which I'll 00202 # get around to writing up someday <0.9 wink>. 00203 # DON'T USE! Only __chain_b uses this. Use isbjunk. 00204 # isbjunk 00205 # for x in b, isbjunk(x) == isjunk(x) but much faster; 00206 # it's really the has_key method of a hidden dict. 00207 # DOES NOT WORK for x in a! 00208 # isbpopular 00209 # for x in b, isbpopular(x) is true iff b is reasonably long 00210 # (at least 200 elements) and x accounts for more than 1% of 00211 # its elements. DOES NOT WORK for x in a! 00212 00213 self.isjunk = isjunk 00214 self.a = self.b = None 00215 self.set_seqs(a, b)
def MoinMoin.support.difflib.SequenceMatcher.__chain_b  (  self  )  [private] 
Definition at line 299 of file difflib.py.
00299 00300 def __chain_b(self): 00301 # Because isjunk is a userdefined (not C) function, and we test 00302 # for junk a LOT, it's important to minimize the number of calls. 00303 # Before the tricks described here, __chain_b was by far the most 00304 # timeconsuming routine in the whole module! If anyone sees 00305 # Jim Roskind, thank him again for profile.py  I never would 00306 # have guessed that. 00307 # The first trick is to build b2j ignoring the possibility 00308 # of junk. I.e., we don't call isjunk at all yet. Throwing 00309 # out the junk later is much cheaper than building b2j "right" 00310 # from the start. 00311 b = self.b 00312 n = len(b) 00313 self.b2j = b2j = {} 00314 populardict = {} 00315 for i, elt in enumerate(b): 00316 if elt in b2j: 00317 indices = b2j[elt] 00318 if n >= 200 and len(indices) * 100 > n: 00319 populardict[elt] = 1 00320 del indices[:] 00321 else: 00322 indices.append(i) 00323 else: 00324 b2j[elt] = [i] 00325 00326 # Purge leftover indices for popular elements. 00327 for elt in populardict: 00328 del b2j[elt] 00329 00330 # Now b2j.keys() contains elements uniquely, and especially when 00331 # the sequence is a string, that's usually a good deal smaller 00332 # than len(string). The difference is the number of isjunk calls 00333 # saved. 00334 isjunk = self.isjunk 00335 junkdict = {} 00336 if isjunk: 00337 for d in populardict, b2j: 00338 for elt in d.keys(): 00339 if isjunk(elt): 00340 junkdict[elt] = 1 00341 del d[elt] 00342 00343 # Now for x in b, isjunk(x) == x in junkdict, but the 00344 # latter is much faster. Note too that while there may be a 00345 # lot of junk in the sequence, the number of *unique* junk 00346 # elements is probably small. So the memory burden of keeping 00347 # this dict alive is likely trivial compared to the size of b2j. 00348 self.isbjunk = junkdict.has_key 00349 self.isbpopular = populardict.has_key
def MoinMoin.support.difflib.SequenceMatcher.find_longest_match  (  self,  
alo,  
ahi,  
blo,  
bhi  
) 
Find longest matching block in a[alo:ahi] and b[blo:bhi]. If isjunk is not defined: Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi blo <= j <= j+k <= bhi and for all (i',j',k') meeting those conditions, k >= k' i <= i' and if i == i', j <= j' In other words, of all maximal matching blocks, return one that starts earliest in a, and of all those maximal matching blocks that start earliest in a, return the one that starts earliest in b. >>> s = SequenceMatcher(None, " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) (0, 4, 5) If isjunk is defined, first the longest matching block is determined as above, but with the additional restriction that no junk element appears in the block. Then that block is extended as far as possible by matching (only) junk elements on both sides. So the resulting block never matches on junk except as identical junk happens to be adjacent to an "interesting" match. Here's the same example as before, but considering blanks to be junk. That prevents " abcd" from matching the " abcd" at the tail end of the second sequence directly. Instead only the "abcd" can match, and matches the leftmost "abcd" in the second sequence: >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) (1, 0, 4) If no blocks match, return (alo, blo, 0). >>> s = SequenceMatcher(None, "ab", "c") >>> s.find_longest_match(0, 2, 0, 1) (0, 0, 0)
Definition at line 350 of file difflib.py.
00350 00351 def find_longest_match(self, alo, ahi, blo, bhi): 00352 """Find longest matching block in a[alo:ahi] and b[blo:bhi]. 00353 00354 If isjunk is not defined: 00355 00356 Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where 00357 alo <= i <= i+k <= ahi 00358 blo <= j <= j+k <= bhi 00359 and for all (i',j',k') meeting those conditions, 00360 k >= k' 00361 i <= i' 00362 and if i == i', j <= j' 00363 00364 In other words, of all maximal matching blocks, return one that 00365 starts earliest in a, and of all those maximal matching blocks that 00366 start earliest in a, return the one that starts earliest in b. 00367 00368 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 00369 >>> s.find_longest_match(0, 5, 0, 9) 00370 (0, 4, 5) 00371 00372 If isjunk is defined, first the longest matching block is 00373 determined as above, but with the additional restriction that no 00374 junk element appears in the block. Then that block is extended as 00375 far as possible by matching (only) junk elements on both sides. So 00376 the resulting block never matches on junk except as identical junk 00377 happens to be adjacent to an "interesting" match. 00378 00379 Here's the same example as before, but considering blanks to be 00380 junk. That prevents " abcd" from matching the " abcd" at the tail 00381 end of the second sequence directly. Instead only the "abcd" can 00382 match, and matches the leftmost "abcd" in the second sequence: 00383 00384 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 00385 >>> s.find_longest_match(0, 5, 0, 9) 00386 (1, 0, 4) 00387 00388 If no blocks match, return (alo, blo, 0). 00389 00390 >>> s = SequenceMatcher(None, "ab", "c") 00391 >>> s.find_longest_match(0, 2, 0, 1) 00392 (0, 0, 0) 00393 """ 00394 00395 # CAUTION: stripping common prefix or suffix would be incorrect. 00396 # E.g., 00397 # ab 00398 # acab 00399 # Longest matching block is "ab", but if common prefix is 00400 # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so 00401 # strip, so ends up claiming that ab is changed to acab by 00402 # inserting "ca" in the middle. That's minimal but unintuitive: 00403 # "it's obvious" that someone inserted "ac" at the front. 00404 # Windiff ends up at the same place as diff, but by pairing up 00405 # the unique 'b's and then matching the first two 'a's. 00406 00407 a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk 00408 besti, bestj, bestsize = alo, blo, 0 00409 # find longest junkfree match 00410 # during an iteration of the loop, j2len[j] = length of longest 00411 # junkfree match ending with a[i1] and b[j] 00412 j2len = {} 00413 nothing = [] 00414 for i in xrange(alo, ahi): 00415 # look at all instances of a[i] in b; note that because 00416 # b2j has no junk keys, the loop is skipped if a[i] is junk 00417 j2lenget = j2len.get 00418 newj2len = {} 00419 for j in b2j.get(a[i], nothing): 00420 # a[i] matches b[j] 00421 if j < blo: 00422 continue 00423 if j >= bhi: 00424 break 00425 k = newj2len[j] = j2lenget(j1, 0) + 1 00426 if k > bestsize: 00427 besti, bestj, bestsize = ik+1, jk+1, k 00428 j2len = newj2len 00429 00430 # Extend the best by nonjunk elements on each end. In particular, 00431 # "popular" nonjunk elements aren't in b2j, which greatly speeds 00432 # the inner loop above, but also means "the best" match so far 00433 # doesn't contain any junk *or* popular nonjunk elements. 00434 while besti > alo and bestj > blo and \ 00435 not isbjunk(b[bestj1]) and \ 00436 a[besti1] == b[bestj1]: 00437 besti, bestj, bestsize = besti1, bestj1, bestsize+1 00438 while besti+bestsize < ahi and bestj+bestsize < bhi and \ 00439 not isbjunk(b[bestj+bestsize]) and \ 00440 a[besti+bestsize] == b[bestj+bestsize]: 00441 bestsize += 1 00442 00443 # Now that we have a wholly interesting match (albeit possibly 00444 # empty!), we may as well suck up the matching junk on each 00445 # side of it too. Can't think of a good reason not to, and it 00446 # saves postprocessing the (possibly considerable) expense of 00447 # figuring out what to do with it. In the case of an empty 00448 # interesting match, this is clearly the right thing to do, 00449 # because no other kind of match is possible in the regions. 00450 while besti > alo and bestj > blo and \ 00451 isbjunk(b[bestj1]) and \ 00452 a[besti1] == b[bestj1]: 00453 besti, bestj, bestsize = besti1, bestj1, bestsize+1 00454 while besti+bestsize < ahi and bestj+bestsize < bhi and \ 00455 isbjunk(b[bestj+bestsize]) and \ 00456 a[besti+bestsize] == b[bestj+bestsize]: 00457 bestsize = bestsize + 1 00458 00459 return besti, bestj, bestsize
def MoinMoin.support.difflib.SequenceMatcher.get_grouped_opcodes  (  self,  
n = 3 

) 
Isolate change clusters by eliminating ranges with no changes. Return a generator of groups with upto n lines of context. Each group is in the same format as returned by get_opcodes(). >>> from pprint import pprint >>> a = map(str, range(1,40)) >>> b = a[:] >>> b[8:8] = ['i'] # Make an insertion >>> b[20] += 'x' # Make a replacement >>> b[23:28] = [] # Make a deletion >>> b[30] += 'y' # Make another replacement >>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes())) [[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)], [('equal', 16, 19, 17, 20), ('replace', 19, 20, 20, 21), ('equal', 20, 22, 21, 23), ('delete', 22, 27, 23, 23), ('equal', 27, 30, 23, 26)], [('equal', 31, 34, 27, 30), ('replace', 34, 35, 30, 31), ('equal', 35, 38, 31, 34)]]
Definition at line 586 of file difflib.py.
00586 00587 def get_grouped_opcodes(self, n=3): 00588 """ Isolate change clusters by eliminating ranges with no changes. 00589 00590 Return a generator of groups with upto n lines of context. 00591 Each group is in the same format as returned by get_opcodes(). 00592 00593 >>> from pprint import pprint 00594 >>> a = map(str, range(1,40)) 00595 >>> b = a[:] 00596 >>> b[8:8] = ['i'] # Make an insertion 00597 >>> b[20] += 'x' # Make a replacement 00598 >>> b[23:28] = [] # Make a deletion 00599 >>> b[30] += 'y' # Make another replacement 00600 >>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes())) 00601 [[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)], 00602 [('equal', 16, 19, 17, 20), 00603 ('replace', 19, 20, 20, 21), 00604 ('equal', 20, 22, 21, 23), 00605 ('delete', 22, 27, 23, 23), 00606 ('equal', 27, 30, 23, 26)], 00607 [('equal', 31, 34, 27, 30), 00608 ('replace', 34, 35, 30, 31), 00609 ('equal', 35, 38, 31, 34)]] 00610 """ 00611 00612 codes = self.get_opcodes() 00613 if not codes: 00614 codes = [("equal", 0, 1, 0, 1)] 00615 # Fixup leading and trailing groups if they show no changes. 00616 if codes[0][0] == 'equal': 00617 tag, i1, i2, j1, j2 = codes[0] 00618 codes[0] = tag, max(i1, i2n), i2, max(j1, j2n), j2 00619 if codes[1][0] == 'equal': 00620 tag, i1, i2, j1, j2 = codes[1] 00621 codes[1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n) 00622 00623 nn = n + n 00624 group = [] 00625 for tag, i1, i2, j1, j2 in codes: 00626 # End the current group and start a new one whenever 00627 # there is a large range with no changes. 00628 if tag == 'equal' and i2i1 > nn: 00629 group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n))) 00630 yield group 00631 group = [] 00632 i1, j1 = max(i1, i2n), max(j1, j2n) 00633 group.append((tag, i1, i2, j1 ,j2)) 00634 if group and not (len(group)==1 and group[0][0] == 'equal'): 00635 yield group
Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and in j. New in Python 2.5, it's also guaranteed that if (i, j, n) and (i', j', n') are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'. IOW, adjacent triples never describe adjacent equal blocks. The last triple is a dummy, (len(a), len(b), 0), and is the only triple with n==0. >>> s = SequenceMatcher(None, "abxcd", "abcd") >>> s.get_matching_blocks() [(0, 0, 2), (3, 2, 2), (5, 4, 0)]
Definition at line 460 of file difflib.py.
00460 00461 def get_matching_blocks(self): 00462 """Return list of triples describing matching subsequences. 00463 00464 Each triple is of the form (i, j, n), and means that 00465 a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in 00466 i and in j. New in Python 2.5, it's also guaranteed that if 00467 (i, j, n) and (i', j', n') are adjacent triples in the list, and 00468 the second is not the last triple in the list, then i+n != i' or 00469 j+n != j'. IOW, adjacent triples never describe adjacent equal 00470 blocks. 00471 00472 The last triple is a dummy, (len(a), len(b), 0), and is the only 00473 triple with n==0. 00474 00475 >>> s = SequenceMatcher(None, "abxcd", "abcd") 00476 >>> s.get_matching_blocks() 00477 [(0, 0, 2), (3, 2, 2), (5, 4, 0)] 00478 """ 00479 00480 if self.matching_blocks is not None: 00481 return self.matching_blocks 00482 la, lb = len(self.a), len(self.b) 00483 00484 # This is most naturally expressed as a recursive algorithm, but 00485 # at least one user bumped into extreme use cases that exceeded 00486 # the recursion limit on their box. So, now we maintain a list 00487 # ('queue`) of blocks we still need to look at, and append partial 00488 # results to `matching_blocks` in a loop; the matches are sorted 00489 # at the end. 00490 queue = [(0, la, 0, lb)] 00491 matching_blocks = [] 00492 while queue: 00493 alo, ahi, blo, bhi = queue.pop() 00494 i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi) 00495 # a[alo:i] vs b[blo:j] unknown 00496 # a[i:i+k] same as b[j:j+k] 00497 # a[i+k:ahi] vs b[j+k:bhi] unknown 00498 if k: # if k is 0, there was no matching block 00499 matching_blocks.append(x) 00500 if alo < i and blo < j: 00501 queue.append((alo, i, blo, j)) 00502 if i+k < ahi and j+k < bhi: 00503 queue.append((i+k, ahi, j+k, bhi)) 00504 matching_blocks.sort() 00505 00506 # It's possible that we have adjacent equal blocks in the 00507 # matching_blocks list now. Starting with 2.5, this code was added 00508 # to collapse them. 00509 i1 = j1 = k1 = 0 00510 non_adjacent = [] 00511 for i2, j2, k2 in matching_blocks: 00512 # Is this block adjacent to i1, j1, k1? 00513 if i1 + k1 == i2 and j1 + k1 == j2: 00514 # Yes, so collapse them  this just increases the length of 00515 # the first block by the length of the second, and the first 00516 # block so lengthened remains the block to compare against. 00517 k1 += k2 00518 else: 00519 # Not adjacent. Remember the first block (k1==0 means it's 00520 # the dummy we started with), and make the second block the 00521 # new block to compare against. 00522 if k1: 00523 non_adjacent.append((i1, j1, k1)) 00524 i1, j1, k1 = i2, j2, k2 00525 if k1: 00526 non_adjacent.append((i1, j1, k1)) 00527 00528 non_adjacent.append( (la, lb, 0) ) 00529 self.matching_blocks = non_adjacent 00530 return self.matching_blocks
Return list of 5tuples describing how to turn a into b. Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the tuple preceding it, and likewise for j1 == the previous j2. The tags are strings, with these meanings: 'replace': a[i1:i2] should be replaced by b[j1:j2] 'delete': a[i1:i2] should be deleted. Note that j1==j2 in this case. 'insert': b[j1:j2] should be inserted at a[i1:i1]. Note that i1==i2 in this case. 'equal': a[i1:i2] == b[j1:j2] >>> a = "qabxcd" >>> b = "abycdf" >>> s = SequenceMatcher(None, a, b) >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): ... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) replace a[3:4] (x) b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f)
Definition at line 531 of file difflib.py.
00531 00532 def get_opcodes(self): 00533 """Return list of 5tuples describing how to turn a into b. 00534 00535 Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple 00536 has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the 00537 tuple preceding it, and likewise for j1 == the previous j2. 00538 00539 The tags are strings, with these meanings: 00540 00541 'replace': a[i1:i2] should be replaced by b[j1:j2] 00542 'delete': a[i1:i2] should be deleted. 00543 Note that j1==j2 in this case. 00544 'insert': b[j1:j2] should be inserted at a[i1:i1]. 00545 Note that i1==i2 in this case. 00546 'equal': a[i1:i2] == b[j1:j2] 00547 00548 >>> a = "qabxcd" 00549 >>> b = "abycdf" 00550 >>> s = SequenceMatcher(None, a, b) 00551 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 00552 ... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % 00553 ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])) 00554 delete a[0:1] (q) b[0:0] () 00555 equal a[1:3] (ab) b[0:2] (ab) 00556 replace a[3:4] (x) b[2:3] (y) 00557 equal a[4:6] (cd) b[3:5] (cd) 00558 insert a[6:6] () b[5:6] (f) 00559 """ 00560 00561 if self.opcodes is not None: 00562 return self.opcodes 00563 i = j = 0 00564 self.opcodes = answer = [] 00565 for ai, bj, size in self.get_matching_blocks(): 00566 # invariant: we've pumped out correct diffs to change 00567 # a[:i] into b[:j], and the next matching block is 00568 # a[ai:ai+size] == b[bj:bj+size]. So we need to pump 00569 # out a diff to change a[i:ai] into b[j:bj], pump out 00570 # the matching block, and move (i,j) beyond the match 00571 tag = '' 00572 if i < ai and j < bj: 00573 tag = 'replace' 00574 elif i < ai: 00575 tag = 'delete' 00576 elif j < bj: 00577 tag = 'insert' 00578 if tag: 00579 answer.append( (tag, i, ai, j, bj) ) 00580 i, j = ai+size, bj+size 00581 # the list of matching blocks is terminated by a 00582 # sentinel with size 0 00583 if size: 00584 answer.append( ('equal', ai, i, bj, j) ) 00585 return answer
Return an upper bound on ratio() relatively quickly. This isn't defined beyond that it is an upper bound on .ratio(), and is faster to compute.
Definition at line 662 of file difflib.py.
00662 00663 def quick_ratio(self): 00664 """Return an upper bound on ratio() relatively quickly. 00665 00666 This isn't defined beyond that it is an upper bound on .ratio(), and 00667 is faster to compute. 00668 """ 00669 00670 # viewing a and b as multisets, set matches to the cardinality 00671 # of their intersection; this counts the number of matches 00672 # without regard to order, so is clearly an upper bound 00673 if self.fullbcount is None: 00674 self.fullbcount = fullbcount = {} 00675 for elt in self.b: 00676 fullbcount[elt] = fullbcount.get(elt, 0) + 1 00677 fullbcount = self.fullbcount 00678 # avail[x] is the number of times x appears in 'b' less the 00679 # number of times we've seen it in 'a' so far ... kinda 00680 avail = {} 00681 availhas, matches = avail.has_key, 0 00682 for elt in self.a: 00683 if availhas(elt): 00684 numb = avail[elt] 00685 else: 00686 numb = fullbcount.get(elt, 0) 00687 avail[elt] = numb  1 00688 if numb > 0: 00689 matches = matches + 1 00690 return _calculate_ratio(matches, len(self.a) + len(self.b))
def MoinMoin.support.difflib.SequenceMatcher.ratio  (  self  ) 
Return a measure of the sequences' similarity (float in [0,1]). Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1 if the sequences are identical, and 0 if they have nothing in common. .ratio() is expensive to compute if you haven't already computed .get_matching_blocks() or .get_opcodes(), in which case you may want to try .quick_ratio() or .real_quick_ratio() first to get an upper bound. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.quick_ratio() 0.75 >>> s.real_quick_ratio() 1.0
Definition at line 636 of file difflib.py.
00636 00637 def ratio(self): 00638 """Return a measure of the sequences' similarity (float in [0,1]). 00639 00640 Where T is the total number of elements in both sequences, and 00641 M is the number of matches, this is 2.0*M / T. 00642 Note that this is 1 if the sequences are identical, and 0 if 00643 they have nothing in common. 00644 00645 .ratio() is expensive to compute if you haven't already computed 00646 .get_matching_blocks() or .get_opcodes(), in which case you may 00647 want to try .quick_ratio() or .real_quick_ratio() first to get an 00648 upper bound. 00649 00650 >>> s = SequenceMatcher(None, "abcd", "bcde") 00651 >>> s.ratio() 00652 0.75 00653 >>> s.quick_ratio() 00654 0.75 00655 >>> s.real_quick_ratio() 00656 1.0 00657 """ 00658 00659 matches = reduce(lambda sum, triple: sum + triple[1], 00660 self.get_matching_blocks(), 0) 00661 return _calculate_ratio(matches, len(self.a) + len(self.b))
Return an upper bound on ratio() very quickly. This isn't defined beyond that it is an upper bound on .ratio(), and is faster to compute than either .ratio() or .quick_ratio().
Definition at line 691 of file difflib.py.
00691 00692 def real_quick_ratio(self): 00693 """Return an upper bound on ratio() very quickly. 00694 00695 This isn't defined beyond that it is an upper bound on .ratio(), and 00696 is faster to compute than either .ratio() or .quick_ratio(). 00697 """ 00698 00699 la, lb = len(self.a), len(self.b) 00700 # can't have more matches than the number of elements in the 00701 # shorter sequence 00702 return _calculate_ratio(min(la, lb), la + lb)
def MoinMoin.support.difflib.SequenceMatcher.set_seq1  (  self,  
a  
) 
Set the first sequence to be compared. The second sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq1("bcde") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq2().
Definition at line 228 of file difflib.py.
00228 00229 def set_seq1(self, a): 00230 """Set the first sequence to be compared. 00231 00232 The second sequence to be compared is not changed. 00233 00234 >>> s = SequenceMatcher(None, "abcd", "bcde") 00235 >>> s.ratio() 00236 0.75 00237 >>> s.set_seq1("bcde") 00238 >>> s.ratio() 00239 1.0 00240 >>> 00241 00242 SequenceMatcher computes and caches detailed information about the 00243 second sequence, so if you want to compare one sequence S against 00244 many sequences, use .set_seq2(S) once and call .set_seq1(x) 00245 repeatedly for each of the other sequences. 00246 00247 See also set_seqs() and set_seq2(). 00248 """ 00249 00250 if a is self.a: 00251 return 00252 self.a = a 00253 self.matching_blocks = self.opcodes = None
def MoinMoin.support.difflib.SequenceMatcher.set_seq2  (  self,  
b  
) 
Set the second sequence to be compared. The first sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq2("abcd") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq1().
Definition at line 254 of file difflib.py.
00254 00255 def set_seq2(self, b): 00256 """Set the second sequence to be compared. 00257 00258 The first sequence to be compared is not changed. 00259 00260 >>> s = SequenceMatcher(None, "abcd", "bcde") 00261 >>> s.ratio() 00262 0.75 00263 >>> s.set_seq2("abcd") 00264 >>> s.ratio() 00265 1.0 00266 >>> 00267 00268 SequenceMatcher computes and caches detailed information about the 00269 second sequence, so if you want to compare one sequence S against 00270 many sequences, use .set_seq2(S) once and call .set_seq1(x) 00271 repeatedly for each of the other sequences. 00272 00273 See also set_seqs() and set_seq1(). 00274 """ 00275 00276 if b is self.b: 00277 return 00278 self.b = b 00279 self.matching_blocks = self.opcodes = None 00280 self.fullbcount = None 00281 self.__chain_b()
def MoinMoin.support.difflib.SequenceMatcher.set_seqs  (  self,  
a,  
b  
) 
Set the two sequences to be compared. >>> s = SequenceMatcher() >>> s.set_seqs("abcd", "bcde") >>> s.ratio() 0.75
Definition at line 216 of file difflib.py.
00216 00217 def set_seqs(self, a, b): 00218 """Set the two sequences to be compared. 00219 00220 >>> s = SequenceMatcher() 00221 >>> s.set_seqs("abcd", "bcde") 00222 >>> s.ratio() 00223 0.75 00224 """ 00225 00226 self.set_seq1(a) 00227 self.set_seq2(b)
Definition at line 213 of file difflib.py.
Definition at line 213 of file difflib.py.
Definition at line 312 of file difflib.py.
Definition at line 279 of file difflib.py.
Definition at line 347 of file difflib.py.
Definition at line 348 of file difflib.py.
Definition at line 212 of file difflib.py.
Definition at line 252 of file difflib.py.
Definition at line 252 of file difflib.py.