python3.2
3.2.2

Public Member Functions  
def  __init__ 
def  set_seqs 
def  set_seq1 
def  set_seq2 
def  isbjunk 
def  isbpopular 
def  find_longest_match 
def  get_matching_blocks 
def  get_opcodes 
def  get_grouped_opcodes 
def  ratio 
def  quick_ratio 
def  real_quick_ratio 
Public Attributes  
isjunk  
a  
b  
autojunk  
matching_blocks  
opcodes  
fullbcount  
b2j  
bjunk  
bpopular  
Private Member Functions  
def  __chain_b 
SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (RO doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people. SequenceMatcher tries to compute a "humanfriendly diff" between two sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the longest *contiguous* & junkfree matching subsequence. That's what catches peoples' eyes. The Windows(tm) windiff has another interesting notion, pairing up elements that appear uniquely in each sequence. That, and the method here, appear to yield more intuitive difference reports than does diff. This method appears to be the least vulnerable to synching up on blocks of "junk lines", though (like blank lines in ordinary text files, or maybe "<P>" lines in HTML files). That may be because this is the only method of the 3 that has a *concept* of "junk" <wink>. Example, comparing two strings, and considering blanks to be "junk": >>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;") >>> .ratio() returns a float in [0, 1], measuring the "similarity" of the sequences. As a rule of thumb, a .ratio() value over 0.6 means the sequences are close matches: >>> print(round(s.ratio(), 3)) 0.866 >>> If you're only interested in where the sequences match, .get_matching_blocks() is handy: >>> for block in s.get_matching_blocks(): ... print("a[%d] and b[%d] match for %d elements" % block) a[0] and b[0] match for 8 elements a[8] and b[17] match for 21 elements a[29] and b[38] match for 0 elements Note that the last tuple returned by .get_matching_blocks() is always a dummy, (len(a), len(b), 0), and this is the only case in which the last tuple element (number of elements matched) is 0. If you want to know how to change the first sequence into the second, use .get_opcodes(): >>> for opcode in s.get_opcodes(): ... print("%6s a[%d:%d] b[%d:%d]" % opcode) equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:29] b[17:38] See the Differ class for a fancy humanfriendly file differencer, which uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (nearmatching) lines. See also function get_close_matches() in this module, which shows how simple code building on SequenceMatcher can be used to do useful work. Timing: Basic RO is cubic time worst case and quadratic time expected case. SequenceMatcher is quadratic time for the worst case and has expectedcase behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear. Methods: __init__(isjunk=None, a='', b='') Construct a SequenceMatcher. set_seqs(a, b) Set the two sequences to be compared. set_seq1(a) Set the first sequence to be compared. set_seq2(b) Set the second sequence to be compared. find_longest_match(alo, ahi, blo, bhi) Find longest matching block in a[alo:ahi] and b[blo:bhi]. get_matching_blocks() Return list of triples describing matching subsequences. get_opcodes() Return list of 5tuples describing how to turn a into b. ratio() Return a measure of the sequences' similarity (float in [0,1]). quick_ratio() Return an upper bound on .ratio() relatively quickly. real_quick_ratio() Return an upper bound on ratio() very quickly.
Definition at line 46 of file difflib.py.
def difflib.SequenceMatcher.__init__  (  self,  
isjunk = None , 

a = '' , 

b = '' , 

autojunk = True 

) 
Construct a SequenceMatcher. Optional arg isjunk is None (the default), or a oneargument function that takes a sequence element and returns true iff the element is junk. None is equivalent to passing "lambda x: 0", i.e. no elements are considered to be junk. For example, pass lambda x: x in " \\t" if you're comparing lines as sequences of characters, and don't want to synch up on blanks or hard tabs. Optional arg a is the first of two sequences to be compared. By default, an empty string. The elements of a must be hashable. See also .set_seqs() and .set_seq1(). Optional arg b is the second of two sequences to be compared. By default, an empty string. The elements of b must be hashable. See also .set_seqs() and .set_seq2(). Optional arg autojunk should be set to False to disable the "automatic junk heuristic" that treats popular elements as junk (see module documentation for more information).
Definition at line 154 of file difflib.py.
00154 00155 def __init__(self, isjunk=None, a='', b='', autojunk=True): 00156 """Construct a SequenceMatcher. 00157 00158 Optional arg isjunk is None (the default), or a oneargument 00159 function that takes a sequence element and returns true iff the 00160 element is junk. None is equivalent to passing "lambda x: 0", i.e. 00161 no elements are considered to be junk. For example, pass 00162 lambda x: x in " \\t" 00163 if you're comparing lines as sequences of characters, and don't 00164 want to synch up on blanks or hard tabs. 00165 00166 Optional arg a is the first of two sequences to be compared. By 00167 default, an empty string. The elements of a must be hashable. See 00168 also .set_seqs() and .set_seq1(). 00169 00170 Optional arg b is the second of two sequences to be compared. By 00171 default, an empty string. The elements of b must be hashable. See 00172 also .set_seqs() and .set_seq2(). 00173 00174 Optional arg autojunk should be set to False to disable the 00175 "automatic junk heuristic" that treats popular elements as junk 00176 (see module documentation for more information). 00177 """ 00178 00179 # Members: 00180 # a 00181 # first sequence 00182 # b 00183 # second sequence; differences are computed as "what do 00184 # we need to do to 'a' to change it into 'b'?" 00185 # b2j 00186 # for x in b, b2j[x] is a list of the indices (into b) 00187 # at which x appears; junk and popular elements do not appear 00188 # fullbcount 00189 # for x in b, fullbcount[x] == the number of times x 00190 # appears in b; only materialized if really needed (used 00191 # only for computing quick_ratio()) 00192 # matching_blocks 00193 # a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k]; 00194 # ascending & nonoverlapping in i and in j; terminated by 00195 # a dummy (len(a), len(b), 0) sentinel 00196 # opcodes 00197 # a list of (tag, i1, i2, j1, j2) tuples, where tag is 00198 # one of 00199 # 'replace' a[i1:i2] should be replaced by b[j1:j2] 00200 # 'delete' a[i1:i2] should be deleted 00201 # 'insert' b[j1:j2] should be inserted 00202 # 'equal' a[i1:i2] == b[j1:j2] 00203 # isjunk 00204 # a usersupplied function taking a sequence element and 00205 # returning true iff the element is "junk"  this has 00206 # subtle but helpful effects on the algorithm, which I'll 00207 # get around to writing up someday <0.9 wink>. 00208 # DON'T USE! Only __chain_b uses this. Use isbjunk. 00209 # bjunk 00210 # the items in b for which isjunk is True. 00211 # bpopular 00212 # nonjunk items in b treated as junk by the heuristic (if used). 00213 00214 self.isjunk = isjunk 00215 self.a = self.b = None 00216 self.autojunk = autojunk 00217 self.set_seqs(a, b)
def difflib.SequenceMatcher.__chain_b  (  self  )  [private] 
Definition at line 301 of file difflib.py.
00301 00302 def __chain_b(self): 00303 # Because isjunk is a userdefined (not C) function, and we test 00304 # for junk a LOT, it's important to minimize the number of calls. 00305 # Before the tricks described here, __chain_b was by far the most 00306 # timeconsuming routine in the whole module! If anyone sees 00307 # Jim Roskind, thank him again for profile.py  I never would 00308 # have guessed that. 00309 # The first trick is to build b2j ignoring the possibility 00310 # of junk. I.e., we don't call isjunk at all yet. Throwing 00311 # out the junk later is much cheaper than building b2j "right" 00312 # from the start. 00313 b = self.b 00314 self.b2j = b2j = {} 00315 00316 for i, elt in enumerate(b): 00317 indices = b2j.setdefault(elt, []) 00318 indices.append(i) 00319 00320 # Purge junk elements 00321 self.bjunk = junk = set() 00322 isjunk = self.isjunk 00323 if isjunk: 00324 for elt in b2j.keys(): 00325 if isjunk(elt): 00326 junk.add(elt) 00327 for elt in junk: # separate loop avoids separate list of keys 00328 del b2j[elt] 00329 00330 # Purge popular elements that are not junk 00331 self.bpopular = popular = set() 00332 n = len(b) 00333 if self.autojunk and n >= 200: 00334 ntest = n // 100 + 1 00335 for elt, idxs in b2j.items(): 00336 if len(idxs) > ntest: 00337 popular.add(elt) 00338 for elt in popular: # ditto; as fast for 1% deletion 00339 del b2j[elt]
def difflib.SequenceMatcher.find_longest_match  (  self,  
alo,  
ahi,  
blo,  
bhi  
) 
Find longest matching block in a[alo:ahi] and b[blo:bhi]. If isjunk is not defined: Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi blo <= j <= j+k <= bhi and for all (i',j',k') meeting those conditions, k >= k' i <= i' and if i == i', j <= j' In other words, of all maximal matching blocks, return one that starts earliest in a, and of all those maximal matching blocks that start earliest in a, return the one that starts earliest in b. >>> s = SequenceMatcher(None, " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=0, b=4, size=5) If isjunk is defined, first the longest matching block is determined as above, but with the additional restriction that no junk element appears in the block. Then that block is extended as far as possible by matching (only) junk elements on both sides. So the resulting block never matches on junk except as identical junk happens to be adjacent to an "interesting" match. Here's the same example as before, but considering blanks to be junk. That prevents " abcd" from matching the " abcd" at the tail end of the second sequence directly. Instead only the "abcd" can match, and matches the leftmost "abcd" in the second sequence: >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=1, b=0, size=4) If no blocks match, return (alo, blo, 0). >>> s = SequenceMatcher(None, "ab", "c") >>> s.find_longest_match(0, 2, 0, 1) Match(a=0, b=0, size=0)
Definition at line 354 of file difflib.py.
00354 00355 def find_longest_match(self, alo, ahi, blo, bhi): 00356 """Find longest matching block in a[alo:ahi] and b[blo:bhi]. 00357 00358 If isjunk is not defined: 00359 00360 Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where 00361 alo <= i <= i+k <= ahi 00362 blo <= j <= j+k <= bhi 00363 and for all (i',j',k') meeting those conditions, 00364 k >= k' 00365 i <= i' 00366 and if i == i', j <= j' 00367 00368 In other words, of all maximal matching blocks, return one that 00369 starts earliest in a, and of all those maximal matching blocks that 00370 start earliest in a, return the one that starts earliest in b. 00371 00372 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 00373 >>> s.find_longest_match(0, 5, 0, 9) 00374 Match(a=0, b=4, size=5) 00375 00376 If isjunk is defined, first the longest matching block is 00377 determined as above, but with the additional restriction that no 00378 junk element appears in the block. Then that block is extended as 00379 far as possible by matching (only) junk elements on both sides. So 00380 the resulting block never matches on junk except as identical junk 00381 happens to be adjacent to an "interesting" match. 00382 00383 Here's the same example as before, but considering blanks to be 00384 junk. That prevents " abcd" from matching the " abcd" at the tail 00385 end of the second sequence directly. Instead only the "abcd" can 00386 match, and matches the leftmost "abcd" in the second sequence: 00387 00388 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 00389 >>> s.find_longest_match(0, 5, 0, 9) 00390 Match(a=1, b=0, size=4) 00391 00392 If no blocks match, return (alo, blo, 0). 00393 00394 >>> s = SequenceMatcher(None, "ab", "c") 00395 >>> s.find_longest_match(0, 2, 0, 1) 00396 Match(a=0, b=0, size=0) 00397 """ 00398 00399 # CAUTION: stripping common prefix or suffix would be incorrect. 00400 # E.g., 00401 # ab 00402 # acab 00403 # Longest matching block is "ab", but if common prefix is 00404 # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so 00405 # strip, so ends up claiming that ab is changed to acab by 00406 # inserting "ca" in the middle. That's minimal but unintuitive: 00407 # "it's obvious" that someone inserted "ac" at the front. 00408 # Windiff ends up at the same place as diff, but by pairing up 00409 # the unique 'b's and then matching the first two 'a's. 00410 00411 a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.bjunk.__contains__ 00412 besti, bestj, bestsize = alo, blo, 0 00413 # find longest junkfree match 00414 # during an iteration of the loop, j2len[j] = length of longest 00415 # junkfree match ending with a[i1] and b[j] 00416 j2len = {} 00417 nothing = [] 00418 for i in range(alo, ahi): 00419 # look at all instances of a[i] in b; note that because 00420 # b2j has no junk keys, the loop is skipped if a[i] is junk 00421 j2lenget = j2len.get 00422 newj2len = {} 00423 for j in b2j.get(a[i], nothing): 00424 # a[i] matches b[j] 00425 if j < blo: 00426 continue 00427 if j >= bhi: 00428 break 00429 k = newj2len[j] = j2lenget(j1, 0) + 1 00430 if k > bestsize: 00431 besti, bestj, bestsize = ik+1, jk+1, k 00432 j2len = newj2len 00433 00434 # Extend the best by nonjunk elements on each end. In particular, 00435 # "popular" nonjunk elements aren't in b2j, which greatly speeds 00436 # the inner loop above, but also means "the best" match so far 00437 # doesn't contain any junk *or* popular nonjunk elements. 00438 while besti > alo and bestj > blo and \ 00439 not isbjunk(b[bestj1]) and \ 00440 a[besti1] == b[bestj1]: 00441 besti, bestj, bestsize = besti1, bestj1, bestsize+1 00442 while besti+bestsize < ahi and bestj+bestsize < bhi and \ 00443 not isbjunk(b[bestj+bestsize]) and \ 00444 a[besti+bestsize] == b[bestj+bestsize]: 00445 bestsize += 1 00446 00447 # Now that we have a wholly interesting match (albeit possibly 00448 # empty!), we may as well suck up the matching junk on each 00449 # side of it too. Can't think of a good reason not to, and it 00450 # saves postprocessing the (possibly considerable) expense of 00451 # figuring out what to do with it. In the case of an empty 00452 # interesting match, this is clearly the right thing to do, 00453 # because no other kind of match is possible in the regions. 00454 while besti > alo and bestj > blo and \ 00455 isbjunk(b[bestj1]) and \ 00456 a[besti1] == b[bestj1]: 00457 besti, bestj, bestsize = besti1, bestj1, bestsize+1 00458 while besti+bestsize < ahi and bestj+bestsize < bhi and \ 00459 isbjunk(b[bestj+bestsize]) and \ 00460 a[besti+bestsize] == b[bestj+bestsize]: 00461 bestsize = bestsize + 1 00462 00463 return Match(besti, bestj, bestsize)
def difflib.SequenceMatcher.get_grouped_opcodes  (  self,  
n = 3 

) 
Isolate change clusters by eliminating ranges with no changes. Return a generator of groups with upto n lines of context. Each group is in the same format as returned by get_opcodes(). >>> from pprint import pprint >>> a = list(map(str, range(1,40))) >>> b = a[:] >>> b[8:8] = ['i'] # Make an insertion >>> b[20] += 'x' # Make a replacement >>> b[23:28] = [] # Make a deletion >>> b[30] += 'y' # Make another replacement >>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes())) [[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)], [('equal', 16, 19, 17, 20), ('replace', 19, 20, 20, 21), ('equal', 20, 22, 21, 23), ('delete', 22, 27, 23, 23), ('equal', 27, 30, 23, 26)], [('equal', 31, 34, 27, 30), ('replace', 34, 35, 30, 31), ('equal', 35, 38, 31, 34)]]
Definition at line 590 of file difflib.py.
00590 00591 def get_grouped_opcodes(self, n=3): 00592 """ Isolate change clusters by eliminating ranges with no changes. 00593 00594 Return a generator of groups with upto n lines of context. 00595 Each group is in the same format as returned by get_opcodes(). 00596 00597 >>> from pprint import pprint 00598 >>> a = list(map(str, range(1,40))) 00599 >>> b = a[:] 00600 >>> b[8:8] = ['i'] # Make an insertion 00601 >>> b[20] += 'x' # Make a replacement 00602 >>> b[23:28] = [] # Make a deletion 00603 >>> b[30] += 'y' # Make another replacement 00604 >>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes())) 00605 [[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)], 00606 [('equal', 16, 19, 17, 20), 00607 ('replace', 19, 20, 20, 21), 00608 ('equal', 20, 22, 21, 23), 00609 ('delete', 22, 27, 23, 23), 00610 ('equal', 27, 30, 23, 26)], 00611 [('equal', 31, 34, 27, 30), 00612 ('replace', 34, 35, 30, 31), 00613 ('equal', 35, 38, 31, 34)]] 00614 """ 00615 00616 codes = self.get_opcodes() 00617 if not codes: 00618 codes = [("equal", 0, 1, 0, 1)] 00619 # Fixup leading and trailing groups if they show no changes. 00620 if codes[0][0] == 'equal': 00621 tag, i1, i2, j1, j2 = codes[0] 00622 codes[0] = tag, max(i1, i2n), i2, max(j1, j2n), j2 00623 if codes[1][0] == 'equal': 00624 tag, i1, i2, j1, j2 = codes[1] 00625 codes[1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n) 00626 00627 nn = n + n 00628 group = [] 00629 for tag, i1, i2, j1, j2 in codes: 00630 # End the current group and start a new one whenever 00631 # there is a large range with no changes. 00632 if tag == 'equal' and i2i1 > nn: 00633 group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n))) 00634 yield group 00635 group = [] 00636 i1, j1 = max(i1, i2n), max(j1, j2n) 00637 group.append((tag, i1, i2, j1 ,j2)) 00638 if group and not (len(group)==1 and group[0][0] == 'equal'): 00639 yield group
def difflib.SequenceMatcher.get_matching_blocks  (  self  ) 
Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and in j. New in Python 2.5, it's also guaranteed that if (i, j, n) and (i', j', n') are adjacent triples in the list, and the second is not the last triple in the list, then i+n != i' or j+n != j'. IOW, adjacent triples never describe adjacent equal blocks. The last triple is a dummy, (len(a), len(b), 0), and is the only triple with n==0. >>> s = SequenceMatcher(None, "abxcd", "abcd") >>> list(s.get_matching_blocks()) [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Definition at line 464 of file difflib.py.
00464 00465 def get_matching_blocks(self): 00466 """Return list of triples describing matching subsequences. 00467 00468 Each triple is of the form (i, j, n), and means that 00469 a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in 00470 i and in j. New in Python 2.5, it's also guaranteed that if 00471 (i, j, n) and (i', j', n') are adjacent triples in the list, and 00472 the second is not the last triple in the list, then i+n != i' or 00473 j+n != j'. IOW, adjacent triples never describe adjacent equal 00474 blocks. 00475 00476 The last triple is a dummy, (len(a), len(b), 0), and is the only 00477 triple with n==0. 00478 00479 >>> s = SequenceMatcher(None, "abxcd", "abcd") 00480 >>> list(s.get_matching_blocks()) 00481 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 00482 """ 00483 00484 if self.matching_blocks is not None: 00485 return self.matching_blocks 00486 la, lb = len(self.a), len(self.b) 00487 00488 # This is most naturally expressed as a recursive algorithm, but 00489 # at least one user bumped into extreme use cases that exceeded 00490 # the recursion limit on their box. So, now we maintain a list 00491 # ('queue`) of blocks we still need to look at, and append partial 00492 # results to `matching_blocks` in a loop; the matches are sorted 00493 # at the end. 00494 queue = [(0, la, 0, lb)] 00495 matching_blocks = [] 00496 while queue: 00497 alo, ahi, blo, bhi = queue.pop() 00498 i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi) 00499 # a[alo:i] vs b[blo:j] unknown 00500 # a[i:i+k] same as b[j:j+k] 00501 # a[i+k:ahi] vs b[j+k:bhi] unknown 00502 if k: # if k is 0, there was no matching block 00503 matching_blocks.append(x) 00504 if alo < i and blo < j: 00505 queue.append((alo, i, blo, j)) 00506 if i+k < ahi and j+k < bhi: 00507 queue.append((i+k, ahi, j+k, bhi)) 00508 matching_blocks.sort() 00509 00510 # It's possible that we have adjacent equal blocks in the 00511 # matching_blocks list now. Starting with 2.5, this code was added 00512 # to collapse them. 00513 i1 = j1 = k1 = 0 00514 non_adjacent = [] 00515 for i2, j2, k2 in matching_blocks: 00516 # Is this block adjacent to i1, j1, k1? 00517 if i1 + k1 == i2 and j1 + k1 == j2: 00518 # Yes, so collapse them  this just increases the length of 00519 # the first block by the length of the second, and the first 00520 # block so lengthened remains the block to compare against. 00521 k1 += k2 00522 else: 00523 # Not adjacent. Remember the first block (k1==0 means it's 00524 # the dummy we started with), and make the second block the 00525 # new block to compare against. 00526 if k1: 00527 non_adjacent.append((i1, j1, k1)) 00528 i1, j1, k1 = i2, j2, k2 00529 if k1: 00530 non_adjacent.append((i1, j1, k1)) 00531 00532 non_adjacent.append( (la, lb, 0) ) 00533 self.matching_blocks = non_adjacent 00534 return map(Match._make, self.matching_blocks)
def difflib.SequenceMatcher.get_opcodes  (  self  ) 
Return list of 5tuples describing how to turn a into b. Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the tuple preceding it, and likewise for j1 == the previous j2. The tags are strings, with these meanings: 'replace': a[i1:i2] should be replaced by b[j1:j2] 'delete': a[i1:i2] should be deleted. Note that j1==j2 in this case. 'insert': b[j1:j2] should be inserted at a[i1:i1]. Note that i1==i2 in this case. 'equal': a[i1:i2] == b[j1:j2] >>> a = "qabxcd" >>> b = "abycdf" >>> s = SequenceMatcher(None, a, b) >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): ... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) replace a[3:4] (x) b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f)
Definition at line 535 of file difflib.py.
00535 00536 def get_opcodes(self): 00537 """Return list of 5tuples describing how to turn a into b. 00538 00539 Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple 00540 has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the 00541 tuple preceding it, and likewise for j1 == the previous j2. 00542 00543 The tags are strings, with these meanings: 00544 00545 'replace': a[i1:i2] should be replaced by b[j1:j2] 00546 'delete': a[i1:i2] should be deleted. 00547 Note that j1==j2 in this case. 00548 'insert': b[j1:j2] should be inserted at a[i1:i1]. 00549 Note that i1==i2 in this case. 00550 'equal': a[i1:i2] == b[j1:j2] 00551 00552 >>> a = "qabxcd" 00553 >>> b = "abycdf" 00554 >>> s = SequenceMatcher(None, a, b) 00555 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 00556 ... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % 00557 ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) 00558 delete a[0:1] (q) b[0:0] () 00559 equal a[1:3] (ab) b[0:2] (ab) 00560 replace a[3:4] (x) b[2:3] (y) 00561 equal a[4:6] (cd) b[3:5] (cd) 00562 insert a[6:6] () b[5:6] (f) 00563 """ 00564 00565 if self.opcodes is not None: 00566 return self.opcodes 00567 i = j = 0 00568 self.opcodes = answer = [] 00569 for ai, bj, size in self.get_matching_blocks(): 00570 # invariant: we've pumped out correct diffs to change 00571 # a[:i] into b[:j], and the next matching block is 00572 # a[ai:ai+size] == b[bj:bj+size]. So we need to pump 00573 # out a diff to change a[i:ai] into b[j:bj], pump out 00574 # the matching block, and move (i,j) beyond the match 00575 tag = '' 00576 if i < ai and j < bj: 00577 tag = 'replace' 00578 elif i < ai: 00579 tag = 'delete' 00580 elif j < bj: 00581 tag = 'insert' 00582 if tag: 00583 answer.append( (tag, i, ai, j, bj) ) 00584 i, j = ai+size, bj+size 00585 # the list of matching blocks is terminated by a 00586 # sentinel with size 0 00587 if size: 00588 answer.append( ('equal', ai, i, bj, j) ) 00589 return answer
def difflib.SequenceMatcher.isbjunk  (  self,  
item  
) 
Definition at line 340 of file difflib.py.
00340 00341 def isbjunk(self, item): 00342 "Deprecated; use 'item in SequenceMatcher().bjunk'." 00343 warnings.warn("'SequenceMatcher().isbjunk(item)' is deprecated;\n" 00344 "use 'item in SMinstance.bjunk' instead.", 00345 DeprecationWarning, 2) 00346 return item in self.bjunk
def difflib.SequenceMatcher.isbpopular  (  self,  
item  
) 
Definition at line 347 of file difflib.py.
00347 00348 def isbpopular(self, item): 00349 "Deprecated; use 'item in SequenceMatcher().bpopular'." 00350 warnings.warn("'SequenceMatcher().isbpopular(item)' is deprecated;\n" 00351 "use 'item in SMinstance.bpopular' instead.", 00352 DeprecationWarning, 2) 00353 return item in self.bpopular
def difflib.SequenceMatcher.quick_ratio  (  self  ) 
Return an upper bound on ratio() relatively quickly. This isn't defined beyond that it is an upper bound on .ratio(), and is faster to compute.
Definition at line 665 of file difflib.py.
00665 00666 def quick_ratio(self): 00667 """Return an upper bound on ratio() relatively quickly. 00668 00669 This isn't defined beyond that it is an upper bound on .ratio(), and 00670 is faster to compute. 00671 """ 00672 00673 # viewing a and b as multisets, set matches to the cardinality 00674 # of their intersection; this counts the number of matches 00675 # without regard to order, so is clearly an upper bound 00676 if self.fullbcount is None: 00677 self.fullbcount = fullbcount = {} 00678 for elt in self.b: 00679 fullbcount[elt] = fullbcount.get(elt, 0) + 1 00680 fullbcount = self.fullbcount 00681 # avail[x] is the number of times x appears in 'b' less the 00682 # number of times we've seen it in 'a' so far ... kinda 00683 avail = {} 00684 availhas, matches = avail.__contains__, 0 00685 for elt in self.a: 00686 if availhas(elt): 00687 numb = avail[elt] 00688 else: 00689 numb = fullbcount.get(elt, 0) 00690 avail[elt] = numb  1 00691 if numb > 0: 00692 matches = matches + 1 00693 return _calculate_ratio(matches, len(self.a) + len(self.b))
def difflib.SequenceMatcher.ratio  (  self  ) 
Return a measure of the sequences' similarity (float in [0,1]). Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1 if the sequences are identical, and 0 if they have nothing in common. .ratio() is expensive to compute if you haven't already computed .get_matching_blocks() or .get_opcodes(), in which case you may want to try .quick_ratio() or .real_quick_ratio() first to get an upper bound. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.quick_ratio() 0.75 >>> s.real_quick_ratio() 1.0
Definition at line 640 of file difflib.py.
00640 00641 def ratio(self): 00642 """Return a measure of the sequences' similarity (float in [0,1]). 00643 00644 Where T is the total number of elements in both sequences, and 00645 M is the number of matches, this is 2.0*M / T. 00646 Note that this is 1 if the sequences are identical, and 0 if 00647 they have nothing in common. 00648 00649 .ratio() is expensive to compute if you haven't already computed 00650 .get_matching_blocks() or .get_opcodes(), in which case you may 00651 want to try .quick_ratio() or .real_quick_ratio() first to get an 00652 upper bound. 00653 00654 >>> s = SequenceMatcher(None, "abcd", "bcde") 00655 >>> s.ratio() 00656 0.75 00657 >>> s.quick_ratio() 00658 0.75 00659 >>> s.real_quick_ratio() 00660 1.0 00661 """ 00662 00663 matches = sum(triple[1] for triple in self.get_matching_blocks()) 00664 return _calculate_ratio(matches, len(self.a) + len(self.b))
def difflib.SequenceMatcher.real_quick_ratio  (  self  ) 
Return an upper bound on ratio() very quickly. This isn't defined beyond that it is an upper bound on .ratio(), and is faster to compute than either .ratio() or .quick_ratio().
Definition at line 694 of file difflib.py.
00694 00695 def real_quick_ratio(self): 00696 """Return an upper bound on ratio() very quickly. 00697 00698 This isn't defined beyond that it is an upper bound on .ratio(), and 00699 is faster to compute than either .ratio() or .quick_ratio(). 00700 """ 00701 00702 la, lb = len(self.a), len(self.b) 00703 # can't have more matches than the number of elements in the 00704 # shorter sequence 00705 return _calculate_ratio(min(la, lb), la + lb)
def difflib.SequenceMatcher.set_seq1  (  self,  
a  
) 
Set the first sequence to be compared. The second sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq1("bcde") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq2().
Definition at line 230 of file difflib.py.
00230 00231 def set_seq1(self, a): 00232 """Set the first sequence to be compared. 00233 00234 The second sequence to be compared is not changed. 00235 00236 >>> s = SequenceMatcher(None, "abcd", "bcde") 00237 >>> s.ratio() 00238 0.75 00239 >>> s.set_seq1("bcde") 00240 >>> s.ratio() 00241 1.0 00242 >>> 00243 00244 SequenceMatcher computes and caches detailed information about the 00245 second sequence, so if you want to compare one sequence S against 00246 many sequences, use .set_seq2(S) once and call .set_seq1(x) 00247 repeatedly for each of the other sequences. 00248 00249 See also set_seqs() and set_seq2(). 00250 """ 00251 00252 if a is self.a: 00253 return 00254 self.a = a 00255 self.matching_blocks = self.opcodes = None
def difflib.SequenceMatcher.set_seq2  (  self,  
b  
) 
Set the second sequence to be compared. The first sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq2("abcd") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq1().
Definition at line 256 of file difflib.py.
00256 00257 def set_seq2(self, b): 00258 """Set the second sequence to be compared. 00259 00260 The first sequence to be compared is not changed. 00261 00262 >>> s = SequenceMatcher(None, "abcd", "bcde") 00263 >>> s.ratio() 00264 0.75 00265 >>> s.set_seq2("abcd") 00266 >>> s.ratio() 00267 1.0 00268 >>> 00269 00270 SequenceMatcher computes and caches detailed information about the 00271 second sequence, so if you want to compare one sequence S against 00272 many sequences, use .set_seq2(S) once and call .set_seq1(x) 00273 repeatedly for each of the other sequences. 00274 00275 See also set_seqs() and set_seq1(). 00276 """ 00277 00278 if b is self.b: 00279 return 00280 self.b = b 00281 self.matching_blocks = self.opcodes = None 00282 self.fullbcount = None 00283 self.__chain_b()
def difflib.SequenceMatcher.set_seqs  (  self,  
a,  
b  
) 
Set the two sequences to be compared. >>> s = SequenceMatcher() >>> s.set_seqs("abcd", "bcde") >>> s.ratio() 0.75
Definition at line 218 of file difflib.py.
00218 00219 def set_seqs(self, a, b): 00220 """Set the two sequences to be compared. 00221 00222 >>> s = SequenceMatcher() 00223 >>> s.set_seqs("abcd", "bcde") 00224 >>> s.ratio() 00225 0.75 00226 """ 00227 00228 self.set_seq1(a) 00229 self.set_seq2(b)
Definition at line 214 of file difflib.py.
Definition at line 215 of file difflib.py.
Definition at line 214 of file difflib.py.
Definition at line 313 of file difflib.py.
Definition at line 320 of file difflib.py.
Definition at line 330 of file difflib.py.
Definition at line 281 of file difflib.py.
Definition at line 213 of file difflib.py.
Definition at line 254 of file difflib.py.
Definition at line 254 of file difflib.py.