Back to index

python3.2  3.2.2
Public Member Functions | Public Attributes | Private Member Functions
difflib.Differ Class Reference

List of all members.

Public Member Functions

def __init__
def compare

Public Attributes

 linejunk
 charjunk

Private Member Functions

def _dump
def _plain_replace
def _fancy_replace
def _fancy_helper
def _qformat

Detailed Description

Definition at line 769 of file difflib.py.


Constructor & Destructor Documentation

def difflib.Differ.__init__ (   self,
  linejunk = None,
  charjunk = None 
)
Construct a text differencer, with optional filters.

The two optional keyword parameters are for filter functions:

- `linejunk`: A function that should accept a single string argument,
  and return true iff the string is junk. The module-level function
  `IS_LINE_JUNK` may be used to filter out lines without visible
  characters, except for at most one splat ('#').  It is recommended
  to leave linejunk None; as of Python 2.3, the underlying
  SequenceMatcher class has grown an adaptive notion of "noise" lines
  that's better than any static definition the author has ever been
  able to craft.

- `charjunk`: A function that should accept a string of length 1. The
  module-level function `IS_CHARACTER_JUNK` may be used to filter out
  whitespace characters (a blank or tab; **note**: bad idea to include
  newline in this!).  Use of IS_CHARACTER_JUNK is recommended.

Definition at line 863 of file difflib.py.

00863 
00864     def __init__(self, linejunk=None, charjunk=None):
00865         """
00866         Construct a text differencer, with optional filters.
00867 
00868         The two optional keyword parameters are for filter functions:
00869 
00870         - `linejunk`: A function that should accept a single string argument,
00871           and return true iff the string is junk. The module-level function
00872           `IS_LINE_JUNK` may be used to filter out lines without visible
00873           characters, except for at most one splat ('#').  It is recommended
00874           to leave linejunk None; as of Python 2.3, the underlying
00875           SequenceMatcher class has grown an adaptive notion of "noise" lines
00876           that's better than any static definition the author has ever been
00877           able to craft.
00878 
00879         - `charjunk`: A function that should accept a string of length 1. The
00880           module-level function `IS_CHARACTER_JUNK` may be used to filter out
00881           whitespace characters (a blank or tab; **note**: bad idea to include
00882           newline in this!).  Use of IS_CHARACTER_JUNK is recommended.
00883         """
00884 
00885         self.linejunk = linejunk
00886         self.charjunk = charjunk

Here is the caller graph for this function:


Member Function Documentation

def difflib.Differ._dump (   self,
  tag,
  x,
  lo,
  hi 
) [private]
Generate comparison results for a same-tagged range.

Definition at line 929 of file difflib.py.

00929 
00930     def _dump(self, tag, x, lo, hi):
00931         """Generate comparison results for a same-tagged range."""
00932         for i in range(lo, hi):
00933             yield '%s %s' % (tag, x[i])

Here is the caller graph for this function:

def difflib.Differ._fancy_helper (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 1047 of file difflib.py.

01047 
01048     def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
01049         g = []
01050         if alo < ahi:
01051             if blo < bhi:
01052                 g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
01053             else:
01054                 g = self._dump('-', a, alo, ahi)
01055         elif blo < bhi:
01056             g = self._dump('+', b, blo, bhi)
01057 
01058         for line in g:
01059             yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def difflib.Differ._fancy_replace (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 949 of file difflib.py.

00949 
00950     def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
00951         r"""
00952         When replacing one block of lines with another, search the blocks
00953         for *similar* lines; the best-matching pair (if any) is used as a
00954         synch point, and intraline difference marking is done on the
00955         similar pair. Lots of work, but often worth it.
00956 
00957         Example:
00958 
00959         >>> d = Differ()
00960         >>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
00961         ...                            ['abcdefGhijkl\n'], 0, 1)
00962         >>> print(''.join(results), end="")
00963         - abcDefghiJkl
00964         ?    ^  ^  ^
00965         + abcdefGhijkl
00966         ?    ^  ^  ^
00967         """
00968 
00969         # don't synch up unless the lines have a similarity score of at
00970         # least cutoff; best_ratio tracks the best score seen so far
00971         best_ratio, cutoff = 0.74, 0.75
00972         cruncher = SequenceMatcher(self.charjunk)
00973         eqi, eqj = None, None   # 1st indices of equal lines (if any)
00974 
00975         # search for the pair that matches best without being identical
00976         # (identical lines must be junk lines, & we don't want to synch up
00977         # on junk -- unless we have to)
00978         for j in range(blo, bhi):
00979             bj = b[j]
00980             cruncher.set_seq2(bj)
00981             for i in range(alo, ahi):
00982                 ai = a[i]
00983                 if ai == bj:
00984                     if eqi is None:
00985                         eqi, eqj = i, j
00986                     continue
00987                 cruncher.set_seq1(ai)
00988                 # computing similarity is expensive, so use the quick
00989                 # upper bounds first -- have seen this speed up messy
00990                 # compares by a factor of 3.
00991                 # note that ratio() is only expensive to compute the first
00992                 # time it's called on a sequence pair; the expensive part
00993                 # of the computation is cached by cruncher
00994                 if cruncher.real_quick_ratio() > best_ratio and \
00995                       cruncher.quick_ratio() > best_ratio and \
00996                       cruncher.ratio() > best_ratio:
00997                     best_ratio, best_i, best_j = cruncher.ratio(), i, j
00998         if best_ratio < cutoff:
00999             # no non-identical "pretty close" pair
01000             if eqi is None:
01001                 # no identical pair either -- treat it as a straight replace
01002                 for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
01003                     yield line
01004                 return
01005             # no close pair, but an identical pair -- synch up on that
01006             best_i, best_j, best_ratio = eqi, eqj, 1.0
01007         else:
01008             # there's a close pair, so forget the identical pair (if any)
01009             eqi = None
01010 
01011         # a[best_i] very similar to b[best_j]; eqi is None iff they're not
01012         # identical
01013 
01014         # pump out diffs from before the synch point
01015         for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
01016             yield line
01017 
01018         # do intraline marking on the synch pair
01019         aelt, belt = a[best_i], b[best_j]
01020         if eqi is None:
01021             # pump out a '-', '?', '+', '?' quad for the synched lines
01022             atags = btags = ""
01023             cruncher.set_seqs(aelt, belt)
01024             for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
01025                 la, lb = ai2 - ai1, bj2 - bj1
01026                 if tag == 'replace':
01027                     atags += '^' * la
01028                     btags += '^' * lb
01029                 elif tag == 'delete':
01030                     atags += '-' * la
01031                 elif tag == 'insert':
01032                     btags += '+' * lb
01033                 elif tag == 'equal':
01034                     atags += ' ' * la
01035                     btags += ' ' * lb
01036                 else:
01037                     raise ValueError('unknown tag %r' % (tag,))
01038             for line in self._qformat(aelt, belt, atags, btags):
01039                 yield line
01040         else:
01041             # the synch pair is identical
01042             yield '  ' + aelt
01043 
01044         # pump out diffs from after the synch point
01045         for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
01046             yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def difflib.Differ._plain_replace (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 934 of file difflib.py.

00934 
00935     def _plain_replace(self, a, alo, ahi, b, blo, bhi):
00936         assert alo < ahi and blo < bhi
00937         # dump the shorter block first -- reduces the burden on short-term
00938         # memory if the blocks are of very different sizes
00939         if bhi - blo < ahi - alo:
00940             first  = self._dump('+', b, blo, bhi)
00941             second = self._dump('-', a, alo, ahi)
00942         else:
00943             first  = self._dump('-', a, alo, ahi)
00944             second = self._dump('+', b, blo, bhi)
00945 
00946         for g in first, second:
00947             for line in g:
00948                 yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def difflib.Differ._qformat (   self,
  aline,
  bline,
  atags,
  btags 
) [private]

Definition at line 1060 of file difflib.py.

01060 
01061     def _qformat(self, aline, bline, atags, btags):
01062         r"""
01063         Format "?" output and deal with leading tabs.
01064 
01065         Example:
01066 
01067         >>> d = Differ()
01068         >>> results = d._qformat('\tabcDefghiJkl\n', '\tabcdefGhijkl\n',
01069         ...                      '  ^ ^  ^      ', '  ^ ^  ^      ')
01070         >>> for line in results: print(repr(line))
01071         ...
01072         '- \tabcDefghiJkl\n'
01073         '? \t ^ ^  ^\n'
01074         '+ \tabcdefGhijkl\n'
01075         '? \t ^ ^  ^\n'
01076         """
01077 
01078         # Can hurt, but will probably help most of the time.
01079         common = min(_count_leading(aline, "\t"),
01080                      _count_leading(bline, "\t"))
01081         common = min(common, _count_leading(atags[:common], " "))
01082         common = min(common, _count_leading(btags[:common], " "))
01083         atags = atags[common:].rstrip()
01084         btags = btags[common:].rstrip()
01085 
01086         yield "- " + aline
01087         if atags:
01088             yield "? %s%s\n" % ("\t" * common, atags)
01089 
01090         yield "+ " + bline
01091         if btags:
01092             yield "? %s%s\n" % ("\t" * common, btags)
01093 
01094 # With respect to junk, an earlier version of ndiff simply refused to
01095 # *start* a match with a junk element.  The result was cases like this:
01096 #     before: private Thread currentThread;
01097 #     after:  private volatile Thread currentThread;
01098 # If you consider whitespace to be junk, the longest contiguous match
01099 # not starting with junk is "e Thread currentThread".  So ndiff reported
01100 # that "e volatil" was inserted between the 't' and the 'e' in "private".
01101 # While an accurate view, to people that's absurd.  The current version
01102 # looks for matching blocks that are entirely junk-free, then extends the
01103 # longest one of those as far as possible but only with matching junk.
01104 # So now "currentThread" is matched, then extended to suck up the
01105 # preceding blank; then "private" is matched, and extended to suck up the
01106 # following blank; then "Thread" is matched; and finally ndiff reports
01107 # that "volatile " was inserted before "Thread".  The only quibble
01108 # remaining is that perhaps it was really the case that " volatile"
01109 # was inserted after "private".  I can live with that <wink>.

Here is the call graph for this function:

Here is the caller graph for this function:

def difflib.Differ.compare (   self,
  a,
  b 
)

Definition at line 887 of file difflib.py.

00887 
00888     def compare(self, a, b):
00889         r"""
00890         Compare two sequences of lines; generate the resulting delta.
00891 
00892         Each sequence must contain individual single-line strings ending with
00893         newlines. Such sequences can be obtained from the `readlines()` method
00894         of file-like objects.  The delta generated also consists of newline-
00895         terminated strings, ready to be printed as-is via the writeline()
00896         method of a file-like object.
00897 
00898         Example:
00899 
00900         >>> print(''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
00901         ...                                'ore\ntree\nemu\n'.splitlines(1))),
00902         ...       end="")
00903         - one
00904         ?  ^
00905         + ore
00906         ?  ^
00907         - two
00908         - three
00909         ?  -
00910         + tree
00911         + emu
00912         """
00913 
00914         cruncher = SequenceMatcher(self.linejunk, a, b)
00915         for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
00916             if tag == 'replace':
00917                 g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
00918             elif tag == 'delete':
00919                 g = self._dump('-', a, alo, ahi)
00920             elif tag == 'insert':
00921                 g = self._dump('+', b, blo, bhi)
00922             elif tag == 'equal':
00923                 g = self._dump(' ', a, alo, ahi)
00924             else:
00925                 raise ValueError('unknown tag %r' % (tag,))
00926 
00927             for line in g:
00928                 yield line

Here is the call graph for this function:


Member Data Documentation

Definition at line 885 of file difflib.py.

Definition at line 884 of file difflib.py.


The documentation for this class was generated from the following file: