Back to index

moin  1.9.0~rc2
Public Member Functions | Public Attributes | Private Member Functions
MoinMoin.support.difflib.Differ Class Reference

List of all members.

Public Member Functions

def __init__
def compare

Public Attributes

 linejunk
 charjunk

Private Member Functions

def _dump
def _plain_replace
def _fancy_replace
def _fancy_helper
def _qformat

Detailed Description

Definition at line 770 of file difflib.py.


Constructor & Destructor Documentation

def MoinMoin.support.difflib.Differ.__init__ (   self,
  linejunk = None,
  charjunk = None 
)
Construct a text differencer, with optional filters.

The two optional keyword parameters are for filter functions:

- `linejunk`: A function that should accept a single string argument,
  and return true iff the string is junk. The module-level function
  `IS_LINE_JUNK` may be used to filter out lines without visible
  characters, except for at most one splat ('#').  It is recommended
  to leave linejunk None; as of Python 2.3, the underlying
  SequenceMatcher class has grown an adaptive notion of "noise" lines
  that's better than any static definition the author has ever been
  able to craft.

- `charjunk`: A function that should accept a string of length 1. The
  module-level function `IS_CHARACTER_JUNK` may be used to filter out
  whitespace characters (a blank or tab; **note**: bad idea to include
  newline in this!).  Use of IS_CHARACTER_JUNK is recommended.

Definition at line 864 of file difflib.py.

00864 
00865     def __init__(self, linejunk=None, charjunk=None):
00866         """
00867         Construct a text differencer, with optional filters.
00868 
00869         The two optional keyword parameters are for filter functions:
00870 
00871         - `linejunk`: A function that should accept a single string argument,
00872           and return true iff the string is junk. The module-level function
00873           `IS_LINE_JUNK` may be used to filter out lines without visible
00874           characters, except for at most one splat ('#').  It is recommended
00875           to leave linejunk None; as of Python 2.3, the underlying
00876           SequenceMatcher class has grown an adaptive notion of "noise" lines
00877           that's better than any static definition the author has ever been
00878           able to craft.
00879 
00880         - `charjunk`: A function that should accept a string of length 1. The
00881           module-level function `IS_CHARACTER_JUNK` may be used to filter out
00882           whitespace characters (a blank or tab; **note**: bad idea to include
00883           newline in this!).  Use of IS_CHARACTER_JUNK is recommended.
00884         """
00885 
00886         self.linejunk = linejunk
00887         self.charjunk = charjunk


Member Function Documentation

def MoinMoin.support.difflib.Differ._dump (   self,
  tag,
  x,
  lo,
  hi 
) [private]
Generate comparison results for a same-tagged range.

Definition at line 929 of file difflib.py.

00929 
00930     def _dump(self, tag, x, lo, hi):
00931         """Generate comparison results for a same-tagged range."""
00932         for i in xrange(lo, hi):
00933             yield '%s %s' % (tag, x[i])

Here is the caller graph for this function:

def MoinMoin.support.difflib.Differ._fancy_helper (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 1047 of file difflib.py.

01047 
01048     def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
01049         g = []
01050         if alo < ahi:
01051             if blo < bhi:
01052                 g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
01053             else:
01054                 g = self._dump('-', a, alo, ahi)
01055         elif blo < bhi:
01056             g = self._dump('+', b, blo, bhi)
01057 
01058         for line in g:
01059             yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def MoinMoin.support.difflib.Differ._fancy_replace (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 949 of file difflib.py.

00949 
00950     def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
00951         r"""
00952         When replacing one block of lines with another, search the blocks
00953         for *similar* lines; the best-matching pair (if any) is used as a
00954         synch point, and intraline difference marking is done on the
00955         similar pair. Lots of work, but often worth it.
00956 
00957         Example:
00958 
00959         >>> d = Differ()
00960         >>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
00961         ...                            ['abcdefGhijkl\n'], 0, 1)
00962         >>> print ''.join(results),
00963         - abcDefghiJkl
00964         ?    ^  ^  ^
00965         + abcdefGhijkl
00966         ?    ^  ^  ^
00967         """
00968 
00969         # don't synch up unless the lines have a similarity score of at
00970         # least cutoff; best_ratio tracks the best score seen so far
00971         best_ratio, cutoff = 0.74, 0.75
00972         cruncher = SequenceMatcher(self.charjunk)
00973         eqi, eqj = None, None   # 1st indices of equal lines (if any)
00974 
00975         # search for the pair that matches best without being identical
00976         # (identical lines must be junk lines, & we don't want to synch up
00977         # on junk -- unless we have to)
00978         for j in xrange(blo, bhi):
00979             bj = b[j]
00980             cruncher.set_seq2(bj)
00981             for i in xrange(alo, ahi):
00982                 ai = a[i]
00983                 if ai == bj:
00984                     if eqi is None:
00985                         eqi, eqj = i, j
00986                     continue
00987                 cruncher.set_seq1(ai)
00988                 # computing similarity is expensive, so use the quick
00989                 # upper bounds first -- have seen this speed up messy
00990                 # compares by a factor of 3.
00991                 # note that ratio() is only expensive to compute the first
00992                 # time it's called on a sequence pair; the expensive part
00993                 # of the computation is cached by cruncher
00994                 if cruncher.real_quick_ratio() > best_ratio and \
00995                       cruncher.quick_ratio() > best_ratio and \
00996                       cruncher.ratio() > best_ratio:
00997                     best_ratio, best_i, best_j = cruncher.ratio(), i, j
00998         if best_ratio < cutoff:
00999             # no non-identical "pretty close" pair
01000             if eqi is None:
01001                 # no identical pair either -- treat it as a straight replace
01002                 for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
01003                     yield line
01004                 return
01005             # no close pair, but an identical pair -- synch up on that
01006             best_i, best_j, best_ratio = eqi, eqj, 1.0
01007         else:
01008             # there's a close pair, so forget the identical pair (if any)
01009             eqi = None
01010 
01011         # a[best_i] very similar to b[best_j]; eqi is None iff they're not
01012         # identical
01013 
01014         # pump out diffs from before the synch point
01015         for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
01016             yield line
01017 
01018         # do intraline marking on the synch pair
01019         aelt, belt = a[best_i], b[best_j]
01020         if eqi is None:
01021             # pump out a '-', '?', '+', '?' quad for the synched lines
01022             atags = btags = ""
01023             cruncher.set_seqs(aelt, belt)
01024             for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
01025                 la, lb = ai2 - ai1, bj2 - bj1
01026                 if tag == 'replace':
01027                     atags += '^' * la
01028                     btags += '^' * lb
01029                 elif tag == 'delete':
01030                     atags += '-' * la
01031                 elif tag == 'insert':
01032                     btags += '+' * lb
01033                 elif tag == 'equal':
01034                     atags += ' ' * la
01035                     btags += ' ' * lb
01036                 else:
01037                     raise ValueError, 'unknown tag %r' % (tag,)
01038             for line in self._qformat(aelt, belt, atags, btags):
01039                 yield line
01040         else:
01041             # the synch pair is identical
01042             yield '  ' + aelt
01043 
01044         # pump out diffs from after the synch point
01045         for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
01046             yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def MoinMoin.support.difflib.Differ._plain_replace (   self,
  a,
  alo,
  ahi,
  b,
  blo,
  bhi 
) [private]

Definition at line 934 of file difflib.py.

00934 
00935     def _plain_replace(self, a, alo, ahi, b, blo, bhi):
00936         assert alo < ahi and blo < bhi
00937         # dump the shorter block first -- reduces the burden on short-term
00938         # memory if the blocks are of very different sizes
00939         if bhi - blo < ahi - alo:
00940             first  = self._dump('+', b, blo, bhi)
00941             second = self._dump('-', a, alo, ahi)
00942         else:
00943             first  = self._dump('-', a, alo, ahi)
00944             second = self._dump('+', b, blo, bhi)
00945 
00946         for g in first, second:
00947             for line in g:
00948                 yield line

Here is the call graph for this function:

Here is the caller graph for this function:

def MoinMoin.support.difflib.Differ._qformat (   self,
  aline,
  bline,
  atags,
  btags 
) [private]

Definition at line 1060 of file difflib.py.

01060 
01061     def _qformat(self, aline, bline, atags, btags):
01062         r"""
01063         Format "?" output and deal with leading tabs.
01064 
01065         Example:
01066 
01067         >>> d = Differ()
01068         >>> results = d._qformat('\tabcDefghiJkl\n', '\t\tabcdefGhijkl\n',
01069         ...                      '  ^ ^  ^      ', '+  ^ ^  ^      ')
01070         >>> for line in results: print repr(line)
01071         ...
01072         '- \tabcDefghiJkl\n'
01073         '? \t ^ ^  ^\n'
01074         '+ \t\tabcdefGhijkl\n'
01075         '? \t  ^ ^  ^\n'
01076         """
01077 
01078         # Can hurt, but will probably help most of the time.
01079         common = min(_count_leading(aline, "\t"),
01080                      _count_leading(bline, "\t"))
01081         common = min(common, _count_leading(atags[:common], " "))
01082         atags = atags[common:].rstrip()
01083         btags = btags[common:].rstrip()
01084 
01085         yield "- " + aline
01086         if atags:
01087             yield "? %s%s\n" % ("\t" * common, atags)
01088 
01089         yield "+ " + bline
01090         if btags:
01091             yield "? %s%s\n" % ("\t" * common, btags)
01092 
01093 # With respect to junk, an earlier version of ndiff simply refused to
01094 # *start* a match with a junk element.  The result was cases like this:
01095 #     before: private Thread currentThread;
01096 #     after:  private volatile Thread currentThread;
01097 # If you consider whitespace to be junk, the longest contiguous match
01098 # not starting with junk is "e Thread currentThread".  So ndiff reported
01099 # that "e volatil" was inserted between the 't' and the 'e' in "private".
01100 # While an accurate view, to people that's absurd.  The current version
01101 # looks for matching blocks that are entirely junk-free, then extends the
01102 # longest one of those as far as possible but only with matching junk.
01103 # So now "currentThread" is matched, then extended to suck up the
01104 # preceding blank; then "private" is matched, and extended to suck up the
01105 # following blank; then "Thread" is matched; and finally ndiff reports
01106 # that "volatile " was inserted before "Thread".  The only quibble
01107 # remaining is that perhaps it was really the case that " volatile"
01108 # was inserted after "private".  I can live with that <wink>.

Here is the call graph for this function:

Here is the caller graph for this function:

def MoinMoin.support.difflib.Differ.compare (   self,
  a,
  b 
)

Definition at line 888 of file difflib.py.

00888 
00889     def compare(self, a, b):
00890         r"""
00891         Compare two sequences of lines; generate the resulting delta.
00892 
00893         Each sequence must contain individual single-line strings ending with
00894         newlines. Such sequences can be obtained from the `readlines()` method
00895         of file-like objects.  The delta generated also consists of newline-
00896         terminated strings, ready to be printed as-is via the writeline()
00897         method of a file-like object.
00898 
00899         Example:
00900 
00901         >>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
00902         ...                                'ore\ntree\nemu\n'.splitlines(1))),
00903         - one
00904         ?  ^
00905         + ore
00906         ?  ^
00907         - two
00908         - three
00909         ?  -
00910         + tree
00911         + emu
00912         """
00913 
00914         cruncher = SequenceMatcher(self.linejunk, a, b)
00915         for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
00916             if tag == 'replace':
00917                 g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
00918             elif tag == 'delete':
00919                 g = self._dump('-', a, alo, ahi)
00920             elif tag == 'insert':
00921                 g = self._dump('+', b, blo, bhi)
00922             elif tag == 'equal':
00923                 g = self._dump(' ', a, alo, ahi)
00924             else:
00925                 raise ValueError, 'unknown tag %r' % (tag,)
00926 
00927             for line in g:
00928                 yield line

Here is the call graph for this function:


Member Data Documentation

Definition at line 886 of file difflib.py.

Definition at line 885 of file difflib.py.


The documentation for this class was generated from the following file: