Back to index

python-biopython  1.60
Functions | Variables
Bio.triefind Namespace Reference

Functions

def match
def match_all
def find
def find_words

Variables

 DEFAULT_BOUNDARY_CHARS = string.punctuation+string.whitespace

Detailed Description

Given a trie, find all occurrences of a word in the trie in a string.

Like searching a string for a substring, except that the substring is
any word in a trie.

Functions:
match         Find longest key in a trie matching the beginning of the string.
match_all     Find all keys in a trie matching the beginning of the string.
find          Find keys in a trie matching anywhere in a string.
find_words    Find keys in a trie matching whole words in a string.

Function Documentation

def Bio.triefind.find (   string,
  trie 
)
find(string, trie) -> list of tuples (key, start, end)

Find all the keys in the trie that match anywhere in the string.

Definition at line 49 of file triefind.py.

00049 
00050 def find(string, trie):
00051     """find(string, trie) -> list of tuples (key, start, end)
00052 
00053     Find all the keys in the trie that match anywhere in the string.
00054 
00055     """
00056     results = []
00057     start = 0     # index to start the search
00058     while start < len(string):
00059         # Look for a match.
00060         keys = match_all(string[start:], trie)
00061         for key in keys:
00062             results.append((key, start, start+len(key)))
00063         start += 1
00064     return results

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.triefind.find_words (   string,
  trie 
)
find_words(string, trie) -> list of tuples (key, start, end)

Find all the keys in the trie that match full words in the string.
Word boundaries are defined as any punctuation or whitespace.

Definition at line 67 of file triefind.py.

00067 
00068 def find_words(string, trie):
00069     """find_words(string, trie) -> list of tuples (key, start, end)
00070 
00071     Find all the keys in the trie that match full words in the string.
00072     Word boundaries are defined as any punctuation or whitespace.
00073 
00074     """
00075     _boundary_re = re.compile(r"[%s]+" % re.escape(DEFAULT_BOUNDARY_CHARS))
00076         
00077     results = []
00078     start = 0     # index of word boundary
00079     while start < len(string):
00080         # Look for a match.
00081         keys = match_all(string[start:], trie)
00082         for key in keys:
00083             l = len(key)
00084             # Make sure it ends at a boundary.
00085             if start+l == len(string) or \
00086                _boundary_re.match(string[start+l]):
00087                 results.append((key, start, start+l))
00088         # Move forward to the next boundary.
00089         m = _boundary_re.search(string, start)
00090         if m is None:
00091             break
00092         start = m.end()
00093     return results

Here is the call graph for this function:

def Bio.triefind.match (   string,
  trie 
)
match(string, trie) -> longest key or None

Find the longest key in the trie that matches the beginning of the
string.

Definition at line 17 of file triefind.py.

00017 
00018 def match(string, trie):
00019     """match(string, trie) -> longest key or None
00020 
00021     Find the longest key in the trie that matches the beginning of the
00022     string.
00023 
00024     """
00025     longest = None
00026     for i in range(len(string)):
00027         substr = string[:i+1]
00028         if not trie.has_prefix(substr):
00029             break
00030         if trie.has_key(substr):
00031             longest = substr
00032     return longest

Here is the caller graph for this function:

def Bio.triefind.match_all (   string,
  trie 
)
match_all(string, trie) -> list of keys

Find all the keys in the trie that matches the beginning of the
string.

Definition at line 33 of file triefind.py.

00033 
00034 def match_all(string, trie):
00035     """match_all(string, trie) -> list of keys
00036 
00037     Find all the keys in the trie that matches the beginning of the
00038     string.
00039 
00040     """
00041     matches = []
00042     for i in range(len(string)):
00043         substr = string[:i+1]
00044         if not trie.has_prefix(substr):
00045             break
00046         if trie.has_key(substr):
00047             matches.append(substr)
00048     return matches

Here is the caller graph for this function:


Variable Documentation

Bio.triefind.DEFAULT_BOUNDARY_CHARS = string.punctuation+string.whitespace

Definition at line 65 of file triefind.py.