Back to index

python-biopython  1.60
Classes | Functions | Variables
Bio.Nexus.Nexus Namespace Reference

Classes

class  NexusError
class  CharBuffer
class  StepMatrix
class  Commandline
class  Block
class  Nexus

Functions

def safename
def quotestrip
def get_start_end
def _sort_keys_by_values
def _make_unique
def _unique_label
def _seqmatrix2strmatrix
def _compact4nexus
def combine
def _kill_comments_and_break_lines
def _adjust_lines
def _replace_parenthesized_ambigs
def _get_command_lines

Variables

int INTERLEAVE = 70
list SPECIAL_COMMANDS
list KNOWN_NEXUS_BLOCKS = ['trees','data', 'characters', 'taxa', 'sets','codons']
string PUNCTUATION = '()[]{}/\,;:=*\'"`+-<>'
string MRBAYESSAFE = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_'
string WHITESPACE = ' \t\n'
list SPECIALCOMMENTS = ['&']
string CHARSET = 'chars'
string TAXSET = 'taxa'
string CODONPOSITIONS = 'codonpositions'
string DEFAULTNEXUS = '#NEXUS\nbegin data; dimensions ntax=0 nchar=0; format datatype=dna; end; '

Class Documentation

class Bio::Nexus::Nexus::NexusError

Definition at line 37 of file Nexus.py.


Function Documentation

def Bio.Nexus.Nexus._adjust_lines (   lines) [private]
Adjust linebreaks to match ';', strip leading/trailing whitespace.

list_of_commandlines=_adjust_lines(input_text)
Lines are adjusted so that no linebreaks occur within a commandline 
(except matrix command line)

Definition at line 433 of file Nexus.py.

00433 
00434 def _adjust_lines(lines):
00435     """Adjust linebreaks to match ';', strip leading/trailing whitespace.
00436 
00437     list_of_commandlines=_adjust_lines(input_text)
00438     Lines are adjusted so that no linebreaks occur within a commandline 
00439     (except matrix command line)
00440     """
00441     formatted_lines=[] 
00442     for l in lines:
00443         #Convert line endings
00444         l=l.replace('\r\n','\n').replace('\r','\n').strip()
00445         if l.lower().startswith('matrix'):
00446             formatted_lines.append(l)
00447         else:
00448             l=l.replace('\n',' ')
00449             if l:
00450                 formatted_lines.append(l)
00451     return formatted_lines
    

Here is the caller graph for this function:

def Bio.Nexus.Nexus._compact4nexus (   orig_list) [private]
Transform [1 2 3 5 6 7 8 12 15 18 20] (baseindex 0, used in the Nexus class)
into '2-4 6-9 13-19\\3 21' (baseindex 1, used in programs like Paup or MrBayes.).

Definition at line 267 of file Nexus.py.

00267 
00268 def _compact4nexus(orig_list):
00269     """Transform [1 2 3 5 6 7 8 12 15 18 20] (baseindex 0, used in the Nexus class)
00270     into '2-4 6-9 13-19\\3 21' (baseindex 1, used in programs like Paup or MrBayes.).
00271     """
00272     
00273     if not orig_list:
00274         return ''
00275     orig_list=list(set(orig_list))
00276     orig_list.sort()
00277     shortlist=[]
00278     clist=orig_list[:]
00279     clist.append(clist[-1]+.5) # dummy value makes it easier 
00280     while len(clist)>1:
00281         step=1
00282         for i,x in enumerate(clist):
00283             if x==clist[0]+i*step:   # are we still in the right step?
00284                 continue
00285             elif i==1 and len(clist)>3 and clist[i+1]-x==x-clist[0]:
00286                 # second element, and possibly at least 3 elements to link,
00287                 # and the next one is in the right step 
00288                 step=x-clist[0]
00289             else:   # pattern broke, add all values before current position to new list
00290                 sub=clist[:i]
00291                 if len(sub)==1:
00292                     shortlist.append(str(sub[0]+1))
00293                 else:
00294                     if step==1:
00295                         shortlist.append('%d-%d' % (sub[0]+1,sub[-1]+1))
00296                     else:
00297                         shortlist.append('%d-%d\\%d' % (sub[0]+1,sub[-1]+1,step))
00298                 clist=clist[i:]
00299                 break
00300     return ' '.join(shortlist)

Here is the caller graph for this function:

def Bio.Nexus.Nexus._get_command_lines (   file_contents) [private]

Definition at line 1716 of file Nexus.py.

01716 
01717     def _get_command_lines(file_contents):
01718         lines=_kill_comments_and_break_lines(file_contents)
01719         commandlines=_adjust_lines(lines)
01720         return commandlines
else:

Here is the call graph for this function:

Delete []-delimited comments out of a file and break into lines separated by ';'.

stripped_text=_kill_comments_and_break_lines(text):
Nested and multiline comments are allowed. [ and ] symbols within single
or double quotes are ignored, newline ends a quote, all symbols with quotes are
treated the same (thus not quoting inside comments like [this character ']' ends a comment])
Special [&...] and [\...] comments remain untouched, if not inside standard comment.
Quotes inside special [& and [\ are treated as normal characters,
but no nesting inside these special comments allowed (like [&   [\   ]]).
';' ist deleted from end of line.
   
NOTE: this function is very slow for large files, and obsolete when using C extension cnexus

Definition at line 372 of file Nexus.py.

00372 
00373 def _kill_comments_and_break_lines(text):
00374     """Delete []-delimited comments out of a file and break into lines separated by ';'.
00375     
00376     stripped_text=_kill_comments_and_break_lines(text):
00377     Nested and multiline comments are allowed. [ and ] symbols within single
00378     or double quotes are ignored, newline ends a quote, all symbols with quotes are
00379     treated the same (thus not quoting inside comments like [this character ']' ends a comment])
00380     Special [&...] and [\...] comments remain untouched, if not inside standard comment.
00381     Quotes inside special [& and [\ are treated as normal characters,
00382     but no nesting inside these special comments allowed (like [&   [\   ]]).
00383     ';' ist deleted from end of line.
00384    
00385     NOTE: this function is very slow for large files, and obsolete when using C extension cnexus
00386     """
00387     contents=iter(text)
00388     newtext=[] 
00389     newline=[]
00390     quotelevel=''
00391     speciallevel=False
00392     commlevel=0
00393     #Parse with one character look ahead (for special comments)
00394     t2 = contents.next()
00395     while True:
00396         t = t2
00397         try:
00398             t2 = contents.next()
00399         except StopIteration:
00400             t2 = None
00401         if t is None:
00402             break
00403         if t==quotelevel and not (commlevel or speciallevel):            # matching quote ends quotation
00404             quotelevel=''
00405         elif not quotelevel and not (commlevel or speciallevel) and (t=='"' or t=="'"): # single or double quote starts quotation
00406             quotelevel=t
00407         elif not quotelevel and t=='[':                             # opening bracket outside a quote
00408             if t2 in SPECIALCOMMENTS and commlevel==0 and not speciallevel:
00409                 speciallevel=True
00410             else:
00411                 commlevel+=1
00412         elif not quotelevel and t==']':                             # closing bracket ioutside a quote 
00413             if speciallevel:
00414                 speciallevel=False
00415             else:
00416                 commlevel-=1
00417                 if commlevel<0:
00418                     raise NexusError('Nexus formatting error: unmatched ]')
00419                 continue 
00420         if commlevel==0:                        # copy if we're not in comment
00421             if t==';' and not quotelevel:
00422                 newtext.append(''.join(newline))
00423                 newline=[]
00424             else:
00425                 newline.append(t)
00426     #level of comments should be 0 at the end of the file
00427     if newline:
00428         newtext.append('\n'.join(newline))
00429     if commlevel>0:
00430         raise NexusError('Nexus formatting error: unmatched [')
00431     return newtext
00432 

Here is the caller graph for this function:

def Bio.Nexus.Nexus._make_unique (   l) [private]
Check that all values in list are unique and return a pruned and sorted list.

Definition at line 248 of file Nexus.py.

00248 
00249 def _make_unique(l):
00250     """Check that all values in list are unique and return a pruned and sorted list."""
00251     l=list(set(l))
00252     l.sort()
00253     return l

Here is the caller graph for this function:

def Bio.Nexus.Nexus._replace_parenthesized_ambigs (   seq,
  rev_ambig_values 
) [private]
Replaces ambigs in xxx(ACG)xxx format by IUPAC ambiguity code.

Definition at line 452 of file Nexus.py.

00452 
00453 def _replace_parenthesized_ambigs(seq,rev_ambig_values):
00454     """Replaces ambigs in xxx(ACG)xxx format by IUPAC ambiguity code."""
00455 
00456     opening=seq.find('(')
00457     while opening>-1:
00458         closing=seq.find(')')
00459         if closing<0:
00460             raise NexusError('Missing closing parenthesis in: '+seq)
00461         elif closing<opening:
00462             raise NexusError('Missing opening parenthesis in: '+seq)
00463         ambig=[x for x in seq[opening+1:closing]]
00464         ambig.sort()
00465         ambig=''.join(ambig)
00466         ambig_code=rev_ambig_values[ambig.upper()]
00467         if ambig!=ambig.upper():
00468             ambig_code=ambig_code.lower()
00469         seq=seq[:opening]+ambig_code+seq[closing+1:]        
00470         opening=seq.find('(')
00471     return seq

def Bio.Nexus.Nexus._seqmatrix2strmatrix (   matrix) [private]
Converts a Seq-object matrix to a plain sequence-string matrix.

Definition at line 263 of file Nexus.py.

00263 
00264 def _seqmatrix2strmatrix(matrix):
00265     """Converts a Seq-object matrix to a plain sequence-string matrix."""
00266     return dict([(t,matrix[t].tostring()) for t in matrix])

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.Nexus.Nexus._sort_keys_by_values (   p) [private]
Returns a sorted list of keys of p sorted by values of p.

Definition at line 241 of file Nexus.py.

00241 
00242 def _sort_keys_by_values(p):
00243     """Returns a sorted list of keys of p sorted by values of p."""     
00244     startpos=[(p[pn],pn) for pn in p if p[pn]]
00245     startpos.sort()
00246     # parenthisis added because of py3k
00247     return (zip(*startpos))[1]
    

Here is the caller graph for this function:

def Bio.Nexus.Nexus._unique_label (   previous_labels,
  label 
) [private]
Returns a unique name if label is already in previous_labels.

Definition at line 254 of file Nexus.py.

00254 
00255 def _unique_label(previous_labels,label):
00256     """Returns a unique name if label is already in previous_labels."""
00257     while label in previous_labels:
00258         if label.split('.')[-1].startswith('copy'):
00259             label='.'.join(label.split('.')[:-1])+'.copy'+str(eval('0'+label.split('.')[-1][4:])+1)
00260         else:
00261             label+='.copy'
00262     return label

Here is the caller graph for this function:

def Bio.Nexus.Nexus.combine (   matrices)
Combine matrices in [(name,nexus-instance),...] and return new nexus instance.

combined_matrix=combine([(name1,nexus_instance1),(name2,nexus_instance2),...]
Character sets, character partitions and taxon sets are prefixed, readjusted
and present in the combined matrix. 

Definition at line 301 of file Nexus.py.

00301 
00302 def combine(matrices):
00303     """Combine matrices in [(name,nexus-instance),...] and return new nexus instance.
00304 
00305     combined_matrix=combine([(name1,nexus_instance1),(name2,nexus_instance2),...]
00306     Character sets, character partitions and taxon sets are prefixed, readjusted
00307     and present in the combined matrix. 
00308     """
00309     
00310     if not matrices:
00311         return None
00312     name=matrices[0][0]
00313     combined=copy.deepcopy(matrices[0][1]) # initiate with copy of first matrix
00314     mixed_datatypes=(len(set([n[1].datatype for n in matrices]))>1)
00315     if mixed_datatypes:
00316         combined.datatype='None'    # dealing with mixed matrices is application specific. You take care of that yourself!
00317     #    raise NexusError('Matrices must be of same datatype')
00318     combined.charlabels=None
00319     combined.statelabels=None
00320     combined.interleave=False
00321     combined.translate=None
00322 
00323     # rename taxon sets and character sets and name them with prefix
00324     for cn,cs in combined.charsets.iteritems():
00325         combined.charsets['%s.%s' % (name,cn)]=cs
00326         del combined.charsets[cn]
00327     for tn,ts in combined.taxsets.iteritems():
00328         combined.taxsets['%s.%s' % (name,tn)]=ts
00329         del combined.taxsets[tn]
00330     # previous partitions usually don't make much sense in combined matrix
00331     # just initiate one new partition parted by single matrices
00332     combined.charpartitions={'combined':{name:range(combined.nchar)}}
00333     for n,m in matrices[1:]:    # add all other matrices
00334         both=[t for t in combined.taxlabels if t in m.taxlabels]
00335         combined_only=[t for t in combined.taxlabels if t not in both]
00336         m_only=[t for t in m.taxlabels if t not in both]
00337         for t in both:
00338             # concatenate sequences and unify gap and missing character symbols
00339             combined.matrix[t]+=Seq(m.matrix[t].tostring().replace(m.gap,combined.gap).replace(m.missing,combined.missing),combined.alphabet)
00340         # replace date of missing taxa with symbol for missing data
00341         for t in combined_only:
00342             combined.matrix[t]+=Seq(combined.missing*m.nchar,combined.alphabet)
00343         for t in m_only:
00344             combined.matrix[t]=Seq(combined.missing*combined.nchar,combined.alphabet)+\
00345                 Seq(m.matrix[t].tostring().replace(m.gap,combined.gap).replace(m.missing,combined.missing),combined.alphabet)
00346         combined.taxlabels.extend(m_only)    # new taxon list
00347         for cn,cs in m.charsets.iteritems(): # adjust character sets for new matrix
00348             combined.charsets['%s.%s' % (n,cn)]=[x+combined.nchar for x in cs]
00349         if m.taxsets:
00350             if not combined.taxsets:
00351                 combined.taxsets={}
00352             # update taxon sets
00353             combined.taxsets.update(dict(('%s.%s' % (n,tn),ts) \
00354                                          for tn,ts in m.taxsets.iteritems()))
00355         # update new charpartition
00356         combined.charpartitions['combined'][n]=range(combined.nchar,combined.nchar+m.nchar)
00357         # update charlabels
00358         if m.charlabels:
00359             if not combined.charlabels:
00360                 combined.charlabels={}
00361             combined.charlabels.update(dict((combined.nchar+i,label) \
00362                                             for (i,label) in m.charlabels.iteritems()))
00363         combined.nchar+=m.nchar # update nchar and ntax
00364         combined.ntax+=len(m_only)
00365         
00366     # some prefer partitions, some charsets:
00367     # make separate charset for ecah initial dataset
00368     for c in combined.charpartitions['combined']:
00369         combined.charsets[c]=combined.charpartitions['combined'][c]
00370 
00371     return combined

Here is the call graph for this function:

def Bio.Nexus.Nexus.get_start_end (   sequence,
  skiplist = ['-' 
)
Return position of first and last character which is not in skiplist.

Skiplist defaults to ['-','?']).

Definition at line 222 of file Nexus.py.

00222 
00223 def get_start_end(sequence, skiplist=['-','?']):
00224     """Return position of first and last character which is not in skiplist.
00225 
00226     Skiplist defaults to ['-','?'])."""
00227 
00228     length=len(sequence)
00229     if length==0:
00230         return None,None
00231     end=length-1
00232     while end>=0 and (sequence[end] in skiplist):
00233         end-=1
00234     start=0
00235     while start<length and (sequence[start] in skiplist):
00236         start+=1
00237     if start==length and end==-1: # empty sequence
00238         return -1,-1
00239     else:
00240         return start,end 
    

Here is the caller graph for this function:

Remove quotes and/or double quotes around identifiers.

Definition at line 214 of file Nexus.py.

00214 
00215 def quotestrip(word):
00216     """Remove quotes and/or double quotes around identifiers."""
00217     if not word:
00218         return None
00219     while (word.startswith("'") and word.endswith("'")) or (word.startswith('"') and word.endswith('"')):
00220         word=word[1:-1]
00221     return word

Here is the caller graph for this function:

def Bio.Nexus.Nexus.safename (   name,
  mrbayes = False 
)
Return a taxon identifier according to NEXUS standard.

Wrap quotes around names with punctuation or whitespace, and double
single quotes.

mrbayes=True: write names without quotes, whitespace or punctuation
for the mrbayes software package.

Definition at line 196 of file Nexus.py.

00196 
00197 def safename(name,mrbayes=False):
00198     """Return a taxon identifier according to NEXUS standard.
00199 
00200     Wrap quotes around names with punctuation or whitespace, and double
00201     single quotes.
00202 
00203     mrbayes=True: write names without quotes, whitespace or punctuation
00204     for the mrbayes software package.
00205     """
00206     if mrbayes:
00207         safe=name.replace(' ','_')
00208         safe=''.join([c for c in safe if c in MRBAYESSAFE])
00209     else:
00210         safe=name.replace("'","''")
00211         if set(safe).intersection(set(WHITESPACE+PUNCTUATION)):
00212             safe="'"+safe+"'"
00213     return safe

Here is the caller graph for this function:


Variable Documentation

string Bio.Nexus.Nexus.CHARSET = 'chars'

Definition at line 33 of file Nexus.py.

string Bio.Nexus.Nexus.CODONPOSITIONS = 'codonpositions'

Definition at line 35 of file Nexus.py.

string Bio.Nexus.Nexus.DEFAULTNEXUS = '#NEXUS\nbegin data; dimensions ntax=0 nchar=0; format datatype=dna; end; '

Definition at line 36 of file Nexus.py.

Definition at line 24 of file Nexus.py.

list Bio.Nexus.Nexus.KNOWN_NEXUS_BLOCKS = ['trees','data', 'characters', 'taxa', 'sets','codons']

Definition at line 27 of file Nexus.py.

string Bio.Nexus.Nexus.MRBAYESSAFE = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_'

Definition at line 29 of file Nexus.py.

string Bio.Nexus.Nexus.PUNCTUATION = '()[]{}/\,;:=*\'"`+-<>'

Definition at line 28 of file Nexus.py.

Initial value:
00001 ['charstatelabels','charlabels','taxlabels', 'taxset', 'charset','charpartition','taxpartition',\
00002         'matrix','tree', 'utree','translate','codonposset','title']

Definition at line 25 of file Nexus.py.

Definition at line 32 of file Nexus.py.

string Bio.Nexus.Nexus.TAXSET = 'taxa'

Definition at line 34 of file Nexus.py.

string Bio.Nexus.Nexus.WHITESPACE = ' \t\n'

Definition at line 30 of file Nexus.py.