Back to index

python-biopython  1.60
Public Member Functions | Public Attributes | Static Public Attributes
Bio.PDB.PDBList.PDBList Class Reference
Collaboration diagram for Bio.PDB.PDBList.PDBList:
Collaboration graph
[legend]

List of all members.

Public Member Functions

def __init__
def get_status_list
def get_recent_changes
def get_all_entries
def get_all_obsolete
def retrieve_pdb_file
def update_pdb
def download_entire_pdb
def download_obsolete_entries
def get_seqres_file

Public Attributes

 pdb_server
 local_pdb
 obsolete_pdb
 overwrite
 flat_tree

Static Public Attributes

string PDB_REF
string alternative_download_url = "http://www.rcsb.org/pdb/files/"

Detailed Description

This class provides quick access to the structure lists on the
PDB server or its mirrors. The structure lists contain
four-letter PDB codes, indicating that structures are
new, have been modified or are obsolete. The lists are released
on a weekly basis.

It also provides a function to retrieve PDB files from the server.
To use it properly, prepare a directory /pdb or the like,
where PDB files are stored.

If You want to use this module from inside a proxy, add
the proxy variable to Your environment, e.g. in Unix
export HTTP_PROXY='http://realproxy.charite.de:888'    
(This can also be added to ~/.bashrc)

Definition at line 34 of file PDBList.py.


Constructor & Destructor Documentation

def Bio.PDB.PDBList.PDBList.__init__ (   self,
  server = 'ftp://ftp.wwpdb.org',
  pdb = os.getcwd(),
  obsolete_pdb = None 
)
Initialize the class with the default server or a custom one.

Definition at line 62 of file PDBList.py.

00062 
00063     def __init__(self,server='ftp://ftp.wwpdb.org', pdb=os.getcwd(), obsolete_pdb=None):
00064         """Initialize the class with the default server or a custom one."""
00065         # remote pdb server
00066         self.pdb_server = server
00067 
00068         # local pdb file tree
00069         self.local_pdb = pdb
00070 
00071         # local file tree for obsolete pdb files
00072         if obsolete_pdb:
00073             self.obsolete_pdb = obsolete_pdb
00074         else:
00075             self.obsolete_pdb = os.path.join(self.local_pdb, 'obsolete')
00076             if not os.access(self.obsolete_pdb,os.F_OK):
00077                 os.makedirs(self.obsolete_pdb)
00078 
00079         # variables for command-line options
00080         self.overwrite = 0
00081         self.flat_tree = 0
00082 


Member Function Documentation

def Bio.PDB.PDBList.PDBList.download_entire_pdb (   self,
  listfile = None 
)
Retrieve all PDB entries not present in the local PDB copy.

Writes a list file containing all PDB codes (optional, if listfile is
given).

Definition at line 299 of file PDBList.py.

00299 
00300     def download_entire_pdb(self, listfile=None):
00301         """Retrieve all PDB entries not present in the local PDB copy.
00302 
00303         Writes a list file containing all PDB codes (optional, if listfile is
00304         given).
00305         """ 
00306         entries = self.get_all_entries()
00307         for pdb_code in entries:
00308             self.retrieve_pdb_file(pdb_code)
00309         # Write the list
00310         if listfile:
00311             outfile = open(listfile, 'w')
00312             outfile.writelines((x+'\n' for x in entries))
00313             outfile.close()

Here is the call graph for this function:

def Bio.PDB.PDBList.PDBList.download_obsolete_entries (   self,
  listfile = None 
)
Retrieve all obsolete PDB entries not present in the local obsolete
PDB copy.

Writes a list file containing all PDB codes (optional, if listfile is
given).

Definition at line 314 of file PDBList.py.

00314 
00315     def download_obsolete_entries(self, listfile=None):
00316         """Retrieve all obsolete PDB entries not present in the local obsolete
00317         PDB copy.
00318 
00319         Writes a list file containing all PDB codes (optional, if listfile is
00320         given).
00321         """ 
00322         entries = self.get_all_obsolete()
00323         for pdb_code in entries:
00324             self.retrieve_pdb_file(pdb_code, obsolete=1)
00325 
00326         # Write the list
00327         if listfile:
00328             outfile = open(listfile, 'w')
00329             outfile.writelines((x+'\n' for x in entries))
00330             outfile.close()

Here is the call graph for this function:

Retrieves a big file containing all the 
PDB entries and some annotation to them. 
Returns a list of PDB codes in the index file.

Definition at line 125 of file PDBList.py.

00125 
00126     def get_all_entries(self):
00127         """Retrieves a big file containing all the 
00128         PDB entries and some annotation to them. 
00129         Returns a list of PDB codes in the index file.
00130         """
00131         print "retrieving index file. Takes about 5 MB."
00132         url = _urlopen(self.pdb_server +
00133                        '/pub/pdb/derived_data/index/entries.idx')
00134         return [line[:4] for line in url.readlines()[2:] if len(line) > 4]

Here is the caller graph for this function:

Returns a list of all obsolete entries ever in the PDB.

Returns a list of all obsolete pdb codes that have ever been
in the PDB.

Gets and parses the file from the PDB server in the format
(the first pdb_code column is the one used). The file looks
like this:

 LIST OF OBSOLETE COORDINATE ENTRIES AND SUCCESSORS
OBSLTE    31-JUL-94 116L     216L
...
OBSLTE    29-JAN-96 1HFT     2HFT
OBSLTE    21-SEP-06 1HFV     2J5X
OBSLTE    21-NOV-03 1HG6     
OBSLTE    18-JUL-84 1HHB     2HHB 3HHB 
OBSLTE    08-NOV-96 1HID     2HID
OBSLTE    01-APR-97 1HIU     2HIU
OBSLTE    14-JAN-04 1HKE     1UUZ
...

Definition at line 135 of file PDBList.py.

00135 
00136     def get_all_obsolete(self):
00137         """Returns a list of all obsolete entries ever in the PDB.
00138 
00139         Returns a list of all obsolete pdb codes that have ever been
00140         in the PDB.
00141         
00142         Gets and parses the file from the PDB server in the format
00143         (the first pdb_code column is the one used). The file looks
00144         like this:
00145 
00146          LIST OF OBSOLETE COORDINATE ENTRIES AND SUCCESSORS
00147         OBSLTE    31-JUL-94 116L     216L
00148         ...
00149         OBSLTE    29-JAN-96 1HFT     2HFT
00150         OBSLTE    21-SEP-06 1HFV     2J5X
00151         OBSLTE    21-NOV-03 1HG6     
00152         OBSLTE    18-JUL-84 1HHB     2HHB 3HHB 
00153         OBSLTE    08-NOV-96 1HID     2HID
00154         OBSLTE    01-APR-97 1HIU     2HIU
00155         OBSLTE    14-JAN-04 1HKE     1UUZ
00156         ...
00157 
00158         """
00159         handle = _urlopen(self.pdb_server +
00160                           '/pub/pdb/data/status/obsolete.dat')
00161         # Extract pdb codes. Could use a list comprehension, but I want
00162         # to include an assert to check for mis-reading the data.
00163         obsolete = []
00164         for line in handle:
00165             if not line.startswith("OBSLTE ") : continue
00166             pdb = line.split()[2]
00167             assert len(pdb)==4
00168             obsolete.append(pdb)
00169         handle.close()
00170         return obsolete

Here is the call graph for this function:

Here is the caller graph for this function:

Returns three lists of the newest weekly files (added,mod,obsolete).

Reads the directories with changed entries from the PDB server and
returns a tuple of three URL's to the files of new, modified and
obsolete entries from the most recent list. The directory with the
largest numerical name is used.
Returns None if something goes wrong.

Contents of the data/status dir (20031013 would be used);
drwxrwxr-x   2 1002     sysadmin     512 Oct  6 18:28 20031006
drwxrwxr-x   2 1002     sysadmin     512 Oct 14 02:14 20031013
-rw-r--r--   1 1002     sysadmin    1327 Mar 12  2001 README

Definition at line 100 of file PDBList.py.

00100 
00101     def get_recent_changes(self):
00102         """Returns three lists of the newest weekly files (added,mod,obsolete).
00103         
00104         Reads the directories with changed entries from the PDB server and
00105         returns a tuple of three URL's to the files of new, modified and
00106         obsolete entries from the most recent list. The directory with the
00107         largest numerical name is used.
00108         Returns None if something goes wrong.
00109         
00110         Contents of the data/status dir (20031013 would be used);
00111         drwxrwxr-x   2 1002     sysadmin     512 Oct  6 18:28 20031006
00112         drwxrwxr-x   2 1002     sysadmin     512 Oct 14 02:14 20031013
00113         -rw-r--r--   1 1002     sysadmin    1327 Mar 12  2001 README
00114         """     
00115         url = _urlopen(self.pdb_server + '/pub/pdb/data/status/')
00116         recent = filter(str.isdigit,
00117                         (x.split()[-1] for x in url.readlines())
00118                         )[-1]
00119         path = self.pdb_server+'/pub/pdb/data/status/%s/'%(recent)
00120         # Retrieve the lists
00121         added = self.get_status_list(path+'added.pdb')
00122         modified = self.get_status_list(path+'modified.pdb')
00123         obsolete = self.get_status_list(path+'obsolete.pdb')
00124         return [added,modified,obsolete]

Here is the call graph for this function:

Here is the caller graph for this function:

def Bio.PDB.PDBList.PDBList.get_seqres_file (   self,
  savefile = 'pdb_seqres.txt' 
)
Retrieves a (big) file containing all the sequences of PDB entries
and writes it to a file.

Definition at line 331 of file PDBList.py.

00331 
00332     def get_seqres_file(self,savefile='pdb_seqres.txt'):
00333         """Retrieves a (big) file containing all the sequences of PDB entries
00334         and writes it to a file.
00335         """
00336         print "retrieving sequence file. Takes about 15 MB."
00337         handle = _urlopen(self.pdb_server + 
00338                           '/pub/pdb/derived_data/pdb_seqres.txt')
00339         lines = handle.readlines()
00340         outfile = open(savefile, 'w')
00341         outfile.writelines(lines)
00342         outfile.close()
00343         handle.close()
00344 

Here is the call graph for this function:

def Bio.PDB.PDBList.PDBList.get_status_list (   self,
  url 
)
Retrieves a list of pdb codes in the weekly pdb status file
from the given URL. Used by get_recent_files.

Typical contents of the list files parsed by this method is now
very simply one PDB name per line.

Definition at line 83 of file PDBList.py.

00083 
00084     def get_status_list(self,url):
00085         """Retrieves a list of pdb codes in the weekly pdb status file
00086         from the given URL. Used by get_recent_files.
00087         
00088         Typical contents of the list files parsed by this method is now
00089         very simply one PDB name per line.
00090         """
00091         handle = _urlopen(url)
00092         answer = []
00093         for line in handle:
00094             pdb = line.strip()
00095             assert len(pdb)==4
00096             answer.append(pdb)
00097         handle.close()
00098         return answer
00099 

Here is the caller graph for this function:

def Bio.PDB.PDBList.PDBList.retrieve_pdb_file (   self,
  pdb_code,
  obsolete = 0,
  compression = None,
  uncompress = None,
  pdir = None 
)
Retrieves a PDB structure file from the PDB server and
stores it in a local file tree.
The PDB structure is returned as a single string.
If obsolete==1, the file will be saved in a special file tree.
If uncompress is specified, a system utility will decompress the .gz
archive. Otherwise, Python gzip utility will handle it.
compression does nothing, as all archives are already in .gz format

@param pdir: put the file in this directory (default: create a PDB-style directory tree) 
@type pdir: string

@return: filename
@rtype: string

Definition at line 172 of file PDBList.py.

00172 
00173             uncompress=None, pdir=None):
00174         """ Retrieves a PDB structure file from the PDB server and
00175         stores it in a local file tree.
00176         The PDB structure is returned as a single string.
00177         If obsolete==1, the file will be saved in a special file tree.
00178         If uncompress is specified, a system utility will decompress the .gz
00179         archive. Otherwise, Python gzip utility will handle it.
00180         compression does nothing, as all archives are already in .gz format
00181 
00182         @param pdir: put the file in this directory (default: create a PDB-style directory tree) 
00183         @type pdir: string
00184 
00185         @return: filename
00186         @rtype: string
00187         """
00188         # Alert the user about deprecated parameters
00189         if compression is not None:
00190             warnings.warn("PDB file servers now only host .gz archives: "
00191                     "the compression parameter will not do anything"
00192                     , BiopythonDeprecationWarning)
00193         if uncompress is not None:
00194             warnings.warn("Decompression is handled with the gzip module: "
00195                     "the uncompression parameter will not do anything"
00196                     , BiopythonDeprecationWarning)
00197 
00198         # Get the structure
00199         code=pdb_code.lower()
00200         filename="pdb%s.ent.gz"%code
00201         if not obsolete:
00202             url=(self.pdb_server+
00203                  '/pub/pdb/data/structures/divided/pdb/%s/pdb%s.ent.gz'
00204                  % (code[1:3],code))
00205         else:
00206             url=(self.pdb_server+
00207                  '/pub/pdb/data/structures/obsolete/pdb/%s/pdb%s.ent.gz'
00208                  % (code[1:3],code))
00209             
00210         # In which dir to put the pdb file?
00211         if pdir is None:
00212             if self.flat_tree:
00213                 if not obsolete:
00214                     path=self.local_pdb
00215                 else:
00216                     path=self.obsolete_pdb
00217             else:
00218                 # Put in PDB-style directory tree
00219                 if not obsolete:
00220                     path=os.path.join(self.local_pdb, code[1:3])
00221                 else:
00222                     path=os.path.join(self.obsolete_pdb,code[1:3])
00223         else:
00224             # Put in specified directory
00225             path=pdir
00226             
00227         if not os.access(path,os.F_OK):
00228             os.makedirs(path)
00229             
00230         filename=os.path.join(path, filename)
00231         # the final uncompressed file
00232         final_file=os.path.join(path, "pdb%s.ent" % code)
00233 
00234         # Skip download if the file already exists
00235         if not self.overwrite:
00236             if os.path.exists(final_file):
00237                 print "Structure exists: '%s' " % final_file
00238                 return final_file
00239 
00240         # Retrieve the file
00241         print "Downloading PDB structure '%s'..." % pdb_code
00242         lines = _urlopen(url).read()
00243         open(filename,'wb').write(lines)
00244 
00245         # Uncompress the file
00246         gz = gzip.open(filename, 'rb')
00247         out = open(final_file, 'wb')
00248         out.writelines(gz.read())
00249         gz.close()
00250         out.close()
00251         os.remove(filename)
00252 
00253         return final_file
00254             

Here is the call graph for this function:

Here is the caller graph for this function:

I guess this is the 'most wanted' function from this module.
It gets the weekly lists of new and modified pdb entries and
automatically downloads the according PDB files.
You can call this module as a weekly cronjob.

Definition at line 255 of file PDBList.py.

00255 
00256     def update_pdb(self):
00257         """
00258         I guess this is the 'most wanted' function from this module.
00259         It gets the weekly lists of new and modified pdb entries and
00260         automatically downloads the according PDB files.
00261         You can call this module as a weekly cronjob.
00262         """
00263         assert os.path.isdir(self.local_pdb)
00264         assert os.path.isdir(self.obsolete_pdb)
00265         
00266         new, modified, obsolete = self.get_recent_changes()
00267         
00268         for pdb_code in new+modified:
00269             try:
00270                 self.retrieve_pdb_file(pdb_code)
00271             except Exception:
00272                 print 'error %s\n' % pdb_code
00273                 # you can insert here some more log notes that
00274                 # something has gone wrong.            
00275 
00276         # Move the obsolete files to a special folder
00277         for pdb_code in obsolete:
00278             if self.flat_tree:
00279                 old_file = os.path.join(self.local_pdb,
00280                                         'pdb%s.ent' % pdb_code)
00281                 new_dir = self.obsolete_pdb             
00282             else:
00283                 old_file = os.path.join(self.local_pdb, pdb_code[1:3],
00284                                         'pdb%s.ent' % pdb_code)
00285                 new_dir = os.path.join(self.obsolete_pdb, pdb_code[1:3])
00286             new_file = os.path.join(new_dir, 'pdb%s.ent' % pdb_code)
00287             if os.path.isfile(old_file):
00288                 if not os.path.isdir(new_dir):
00289                     os.mkdir(new_dir)
00290                 try:
00291                     shutil.move(old_file, new_file)
00292                 except Exception:
00293                     print "Could not move %s to obsolete folder" % old_file
00294             elif os.path.isfile(new_file):
00295                 print "Obsolete file %s already moved" % old_file
00296             else:
00297                 print "Obsolete file %s is missing" % old_file
00298 

Here is the call graph for this function:


Member Data Documentation

string Bio.PDB.PDBList.PDBList.alternative_download_url = "http://www.rcsb.org/pdb/files/" [static]

Definition at line 59 of file PDBList.py.

Definition at line 80 of file PDBList.py.

Definition at line 68 of file PDBList.py.

Definition at line 72 of file PDBList.py.

Definition at line 79 of file PDBList.py.

Initial value:
"""
The Protein Data Bank: a computer-based archival file for macromolecular structures.
F.C.Bernstein, T.F.Koetzle, G.J.B.Williams, E.F.Meyer Jr, M.D.Brice, J.R.Rodgers, O.Kennard, T.Shimanouchi, M.Tasumi
J. Mol. Biol. 112 pp. 535-542 (1977)
http://www.pdb.org/.
"""

Definition at line 52 of file PDBList.py.

Definition at line 65 of file PDBList.py.


The documentation for this class was generated from the following file: