Back to index

nordugrid-arc-nox  1.1.0~rc6
xmltree.py
Go to the documentation of this file.
00001 """ The XMLTree class provides a way to convert from XML to native python structures and vica versa.
00002 
00003 Examples
00004 --------
00005 
00006 if you have an XMLNode:
00007 
00008 >>> x = arc.XMLNode('<soap-env:Envelope xmlns:hash="urn:hash" \
00009 xmlns:soap-enc="http://schemas.xmlsoap.org/soap/encoding/" \
00010 xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/" \
00011 xmlns:xsd="http://www.w3.org/2001/XMLSchema" \
00012 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\
00013 <soap-env:Body>\
00014 <hash:get>\
00015 <hash:IDs>\
00016 <hash:ID>0</hash:ID>\
00017 <hash:ID>1</hash:ID>\
00018 <hash:ID>2</hash:ID>\
00019 </hash:IDs>\
00020 </hash:get>\
00021 </soap-env:Body>\
00022 </soap-env:Envelope>')
00023 
00024 you can convert it to an XMLTree:
00025 
00026 >>> t = XMLTree(x)
00027 >>> t.get()
00028 [('soap-env:Envelope',
00029   [('soap-env:Body',
00030       [('hash:get',
00031             [('hash:IDs',
00032                     [('hash:ID', '0'), ('hash:ID', '1'), ('hash:ID', '2')])])])])]
00033 
00034 you can specify a path:
00035 
00036 >>> t.get('/soap-env:Envelope/soap-env:Body/hash:get')
00037 [('hash:get', [('hash:IDs', [('hash:ID', '0'), ('hash:ID', '1'), ('hash:ID', '2')])])]
00038 
00039 this is not an XPath, it is just a plain path
00040 empty tagname matches to everything, so these are the same as the previous example:
00041 
00042 >>> t.get('/soap-env:Envelope//hash:get')
00043 [('hash:get', [('hash:IDs', [('hash:ID', '0'), ('hash:ID', '1'), ('hash:ID', '2')])])]
00044 
00045 >>> t.get('///hash:get')
00046 [('hash:get', [('hash:IDs', [('hash:ID', '0'), ('hash:ID', '1'), ('hash:ID', '2')])])]
00047 
00048 there are some other query methods, e.g. get_value and get_values:
00049 
00050 >>> t.get('/////hash:ID')
00051 [('hash:ID', '0'), ('hash:ID', '1'), ('hash:ID', '2')]
00052 
00053 >>> t.get_value('/////hash:ID')
00054 '0'
00055 
00056 >>> t.get_values('/////hash:ID')
00057 ['0', '1', '2']
00058 
00059 if you have an XML with key-value pairs, e.g.:
00060 
00061 >>> t = XMLTree(from_string = '<root><object><key1>value1</key1><key2>value2</key2></object>\
00062     <object><key1>value3</key1><key2>value4</key2></object></root>')
00063 >>> t.get()
00064 [('root',
00065   [('object', [('key1', 'value1'), ('key2', 'value2')]),
00066      ('object', [('key1', 'value3'), ('key2', 'value4')])])]
00067 
00068 now you can use the get_dict and get_dicts methods:
00069 
00070 >>> t.get('/root/object')
00071 [('object', [('key1', 'value1'), ('key2', 'value2')]),
00072 ('object', [('key1', 'value3'), ('key2', 'value4')])]
00073 
00074 >>> t.get_dict('/root/object')
00075 {'key1': 'value1', 'key2': 'value2'}
00076 
00077 >>> t.get_dicts('/root/object')
00078 [{'key1': 'value1', 'key2': 'value2'}, {'key1': 'value3', 'key2': 'value4'}]
00079 
00080 you can specify the needed keys, and rename them:
00081 
00082 >>> t.get_dicts('/root/object', {'key1':'new name'})
00083 [{'new name': 'value1'}, {'new name': 'value3'}]
00084 
00085 
00086 you can specify default value with get_value:
00087 
00088 >>> t.get_value('///key1','default value')
00089 'value1'
00090 >>> t.get_value('///key3','default value')
00091 'default value'
00092 
00093 you can add an XMLTree to an XMLNode with the add_to_node method:
00094 
00095 >>> x = XMLNode('<start/>')
00096 >>> x.GetXML()
00097 '<start/>'
00098 >>> t.get('/root/object')
00099 [('object', [('key1', 'value1'), ('key2', 'value2')]),
00100  ('object', [('key1', 'value3'), ('key2', 'value4')])]
00101 >>> t.add_to_node(x,'/root/object')
00102 >>> x.GetXML()
00103 '<start><object><key1>value1</key1><key2>value2</key2></object>\
00104     <object><key1>value3</key1><key2>value4</key2></object></start>'
00105 
00106 you can create an XMLTree from the tree structure:
00107 
00108 >>> t2 = XMLTree(from_tree = ('object', [('key1', 'value5'), ('key2', 'value6')]))
00109 >>> t2.get()
00110 [('object', [('key1', 'value5'), ('key2', 'value6')])]
00111 
00112 or you can add a new subtree to an XMLTree:
00113 
00114 >>> t2.add_tree(('key3','valuex'),'/object')
00115 >>> t2.get()
00116 [('object', [('key1', 'value5'), ('key2', 'value6'), ('key3', 'valuex')])]
00117 
00118 this will actually add it to the first node which matches the path, e.g.:
00119 
00120 >>> t.get('/root/object')
00121 [('object', [('key1', 'value1'), ('key2', 'value2')]),
00122  ('object', [('key1', 'value3'), ('key2', 'value4')])]
00123 
00124 >>> t.add_tree(('key3','valuex'),'/root/object')
00125 >>> t.get()
00126 [('root',
00127   [('object', [('key1', 'value1'), ('key2', 'value2'), ('key3', 'valuex')]),
00128      ('object', [('key1', 'value3'), ('key2', 'value4')])])]
00129 
00130 you can create list of subtrees with the get_trees method:
00131 
00132 >>> t.get_trees('/root/object')
00133 [<hash.xmltree.XMLTree instance at 0x17a6300>,
00134  <hash.xmltree.XMLTree instance at 0x17a6558>]
00135 
00136 the str() method gives a string representation of an XMLTree:
00137 
00138 >>> [str(i) for i in t.get_trees('/root/object')]
00139  ["('object', [('key1', 'value1'), ('key2', 'value2'), ('key3', 'valuex')])",
00140   "('object', [('key1', 'value3'), ('key2', 'value4')])"]
00141 
00142 finally, you can create complex XML structures easily with XMLTree:
00143 (this example is from the 'get' method of the ahash.AHashService class,
00144 the 'resp' is a list of (ID, object) pairs,
00145 where 'object' is a list of (section, property, value) tuples)
00146 
00147 # create the 'getResponse' node and its child called 'objects'
00148 response_node = out.NewChild('hash:getResponse')
00149 # create an XMLTree from the results
00150 tree = XMLTree(from_tree = 
00151     ('hash:objects',
00152         [('hash:object', # for each object
00153             [('hash:ID', ID),
00154             ('hash:lines',
00155                 [('hash:line', # for each line in the object
00156                     [('hash:section', section),
00157                     ('hash:property', property),
00158                     ('hash:value', value)]
00159                 ) for (section, property, value) in lines]
00160             )]
00161         ) for (ID, lines) in resp]
00162     ))
00163 print tree
00164 # convert to tree to XML via adding it to the 'getResponse' node
00165 tree.add_to_node(response_node)
00166 
00167 this generates an XML like this:
00168 
00169 <hash:getResponse>
00170     <hash:objects>
00171         <hash:object>
00172             <hash:ID>0</hash:ID>
00173                 <hash:lines>
00174                     <hash:line>
00175                         <hash:section>1</hash:section>
00176                         <hash:property>2</hash:property>
00177                         <hash:value>3</hash:value>
00178                     </hash:line>
00179                     <hash:line>
00180                         <hash:section>a</hash:section>
00181                         <hash:property>b</hash:property>
00182                         <hash:value>c</hash:value>
00183                     </hash:line>
00184                     <hash:line>
00185                         <hash:section>su</hash:section>
00186                         <hash:property>bi</hash:property>
00187                         <hash:value>du</hash:value>
00188                 </hash:line>
00189             </hash:lines>
00190         </hash:object>
00191     </hash:objects>
00192 </hash:getResponse>
00193 """
00194 
00195 class XMLTree:
00196     def __init__(self, from_node = None, from_string = '', from_tree = None, rewrite = {}, forget_namespace = False, xmlnode_class = None):
00197         """ Constructor of the XMLTree class
00198 
00199         XMLTree(from_node = None, from_string = '', from_tree = None, rewrite = {}, forget_namespace = False)
00200 
00201         'from_tree' could be tree structure or an XMLTree object
00202         'from_string' could be an XML string
00203         'from_node' could be an XMLNode
00204         'rewrite' is a dictionary, if an XML node has a name which is a key in this dictionary,
00205             then it will be renamed as the value of that key
00206         'forget_namespace' is a boolean, if it is true, the XMLTree will not contain the namespace prefixes
00207 
00208         'from_tree' has the highest priority, if it is not None,
00209             then the other two is ignored.
00210         If 'from_tree' is None but from_string is given, then from_node is ignored.
00211         If only 'from_node' is given, then it will be the choosen one.
00212         In this case you may simply use:
00213             tree = XMLTree(node)
00214         """
00215         if from_tree:
00216             # if a tree structure is given, set the internal variable with it
00217             # if this is an XMLTree object, get just the data from it
00218             if isinstance(from_tree,XMLTree):
00219                 self._data = from_tree._data
00220             else: 
00221                 self._data = from_tree
00222         else:
00223             if from_node:
00224                 # if no from_tree is given, and we have an XMLNode, just save it
00225                 x = from_node
00226             else:
00227                 # if no from_tree and from_node is given, try to parse the string
00228                 if not xmlnode_class:
00229                     from arc import XMLNode
00230                     xmlnode_class = XMLNode
00231                 x = xmlnode_class(from_string)
00232             # set the internal tree structure to (<name of the root node>, <rest of the document>)
00233             # where <rest of the document> is a list of the child nodes of the root node
00234             self._data = (self._getname(x, rewrite, forget_namespace), self._dump(x, rewrite, forget_namespace))
00235 
00236     def _getname(self, node, rewrite = {}, forget_namespace = False):
00237         # gets the name of an XMLNode, with namespace prefix if it has one
00238         if not forget_namespace and node.Prefix():
00239             name = node.FullName()
00240         else: # and without namespace prefix if it has no prefix
00241             name = node.Name()
00242         return rewrite.get(name,name)
00243 
00244     def _dump(self, node, rewrite = {}, forget_namespace = False):
00245         # recursive method for converting an XMLNode to XMLTree structure
00246         size = node.Size() # get the number of children of the node
00247         if size == 0: # if it has no child, get the string
00248             return str(node)
00249         children = [] # collect the children
00250         for i in range(size):
00251             children.append(node.Child(i))
00252         # call itself recursively for each children
00253         return [(self._getname(n, rewrite, forget_namespace), self._dump(n, rewrite, forget_namespace)) for n in children ]
00254 
00255     def add_to_node(self, node, path = None):
00256         """ Adding a tree structure to an XMLNode.
00257 
00258         add_to_node(node, path = None)
00259         
00260         'node' is the XMLNode we want to add to
00261         'path' selects the part of the XMLTree we want to add
00262         """
00263         # selects the part we want
00264         data = self.get(path)
00265         # call the recursive helping method
00266         self._add_to_node(data, node)
00267 
00268     def _add_to_node(self, data, node):
00269         # recursively add the tree structure to the node
00270         for element in data:
00271             # we want to avoid empty tags in XML
00272             if element[0]:
00273                 # for each child in the tree create a child in the XMLNode
00274                 child_node = node.NewChild(element[0])
00275                 # if the node has children:
00276                 if isinstance(element[1],list):
00277                     self._add_to_node(element[1], child_node)
00278                 else: # if it has no child, create a string from it
00279                     child_node.Set(str(element[1]))
00280 
00281     def pretty_xml(self, indent = ' ', path = None, prefix = ''):
00282         data = self.get(path)
00283         return self._pretty_xml(data, indent, level = 0, prefix = prefix)
00284 
00285     def _pretty_xml(self, data, indent, level, prefix ):
00286         out = []
00287         for element in data:
00288             if element[0]:
00289                 if isinstance(element[1], list):
00290                     out.append(
00291                         prefix + indent * level + '<%s>\n' % element[0] +
00292                             self._pretty_xml(element[1], indent, level+1, prefix) + '\n' +
00293                         prefix + indent * level +'</%s>' % element[0]
00294                     )
00295                 else:
00296                     out.append(prefix + indent * level + '<%s>%s</%s>' % (element[0], element[1], element[0]))
00297         return '\n'.join(out)
00298             
00299 
00300     def __str__(self):
00301         return str(self._data)
00302 
00303     def _traverse(self, path, data):
00304         # helping function for recursively traverse the tree
00305         # 'path' is a list of the node names, e.g. ['root','key1']
00306         # 'data' is the data of a tree-node,
00307         # e.g. ('root', [('key1', 'value'), ('key2', 'value')])
00308         # if the first element of the path and the name of the node is equal
00309         #   or if the element of the path is empty, it matches all node names
00310         # if not, then we have no match here, return an empty list
00311         if path[0] != data[0] and path[0] != '':
00312             return []
00313         # if there are no more path-elements, then we are done
00314         # we've just found what we looking for
00315         if len(path) == 1:
00316             return [data]
00317         # if there are more path-elements, but this is a string node
00318         # then no luck, we cannot proceed, return an empty list
00319         if isinstance(data[1],str):
00320             return []
00321         # if there are more path-elements, and this node has children
00322         ret = []
00323         for d in data[1]:
00324             # let's recurively ask all child if they have any matches
00325             # and collect the matches
00326             ret.extend( self._traverse(path[1:], d) )
00327         # return the matches
00328         return ret
00329 
00330     def get(self, path = None):
00331         """ Returns the parts of the XMLTree which match the path.
00332 
00333         get(path = None)
00334 
00335         if 'path' is not given, it defaults to the root node
00336         """
00337         if path: # if path is given
00338             # if it is not starts with a slash
00339             if not path.startswith('/'):
00340                 raise Exception, 'invalid path (%s)' % path
00341             # remove the starting slash
00342             path = path[1:]
00343             # split the path to a list of strings
00344             path = path.split('/')
00345         else: # if path is not given
00346             # set it to the root node
00347             path = [self._data[0]]
00348         # gets the parts which are selected by this path
00349         return self._traverse(path, self._data)
00350 
00351     def get_trees(self, path = None):
00352         """ Returns XMLTree object for each subtree which match the path.
00353 
00354         get_tress(path = None)
00355         """
00356         # get the parts match the path and convert them to XMLTree
00357         return [XMLTree(from_tree = t) for t in self.get(path)]
00358 
00359     def get_value(self, path = None, *args):
00360         """ Returns the value of the selected part.
00361 
00362         get_value(path = None, [default])
00363 
00364         Returns the value of the node first matched the path.
00365         This is one level deeper than the value returned by the 'get' method.
00366         If there is no such node, and a default is given,
00367         it will return the default.
00368         """
00369         try:
00370             # use the get method then get the value of the first result
00371             return self.get(path)[0][1]
00372         except:
00373             # there was an error
00374             if args: # if any more argumentum is given
00375                 # the first will be the default
00376                 return args[0]
00377             raise
00378 
00379     def add_tree(self, tree, path = None):
00380         """ Add a new subtree to a path.
00381 
00382         add_tree(tree, path = None)
00383         """
00384         # if this is a real XMLTree object, get just the data from it
00385         if isinstance(tree,XMLTree):
00386             tree = tree._data
00387         # get the first node selected by the path and append the new subtree to it
00388         self.get(path)[0][1].append(tree)
00389     
00390     def get_values(self, path = None):
00391         """ Get all the values selected by a path.
00392 
00393         get_values(path = None)
00394 
00395         Like get_value but gets all values not just the first
00396         This has no default value.
00397         """
00398         try:
00399             # get just the value of each node
00400             return [d[1] for d in self.get(path)]
00401         except:
00402             return []
00403 
00404     def _dict(self, value, keys):
00405         # helper method for changing keys
00406         if keys:
00407             # if keys is given use only the keys which is in it
00408             # and translete them to new keys (the values of the 'keys' dictionary)
00409             return dict([(keys[k],v) for (k,v) in value if k in keys.keys()])
00410         else: # if keys is empty, use all the data
00411             return dict(value)
00412     
00413     def get_dict(self, path = None, keys = {}):
00414         """ Returns a dictionary from the first node the path matches.
00415 
00416         get_dict(path, keys = {})
00417 
00418         'keys' is a dictionary which filters and translate the keys
00419             e.g. if keys is {'hash:line':'line'}, it will only return
00420             the 'hash:line' nodes, and will call them 'line'
00421         """
00422         return self._dict(self.get_value(path,[]),keys)
00423 
00424     def get_dicts(self, path = None, keys = {}):
00425         """ Returns a list of dictionaries from all the nodes the path matches.
00426 
00427         get_dicts(path, keys = {})
00428 
00429         'keys' is a dictionary which filters and translate the keys
00430             e.g. if keys is {'hash:line':'line'}, it will only return
00431             the 'hash:line' nodes, and will call them 'line'
00432         """
00433         return [self._dict(v,keys) for v in self.get_values(path)]