XMLWalker and XMLSplitter

This tool walks through XML documents and stops at defined xpaths where you can execute your custom code and access the XML structure. One usage of the XMLWalker is the XMLSplitter component which is a memory efficient tool that allows to split huge XML files with low memory usage, whereas a DOM oriented approach (e.g. XSLT) uses around 10-20 times the size of the XML file in memory. Imagine a 10GB XML file, which cannot be handled in DOM fashion by most modern systems currently

Interest

To mark parts of a XML document that are relevant for you we have interests. An interest marks a path where a function given as parameter should be executed

Interest(<path>, <callback function>)

It can be registered to an FastXMLCallbackWalker

Paths

Paths are like xpaths, currently "/" is allowed to mark subnodes and "[]" to mark conditions on the nodes.

FastXMLCallbackWalker

The following code is an example for the usage of the FastXMLCallbackWalker and the registration of a function that triggers on the xpath /a/b:

def printme(**kwargs)
    print(kwargs["element"].tag)

from xml_walker.NodeInterest import Interest
walker = FastXMLCallbackWalker()
walker.register_interest(Interest("/a/b", printme))
walker.walk_tree("obj.xml")

The above code execute on the following file "obj.xml"

<a>
    <b>
        hello
    </b>
    <b>
        world
    </b>
</a>

would print "bb"

The callback function has the following kwargs (key-value arguments):

interest: This is the interest that triggered the function
walker: The walker where the interest is registered
element: The element where the interest matched

XMLWriter

XMLWriter is able to split a XML document into several documents by given paths. The paths are assigned to documents and written into the documents.

writer = XMLWriter()
writer.split("inputfile.xml", "outputfile", {"/*[0]/*[1]":{1,2}, "/*[0]/*[2]":{1,3}})

The above code will lead to three documents: "outputfile1.xml", "outputfile2.xml" "outputfile3.xml". The first document "outputfile1.xml" contains /*[0]/*[1] and /*[0]/*[2] of the source document, "outputfile2.xml" contains /*[0]/*[1] and "outputfile3.xml" *[0]/*[2].

The above code executed on the following file "inputfile.xml" would lead to

<a>
    <b>
        hello
    </b>
    <b>
        world
    </b>
    <b>
        again
    </b>
    <b>
        and again
    </b>
</a>

the following three output files:

outputfile1.xml

<a>
    <b>
        world
    </b>
    <b>
        again
    </b>    
</a>

outputfile2.xml

<a>
    <b>
        world
    </b>
</a>

outputfile3.xml

<a>
    <b>
        again
    </b>
</a>

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
test		test
xml_splitter		xml_splitter
xml_walker		xml_walker
.gitignore		.gitignore
README.md		README.md
relation_genif2.xml		relation_genif2.xml
small_example.xml		small_example.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XMLWalker and XMLSplitter

Interest

Paths

FastXMLCallbackWalker

XMLWriter

About

Releases

Packages

Languages

mschaffenroth/XMLsplit

Folders and files

Latest commit

History

Repository files navigation

XMLWalker and XMLSplitter

Interest

Paths

FastXMLCallbackWalker

XMLWriter

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages