Initialize!
I first read from a config file to get needed DB connect & directory parameters. I also read in a pre-stored-procedure and post-stored-procedure optional parameters in case we decide to do some db-side dirty work later.
Getting the XML file
A previous post had a nifty utility for getting files from a website using perl. That was nice, but for this exercise, we’ll need to use java so we can use common logging & classes with other java apps.
Using the DB connection from initializing, i look up the URLs I need to pull. This will prevent me from having to change code later for hard-coded crap like filenames (which are likely to change a lot).
Next – I loop through each file name and store it. The example below shows the actual pull:
https://gist.github.com/730902
Parsing the XML file
Knowing what XML you’re getting each run is going to be a problem. Attributes are added, structures are changed, etc. The XML I’m dealing with doesn’t even have a key structure added in. So the first thing we need to do is to defined the parent/child relationships. If those don’t match what we expect – we should fail or perform some error-handling. Otherwise, we know at least that the data we are loading conforms to our structure.
So XML to DB tables… think of it as this:
https://gist.github.com/730939
Not great, I know. It’s better to have company_fk on the manager/employee tables then to have an org table. A manager & employee can be stored in the same table too. However, since we don’t know what the xml could look like the next time we run this – the structure we create suffices for now. (After this process, I have a stored procedure to get the data into the preferred data format).
So, how do I do this in java?
https://gist.github.com/730985
This site makes more use of the treeWalker methods than I do and it really helped me out:
http://oreilly.com/catalog/jenut2/chapter/ch19.html
Leave a Reply