The flat file has a line-at-a-time format with the first token on a line determining the type of the data on the line. The lines are in a hierarchical data structure, with various first-position tokens specifying the hierarchy.
I made an object that contained an XML::Writer object and a hash of anonymous subs, where the key to the hash is the first-position token. The code in the anonymous subs parsed the line of the flatfile, and then send this data to XML::Writer to create the XML formatted text. I used four types of calls to XML::Writer: emptyTag, startTag, endTag, and within_element.
The emptyTag calls were easiest. No hierarchy, just a single tag with parameters.
The startTag calls open up a section of hierarchy. This is also easy.
The endTag calls were slightly trickier. My code could detect where a piece of hierarchy was supposed to end. To remember what kind of closing tag is needed, the within_element call detects if a particular tag has been opened. This approach wouldn't work for multiple levels of hierarchy, but this format doesn't have that. Other tools with different formats may have this requirement, so this may need to be revisited someday.
Any good translator should make a lossless round-trip with the data, unlike babelfish. I used XML::Twig to process the data and recreate the flat file. I used a hash of TwigHandlers, which called separate subs for each type of tag. I noticed that there is symmetry in the code with the parser and the writer of the data, particularly in the code that has to read the flat file and understand the order of the fields. This same ordering is needed to take the XML field values and put them into the flat file. I was not able to take advantage of this symmetry, so I ended up with code that I feel could be improved somehow. I also ended up with the fields being described in the module documentation, so now I have the order in three places instead of one. Darn!
I used the XPath approach to parse the XML. I had the problem that the flat file data was not available until the closing tags were parsed, so things tended to come out in an order reminiscent of reverse polish notation. I used some local variables to store things so that they could be written out in the correct order once the closing tag was detected. This is analogous and possibly symmetric with the endTag manipulations in the XML writer. Once again, it will cause problems when deeper hierarchy is needed and is an opportunity for removal of redundancy in the code.
The biggest challenge in this project was determining the proper type of calls to use in XML::Twig. There are many to choose from! XML::Writer was much easier. This follows the general principle that it is easier to transmit than to receive.
New Modules and other activities
Installed Spreadsheet::WriteExcel with cpan.
Install okay.
Tried test program from previous version (0.39)
It broke compatibility with gnumeric, so I reported
the problem to jmcnamara with msg on perlmonks.
I hope he fixes it, I really like both WriteExcel
and gnumeric.
Installed Math::SnapTo with cpan
Install was okay, except I got an old version
so I reinstalled by hand, which worked fine.
Tried a bunch of test cases, I wouldn't use this
module - it seems to have many bugs.
Posted on problems with a new snippet. Noted that root cause of rounding problems were caused by typing lots of digits of pi instead of using 4*atan2(1,1).
SAX (Score:2)
Not that you'd re-write all this now you have
Re:SAX (Score:1)
Perl & XML has a chapter on SAX. I'll start there, and I have ordered the New Riders book. Let me know if you have more recommendations for good SAX tutorials or documentation! Otherwise I'll just start slogging through XML::SAX::Intro and friends.
As I mentioned, I am particularly interested in exploiting the sy
It should work perfectly the first time! - toma
Re:SAX (Score:2)
However there's nothing stopping you unifying some of the code if it's relevant to do that. You could put functions that you would use for both reading and writing in a separate package.
A good example to look at for readers is Pod::SAX. Also check out XML::Generator::DBI. As far as writers go, there's not much detail on them. I tend t