These file sizes are spooky. I think of XML as one-big-happy-file-that-describes-a-thing. Perhaps "the-thing" is too complicated for a single file. If so, I will need a new approach. I may need to learn about namespaces or some other way to partition a large XML dataset.
I thought up a way to eliminate the redundancy in the XML reader/writer for my flat/lumpy files. I can have a data structure that specifies the flat file in XML. Redundant portions of the XML reader and writer can be generated from this file.
It would be nice if someone had already written this. There are many tradeoffs in the design of such a thing, and I don't want to get bogged down in it. I will look at some of the SAX drivers for non-XML data sources.
I think removing reader and writer redundancy will be worthwhile, since I have at least a dozen and perhaps thirty of these file formats to translate to and from XML. As my buddy Steve says, "Make things that are the same the same and things that are different different."
One of the things I like about PerlMonks is that I get new ideas that have nothing to do with what I am working on. Today, for example, I downloaded, built, and ran TWiki. Suddenly I get it and I hope that I will be using TWiki for something that will be useful and yet disruptive. At work there is a large dataset of free-text startup content, which is duct-taped to the side of an exquisitely normalized database. This text is the output from an extensive ongoing collaboration. It looks like a great opportunity for a wiki.
The main challenge will be scalability. I plan on evaluating this within the next few months.
I am still trying to get TWiki working for creating new users. I didn't have any email set up on the machine where I was running TWiki, and that seemed to be a problem. I got the email working, but I still have the same problem. I rebuilt perl 5.8.0 in the process, and updated a bunch of modules as recommended by the results of running
command in the cpan program.