The program for modules is similar to the results
of h2xs -aXn, but it does not create
Makefile.PL or the rest of the nice framework
created by h2xs.
I use my program in conjunction with h2xs, replacing the file module_name.pm with the output from my program.
I welcome comments on the code and ideas for enhancements.
I wrote these while procrastinating on developing a new module. I want to:
My shiny new templates will make this at
least 1% easier
I plan on doing schema development using SQLite perl module, then porting the work to a big-iron server. I like to minimize my work on the big machine, except for developing queries. The nice part about using DBI will be that I can develop the schema on SQLite, and then port easily. I'm looking forward to seeing how well this approach will work!
It should work perfectly the first time! - toma
Previous measurements were thrown out due to pilot error, and the new results reveal that the speed of the two modules is nearly identical in my application.
I learned a lot in the process, and I will be writing more about this topic in the future.
The result was that XML::Twig ran 17 times faster, which surprised me.
The Dispatcher code was cleaner than the Twig code. This is because I was able to remove the code I wrote to get my Twig return values to come out in the correct order. The order of the data from Dispatcher worked the way that I had orgininally hoped that Twig would work.
The speed is a big deal for me, because the Twig code is actually already slower than I would like it to be. The Dispatcher code is probably not fast enough for my application. I'm tempted to write the code again and use a format other than XML to see how fast it runs.
It would be nice if I had a program that would automatically measure the complexity of a perl program. I would like to be able to compare the complexity of the implementations with a numerical technique.
If anyone wants to see the two approaches and the test data, let me know and I'll post it on tomacorp (We're not a corporation).
New Module Testing
I installed and tried PerlBean, which looks
useful for automating the generation of
perl objects. Before I use it in a real
project, I need to understand if there is
a way to use it so that the classes can be
redesigned without losing work.
The straightforward way looks like you would
have to edit the class by hand after the
initial run of the module, and if you want
to run it again you would have to cut and
paste the custom methods in again.
Perhaps there is a way around this. PerlBean would make a good core for a perl IDE, I think.
I sent a bug report to the author of PerlBean. It looks like the tutorial didn't get an update after an API change.
The book is useful but does not provide much new info for me. It spells a few things out clearly that are otherwise hard to figure out, with line-by-line code walkthrough.
Here are a few of the gaps:
I liked the section on XML Schemas, since I didn't know anything about them. I can't say that the book helped with today's particular problem of interest.
Trying SAX Modules
Installed Pod::SAX with cpan. This has many dependencies,
including XML::SAX::Writer. Installed okay. Pod documentation
for the functions is missing.
This style of code doesn't look like the kind of code
that I like write.
Installed XML::Generator::DBI with cpan. It failed tests looking for DBD/Pg.pm, there is some manual configuration that needs to be done. I didn't pursue this further because I have no database on this machine. The code is interesting to read though, both as an example of using XML::Handler::YAWriter and a nifty flexible DBI query.
Other activity
I installed psh and fooled around with it.
It looks like fun, but possibly dangerous
since I don't know what I'm doing.
My shell needs to be very reliable, eg rm commands.
I installed File::List, and wrote and example program using File::Flat with it. I posted this snippet as an answer to a question at perlmonks.
These file sizes are spooky. I think of XML as one-big-happy-file-that-describes-a-thing. Perhaps "the-thing" is too complicated for a single file. If so, I will need a new approach. I may need to learn about namespaces or some other way to partition a large XML dataset.
I thought up a way to eliminate the redundancy in the XML reader/writer for my flat/lumpy files. I can have a data structure that specifies the flat file in XML. Redundant portions of the XML reader and writer can be generated from this file.
It would be nice if someone had already written this. There are many tradeoffs in the design of such a thing, and I don't want to get bogged down in it. I will look at some of the SAX drivers for non-XML data sources.
I think removing reader and writer redundancy will be worthwhile, since I have at least a dozen and perhaps thirty of these file formats to translate to and from XML. As my buddy Steve says, "Make things that are the same the same and things that are different different."
Twiki
One of the things I like about
PerlMonks
is that I get new ideas that have nothing to do with what
I am working on. Today, for example, I
downloaded, built, and ran TWiki. Suddenly I
get it and I hope that I will be using
TWiki for something that will be useful and yet
disruptive. At work there is a large dataset
of free-text startup content,
which is duct-taped to the side
of an exquisitely normalized database.
This text is the output from an extensive
ongoing collaboration. It looks like a great
opportunity for a wiki.
The main challenge will be scalability. I plan on evaluating this within the next few months.
New Modules
I am still trying to get TWiki working for
creating new users. I didn't have any email set
up on the machine where I was running TWiki,
and that seemed to be a problem. I got the email
working, but I still have the same problem.
I rebuilt perl 5.8.0 in the process, and updated
a bunch of modules as recommended by the
results of running r
command in the cpan program.
The flat file has a line-at-a-time format with the first token on a line determining the type of the data on the line. The lines are in a hierarchical data structure, with various first-position tokens specifying the hierarchy.
I made an object that contained an XML::Writer object and a hash of anonymous subs, where the key to the hash is the first-position token. The code in the anonymous subs parsed the line of the flatfile, and then send this data to XML::Writer to create the XML formatted text. I used four types of calls to XML::Writer: emptyTag, startTag, endTag, and within_element.
The emptyTag calls were easiest. No hierarchy, just a single tag with parameters.
The startTag calls open up a section of hierarchy. This is also easy.
The endTag calls were slightly trickier. My code could detect where a piece of hierarchy was supposed to end. To remember what kind of closing tag is needed, the within_element call detects if a particular tag has been opened. This approach wouldn't work for multiple levels of hierarchy, but this format doesn't have that. Other tools with different formats may have this requirement, so this may need to be revisited someday.
Any good translator should make a lossless round-trip with the data, unlike babelfish. I used XML::Twig to process the data and recreate the flat file. I used a hash of TwigHandlers, which called separate subs for each type of tag. I noticed that there is symmetry in the code with the parser and the writer of the data, particularly in the code that has to read the flat file and understand the order of the fields. This same ordering is needed to take the XML field values and put them into the flat file. I was not able to take advantage of this symmetry, so I ended up with code that I feel could be improved somehow. I also ended up with the fields being described in the module documentation, so now I have the order in three places instead of one. Darn!
I used the XPath approach to parse the XML. I had the problem that the flat file data was not available until the closing tags were parsed, so things tended to come out in an order reminiscent of reverse polish notation. I used some local variables to store things so that they could be written out in the correct order once the closing tag was detected. This is analogous and possibly symmetric with the endTag manipulations in the XML writer. Once again, it will cause problems when deeper hierarchy is needed and is an opportunity for removal of redundancy in the code.
The biggest challenge in this project was determining the proper type of calls to use in XML::Twig. There are many to choose from! XML::Writer was much easier. This follows the general principle that it is easier to transmit than to receive.
New Modules and other activities
Installed Spreadsheet::WriteExcel with cpan.
Install okay.
Tried test program from previous version (0.39)
It broke compatibility with gnumeric, so I reported
the problem to jmcnamara with msg on perlmonks.
I hope he fixes it, I really like both WriteExcel
and gnumeric.
Installed Math::SnapTo with cpan
Install was okay, except I got an old version
so I reinstalled by hand, which worked fine.
Tried a bunch of test cases, I wouldn't use this
module - it seems to have many bugs.
Posted on problems with a new snippet. Noted that root cause of rounding problems were caused by typing lots of digits of pi instead of using 4*atan2(1,1).
I've been using XML as a file format to represent CAD data. I translate from a proprietary format from a CAD vendor into XML, do something to the data, then write it back out in the proprietary format. I want to make sure that if I happen to reorder the XML tags in this process that I don't create invalid data when I write it back out, because the proprietary format has order-dependency.
I can imagine a few ways to handle this, and as usual there is a speed/memory/program-complexity tradeoff.
I am using the most excellent XML::Twig module, using the online tutorial and the O'Reilly book, Perl & XML. One thing the docs are a little thin on is examples of using the XPATH capabilities of XML::Twig. Here is an example that I made:
use XML::Twig;
my $fn= 'traces_small.xml';
my ($tag, $att, $value) = ('NET','name','/P5C');
my @example;
push @example, sprintf('TRACES/STFIRST');
push @example, sprintf('%s', $tag);
push @example, sprintf('%s[@%s]', $tag, $att);
push @example, sprintf('%s[@%s="%s"]', $tag, $att, $value);
foreach my $pattern (@example)
{
print "Matching XSLT expression $pattern\n";
print "TwigRoots\n";
my $xml= new XML::Twig(
TwigRoots => {$pattern => 1},
error_context => 1,
);
$xml->set_pretty_print('indented');
$xml->parsefile($fn);
$xml->print;
print "--------------------\n";
}
foreach my $pattern (@example)
{
print "Matching XSLT expression $pattern\n";
print "start_tag_handlers, original_string\n";
my $xml= new XML::Twig(
start_tag_handlers => { $pattern =>
sub
{
print $_[0]->original_string,"\n"
}
},
error_context => 1,
);
$xml->parsefile($fn);
print "--------------------\n";
}
Now I want to make a module to create my CAD data in a certain order, independent of the order of my input data.
Approach 1
Put tags on the data that say what order they
should be in.
I don't like this approach because I want my
XML format to work for different CAD tools that
have different requirements for the order of
their data. One of the main purposes of the XML
format is to have it be CAD tool independent.
So order properties are right out.
Approach 2
One pass per section
In this approach, I would parse the XML file
as many times as needed, each time printing only
the next section of the CAD data. Example:
if it were HTML, I might have one pass for the
header and one pass for the body. This is easy
to code and takes a minimal amount of memory,
but it is CPU intensive.
Approach 3
One pass, store data in an array
Here I would store the data into elements of
an array. At the end of parsing the array would
be printed out to the file, and the order of
the elements in the array would take care of
the ordering of the output file. In the
HTML example, $array[0] would hold the header
and $array[1] would hold the body. This approach
is memory intensive, since I have to store all
the CAD data in an array.
Approach 4
One pass, store the data in an array of files
This approach is like the array, except each
element of the array is an open file handle, and
the data gets printed to the different files.
At the end of the processing the files are
appended to each other. This approach is file-handle
intensive, and is probably not a good idea when
there are, say, 100 or so file handles open at
a time. This type of approach tends to run into
the kernel parameter for the maximum number of
open files for a process.
Since the trend these days is to throw RAM at problems, I'll try approach 3 first, and perhaps have an option in my code to use approach 4 or possibly 2.
I've often posted on perlmonks, but I haven't revealed much of what I actually do with perl. So a journal should be handy. I tend to write journal fragments throughout my Mandrake 8.2 system, recording my adventures in coding. I typically investigate about five perl modules per month, and one major open-source package. I'm going to take a shot at recording my adventures here.
Other activity can be found on tomacorp (We're not a corporation), which is mine.
At work I use perl quite a bit. I'm back in the CAD business, trying to make life better and more productive for a few hundred electrical engineers. I have switched between designing hardware and software over the past twenty-some years, and I have recently switched back to software.
At home I use perl as a hobby and I have also been teaching it as a high school course to a very small class.