toma's Journal
http://use.perl.org/~toma/journal/
toma's use Perl Journalen-ususe Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners.2012-01-25T02:38:54+00:00pudgepudge@perl.orgTechnologyhourly11970-01-01T00:00+00:00toma's Journalhttp://use.perl.org/images/topics/useperl.gif
http://use.perl.org/~toma/journal/
Templates for programs and modules
http://use.perl.org/~toma/journal/10095?from=rss
I have released two new perl template generators.
These programs create the basic structure
of a new module or program.
This saves typing and makes it easier
to get started.
<p>
The program for modules is similar to the results
of <code>h2xs -aXn</code>, but it does not create
Makefile.PL or the rest of the nice framework
created by h2xs.
</p><p>
I use my program in conjunction with h2xs, replacing
the file module_name.pm with the output from my program.
</p><ul>
<li> <a href="http://tomacorp.com/perl/template.html">The Program Template</a> </li><li> <a href="http://tomacorp.com/perl/template_pm.html">The Module Template</a> </li></ul><p>
I welcome comments on the code and ideas for enhancements.
</p><p>
I wrote these while procrastinating on developing
a new module. I want to:
</p><ul>
<li>Improve my SQL by practicing on SQLite.</li>
<li>Translate some of the XML structures
that I have been working on into an SQL schema.</li>
<li>Compare the approaches
of using XPath queries on an XML dataset with
a relational database approach.</li>
</ul><p>
My shiny new templates will make this at
least 1% easier<nobr> <wbr></nobr>:-).
</p><p>
I plan on doing schema development using SQLite
perl module, then porting the work to a big-iron
server. I like to minimize my work on the big
machine, except for developing queries. The
nice part about using DBI will be that I can
develop the schema on SQLite, and then port
easily. I'm looking forward to seeing how well
this approach will work!
</p><p>
<i>It should work perfectly the first time! - toma</i></p>toma2003-01-21T07:57:30+00:00journalXML::Twig and XML::Filter::Dispatcher tie in speed
http://use.perl.org/~toma/journal/10009?from=rss
<a href="http://tomacorp.com/perl/xml/saxvstwig.html">
Performance Comparison Between SAX XML::Filter::Dispatcher and XML::Twig
</a> with test data is available.<p>
Previous measurements were thrown out due to
pilot error, and the new results reveal that
the speed of the two modules is
nearly identical in my application.</p><p>
I learned a lot in the process, and I will be
writing more about this topic in the future.</p>toma2003-01-16T08:34:34+00:00journalComparing XML::Twig and XML::Filter::Dispatcher
http://use.perl.org/~toma/journal/9920?from=rss
<b>Comparing Twig and Dispatcher</b> <br>
I rewrote my XML::Twig program
to use XML::Filter::Dispatcher
in order to compare the approaches.
I compared the simplicity of the code
necessary to do the job,
and the speed of execution.<p>
The result was that XML::Twig ran 17 times faster,
which surprised me.</p><p>
The Dispatcher code was cleaner
than the Twig code. This is because I
was able to remove the code I wrote to get
my Twig return values to come out in the
correct order. The order of the data from
Dispatcher worked the way that I had
orgininally hoped that Twig would work.</p><p>
The speed is a big deal for me,
because the Twig code is actually already
slower than I would like it to be. The
Dispatcher code is probably
not fast enough for my application.
I'm tempted to write the code again and
use a format other than XML to see
how fast it runs.</p><p>
It would be nice if I had a program that
would automatically measure the complexity
of a perl program. I would like to be
able to compare the complexity of the
implementations with a numerical technique.</p><p>
If anyone wants to see the two approaches
and the test data, let me know and
I'll post it on <a href="http://tomacorp.com/">
tomacorp (We're not a corporation)</a>.</p><p>
<b>New Module Testing</b> <br>
I installed and tried PerlBean, which looks
useful for automating the generation of
perl objects. Before I use it in a real
project, I need to understand if there is
a way to use it so that the classes can be
redesigned without losing work.
The straightforward way looks like you would
have to edit the class by hand after the
initial run of the module, and if you want
to run it again you would have to cut and
paste the custom methods in again.</p><p>
Perhaps there is a way around this. PerlBean
would make a good core for a perl IDE,
I think.</p><p>
I sent a bug report to the author of
PerlBean. It looks like the tutorial didn't
get an update after an API change.</p>toma2003-01-13T05:12:25+00:00journalA new XML book and more XML modules
http://use.perl.org/~toma/journal/9761?from=rss
<b>XML and Perl</b> <br>
Received and read part of "XML and Perl" (New Riders).
This isn't a book review, leave that to people that
understand the subject better than I do. These
are just my notes!<p>
The book is useful but does not provide much new info for me.
It spells a few things out clearly that are otherwise
hard to figure out, with line-by-line code walkthrough.
</p><p>
Here are a few of the gaps:
</p><ol>
<li>It says that XSLT can be used to transform
XML into CSV and other non-tagged formats,
but the example code just shows the usual
XML to HTML translation.</li><li>The SAX code has a bunch of case statements in the
handlers for which kind of tag is being processed.
I would prefer to see this coded another
way with SAX. I try to avoid big case statements.
</li></ol><p>
I liked the section on XML Schemas,
since I didn't know anything about them.
I can't say that the book helped with today's particular
problem of interest.</p><p>
<b>Trying SAX Modules</b> <br>
Installed Pod::SAX with cpan. This has many dependencies,
including XML::SAX::Writer. Installed okay. Pod documentation
for the functions is missing.
This style of code doesn't look like the kind of code
that I like write.</p><p>
Installed XML::Generator::DBI with cpan.
It failed tests looking for DBD/Pg.pm, there
is some manual configuration that needs to
be done. I didn't pursue this further because
I have no database on this machine. The code
is interesting to read though, both as an example
of using XML::Handler::YAWriter and a nifty
flexible DBI query.</p><p>
<b>Other activity</b> <br>
I installed psh and fooled around with it.
It looks like fun, but possibly dangerous
since I don't know what I'm doing.
My shell needs to be very reliable, eg rm commands.
</p><p>
I installed File::List, and wrote and example
program using File::Flat with it.
I posted this snippet as an answer to a question at
perlmonks.</p>toma2003-01-04T18:41:12+00:00journalXML: SAX and Twig, also TWiki
http://use.perl.org/~toma/journal/9703?from=rss
<b>XML: SAX and Twig</b> <br>
I have been reading about XML::SAX, and it is
starting to make sense. I am concerned about
memory usage as compared to XML::Twig. I need
to process an XML file that is at least 10MB
and possibly as large as 100MB. I would like
to limit the RAM usage to less than 256MB,
although 512MB might be okay.
I have a hard limit of 2GB of RAM,
since I am using 32 bit perl.
It looks like XML::Twig can be set up to work
with SAX, so this might help to solve
the memory problem.<p>
These file sizes are spooky.
I think of XML as
one-big-happy-file-that-describes-a-thing.
Perhaps "the-thing" is too complicated
for a single file.
If so, I will need a new approach. I may need
to learn about namespaces or some other
way to partition a large XML dataset.</p><p>
I thought up a way to eliminate the redundancy
in the XML reader/writer for my flat/lumpy
files. I can have a data structure that
specifies the flat file in XML.
Redundant portions of the XML reader and writer
can be generated from this file.</p><p>
It would be nice if someone had already
written this. There are
many tradeoffs in the design of such a thing,
and I don't want to get bogged down in it.
I will look at some of the SAX drivers for
non-XML data sources.</p><p>
I think removing reader and writer
redundancy will be worthwhile,
since I have at least a dozen and perhaps
thirty of these file formats to translate
to and from XML.
As my buddy Steve says,
"Make things that are the same the same and
things that are different different."</p><p>
<b>Twiki</b> <br>
One of the things I like about
<a href="http://www.perlmonks.org/">PerlMonks</a>
is that I get new ideas that have nothing to do with what
I am working on. Today, for example, I
downloaded, built, and ran TWiki. Suddenly I
<i>get it</i> and I hope that I will be using
TWiki for something that will be useful and yet
disruptive. At work there is a large dataset
of free-text startup content,
which is duct-taped to the side
of an exquisitely normalized database.
This text is the output from an extensive
ongoing collaboration. It looks like a great
opportunity for a wiki.</p><p>
The main challenge will be scalability. I plan
on evaluating this within the next few months.</p><p>
<b>New Modules</b> <br>
I am still trying to get TWiki working for
creating new users. I didn't have any email set
up on the machine where I was running TWiki,
and that seemed to be a problem. I got the email
working, but I still have the same problem.
I rebuilt perl 5.8.0 in the process, and updated
a bunch of modules as recommended by the
results of running <code>r</code>
command in the cpan program.</p>toma2003-01-01T10:10:16+00:00journalFlat file to XML round trip via XML::Writer and XML:Twig
http://use.perl.org/~toma/journal/9634?from=rss
I wrote a translator (described yesterday)
that converts a particular
flat file format to XML, and another from the
XML format back into the flat file.<p>
The flat file has a line-at-a-time format with
the first token on a line determining the type
of the data on the line. The lines are in a
hierarchical data structure, with various
first-position tokens specifying the hierarchy.
</p><p>
I made an object that contained an XML::Writer
object and a hash of anonymous subs, where the
key to the hash is the first-position token.
The code in the anonymous subs parsed the line
of the flatfile, and then send this data to
XML::Writer to create the XML formatted text.
I used four types of calls to XML::Writer:
emptyTag, startTag, endTag, and within_element.</p><p>
The emptyTag calls were easiest. No hierarchy,
just a single tag with parameters.</p><p>
The startTag calls open up a section of hierarchy.
This is also easy.</p><p>
The endTag calls were slightly trickier. My code
could detect where a piece of hierarchy was
supposed to end. To remember what kind of closing
tag is needed, the within_element call detects
if a particular tag has been opened. This
approach wouldn't work for multiple levels of
hierarchy, but this format doesn't have that.
Other tools with different formats may have
this requirement, so this may need to be
revisited someday.</p><p>
Any good translator should make a lossless
round-trip with the data, unlike babelfish.
I used XML::Twig to process the data and recreate
the flat file. I used a hash of TwigHandlers,
which called separate subs for each type of
tag. I noticed that there is symmetry in the
code with the parser and the writer of the
data, particularly in the code that has to
read the flat file and understand the order of
the fields. This same ordering is needed to
take the XML field values and put them into
the flat file. I was not able to take advantage
of this symmetry, so I ended up with code that
I feel could be improved somehow. I also ended
up with the fields being described in the
module documentation, so now I have the order
in three places instead of one. Darn!</p><p>
I used the XPath approach to parse the XML.
I had the problem that the flat file data was
not available until the closing tags were
parsed, so things tended to come out in an order
reminiscent of reverse polish notation. I used
some local variables to store things so that
they could be written out in the correct order
once the closing tag was detected. This is
analogous and possibly symmetric with the endTag
manipulations in the XML writer. Once again,
it will cause problems when deeper hierarchy is
needed and is an opportunity for removal of
redundancy in the code.</p><p>
The biggest challenge in this project was
determining the proper type of calls to use
in XML::Twig. There are many to choose from!
XML::Writer was much easier. This follows
the general principle that it is easier to
transmit than to receive.</p><p>
<b>New Modules and other activities</b> <br>
Installed Spreadsheet::WriteExcel with cpan.<br>
Install okay.<br>
Tried test program from previous version (0.39)
It broke compatibility with gnumeric, so I reported
the problem to jmcnamara with msg on perlmonks.
I hope he fixes it, I really like both WriteExcel
and gnumeric.</p><p>
Installed Math::SnapTo with cpan<br>
Install was okay, except I got an old version
so I reinstalled by hand, which worked fine.<br>
Tried a bunch of test cases, I wouldn't use this
module - it seems to have many bugs.</p><p>
Posted on problems with a new snippet. Noted that
root cause of rounding problems were caused
by typing lots of digits of pi instead of
using 4*atan2(1,1).</p>toma2002-12-26T21:32:57+00:00journalOrder of tags in XML files
http://use.perl.org/~toma/journal/9629?from=rss
One of the things I look for in an application that
stores its data in XML is whether or not the application
fails if the ordering of the tags in the XML is changed.
Some applications that use XML format don't seem
to even look at the values inside the tags, they
just throw out everything between < and > and
instead depend on the order in the file to
determine the meaning of the data. I sure don't
want this to happen to any of my applications!<p>
I've been using XML as a file format to represent
CAD data. I translate from a proprietary format
from a CAD vendor into XML, do something to the
data, then write it back out in the proprietary
format. I want to make sure that if I happen to
reorder the XML tags in this process that I don't
create invalid data when I write it back out,
because the proprietary format has order-dependency.</p><p>
I can imagine a few ways to handle this, and as
usual there is a speed/memory/program-complexity
tradeoff.</p><p>
I am using the most excellent
<a href="http://www.xmltwig.com/">
XML::Twig</a> module, using the
<a href="http://www.xmltwig.com/xmltwig/tutorial/">
online tutorial</a> and the O'Reilly book,
<a href="http://www.oreilly.com/catalog/perlxml/">
Perl & XML</a>. One thing the docs are a little
thin on is examples of using the XPATH capabilities
of XML::Twig. Here is an example that I made:</p><blockquote><div><p> <tt>use XML::Twig;<br> <br>my $fn= 'traces_small.xml';<br> <br>my ($tag, $att, $value) = ('NET','name','/P5C');<br> <br>my @example;<br>push @example, sprintf('TRACES/STFIRST');<br>push @example, sprintf('%s', $tag);<br>push @example, sprintf('%s[@%s]', $tag, $att);<br>push @example, sprintf('%s[@%s="%s"]', $tag, $att, $value);<br> <br>foreach my $pattern (@example)<br>{<br> print "Matching XSLT expression $pattern\n";<br> print "TwigRoots\n";<br> my $xml= new XML::Twig(<br> TwigRoots => {$pattern => 1},<br> error_context => 1,<br> );<br> <br> $xml->set_pretty_print('indented');<br> $xml->parsefile($fn);<br> $xml->print;<br> print "--------------------\n";<br>}<br> <br>foreach my $pattern (@example)<br>{<br> print "Matching XSLT expression $pattern\n";<br> print "start_tag_handlers, original_string\n";<br> my $xml= new XML::Twig(<br> start_tag_handlers => { $pattern =><br> sub<br> {<br> print $_[0]->original_string,"\n"<br> }<br> },<br> error_context => 1,<br> );<br> $xml->parsefile($fn);<br> print "--------------------\n";<br>}</tt></p></div> </blockquote><p>Now I want to make a module to create my CAD
data in a certain order, independent of the
order of my input data.</p><p>
<b>Approach 1</b> <br>
<i>Put tags on the data that say what order they
should be in.</i> <br>
I don't like this approach because I want my
XML format to work for different CAD tools that
have different requirements for the order of
their data. One of the main purposes of the XML
format is to have it be CAD tool independent.
So order properties are right out.</p><p>
<b>Approach 2</b> <br>
<i>One pass per section</i> <br>
In this approach, I would parse the XML file
as many times as needed, each time printing only
the next section of the CAD data. Example:
if it were HTML, I might have one pass for the
header and one pass for the body. This is easy
to code and takes a minimal amount of memory,
but it is CPU intensive.</p><p>
<b>Approach 3</b> <br>
<i>One pass, store data in an array</i> <br>
Here I would store the data into elements of
an array. At the end of parsing the array would
be printed out to the file, and the order of
the elements in the array would take care of
the ordering of the output file. In the
HTML example, $array[0] would hold the header
and $array[1] would hold the body. This approach
is memory intensive, since I have to store all
the CAD data in an array.</p><p>
<b>Approach 4</b> <br>
<i>One pass, store the data in an array of files</i> <br>
This approach is like the array, except each
element of the array is an open file handle, and
the data gets printed to the different files.
At the end of the processing the files are
appended to each other. This approach is file-handle
intensive, and is probably not a good idea when
there are, say, 100 or so file handles open at
a time. This type of approach tends to run into
the kernel parameter for the maximum number of
open files for a process.</p><p>
Since the trend these days is to throw RAM at
problems, I'll try approach 3 first, and perhaps
have an option in my code to use approach 4
or possibly 2.</p>toma2002-12-26T03:09:06+00:00journalMetadata for toma's journal
http://use.perl.org/~toma/journal/9624?from=rss
I've had one person request that I keep a journal,
and I've often wanted to keep one, so I figure
I have two potential readers so far.
<p>
I've often posted on perlmonks, but I haven't
revealed much of what I actually do with perl.
So a journal should be handy. I tend to write
journal fragments throughout my Mandrake 8.2 system,
recording my adventures in coding. I typically
investigate about five perl modules per month,
and one major open-source package. I'm going
to take a shot at recording my adventures here.
</p><p>
Other activity can be found on
<a href="http://tomacorp.com/">tomacorp</a>
(We're not a corporation), which is mine.</p><p>
At work I use perl quite a bit. I'm back in
the CAD business, trying to make life better
and more productive for a few hundred electrical
engineers. I have switched between designing
hardware and software over the past twenty-some
years, and I have recently switched back to software.
</p><p>
At home I use perl as a hobby and I have also
been teaching it as a high school course to a
very small class.
</p>toma2002-12-25T07:27:27+00:00journal