Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

runrig (3385)

runrig
  dougwNO@SPAMcpan.org

Just another perl hacker somewhere near Disneyland

I have this homenode [perlmonks.org] of little consequence on Perl Monks [perlmonks.org] that you probably have no interest in whatsoever.

I also have some modules [cpan.org] on CPAN [cpan.org] some of which are marginally [cpan.org] more [cpan.org] useful [cpan.org] than others.

Journal of runrig (3385)

Friday January 18, 2008
07:15 PM

XSLT sort puzzle

[ #35426 ]

Following up on previous XML sorting efforts, where I mostly wanted to sort by tag name, then by NAME attribute, but for some elements, there were other attributes that I wanted to sort by. I first did the tag_name/NAME_attribute sort with XML::Filter::XSLT then sorted by the other odd elements using XML::Filter::Sort.

Out of curiosity, I wanted to try to do it all with XSLT, so I came up with:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

<xsl:output method="xml" version="1.0"
encoding="iso-8859-1" indent="yes" omit-xml-declaration="no"/>

<xsl:template match="/">
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

<xsl:template match="//*">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>

    <xsl:apply-templates select="D">
      <xsl:sort select="@FOO"/>
    </xsl:apply-templates>

    <xsl:apply-templates select="*[not(self::D)]">
      <xsl:sort select="name()"/>
      <xsl:sort select="@NAME"/>
    </xsl:apply-templates>

  </xsl:copy>
</xsl:template>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Which sorts "D" elements by the FOO attribute, and everything else by tag name then NAME attribute. The difference between this and my previous method is that now "D" tags will come before "A" tags (and any others), whereas before they were still sorted by tag name. I think I can get the former two-pass approach behavior in one XSLT document, but for now, it's more trouble than it's worth. It would be easy if you could define a sort callback function.... :-)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • You’re defining two identity copies, one on the root element and one for everything else; why? Next, you write a template for //* which is equivalent to matching * (but I haven’t slept and it’s 6AM, so I might be making a mistake). And finally, you say making the D nodes sort together with all the others would be a lot of work, when actually it’s less work than you’re already doing. Overall:

    <xsl:stylesheet
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     

    • Thanks. Tested and it works. I bow to your superior XSLT-fu (mine needs a lot of work). My only consolation is that I've seen worse code than mine from someone else who was trying to do the same sort of thing (and who wasn't even yet trying to sort by name()). I still find XSLT non-intuitive, e.g., how does it decide which templates to match, does it matter what in what order the templates come, and so why doesn't the first (identity) template above just match and copy everything unchanged?
      • Don’t worry, you’re in good company. Almost no one groks XSLT properly. (This is probably because every single XSLT tutorial is terrible junk.) Most of the transforms I see are seriously suboptimal, and a very large fraction of them are really atrocious. There’s a lot of cargo cult and programming by coincide in the XSLT world. Your attempt was quite decent by those standards. :-)

        Actually, the way XSLT processing happens is very simple, but because of the implicit default templates, it

        • Ugh, even a multitude of previewings did not protect me from submitting the comment with a whole bunch of small grammatical and spelling mistakes. “It’s value?” Argh. My face is melting off. I’m not stupid – I swear!

    • Huh, the order of templates is not supposed to matter, but if I switch the order of the templates above, then it doesn't work anymore. If you change the "*" template to "//*", then it works again. Which probably explains why I had "//*" in my first template. Code:

      #!/usr/bin/perl

      use strict;
      use warnings;

      use XML::LibXML;
      use XML::LibXSLT;

      my $parser = XML::LibXML->new();
      my $xslt = XML::LibXSLT->new();

      my $source = $parser->parse_string(<<'EOT');
      <?xml version="1.0" encoding="UTF-8"?>
      <F

      • Aha...from the docs [w3.org]:

        It is an error if this leaves more than one match. An XSLT processor may signal the error; if it does not signal the error, it must recover by choosing, from amongst the matches that are left, the one that occurs last in the stylesheet.
        • So it would probably be better to put an explicit priority on the "*" template.
          • Or use something more restrictive than node() in the identity transform to avoid matching elements, eliminating the conflict in the first place. The types a node can have are element, text, comment and processing instruction. Matching any node except elements therefore translates to XPath as “text()|comment()|processing-instruction()”. Another, possibly better way to write that (certainly a shorter one) is “node()[not(self::*)]”. (The predicate “[]” constrains the node()