Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday July 31, 2009
04:21 AM

Why I Don't Like YAML

[ #39383 ]

Why don't I like YAML? Well, have you read the spec? It's so awful that most YAML parsers disagreed about what YAML was. If I recall correctly, it was Why's libsyck which effectively set the default YAML standard. That doesn't mean that having a standard is a good thing, though (Prolog has an ISO standard which is universally ignored because it breaks Prolog). For example, do you know what this is?

file: Yes

Let's see what the three most popular Perl modules do with this.

#!/usr/bin/env perl

use strict;
use warnings;

use YAML;
use YAML::Syck;
use YAML::Tiny;
use Data::Dumper::Names;

my $yaml_string = <<END;
file1: Yes
file2: off

my $yaml = YAML::Load($yaml_string);
my $syck = YAML::Syck::Load($yaml_string);
my $tiny = YAML::Tiny->read_string($yaml_string);
print Dumper $yaml, $syck, $tiny;

And they all output:

$yaml = {
          'file2' => 'off',
          'file1' => 'Yes'
$syck = {
          'file2' => 'off',
          'file1' => 'Yes'
$tiny = bless( [
                   'file2' => 'off',
                   'file1' => 'Yes'
               ], 'YAML::Tiny' );

It's a shame they do that because those are not strings, they're boolean values. In short, the most popular YAML parser all ignore the spec. I won't fault YAML::Tiny because it's supposed to be a subset of YAML. I also won't fault YAML::Syck because it's a wrapper around libsyck., on the other hand ...

So big deal. Who cares if we're violating the spec? This Ruby programmer does. The parser he's using follows the spec, but the Perl and Python generators don't properly quote the boolean. And why should they have to build in a special case for yet another string? And if you read the spec for the booleans in YAML, it's almost case sensitive, but not quite. "False", "FALSE" and "false" are all false, but "FalSe" is a string, which ironically would be interpreted as true in Perl.

Just try and read the spec. You probably won't finish it. And do you want to see a grammar for YAML? Apparently there's one hidden in the spec and you can install Schwern's Greasemonkey script to see it.

Oh, and if you really want to have fun, trying playing around with anchors and see if you can make 'em recursive and see how various parsers fail to handle it. I don't want a mega-spec. I don't want something which has all sorts of special meanings which different implementations fail on that I have to keep track of. I don't want the One True Way which would be able to serialize everything if only there were parsers to handle it.

JSON. One Page. Done. Any questions?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • JSON is much simpler, but to be fair it only supports a subset of what YAML does. For example it's not good for serialization, because you don't have any way to specify type names.

    Anyway, JSON being much simpler is why I wrote a JSON parser and generator for Perl 6 (with some help from Johan Viklund), see []. As far as I can tell it implements JSON to 100%.

    • The fact that JSON only supports a subset of what YAML does is why I really like it. I know YAML is powerful and feature-rich, but JSON is much more predictable.

    • Somehow I don’t think that trying to be a serialisation format – for several languages with significant differences – and human-readable and -writeable, all at the same time, is an argument in favour of a format.

    • I had this argument with Ingy way back when he was coming up with YAML. He made it over complex (IMHO), resulting in a spec which made XML easier to implement than YAML (32 pages of spec vs nearly 200), which makes no sense since YAML was supposed to be EASIER to read and write than XML.

      It's a bit more human readable than XML, but not much, and edge cases like this are going to make it a LOT harder to debug an issue with YAML than with XML.

  • Until your post, I'd never discovered the part of the specification that isn't shown in the actual specification that describes those special strings, so I'd never known to implement them.

    2 hours later, it's now fixed. YAML::Tiny will always quote the strings listed in those secondary type specifications.

    BTW, it's not case insensitive.

    It's apparently ( $str eq uc($str) or $str eq lc($str) or $str eq ucfirst($str) )

    Still ugh though.

    • I'll be you'll break a lot of code with this. I've previously worked with a module (can't recall, but I think it was SOAP related) which was guessing my data types and silently converting things for me. No end of headaches :(

      I didn't report this to you because I assumed you had deliberately kept things simple :)

      • The YAML-Tiny language specification does not contain support for these strings, and so the PARSER half of YAML::Tiny (and thus Parse::CPAN::Meta) will not support it.

        However, in emitting YAML it's quite appropriate that I make sure to avoid causing problems for others.

        So all I'm doing is detecting that the emitted string in a hash or array is one of these magic strings, and then escaping them (when I previously wasn't).

  • For document interchange stuff I use XML and for data interchange stuff I use JSON. That works for me the majority of the time.

  • As a user (i.e. not a parser writer) I still very much prefer YAML over JSON most of the time due to it being more readable.

    But yeah, unfortunately many of the earlier Perl YAML parsers do not behave in a standard (YAML-ish) way, not to mention used to crash a lot too.

    For YAML::Syck, there's $YAML::Syck::ImplicitTyping = 1; which IIRC was turned on by default in some previous release but then Audrey reverted it back to 0 to avoid compatibility break.

    How about everybody just use YAML::XS, then?

  • The boolean documentation you link to is from the 1.1 spec. Its a bit insane to have that many special values. 1.2 took a lawn mower to it [] and now there's just two bool values, "true" and "false".

    YAML::XS gets it right. is being gutted.