The Bratislava PM web site has now an RSS feed. This feed is currently generated from a custom made YAML file that's transformed to RSS thanks to XML::RSS. This approach is simple and quite flexible but has some quirks.
First, it's almost impossible to verify that the format of the YAML file is following the default template without writing our own validation. For instance, if a feed entry is missing the title, the link or the date there's no built-in mechanism to inform us of this errors.
Secondly, the main content of each feed element is allowed to have HTML. In fact, all feed items that we have include HTML. Mixing HTML inside of a YAML file doesn't make the input file too nice since it has now two markup languages. Of course, one can argue that YAML Ain't Markup Language (tm), nevertheless it is weird to embed HTML in YAML.
Finally, converting YAML to XML seems strange. YAML is mainly used to provide data structures, configuration files or data serialization. Using it for content manipulation might be pushing it too far.
For this particular context XML seems more appropriated. Some of its advantages are that it's possible to validate through a DTD, an XML Schema or RELAX NG. HTML and XML can coexist without problems, specially if XHTML is used. And transforming an XML file into another an RSS feed can be easily done through XSLT.
Using XML as the input file provided some interesting advantages. First, thanks to a DTD not only can we validate each feed entry in the input file, but we can also validate the HTML that's embedded in the feed's description.
By using some clever XML and DTD hacks it's possible to create a custom made feed that can be validated without too much effort. Let's we assume that an RSS feed contains an events and that each event has:
The following DTD describes and validates a feed input file:
<!ELEMENT ba:events (ba:event*)>
<!ATTLIST ba:events
xmlns:ba CDATA #FIXED "http://bratislava.pm.org/dtd/events-1.0.dtd"
>
<!ELEMENT ba:event (ba:title, ba:link, ba:description, ba:subject, ba:creator, ba:date, ba:id)>
<!ELEMENT ba:title (#PCDATA)>
<!ELEMENT ba:link (#PCDATA)>
<!ELEMENT ba:description (#PCDATA)>
<!ELEMENT ba:subject (#PCDATA)>
<!ELEMENT ba:creator (#PCDATA)>
<!ELEMENT ba:date (#PCDATA)>
<!ELEMENT ba:id (#PCDATA)>
Although this DTD can be used for simple feed elements it has a problem: it doesn't allow any HTML inside the element ba:description! Does defeating the purpose of replacing YAML by XML. But all is not lost as this can be easily fixed by importing the XHTML DTD within our DTD and by redefining the element ba:description in order to accept any HTML tag that a div accepts:
<!-- Import that XHTML DTD -->
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
<!ELEMENT ba:events (ba:event*)>
<!ATTLIST ba:events
xmlns:ba CDATA #FIXED "http://bratislava.pm.org/dtd/events-1.0.dtd"
>
<!ELEMENT ba:event (ba:title, ba:link, ba:description, ba:subject, ba:creator, ba:date, ba:id)>
<!ELEMENT ba:title (#PCDATA)>
<!ELEMENT ba:link (#PCDATA)>
<!ELEMENT ba:subject (#PCDATA)>
<!ELEMENT ba:creator (#PCDATA)>
<!ELEMENT ba:date (#PCDATA)>
<!ELEMENT ba:id (#PCDATA)>
<!-- The definiton of 'ba:description' is the same as a 'div' -->
<!ELEMENT ba:description %Flow;>
<!ATTLIST ba:description
xmlns CDATA #FIXED "http://www.w3.org/1999/xhtml"
>
Thanks to this new DTD the element ba:description can include any HTML element that's allowed within a div element. The DTD will make the validation and will ensure that valid HTML is inside the element. For instance, adding the element body to the element ba:description will be rejected by the DTD even though it's a valid HTML element it's not allowed to be within a div.
The element ba:description is declared in our DTD the same way that the element div is in the XHTML DTD. Furthermore, the element is allowed to set the default namespace to XHTML. Thus, making all child elements of ba:description to belong to the XHTML elements, this is very handy when processing the XML file latter on.
It's is not difficult to see that the new version of the feed will be generated from an XML file as using XML is quite advantageous here.