Sunday, May 16, 2010

The Joy of YAML

I had been working so hard on the next release of qore for so long, I had to take a short break, in which I made the new yaml module. The yaml module is currently a very small module in terms of source code, that allows qore data types (except objects) to be serialized and deserialized in YAML format. It uses libyaml to do the real work.

The great thing about YAML is that it is much better suited to representing data in text format than XML because it's much more concise and readable for humans. Additionally, with the addition of one custom YAML tag (!duration), all native Qore types can be serialized and deserialized as YAML with no information loss.

Compared to XML-RPC, YAML supports time zone information and time resolution to the microsecond (actually YAML's !!timestamp type supports arbitrary fractional seconds), and with our custom !duration type, support relative date/time values in an ISO-8601-like format (with the addition that time values may be negative and an additional character to specify microseconds). Of course YAML is much more readable and concise than XML-RPC.

Compared to JSON, YAML is very similar of course, but is extensible and supports more data types out of the box. JSON is missing !!timestamp and !!binary (base-64 encoded binary type). JSON is as consice and readable as YAML (because YAML, at least YAML 1.2, is a superset of JSON).

Take the following Qore data structure (valid with Qore 0.8.0+):
(1, "two", NOTHING, 2010-05-05T15:35:02.100, False, 1970-01-01Z,
(hash(), (), "three \"things\""), P2M3DT10H14u, now_us(),
binary("hello, how's it going, this is a long string, you know XXXXXXXXXXXXXXXXXXXXXXXX"),
("a" : 2.0,
"b" : "hello",
"key" : True))

Here's how the serialization looks:
[1, "two", null, '2010-05-05 15:35:02.1 +02:00', false, 1970-01-01, [{}, [], "three
\"things\""], '0000-02-03 10:00:00.000014 Z', '2010-05-16 13:27:15.859195 +02:00',
!!binary "aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==",
{"a": 2, "b": "hello", "key": true}]

XML-RPC (note the duration is not serialized correctly, time zone and us info is lost):
<struct><member><name>data</name><value><array><data><value><i4>1</i4></value><value><string>two</string></value><value/><value><dateTime.iso8601>20100505T15:35:02</dateTime.iso8601></value><value><boolean>0</boolean></value><value><dateTime.iso8601>19700101T00:00:00</dateTime.iso8601></value><value><array><data><value><struct></struct></value><value><array><data/></array></value><value><string>three "things"</string></value></data></array></value><value><dateTime.iso8601>00000203T10:00:00</dateTime.iso8601></value><value><dateTime.iso8601>20100516T13:31:23</dateTime.iso8601></value><value><base64>aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==</base64></value><value><struct><member><name>a</name><value><double>2.000000</double></value></member><member><name>b</name><value><string>hello</string></value></member><member><name>key</name><value><boolean>1</boolean></value></member></struct></value></data></array></value></member></struct>

JSON (minus the binary data that cannot be serialized, note dates are serialized as strings):
[ 1, "two", null, "2010-05-05 15:35:02.100 Wed +02:00 (CEST)", false, "1970-01-01 00:00:00 Thu Z (UTC)", [ { }, [ ], "three \"things\"" ], "", "2010-05-16 13:32:44.792114 Sun +02:00 (CEST)", { "a" : 2, "b" : "hello", "key" : true } ]

The incredible thing was that I could not find any standard YAML-RPC protocol definition. The closest I could find was a partially-documented protocol called !okay/rpc. So I just implemented a simple YAML-RPC handler and client based on JSON-RPC 1.1 and it works great. I will probably simplify it a bit more to be a little more like !okay/rpc but with a described fault message, then document it and put it online for people to review.

I'm really happy I found YAML; it's conciseness, extensibility, and readability make it a far superior alternative to XML and XML-RPC for data serialization for Qore. In the future I will look at making it possible to serialize and deserialize objects as well - if a class supports writing out its state by using Qore data that can be then passed to the appropriate constructor or other such method on the remote end, it would solve this problem.

Note that Qore's YAML module still is undocumented, but is stable and somewhat tested. IT requires qore 0.8.0+ (still unreleased - only in svn).