Monday, February 27, 2012

Qore Plus Plus

I've implemented a Qore Pre-Processor (qpp) for writing language bindings - that is the c++ code that binds the internal class and function implementations to the Qore language.

The problem was that the language bindings were complex and error-prone, and, while I could make the language bindings easier to use, I think the qpp solution is better, because with qpp:
  • the language bindings can be abstracted from the actual/current implementation
  • language documentation from doxygen-style comments can be generated directly from the language's source code
  • a great deal of repetitive code can be generated automatically
  • (later) the documentation comments can be incorporated internally to provide additional information in reflection-like internal and external APIs
Currently qpp processes "qpp" files to 2 targets - cpp (the Qore language c++ binding files) and dox.h files (the Doxygen source files).

In the qpp file, function, constant, and classes are defined in a Qore-like syntax (with some additional information for internal tags, functional domains, etc). The bodies of each function or class method are then written in C++ (hence Qore Plus Plus).

For example, here is the qpp implementation of the Dir::path() method:
/! Returns the path of the Dir object or @ref nothing if no path is set
/** This path does not necessarily need to exist; the path is adjusted to remove \c "." and \c ".." from the path if present

@return the path of the Dir object or @ref nothing if no path is set

@par Example:
@code
my *string $mypath = $d.path();
@endcode
*/
*string Dir::path() {
return d->dirname();
}
The current pre-release documentation based on Doxygen for Qore 0.8.4 can be found here: http://qore.org/tmp/qore-0.8.4-svn/

I won't be able to get to many optimizations using qpp in this release, because first I want to clean up the namespace code and some related changes. However, qpp lays the groundwork for making easy infrastructure changes to Qore in the future - we'll be able to implement new solutions in Qore and then apply them globally to all language bindings by extending qpp.

Sunday, November 27, 2011

Qore Optimizations

Work on Qore 0.8.4 is progressing slowly but surely.

Here's a rundown of what's in svn right now - targeted for 0.8.4.

Pseudo-Methods
Qore 0.8.4 will support pseudo-methods; these are methods that can run on any type; basically any value will support a set of methods depending on its type. These pseudo-classes are in a (relatively-flat) hierarchy; the base pseudo-class is "any", then there is a class for each data type that inherits from this class.

Pseudo-methods can be used for convenience and will also support operations that are very inefficient to do the traditional Qore way. For example, to get the first key from a hash, previously one had to make an evaluation like this:

    my *string $firstKey = (keys $hash)[0];

For a large hash, this is terribly inefficient, creating a list of all keys in the hash and then returning the first element of the list. Now, with the new pseudo-methods, it can be done like this:

    my *string $firstKey = $hash.firstKey();

Very efficient and clear. Also, to check the data type, the traditional Qore way would be the following:

    if (type($value) == Type::String) {
}

Whereas the type() function returns a string, and Type::String = "string"; so a string comparison is made. The new way would be to use "any"::typeCode() like this:

    if ($value.typeCode() == NT_STRING) {
}

Whereas NT_STRING is the integer code for a string (3, the same as the C++ NT_STRING constant coincidentally enough), so this is a much more efficient expression.

Currently there are only a handful of pseudo-methods implemented for each type, but more will be implemented before the 0.8.4 release. In particular I plan on adding more date/time methods to get quick information about date/time values.

Of course pseudo-methods are typed and when resolved at parse-time, particularly for integer operations, run-time optimizations are used. In the last example above (using "string"::typeCode()), no memory allocations or will occur; the operands of the boolean comparison operator will be evaluated in an integer context, meaning that internally native integer values are returned (and not, for example, QoreBigIntNode structures), so there are no memory allocations or atomic references made (which could cause SMP cache invalidations, etc). These kinds of operations are very fast and scalable - runtime optimizations made possible because data types were available at parse-time.

And this leads to the second big feature for Qore 0.8.4:

Performance and Memory Optimizations
Qore 0.8.4 will feature major performance and memory optimizations. I've already implemented a big part of this by implementing optimized handling for integer local variables. Local[ly-scoped] variables are already thread-local (and therefore do not require any locking for reads and writes), and now local variables restricted to integers (declared as int or softint) are also stored as native integers. Assignment operations are also evaluated in an integer context; as above, only native C++ integer values are passed around and stored; there are no memory allocations or atomic reference counts.

Additionally, I've started porting all the operators over to Qore's new operator framework (subclasses of QoreOperatorNode - to replace expressions made with QoreTreeNode which will be removed from Qore when the migration is complete), and the operators ported also support optimized integer operations; ie when types are known to be integer at parse-time, the run-time variants of the operators are used that use the optimized integer operations (again without any memory allocations or atomic operations).

This all leads to major performance improvements with integer operations.

I've also added some of the necessary infrastructure for optimizing floating-point operations (support for local variables, optimized operator variants), but have not finished this work yet.

I also have in mind an LLVM back-end for Qore in the future; I plan on adding more optimizations and propagating additional information about the code during parsing which will be used when generating compiled code for further optimizations. By that I mean not just type restrictions (which can obviously lead to major optimizations like the above) but also, for example, marking lvalues as constant in certain scopes, enforcing the QC_CONST flag for functions and methods and more. But at the moment that's a ways down the road - I won't be able to get to LLVM integration before the 0.8.4 release for sure.

Saturday, August 27, 2011

Qore 0.8.3

The next release of Qore is coming very soon; the major new feature for this release will be Windows support.

Qore is now capable of being built as a native DLL for Windows (XP and up) - finally without Cygwin (which made for a pretty slow binary actually). This was made possible through the MinGW cross compiling environment (http://mingw-cross-env.nongnu.org/).

The main blocking point for this port (which has been requested many times over the years) was my lack of familiarity with Windows development tools (and my dislike of the Windows command line). This software allowed me to use Linux and OSX to make the Windows port. It turned out to be a lot easier than I expected; the MinGW pthreads library worked perfectly (I only needed to make minor changes as pthread_t is not a pointer); the socket code was fairly easy to update (the main thing I had to do there was update the code checking for errors and then getting the Windows error messages instead of using errno); and then I had to write new code for time zone handling to read zone information from the Windows registry instead of using the zoneinfo DB (as on UNIXes). Also the dlfcn library for Windows (http://code.google.com/p/dlfcn-win32/) allowed for seamless loadable module handling with the same code as on UNIX.

The release is basically ready; I just want to port a few more modules to Windows before I make it public (so far I've got the xml, json, yaml, and uuid modules also working on Windows).

Other than a large number of bug fixes since 0.8.2 and a few minor improvements here and there, the only other feature of note is support for simple conditional parsing based on C/C++ preprocessor-type %define and %ifdef, %ifndef, etc directives.

Current (tentative) release notes:

Saturday, December 25, 2010

Qore 0.8.1 Released

Qore 0.8.1 has just been released with a ton of bugfixes and new features. Major new features are: SQL prepared statement API (currently only supported by the soon-to-be released oracle driver v2.0), a much improved type system, support for class constants and static class variables, and a more standard syntax for declaring function and method return types by allowing the type name to be declared at the beginning of the function or method signature, as in C/C++ or Java, for example.

Additionally, there are new parse options that allow for programming without the "$" and "$." signs for variables, class method calls, and object member references.

This last change hopefully will make a lot of people happy - I had a lot of requests to do away with the "$" signs, and now it's possible. Unfortunately, the code highlighting solutions out there will have to be updated again to handle the new %allow-bare-refs and %new-style parse options. %new-style combines both of the new parse options %allow-bare-refs and %assume-local, the latter meaning that all variables are assumed to have local scope unless declared global.

Here is an example with %new-style:
%new-style

int sub do_something(int p1, string str, *hash h) {
for (int x = 0; x < p1; ++x) {
stderr.printf("error: %s\n", str);
}
return p1 + 2;
}
Backwards compatibility is a priority and has been maintained. We'll see if the decision to allow for this new programming style is a good one; sometimes too much choice can just lead to confusion and therefore is counterproductive. However at least some people are very happy with it.

Wednesday, June 23, 2010

Current Qore Status

Qore 0.8.0 has been released along with all updated modules; modules have in most cases been updated to take advantage new APIs (mostly regarding typing, date/time improvements, and new Datasource/DatasourcePool methods).

There was a slight delay before the 0.8.0 release to improve the type system; now the type system is internally capable of supporting very flexible types, where one or more types are accepted and one or more types are returned, making types such as "nothing or string" possible to implement internally.

However Qore's user type declaration support in the parser and the function and class library were not updated to take advantage of the new flexible typing support, as everything was stable and tested and applying the new support for more flexible typing would delay the release by probably a month or two, but at least the internal changes were in place and are a part of the Qore library's API and ABI.

One of the coolest new developments to make use of Qore's new typing support is the new qt4 module, which allows Qore code to implement sophisticated platform-independent QT4-based GUIs (note that the qt4 module does take advantage of Qore's new flexible type system, allowing NOTHING to be passed for some classes to simulate a NULL pointer, for example).

Here are some things to expect in future releases of Qore: implicit typing and other parser improvements (0.8.0 is a great improvement in this area already), improved execution speed, and SMP scalability. And most interestingly JIT compilation support using the LLVM project. This is the most exciting part of future development of Qore that once again should take the Qore language to another level. I am astounded at what an awesome project LLVM is; how well documented it is, how well supported and active it is, what it can do and what a language designer can do with it. This will take some time, but will be some of the most interesting work to date done with Qore, and the results should be nothing short of amazing.

I wish I could give a timeline for any of these new developments, but I cannot; they will be done as time permits.

Sunday, May 16, 2010

The Joy of YAML

I had been working so hard on the next release of qore for so long, I had to take a short break, in which I made the new yaml module. The yaml module is currently a very small module in terms of source code, that allows qore data types (except objects) to be serialized and deserialized in YAML format. It uses libyaml to do the real work.

The great thing about YAML is that it is much better suited to representing data in text format than XML because it's much more concise and readable for humans. Additionally, with the addition of one custom YAML tag (!duration), all native Qore types can be serialized and deserialized as YAML with no information loss.

Compared to XML-RPC, YAML supports time zone information and time resolution to the microsecond (actually YAML's !!timestamp type supports arbitrary fractional seconds), and with our custom !duration type, support relative date/time values in an ISO-8601-like format (with the addition that time values may be negative and an additional character to specify microseconds). Of course YAML is much more readable and concise than XML-RPC.

Compared to JSON, YAML is very similar of course, but is extensible and supports more data types out of the box. JSON is missing !!timestamp and !!binary (base-64 encoded binary type). JSON is as consice and readable as YAML (because YAML, at least YAML 1.2, is a superset of JSON).

Take the following Qore data structure (valid with Qore 0.8.0+):
(1, "two", NOTHING, 2010-05-05T15:35:02.100, False, 1970-01-01Z,
(hash(), (), "three \"things\""), P2M3DT10H14u, now_us(),
binary("hello, how's it going, this is a long string, you know XXXXXXXXXXXXXXXXXXXXXXXX"),
("a" : 2.0,
"b" : "hello",
"key" : True))


Here's how the serialization looks:
YAML:
[1, "two", null, '2010-05-05 15:35:02.1 +02:00', false, 1970-01-01, [{}, [], "three
\"things\""], '0000-02-03 10:00:00.000014 Z', '2010-05-16 13:27:15.859195 +02:00',
!!binary "aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==",
{"a": 2, "b": "hello", "key": true}]

XML-RPC (note the duration is not serialized correctly, time zone and us info is lost):
<struct><member><name>data</name><value><array><data><value><i4>1</i4></value><value><string>two</string></value><value/><value><dateTime.iso8601>20100505T15:35:02</dateTime.iso8601></value><value><boolean>0</boolean></value><value><dateTime.iso8601>19700101T00:00:00</dateTime.iso8601></value><value><array><data><value><struct></struct></value><value><array><data/></array></value><value><string>three "things"</string></value></data></array></value><value><dateTime.iso8601>00000203T10:00:00</dateTime.iso8601></value><value><dateTime.iso8601>20100516T13:31:23</dateTime.iso8601></value><value><base64>aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==</base64></value><value><struct><member><name>a</name><value><double>2.000000</double></value></member><member><name>b</name><value><string>hello</string></value></member><member><name>key</name><value><boolean>1</boolean></value></member></struct></value></data></array></value></member></struct>

JSON (minus the binary data that cannot be serialized, note dates are serialized as strings):
[ 1, "two", null, "2010-05-05 15:35:02.100 Wed +02:00 (CEST)", false, "1970-01-01 00:00:00 Thu Z (UTC)", [ { }, [ ], "three \"things\"" ], "", "2010-05-16 13:32:44.792114 Sun +02:00 (CEST)", { "a" : 2, "b" : "hello", "key" : true } ]

The incredible thing was that I could not find any standard YAML-RPC protocol definition. The closest I could find was a partially-documented protocol called !okay/rpc. So I just implemented a simple YAML-RPC handler and client based on JSON-RPC 1.1 and it works great. I will probably simplify it a bit more to be a little more like !okay/rpc but with a described fault message, then document it and put it online for people to review.

I'm really happy I found YAML; it's conciseness, extensibility, and readability make it a far superior alternative to XML and XML-RPC for data serialization for Qore. In the future I will look at making it possible to serialize and deserialize objects as well - if a class supports writing out its state by using Qore data that can be then passed to the appropriate constructor or other such method on the remote end, it would solve this problem.

Note that Qore's YAML module still is undocumented, but is stable and somewhat tested. IT requires qore 0.8.0+ (still unreleased - only in svn).

Tuesday, April 20, 2010

Time Zone Support

Time zone support has recently been added to qore in svn; version 0.8.0 will support time zones. Qore uses the system's zoneinfo database (if it can find it) to load in information about daylight savings time and time zone names, etc.

The entire date/time implementation was internally reimplemented for this change, although the old APIs remain for backwards-compatibility. Basically now all absolute date/time values not having an explicit time zone identifier will be assumed to be local time.

During the change I also extended the precision of relative and absolute date/time values to the microsecond (1,000,000th of a second), previously it was only to the millisecond.