Qore Programming Language

Less is More in Enterprise Development

2016-06-19T10:03:00.001-07:00

Enterprise Development of Mission-Critical Functionality
is Complex, Time-Consuming, and Very Rarely Cheap

"Enterprise" development means development for mission-critical IT support processes for large companies. Basically if you are developing software supports a mission-critical process for a large business, then you are doing enterprise development.

IT managers and enterprise architects are under pressure to provide IT solutions with a low total cost of ownership that allow for maximum business flexibility with high quality; in other words, solutions have to be cheap, fast, and good.

In these kinds of situations, the stakes are high, and decision makers are allergic to risk. This is where the old saying, "Nobody ever got fired for buying IBM" comes from - the idea being that it's acceptable to spend more money in the name of reducing risk.

In mission-critical developments for large companies, the timing of the delivery and the functionality take the front seat, and the development is anything but cheap.

The main question I would like to explore is:

How can enterprise development be affordable, fast, and good?

I believe that the only way to do this is to reduce the complexity of the implementation of the solution. Complex solutions are inherently expensive, either in development, in operations, or most commonly both.

This is the crux of the Less is More argument for enterprise development:

less complexity = more chance for success

I would like to propose several ways to reduce the complexity of enterprise solutions:

Use proven off the shelf applications
Establish and enforce process and technical integration standards
Favor solutions based on configurability over coding
Favor solutions that focus on operations rather than development

The first two points are hopefully self-explanatory, so I'll focus on the last two.

Favor solutions based on configurability over coding

This can also be formulated in a way that is unintuitive, namely: a superficially "weaker" solution with simpler development requirements can provide more value to your business than an extremely powerful and complex open-ended solution through a corresponding reduction in complexity.

A solution that favors configuration over development and simplifies the minimal development necessary can significantly reduce complexity and therefore risk vs an open-ended solution with complex development requirements.

Favor solutions that focus on operations rather than development

The idea here is that a solution focused on operations (the long-term view) will have a lower total cost of ownership than a solution that offers a large and complex development environment but little in the way of operational support.

One critical measure of success for any project is how well it actually performs in operations, therefore a focus on operations should directly lead to higher operational quality as well.

Simplicity is Underrated in Enterprise Development

These last two points hold particularly true for the enterprise integration layer, where problems can have customer visibility and can cost a significant amount of money in manual troubleshooting and in extreme cases in lost market share.

To conclude, enterprise development can be done with dramatically less risk by using solutions that reduce the complexity of the development, and by focusing on operations, such solutions can provide long-term value to customers with the foresight to deploy them.

Data Serialization

2016-06-18T07:59:00.005-07:00

A data serialization standard is required any time you need to to represent data externally from a program.

Two of the most popular data serialization standards are XML and JSON, both human-readable data serialization formats, a feature that has undoubtably contributed to their popularity.

XML is used heavily in enterprise contexts, although currently it seems that JSON is gaining in popularity over XML also in this space. XML is a document markup language, and was not originally designed for data serialization. Without a schema such as XSD or external type encoding rules, XML documents do not innately store data type information.

The SOAP protocol is based on XSDs; it is a very complex protocol with a hierarchy of complex specifications. SOAP originally stood for Simple Object Access Protocol, but this acronym is no longer used as of SOAP 1.2. SOAP ticks a lot of boxes that make enterprise architects happy, such as:

Transport independence
Distributed processing model
Design by contract (interface specification separate from the implementation)
Synchronous and asynchronous messaging
Universal standard

In my opinion, there is nothing simple about SOAP. Also it's status as a "universal standard" is also questionable in practice because of incompatibilities between different vendor implementations of SOAP due to the complexity of the specifications.

I once had a conversation with an engineer from a global tier-1 integration software company (whose name will be left unmentioned) who related an anecdote to me how his company's SOAP solution was so good and so standards-compliant that they had recently won a lawsuit where one of their clients alleged incompatibility in their SOAP implementation due to failure to make a critical interface or set of interfaces work and sued them for negligence. According to this engineer, it turned out that they were able to prove that their SOAP implementation was correct in court and won the suit. The fact that the issue had to go to court at all is a testament to the complexity of SOAP; taking pride in having your customer sue you because you couldn't make an interface work (basically your only purpose as an integration ISV) and then winning the suit is another subject better left alone.

One simple extension of XML to support data typing is XML-RPC, which allows for several data types including lists (array) and key-value pairs (struct) to combine the simpler types to make arbitrarily complex data structures. It's straightforward and simple, easy to ready, easy to serialize and deserialize. It doesn't have the list of enterprise features that SOAP does, but to me it looks like a simple protocol designed by a single engineer, whereas SOAP looks like a typical protocol "designed by committee", where good engineering ("keep it simple, stupid") takes a back seat to politics and complexity is not avoided but rather embraced (a great way to destroy a technology or project in my experience!).

Nowadays you see more and more JSON. JSON is an elegant data serialization format that was designed for the purpose it's used for. JSON is human-readable and is the native format for JavaScript, a language that's grown in popularity along with the web.

For example, here is a JSON string:

{
 "firstName": "John",
 "lastName": "Smith",
 "isAlive": true,
 "age": 25,
 "address": {
   "streetAddress": "21 2nd Street",
   "city": "New York",
   "state": "NY",
   "postalCode": "10021-3100"
 }
}

The same string with XML-RPC encoding looks like:

<struct>
  <member>
    <name>firstName</name>
    <value>
      <string>John</string>
    </value>
  </member>
  <member>
    <name>lastName</name>
    <value>
      <string>Smith</string>
    </value>
  </member>
  <member>
    <name>isAlive</name>
    <value>
      <boolean>1</boolean>
    </value>
  </member>
  <member>
    <name>age</name>
    <value>
      <i4>25</i4>
    </value>
  </member>
  <member>
    <name>address</name>
    <value>
      <struct>
        <member>
          <name>streetAddress</name>
          <value>
            <string>21 2nd Street</string>
          </value>
        </member>
        <member>
          <name>city</name>
          <value>
            <string>New York</string>
          </value>
        </member>
        <member>
          <name>state</name>
          <value>
            <string>NY</string>
          </value>
        </member>
        <member>
          <name>postalCode</name>
          <value>
            <string>10021-3100</string>
          </value>
        </member>
      </struct>
    </value>
  </member>
</struct>

Because it was designed for data serialization, JSON is better suited to this task than XML, even with simple extensions for data types like with XML-RPC.

Even better than JSON, but not as popular, is YAML (the subject of another blog post some time ago) . YAML 1.2 is backwards-compatible with JSON, and is also extensible. With YAML you get an elegant, human-readable data serialization standard that allows itself to be extended to support application or platform types.

The above data structure looks as follows with YAML (using block style for formatting):

firstName: "John"
lastName: "Smith"
isAlive: true
age: 25
address:
  streetAddress: "21 2nd Street"
  city: "New York"
  state: "NY"
  postalCode: "10021-3100"

As with JSON, despite the lack of explicit type information, types are unambiguous and are preserved in the YAML string above.

YAML is extremely suitable for data serialization and therefore also for use in data exchange protocols.

Qore supports XML, JSON, and YAML for data serialization, but YAML is the preferred mechanism when possible to use.

Qore's YAML support is provided by the yaml module. Qore implements a YAML-RPC protocol, which is basically JSON-RPC 1.1 but using YAML for data serialization. Additionally, the DataStream protocol is a point-to-point protocol using YAML serialization for sending large volumes of data over HTTP with a small memory footprint.

From an engineering perspective, YAML is simple, elegant, extensible, and does a very important job very well.

While there are compelling arguments to be made for binary data serialization standards in some contexts, when you need an elegant, human-readable data serialization approach, I would highly recommend taking a look at YAML.

The Application as a Meta Programming Language

2016-06-07T05:28:00.002-07:00

Introduction

One of Qore's fundamental design goals is to provide a platform that allows programmers to dramatically increase productivity (reduce development time) while producing the high-quality results.

In a given complex functional domain, one approach for achieving this goal is to implement an application framework whose configuration and execution can also be influenced/governed by embedding appropriately-sandboxed user-defined code in the application itself.

Example Telecommunications Billing Application

Let's assume an application that implements some complex functionality, like a billing system for telecommunications services, for example. With the approach described here, the application allows itself to be extended by supporting embedded and sandboxed user-defined code as attributes of configuration objects which are executed by the application itself in the appropriate context to provide for a very high level of functional customization.

In this hypothetical telecommunications billing system, this user-defined code might by attached to a tariff or discount, for example, and the billing system would then execute the code when calculating fees for consumed services.

The billing system would supply an internal API that could be used by embedded code for monetary calculations and for accessing contextual information, such as the customer's configuration, service usage data, etc. Furthermore, fees and discounts calculated would be subject to taxation, and the calculation and application of taxation rules could similarly support runtime extensions through code attributes.

To allow for the highest level of user productivity and functional flexibility, our billing system should support simple and standard billing actions through (pure, non-code) configuration while allowing for more complex behavior to be defined through the execution of code attributes on appropriate configuration objects.

Vertical Support in Qore

Qore supports this sort of application development through its sandboxed code / logic containers, which allows applications such as the above to be developed that can be extended and customized with user-defined logic also written in Qore. These logic containers can be created and destroyed at runtime, which allows for the behavior of the system to be updated without having to restart the server itself.

"Vertical" in this sense refers to the fact that both the server and the embedded code are written in the same programming language. The nature of a vertical approach means that the programming language itself must feature enough power and flexibility to support the implementation of server applications while also supporting strong sandboxing controls to allow user-defined code to be safely embedded and executed to extend and customize the behavior of the application at runtime.

Specifically, this is implemented by the Program class in Qore. Sandboxing controls are implemented through Parse Options, which are combination of restriction, capability, parsing and/or execution rules that define the sandboxing rules for the logic container.

Custom APIs can be imported into Program objects through methods such as Program::importClass() and Program::importFunction(). Access to system functionality in sandboxed logic containers can be restricted by functional domain through Parse Options, or the Program object can be created with no Qore system API (i.e. standard functions, classes, and/or constants are not available), and then the desired API can be imported manually.

Conclusion

Using sandboxed Program objects in Qore programs allows for applications to be written in Qore that support runtime extensions also written in Qore. The fact that these logic containers can be created and destroyed at runtime also allows for such applications to support runtime changes and logic upgrades without the need to restart the system for the changes to take effect.

Programmer productivity in such an application is maximized through the introduction of simple APIs for use in embedded code executed in specific application contexts, which, if properly implemented, allow for the behavior of an application to be customized in ways that could not be imagined when the application was designed and created, and therefore also allow for maximum flexibility.

Finding the right balance between configuration and coding to maximize productivity is important; allowing too much freedom (for example, if the balance of coding vs configuration is shifted too far towards coding) can increase programming complexity and steepen the learning curve, while allowing for too little customization through embedded code might cripple the application (particularly if the application is not very configurable through standard configuration).

In this way, such an application can be considered a special purpose meta programming language; programmers use a combination of configuration and embedded code to develop their solutions in the application.

Exception-Safe Programming, RAII, and Deterministic GC

2014-10-18T23:47:00.001-07:00

I'd like to make a follow-up post with more information on Qore's new Deterministic Garbage Collection support including more details and more information on why it's important for the language.

First of all regarding the language. deterministic garbage collection was always a goal of Qore's because of it's focus on the RAII idiom (wikipedia) for exception-safe programming and resource management (I also found the following very interesting link on this subject: http://www.hackcraft.net/raii/).

Some examples of RAII in Qore are (a subset of possible examples):

the AutoLock class releases the Mutex in the destructor (this class is designed to be used with scope-bound exception-safe resource management)
the Datasource class closes any open connection in the destructor, and, if a transaction is still in progress, the transaction is rolled back automatically and an exception is thrown before the connection is closed
the File class closes the file in the destructor if it's open
the RWLock class throws an exception if destroyed while any thread is still holding the lock; note that in this case the underlying object is only destroyed when all threads holding locks have released their locks; this is handled with Qore's thread resource handling and strong references to the underlying RWLock object while thread resources are held; thread resource handling is another potential topic for a blog post
the Socket class first shuts down any TLS/SSL connection and then closes the connection in the destructor if it's open
the ThreadPool class detaches all currently in-progress worker threads, cancels pending tasks not yet executed (by calling their cancellation closure, if any), terminates the worker thread and destroys the thread pool

Basically one of Qore's primary design points is to free Qore programmers from worrying about the details of memory and resource management; the use of the RAII idiom is a large part of this; also above you can see an example of negative feedback provided to programmers when mistakes are made - deleting a RWLock object while a lock is held (note that there is also scope-related resource management support in Qore in the form of the on_exit, on_success, and on_error statements).

Therefore since support for the RAII idiom is a critical feature of Qore's design, the language should always guarantee that objects are destroyed and therefore their resources are managed and associated memory freed when objects go out of scope. This is tricky when there are circular references involved in the objects, particularly since Qore uses a reference-counted solution with atomic references due to its multi-threaded nature. Consider the following post on this subject regarding .NET: http://blogs.msdn.com/b/brada/archive/2005/02/11/371015.aspx. This gives a lot of background to the same problem that Qore (and Java) have for providing for deterministic garbage collection.

Basically to summarize, RAII is not supported in Java and .NET because it's hard to do. Consider also VB6, basically if you have recursive object references in VB6, those objects will never be destroyed or collected by the system; the programmer has to delete objects with recursive references manually in order to destroy them. This is basically the same situation as Qore before the introduction of deterministic GC.

The current status of the deterministic GC support in Qore svn is stable; it appears to finally be working in large complex object-oriented multi-threaded programs with good performance and (at the moment - knock on wood) free of deadlocks. In another post I described the high-level approach with emphasis on the deadlock-avoidance approach since this was one of the trickiest parts to solve. I'd like to provide more detail here on the recursive graph detection and recursive reference algorithm.

To find recursive graphs and calculate the number of recursive references for each member of the graph, Qore does a scan of some starting object. Note that the recursive graph detection algorithm has to produce the same results for any given recursive graph independently of the starting node in the graph. So from a high level, Qore scans through all reachable objects from a start object, and, when a recursive reference is found, increments the recursive count for that object (in fact this is done in a transactional way where all changes are committed atomically to the entire recursive graph at the end of the transaction or rolled back in case of certain kinds of lock contention in order to avoid deadlocks, but this is described in my previous blog post). If a "path" is found from any given object to itself, the recursive count is set to one for all elements of the path, and any recursive reference then has its recursive count incremented. Due to the recursive nature of the algorithm, during path detection new elements can be added to the recursive set while scanning an object, so processing the path has to also take this into account by checking if parent objects are reachable from existing recursive sets and must process their recursive counts appropriately (for example, if, while processing a path, one of the objects in the path already has a non-zero pending recursive count, then if the preceding object in the path was not already in the set, then its pending recursive count is incremented, otherwise it is not). Additionally, if any predecessor in the current path of the current recursive node having a different recursive set is reachable from the current recursive set, then the recursive sets are merged into one.

STL sets are used to provide for O(ln(n)) searches of recursive sets found in the scan, however this may not be the ideal algorithm for small sets due to the overhead in managing red-black balanced binary trees normally used to implement sets in STL. Also these sets are iterated several times, so fast iterator performance is important. I believe that some analysis of the usage of data structures in the recursive scan could provide some optimizations.

Recursive scans are performed when changes are made to objects that could potentially create a new recursive set or change an existing recursive set. The approach also features several checks designed to limit the cases where the recursive graph detection algorithm needs to be applied; I believe more cases can be found to further improve performance.

I have not made a systematic performance test of Qore with deterministic garbage collection enabled (it's currently enabled by default in Qore svn trunk), but from subjective testing on existing code, it seems fast.

You can see if deterministic garbage collection is enabled in Qore by typing qore -V; you should see a line like this if it's enabled:

version 0.8.12-6768 (builtin features: sql, threads, DGC)

If you see "DGC" in the output, then it's supported, otherwise not (currently requires svn trunk - 0.8.12 has not yet been released).

In Qore code, there is a parse define, HAVE_DETERMINISTIC_GC, that can be used to optionally parse code depending on the presence of deterministic garbage collection or not; for example, there are now regression tests in Qore that are executed when deterministic garbage collection is present.

Deterministic Garbage Collection

2014-09-30T11:26:00.004-07:00

Qore has had a big design problem in its memory management regarding object collection; basically it was possible to make circular references to objects, and those objects would not be destroyed automatically, resulting in a memory leak.

The reason for this is that Qore used a simple strong reference count for object scope, so if object A pointed to object B, object B pointed to object A, then a circular reference would exist and neither object would be destroyed (or collected) when the objects would otherwise go out of scope. Note that Qore treats objects like Java does in that objects are always passed as a copy of a reference to the object instead of by value. Weak references to objects are also supported, but weak object references do not affect object collection (ie when the destructor is run on the object); normally the object's destructor is run on the object only when the strong reference count reaches zero (or the object is manually deleted with the delete operator).

Also due to Qore's multi-threaded nature, objects can go out of scope at any time since they can be deleted in another thread at any time. Qore objects are thread-safe and references to an objects are wrapped in a read-write lock - as are all shared lvalues in Qore - basically access to all lvalues in Qore are wrapped in read-write locks except accesses to "pure" local variables, which are local variables where references are not taken of them and also not used in a closure, either of which case makes it possible to use the local variable in another thread and therefore causes the local variable to be created specially so that accesses are wrapped in read-write locks.

I considered using something like Java's garbage collection approach with a dedicated thread that would scan and collect objects, but I always wanted to do a deterministic garbage collector so that Qore's resource management approach with objects could be applied deterministically. I finally have an initial working implementation of a deterministic garbage collection algorithm for Qore that is thread-safe, does not require a dedicated collector thread, has a solid deadlock-avoidance mechanism, and exhibits acceptable performance (which I expect can be further improved).

With this new approach, objects are collected immediately when they only have strong references that belong to a recursive cycle; so resources managed by the object are released in a deterministic way even when the objects participate in a recursive reference cycle.

I would like to describe the algorithm here including the locking approach and the deadlock-avoidance mechanism.

Basically the idea is to determine the number of strong references to an object that belong to a circular reference. In fact, it is more complicated than this, because you have to consider the entire recursive directed graph as a whole and not a single object. The recursive directed graph in this case is the set of all objects participating in the recursive reference chain. If any object is reachable in the chain and also contains link(s) (ie members) to other objects in the chain, then it is a member of the recursive directed graph. So the idea is to maintain the recursive strong reference count of every object in the recursive directed graph and then, when any member of the graph is strongly dereferenced, then the entire graph is checked to see if it can be collected, meaning that each member of the graph is checked to see if the strong reference count is equal to its recursive reference count. If any single member of the graph has non-recursive references, then no object in the graph can be collected; only when the strong reference count equals the recursive reference count for all objects in the graph can the objects in the graph be collected.

This is accomplished by performing a scan of all reachable objects whenever a potentially-relevant change is made to an object. In this case the recursive reference counts are calculated for all objects in the graph.

To accomplish this, each object is locked specially. In fact objects now have a special form of read-write lock that includes a special form of the read lock called an rsection lock. The rsection lock in a Qore object is a read lock that is unique in that first the read lock is acquired and then only one thread can hold the rsection lock at a time. This allows objects to be scanned for recursive graphs while also allowing them to be read in other threads to maximize concurrency. Additionally, since multiple rsection locks are acquired when performing recursive scanning (one for each object) and held for the duration of the scan, and since holding multiple locks could lead to deadlocks, and since this can and is performed in multiple threads simultaneously in multi-threaded object-oriented Qore programs, and since otherwise Qore avoids holding multiple locks simultaneously, the deadlock avoidance approach here is to apply a transaction-handling approach to the rsection scan and if any rsection lock cannot be acquired, the transaction is rolled back and we wait for a confirmation from the other thread that the contentious rsection lock has been released. Also a transaction counter in each object is maintained, and, after waiting for a contentious rsection lock to be released, we see that the transaction count for the root object has been changed, then we know that the rsection scan has already taken place, so we exit the scan immediately.

Basically all changes to objects are stored in temporary data structures and then only committed to the objects in the graph if all rsection locks are acquired.

Also all containers (lists, hashes, and objects) contain a reachable object count, which is a sum of the children (list elements, hash keys, or object members) that have at least one object reachable through them. This turned out to be efficient to calculate. This allows us to ignore any container that has no reachable objects when performing rsection scans.

Additionally if an rsection scan fails due to lock contention after an lvalue change, the scanned objects are marked with an invalid rsection so that a deterministic scan is made on the next strong dereference. When performing rsection scans due to a strong dereference, the scan is repeated after an rsection rollback until it is successful to guarantee deterministic collection when only recursive references remain.

There is still certainly a lot of room for improvement to this algorithm. For example, the rsection transaction could be compared to any existing rsection graph and left in place if identical results are found, or possibly a delta operation could be performed on an existing rsection graph.

While this algorithm is complicated, the goal of achieving deterministic garbage collection is a valid one in my opinion.

Knowing exactly when your objects will be collected even in the case of participation in recursive directed graphs of strong references provides an advantage to Qore programmers regarding resource management with objects.

Currently deterministic garbage collection is enabled by default in Qore trunk, and I plan on continuing to work on it.

Feedback on this subject would be appreciated.

REST vs RPC in Web Service Development

2013-03-07T05:24:00.000-08:00

Ive been implementing web services using lightweight web service protocols such as XML-RPC, JSON-RPC and (my favorite although proprietary) YAML-RPC for some time now.

I had some discussions last year about web service development using REST, a concept I had heard about for some time, but I did not see the advantages of it. I knew that it had become the predominate web development architecture model, but for some reason I could not see the advantages of REST over pure lightweight RPC approaches.

My first experiences with an RPC protocol over HTTP was with SOAP, which is a technology I've grown to strongly dislike; now I consider SOAP a horrible enterprise technology since it's cumbersome and expensive to implement and often suffers from serious compatibility issues between different vendor implementations (Qore also has SOAP support - both server and client, however I would recommend avoiding it if you can). My great appreciation for the lightweight RPC over HTTP protocols listed above is to no small degree based on my dislike of SOAP, so when hearing and reading about REST, I could not see that it would be as great a leap as from SOAP to XML-RPC for example.

(As a footnote Wikipedia lists XML-RPC as the precursor to SOAP, which makes SOAP an absolutely classic case of overengineering by committee - since from my point of view XML-RPC with all its faults is vastly superior to SOAP. Also here is a link which humorously sums up my thoughts on SOAP: http://harmful.cat-v.org/software/xml/soap/simple)

However, finally I've gotten into REST a little deeper, and having developed an initial RestHandler for the Qore HTTP server, the advantages of a REST architecture are completely clear and compelling.

Using REST forces you to have a logical and straightforward organization of your web service development. Each type of object will have it's own URL, and (in the Qore RestHandler at least) it will have a dedicated class to handle requests to objects of that type.

For example, in the Qore RestHandler, if you register a class with the name "workflows" at the root level of your REST handler, then a request like

GET /workflows

will be directed to your workflow class's get() method, which will handle the request.

Classes (and therefore URLs) can be stacked, so a request like

GET /workflows/1/instances/2?action=stop

Will be directed to the appropriate class's "getStop()" method. If the appropriate method name (derived from the HTTP method name in the request and any action verb given as a URI argument) is not implemented in the class, then the RestHandler returns a user-friendly 400 response.

This easy and logical hierarchical organization of the objects corresponding to the URLs along with the simple dispatching to handler methods of objects of the appropriate class makes implementing the REST service trivial in Qore.

Additionally, the RestHandler transparently takes care of deserializing request bodies based on the HTTP Content-Type header and serializing any response body based on the client's HTTP Accept header.

This also makes the REST approach superior to the lightweight RPC protocols mentioned above.

It's easy to see why REST is the dominant web service architecture; Qore's RestHandler will continue to be under active development for the upcoming release of Qore (0.8.8), and I hope it will be useful for others as well.

Class Inheritance vs Class Wrapping

2013-03-02T02:02:00.000-08:00

Qore supports "standard" class inheritance for polymorphism, but also supports another construct that may be unique to Qore. In Qore there is the possibility of "wrapping" a class and providing a compatible interface with the wrapped class without using inheritance.

There are several advantages and also some drawbacks to this approach. Let me first explain what I mean by "wrapping" a class and also explain how it's done in Qore.

To "wrap" a class, you embed an object of the desired class as a member of the new class and implement a methodGate() method that redirects method calls from outside the class to unimplemented methods. Additionally a memberGate() method can be implemented to redirect accesses to unknown members from outside the class and return an arbitrary value for the member access.

For example, you can wrap a class to provide logging for all method calls or provide external thread synchronization to an unsynchronized class. The following is an example that does both.

class MySocket {
private {
Socket $.sock();
Mutex $.m();
}

static log(string $fmt) {
vprintf(now_us().format("YYYY-MM-DD HH:mm:SS.us Z") + ": " + $fmt, $argv);
}

any methodGate(string $method_name) {
MySocket::log("MySocket::methodGate() method called: %y args: %y\n", $method_name, $argv);
$.m.lock();
on_exit $.m.unlock();

return callObjectMethodArgs($.sock, $method_name, $argv);
}
}

This can be very useful for testing and for other purposes, requires very little typing to implement, and its purpose is clear from the implementation (even without comments). Additionally, this method is completely independent of the definition of the wrapped class.

The main disadvantage is that it doesn't claim compatibility with the Socket class, so you cannot pass an object of class MySocket to code that expects a Socket object.

Currently there is no way in Qore to wrap a class in this way and to remain type-compatible (i.e. to appear to be a subclass of the wrapped class). To remain type-compatible in this way, every method of the class has to be explicitly wrapped as above, and if the wrapped class has any method interface changes (for example, new method), the subclass has to changed accordingly.

However despite the drawbacks, it's still a very useful construct. Possibly a future version of Qore will allow such wrapped classes to claim class compatibility with the wrapped class and therefore address the typing issues.

Exceptions vs Error Codes

2013-02-03T11:27:00.000-08:00

Recently I was reading about throwing exceptions vs returning error codes, and I realized that this is a very interesting topic in programming language design that had been in the back of my head, but deserves to be properly thought through.

I don't believe that one approach is better than the other for all cases; from my point of view the biggest argument for exceptions over return codes is that with exceptions you avoid situations where errors are silently ignored (resulting in confusing results for users and difficult to debug situations for the programmer).

Basically Qore tries to be as intuitive as possible for the programmer (and obviously fails sometimes, as seen from the occasional design and interface changes as the language progresses), and therefore should try to get this right.

After thinking about it a bit, it seems to me that Qore builtin code should always throw exceptions if errors are not likely to occur (and therefore be more likely to be ignored). Examples of this would be Socket send methods (changed from returning error codes to throwing exceptions in 0.8.6) and File write methods (even less likely to throw exceptions, already changed to throw exceptions in svn - to be released in 0.8.7 shortly).

It also seems that for code that is more likely to throw an error could justifiably return error codes instead of throwing exceptions, as ignoring the return code would be clearly bad programming practice. If the code in question already returns a result code, then this model fits even better.

Making a language as intuitive as possible for all programmers is not easy. Basically I can only try to do it for myself and try to think through and make educated guesses about what should work for others. However at the end of the day, it's all about maximizing productivity. Having silent errors in an important program could cost a lot of money to debug. If the programming language is not intuitive, then it also costs money every time programmers get stuck on some weirdness with the language or have to debug problems caused by the same. Intuitive programming languages are also more fun to program in.

This last point is the main reason why Qore's design was originally based on perl - I found perl very intuitive and wanted to make a language like perl that was suitable for enterprise development. I know that quite a few programmers do not like perl, but possibly a lot of those have never programmed much in it. Anyway, I believe Qore is getting better and better with each release, I find it more intuitive, and I hope those that are brave enough to try Qore do too.

Solving the Java Classloader Class Compatibility Problem in Qore

2012-08-24T02:10:00.000-07:00

In Java, if you load the same class from two different classloaders, the JVM will see those as two different classes, and not even casting will work to force the JVM to see them as compatible classes.

Qore has a similar problem; Qore uses the Program class to encapsulate code; multiple Program objects can exist in a program (or script), and classes created from the same source code but from different Program objects were (before Qore 0.8.4) not recognized as the same class, analogous to the classloader problem in Java.

Qore 0.8.4 solved this problem by implementing a "class signature" for every user-created class, which is a string describing the interface of the class; the class signature string contains a listing of all the private and public methods and members of the class along with their attributes (types, access control, etc) and the parent classes as well (names and access control plus signatures for user classes and unique class IDs for builtin classes). Then an SHA-1 hash of this string is made, which is used for comparisons for class compatibility.

Before Qore 0.8.4, if the internal unique class ID matched, then the classes were compatible, now, if the class IDs do not match, but the class names and signature match (and the parent Program objects are different), then the classes are assumed to be compatible.

Note that method implementations are not included in the signature, therefore it would be possible to have two classes with the same signature but different implementations, however (aside from hash collisions), assuming the signatures match, the classes should be compatible.

So far the implementation seems to work in practice; there may be some more modifications made to how the signatures are created, but this is a purely internal process in the Qore library and can be changed at any time without affecting backwards compatibility (assuming I only improve the implementation to detect more false positives).

There also could be some security implications in some scenarios, if necessary I can implement a flag to turn this feature off on a per-Program object basis.

Universal References

2012-06-12T10:25:00.000-07:00

References have always been a problem in Qore. The main reason was that there was no unified lvalue handling, and also having local variables thread-local meant that there would either have to be no references to thread-local variables, or nonintuitive restrictions (for example by disallowing a reference to an lvalue expression anchored by a thread-local variable to be assigned to a variable with greater scope). Up unti now, Qore took the wimpy way out and only allowed references to be passed to function and method calls.

Today I just committed support for universal references, addressing (another) long-standing deficiency in the language. I was motivated by writing some Qore code that needed to do some reformatting to a complex data structure (that was parsed from XML data - BTW did I ever mention that I prefer YAML?). Basically I needed to add the number of order items to the last order in a list contained which was an attribute of the last record of a list.

The code looked as follows:

# finalize last order - set numberofitems
recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order[ol.size() - 1].numberofitems = oh.items.size();
recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order += get_order(e, h);

I was frustrated by Qore's lack of references to simplify the above code. Then I realized that I had the unified lvalue infrastructure (implemented in Qore 0.8.4) and had also solved the thread-local multi-threaded access problem when I implemented closures; when a local variable is bound into a closure, the local variable is not only thread-local anymore; it has a mutual-exclusion lock on it and its lifetime is reference counted - it lasts only as long as it's bound in a runtime closure or until its local scope expires. Because a runtime closure could be used in multiple threads, all local variables bound in the closure when it's created at runtime are protected by the mutual-exclusion lock to ensure consistency and atomicity.

Therefore I simply applied this same approach to local variables that are referenced and removed all restrictions on the use of the lvalue reference operator ('\') in Qore.

The resulting code is cleaner and more consistent, and I actually found a segfault-inducing memory error with the old cludgy implementation at the same time.

The above code now reads:

# finalize last order - set numberofitems

reference orders = \recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order;

reference lord = \orders[orders.size() - 1];

lord.numerofitems = lord.items.size();

orders += get_order(e, h);

That's two lines longer than the first one without references, but much easier to read, understand, and maintain.

Qore 0.8.5 should be out before too long; I mainly want to get it out so I can update all the binary modules; I discovered a bug in a new library API that is only used by modules built with qpp, so I want to get Qore 0.8.5 out relatively quickly and then update all the binary modules in common use as well.

This feature is already in svn and looks to be stable; the only other feature I plan on adding to Qore for 0.8.5 is support for abstract class methods - this way java-style interfaces can be implemented by defining a class with abstract methods; I've been wanting this for a while, and it doesn't look too hard to do, so I hope to get that done in the next few days.

Weak and Strong Destructors

2012-05-22T03:19:00.002-07:00

Qore has C++-style constructors and destructors (with a different naming convention; Qore uses "constructor" and "destructor" while C++ uses names derived from the class's name), however due to Qore's unique memory-management approach, while destructors mostly appear to work like in C++, there are some differences in the details of the implementation regarding how destructors work on built-in classes (either from the Qore library itself or from classes provided by binary modules).

The main data structure that represents an object in Qore is the C++ class "QoreObject". Objects of this class have access to their internal state serialized with a mutex (thread-safety is a fundamental design principle of Qore); so for a Qore class implemented only with user code (and not inheriting any built-in classes), after any user destructor method is run, the object is marked as deleted and its internal data structures are cleared and all resources are released back to the operating system. If any copies of references to the object (Qore is like Java in the sense that passing the object by value or assigning the object to another lvalue actually passes/assigns a copy of a reference to the object) are accessed after the object is deleted, an OBJECT-ALREADY-DELETED exception is raised.

This is fairly straightforward and results in predictable behavior for the programmer. Where "weak" and "strong" destructors come in is with built-in classes.

When a user class inherits a built-in class, the built-in class's constructor links a C++ object containing the internal state of the built-in class to the Qore-language object (a "QoreObject" data structure, as mentioned above). Whenever a method of the built-in class is executed on the object, the Qore runtime atomically checks the status of the object to see if it's still valid, then, if so, it finds the built-in C++ object for the built-in class linked to the Qore object (let's call this object a "class state object") and atomically increments its reference count, and then calls the C++ function that implements the method being called with the class state object, a pointer to the "QoreObject" (representing "self") and a (possibly-empty) list of Qore-language arguments to the method call.

When the call is complete, the class state object is dereferenced and the return value of the method call is returned to the caller (or an exception state is returned, if applicable). If the class state object's reference count reaches zero then the class state object is deleted. This normally happens immediately (and synchronously) after the destructor is processed (at which time all linked class state objects are dereferenced), however it can happen afterwards with "weak" destructors.

A "weak" destructor for a built-in object is a destructor that does not implement any further serialization or gated state checking when method calls are made; it just relies on Qore's object state checking. In this case, it's possible for one or more threads to call a method on the object while another thread deletes the object in parallel. If the built-in class does not implement a "strong" destructor, then the built-in class state object will only be deleted after the destructor has been executed (removing the initial reference count for the class state object) and any in-progress methods terminate, which could be quite a while after the actual object destructor has been called depending on the method.

Most built-in classes in Qore implement "weak" destructors because they are simpler to develop and execute faster at runtime (since there's no additional thread synchronization or gating). Furthermore in normal use, a "strong" destructor is not functionally necessary for most classes; it's normally not a problem that such a race condition (where methods are in progress while the object is explicitly deleted in a separate thread) does not cause additional exceptions to be thrown. For example, in Qore the Socket class has a "weak" destructor. However Qore's Queue class has a strong destructor, and, in Qore 0.8.4, the Program class now has a strong destructor in order to enforce stricter discipline on memory resources used.

The Queue class's state object, for example, is implemented by the C++ class "QoreQueue". All methods in this object that cause state changes are explicitly protected by a mutex (logically as this is a thread-synchronization and messaging class). The "strong" destructor grabs the mutex and then checks if other threads are waiting on the Queue (either for reading or writing); if so, an appropriate exception is thrown and the waiting threads are notified that the object has been deleted (and exceptions will be thrown in the waiting threads as well). The race condition with the destructor described earlier is a serious error with the Queue class, particularly due to its critical role in Qore's threading infrastructure, therefore it has a "strong" destructor. The same goes for the Mutex, RWLock, Gate classes, etc.

The Program object now has a "strong" destructor so that whenever it is deleted, all its memory is immediately freed, and any objects or code references created in the Program that have been exported out of the Program will cause PROGRAM-ERROR exceptions to be raised if they try to access the already-deleted Program. This was necessary because otherwise, in a program using lots of dynamically-created Program objects, objects exported from the Programs would cause the parent Program to live for as long as the exported objects even if the Program itself were explicitly deleted if it had a "weak" destructor.

Also on a completely different subject, I implemented support for the "final" keyword when declaring classes and class methods for 0.8.4, which should be the very last feature to go into 0.8.4 before its release, which is now imminent.

Preparing for the 0.8.4 Release

2012-04-30T09:35:00.000-07:00

I just finished doing a major rewrite of the internal lvalue handling in Qore. Basically now most types lvalues are stored in a union which consists of one of a 64-bit int, a double, a bool, or a pointer to a generic Qore value object.

The thing with Qore is that at the beginning, all values were dynamically allocated objects derived from a common virtual base. This was to allow for atomic reference counts and a copy-on-write approach to managing data, which is very efficient for large data structures. In this way you can pass a large data structure (such as a hash, list, or object) to a function by value, and the value is only copied if it's changed. Even then, only the top level of the data structure is copied, because each of the values is also a reference-counted object, so, unless they change as well, they are only copied by reference (meaning a pointer is copied and its reference count is incremented).

However this approach is not efficient for small, discrete values such as integers and floating-point values. It's even worse for boolean values, which can be stored in as little as 1 bit.

Qore has an optimization for special values like True and False and some others, whereas there is only 1 single value in the Qore library that is not subject to reference counting. However this is not possible with ints and floats.

The solution that I implemented for lvalues is to use the union as described above; the type of the union is set based on the lvalue's type restriction -- so if you declare a local variable as "int" or "softint", then it will be internally stored and operated on only as an integer (the same with "float" or "softfloat").

This allows Qore to store and operate directly on the base data type, instead of always working with another level of indirection (a pointer to a generic value object) and also eliminates the associated dynamic memory management. So this approach has both memory and speed benefits.

This work showed me a clear way forward for doing some very cool optimizations in Qore regarding value handling - basically long-term I plan on making all Qore values some sort of union like this, which will allows Qore always in every instance to operate directly on the base data type when possible. This will be necessary before starting llvm integration as well.

This will be a lot of work and will start some time after the upcoming 0.8.4 release.

I also implemented user thread initialization - you can now set a closure or call reference to be executed any time a new thread is started in the current Program object (or any time another Program object accesses the current Program object in a new thread) - this can be used to initialize thread-local data in the Program.

Also I implemented an optional maximum size for the Queue class - if a maximum size is set then writes to the Queue will block if the Queue already has the maximum allowed number of elements in it. In this way, Qore Queue objects can be used like a buffered Go channel.

At the moment, Qore 0.8.4 is feature complete and stable in svn, however there's still some more packaging work etc to be done before the actual release, which hopefully will be pretty soon (I'm aiming for sometime in the next 2 weeks).

User Modules

2012-04-21T00:50:00.004-07:00

I've recently committed support for user modules in Qore; this will allow the language to be extended in an organized and predictable way with Qore-language code.

Before this was only possible with modules written in C++.

The current documentation for user modules is online here: http://qore.org/manual/current/lang/html/qore_modules.html#user_modules (note: edited to reflect a perma-link for the user module documentation in the latest qore docs)

User modules have the following features:

code embedding safety: modules work with Qore's functional domain permission/protection framework so that embedded code can only use modules that use authorized functionality
encapsulation: only symbols marked as public are exported; everything else is private to the module
uniqueness: multiple pieces (source files) can "require" a module safely - also when embedding Qore code, when multiple Programs use a module, there is only one copy of the module and of its private data (single global state)

Note that also there has been a nearly complete rewrite of the namespace code and handling to facilitate user modules - particularly to enable public and private symbols in modules. For example, now global variables are also contained in namespaces (hence it's possible to have more than 1 copy of a global variable with the same name in different namespaces).

The next step will be to integrate a separate program called "qdx" which converts Qore code to a c++-like format for doxygen parsing so that doxygen documentation can be generated from Qore modules and those can be integrated into Qore's reference documentation (at least for the modules that will be shipped with Qore - this will be the start of Qore's Qore-language runtime library).

I have already added a couple of user modules to the Qore source in svn (HttpServer.qm and Smtp.qm) and updated the build and packaging code accordingly.

The new directory location for the runtime library is the Qore version string as a subdirectory of "qore-modules" (where binary/c++ modules are installed). For example on UNIX this might be:

/usr/lib64/qore-modules/0.8.4

This directory is automatically added to the QORE_MODULE_DIR search path.

I hope this will enable more collaboration to be made on Qore and of course for the language to be more transparent and useful for more people.

Qore Plus Plus

2012-02-27T08:39:00.002-08:00

I've implemented a Qore Pre-Processor (qpp) for writing language bindings - that is the c++ code that binds the internal class and function implementations to the Qore language.

The problem was that the language bindings were complex and error-prone, and, while I could make the language bindings easier to use, I think the qpp solution is better, because with qpp:

the language bindings can be abstracted from the actual/current implementation
language documentation from doxygen-style comments can be generated directly from the language's source code
a great deal of repetitive code can be generated automatically
(later) the documentation comments can be incorporated internally to provide additional information in reflection-like internal and external APIs

Currently qpp processes "qpp" files to 2 targets - cpp (the Qore language c++ binding files) and dox.h files (the Doxygen source files).

In the qpp file, function, constant, and classes are defined in a Qore-like syntax (with some additional information for internal tags, functional domains, etc). The bodies of each function or class method are then written in C++ (hence Qore Plus Plus).

For example, here is the qpp implementation of the Dir::path() method:

/! Returns the path of the Dir object or @ref nothing if no path is set
/** This path does not necessarily need to exist; the path is adjusted to remove \c "." and \c ".." from the path if present

   @return the path of the Dir object or @ref nothing if no path is set

   @par Example:
   @code
my *string $mypath = $d.path();
   @endcode
*/
*string Dir::path() {
  return d->dirname();
}

The current pre-release documentation based on Doxygen for Qore 0.8.4 can be found here: http://qore.org/manual/qore-0.8.4/lang/html/index.html

I won't be able to get to many optimizations using qpp in this release, because first I want to clean up the namespace code and some related changes. However, qpp lays the groundwork for making easy infrastructure changes to Qore in the future - we'll be able to implement new solutions in Qore and then apply them globally to all language bindings by extending qpp.

Qore Optimizations

2011-11-27T00:16:00.000-08:00

Work on Qore 0.8.4 is progressing slowly but surely.

Here's a rundown of what's in svn right now - targeted for 0.8.4.

Pseudo-Methods

Qore 0.8.4 will support pseudo-methods; these are methods that can run on any type; basically any value will support a set of methods depending on its type. These pseudo-classes are in a (relatively-flat) hierarchy; the base pseudo-class is "any", then there is a class for each data type that inherits from this class.

Pseudo-methods can be used for convenience and will also support operations that are very inefficient to do the traditional Qore way. For example, to get the first key from a hash, previously one had to make an evaluation like this:

    my *string $firstKey = (keys $hash)[0];

For a large hash, this is terribly inefficient, creating a list of all keys in the hash and then returning the first element of the list. Now, with the new pseudo-methods, it can be done like this:

    my *string $firstKey = $hash.firstKey();

Very efficient and clear. Also, to check the data type, the traditional Qore way would be the following:

    if (type($value) == Type::String) {
    }

Whereas the type() function returns a string, and Type::String = "string"; so a string comparison is made. The new way would be to use "any"::typeCode() like this:

    if ($value.typeCode() == NT_STRING) {
    }

Whereas NT_STRING is the integer code for a string (3, the same as the C++ NT_STRING constant coincidentally enough), so this is a much more efficient expression.

Currently there are only a handful of pseudo-methods implemented for each type, but more will be implemented before the 0.8.4 release. In particular I plan on adding more date/time methods to get quick information about date/time values.

Of course pseudo-methods are typed and when resolved at parse-time, particularly for integer operations, run-time optimizations are used. In the last example above (using "string"::typeCode()), no memory allocations or will occur; the operands of the boolean comparison operator will be evaluated in an integer context, meaning that internally native integer values are returned (and not, for example, QoreBigIntNode structures), so there are no memory allocations or atomic references made (which could cause SMP cache invalidations, etc). These kinds of operations are very fast and scalable - runtime optimizations made possible because data types were available at parse-time.

And this leads to the second big feature for Qore 0.8.4:

Performance and Memory Optimizations

Qore 0.8.4 will feature major performance and memory optimizations. I've already implemented a big part of this by implementing optimized handling for integer local variables. Local[ly-scoped] variables are already thread-local (and therefore do not require any locking for reads and writes), and now local variables restricted to integers (declared as int or softint) are also stored as native integers. Assignment operations are also evaluated in an integer context; as above, only native C++ integer values are passed around and stored; there are no memory allocations or atomic reference counts.

Additionally, I've started porting all the operators over to Qore's new operator framework (subclasses of QoreOperatorNode - to replace expressions made with QoreTreeNode which will be removed from Qore when the migration is complete), and the operators ported also support optimized integer operations; ie when types are known to be integer at parse-time, the run-time variants of the operators are used that use the optimized integer operations (again without any memory allocations or atomic operations).

This all leads to major performance improvements with integer operations.

I've also added some of the necessary infrastructure for optimizing floating-point operations (support for local variables, optimized operator variants), but have not finished this work yet.

I also have in mind an LLVM back-end for Qore in the future; I plan on adding more optimizations and propagating additional information about the code during parsing which will be used when generating compiled code for further optimizations. By that I mean not just type restrictions (which can obviously lead to major optimizations like the above) but also, for example, marking lvalues as constant in certain scopes, enforcing the QC_CONST flag for functions and methods and more. But at the moment that's a ways down the road - I won't be able to get to LLVM integration before the 0.8.4 release for sure.

Qore 0.8.3

2011-08-27T22:47:00.000-07:00

The next release of Qore is coming very soon; the major new feature for this release will be Windows support.

Qore is now capable of being built as a native DLL for Windows (XP and up) - finally without Cygwin (which made for a pretty slow binary actually). This was made possible through the MinGW cross compiling environment (http://mingw-cross-env.nongnu.org/).

The main blocking point for this port (which has been requested many times over the years) was my lack of familiarity with Windows development tools (and my dislike of the Windows command line). This software allowed me to use Linux and OSX to make the Windows port. It turned out to be a lot easier than I expected; the MinGW pthreads library worked perfectly (I only needed to make minor changes as pthread_t is not a pointer); the socket code was fairly easy to update (the main thing I had to do there was update the code checking for errors and then getting the Windows error messages instead of using errno); and then I had to write new code for time zone handling to read zone information from the Windows registry instead of using the zoneinfo DB (as on UNIXes). Also the dlfcn library for Windows (http://code.google.com/p/dlfcn-win32/) allowed for seamless loadable module handling with the same code as on UNIX.

The release is basically ready; I just want to port a few more modules to Windows before I make it public (so far I've got the xml, json, yaml, and uuid modules also working on Windows).

Other than a large number of bug fixes since 0.8.2 and a few minor improvements here and there, the only other feature of note is support for simple conditional parsing based on C/C++ preprocessor-type %define and %ifdef, %ifndef, etc directives.

Current (tentative) release notes:

http://qore.svn.sourceforge.net/viewvc/qore/qore/trunk/RELEASE-NOTES?view=markup

Current (tentative) Windows README:

http://qore.svn.sourceforge.net/viewvc/qore/qore/trunk/README-WINDOWS?view=markup

Qore 0.8.1 Released

2010-12-25T09:31:00.000-08:00

Qore 0.8.1 has just been released with a ton of bugfixes and new features. Major new features are: SQL prepared statement API (currently only supported by the soon-to-be released oracle driver v2.0), a much improved type system, support for class constants and static class variables, and a more standard syntax for declaring function and method return types by allowing the type name to be declared at the beginning of the function or method signature, as in C/C++ or Java, for example.

Additionally, there are new parse options that allow for programming without the "$" and "$." signs for variables, class method calls, and object member references.

This last change hopefully will make a lot of people happy - I had a lot of requests to do away with the "$" signs, and now it's possible. Unfortunately, the code highlighting solutions out there will have to be updated again to handle the new %allow-bare-refs and %new-style parse options. %new-style combines both of the new parse options %allow-bare-refs and %assume-local, the latter meaning that all variables are assumed to have local scope unless declared global.

Here is an example with %new-style:

%new-style

int sub do_something(int p1, string str, *hash h) {
   for (int x = 0; x < p1; ++x) {
      stderr.printf("error: %s\n", str);
   }
   return p1 + 2;
}

Backwards compatibility is a priority and has been maintained. We'll see if the decision to allow for this new programming style is a good one; sometimes too much choice can just lead to confusion and therefore is counterproductive. However at least some people are very happy with it.

Current Qore Status

2010-06-23T00:50:00.001-07:00

Qore 0.8.0 has been released along with all updated modules; modules have in most cases been updated to take advantage new APIs (mostly regarding typing, date/time improvements, and new Datasource/DatasourcePool methods).

There was a slight delay before the 0.8.0 release to improve the type system; now the type system is internally capable of supporting very flexible types, where one or more types are accepted and one or more types are returned, making types such as "nothing or string" possible to implement internally.

However Qore's user type declaration support in the parser and the function and class library were not updated to take advantage of the new flexible typing support, as everything was stable and tested and applying the new support for more flexible typing would delay the release by probably a month or two, but at least the internal changes were in place and are a part of the Qore library's API and ABI.

One of the coolest new developments to make use of Qore's new typing support is the new qt4 module, which allows Qore code to implement sophisticated platform-independent QT4-based GUIs (note that the qt4 module does take advantage of Qore's new flexible type system, allowing NOTHING to be passed for some classes to simulate a NULL pointer, for example).

Here are some things to expect in future releases of Qore: implicit typing and other parser improvements (0.8.0 is a great improvement in this area already), improved execution speed, and SMP scalability. And most interestingly JIT compilation support using the LLVM project. This is the most exciting part of future development of Qore that once again should take the Qore language to another level. I am astounded at what an awesome project LLVM is; how well documented it is, how well supported and active it is, what it can do and what a language designer can do with it. This will take some time, but will be some of the most interesting work to date done with Qore, and the results should be nothing short of amazing.

I wish I could give a timeline for any of these new developments, but I cannot; they will be done as time permits.

The Joy of YAML

2010-05-16T04:11:00.001-07:00

I had been working so hard on the next release of qore for so long, I had to take a short break, in which I made the new yaml module. The yaml module is currently a very small module in terms of source code, that allows qore data types (except objects) to be serialized and deserialized in YAML format. It uses libyaml to do the real work.

The great thing about YAML is that it is much better suited to representing data in text format than XML because it's much more concise and readable for humans. Additionally, with the addition of one custom YAML tag (!duration), all native Qore types can be serialized and deserialized as YAML with no information loss.

Compared to XML-RPC, YAML supports time zone information and time resolution to the microsecond (actually YAML's !!timestamp type supports arbitrary fractional seconds), and with our custom !duration type, support relative date/time values in an ISO-8601-like format (with the addition that time values may be negative and an additional character to specify microseconds). Of course YAML is much more readable and concise than XML-RPC.

Compared to JSON, YAML is very similar of course, but is extensible and supports more data types out of the box. JSON is missing !!timestamp and !!binary (base-64 encoded binary type). JSON is as consice and readable as YAML (because YAML, at least YAML 1.2, is a superset of JSON).

Take the following Qore data structure (valid with Qore 0.8.0+):
(1, "two", NOTHING, 2010-05-05T15:35:02.100, False, 1970-01-01Z,
(hash(), (), "three \"things\""), P2M3DT10H14u, now_us(),
binary("hello, how's it going, this is a long string, you know XXXXXXXXXXXXXXXXXXXXXXXX"),
("a" : 2.0,
"b" : "hello",
"key" : True))

Here's how the serialization looks:
YAML:
[1, "two", null, '2010-05-05 15:35:02.1 +02:00', false, 1970-01-01, [{}, [], "three
\"things\""], '0000-02-03 10:00:00.000014 Z', '2010-05-16 13:27:15.859195 +02:00',
!!binary "aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==",
{"a": 2, "b": "hello", "key": true}]

XML-RPC (note the duration is not serialized correctly, time zone and us info is lost):
<struct><member><name>data</name><value><array><data><value><i4>1</i4></value><value><string>two</string></value><value/><value><dateTime.iso8601>20100505T15:35:02</dateTime.iso8601></value><value><boolean>0</boolean></value><value><dateTime.iso8601>19700101T00:00:00</dateTime.iso8601></value><value><array><data><value><struct></struct></value><value><array><data/></array></value><value><string>three "things"</string></value></data></array></value><value><dateTime.iso8601>00000203T10:00:00</dateTime.iso8601></value><value><dateTime.iso8601>20100516T13:31:23</dateTime.iso8601></value><value><base64>aGVsbG8sIGhvdydzIGl0IGdvaW5nLCB0aGlzIGlzIGEgbG9uZyBzdHJpbmcsIHlvdSBrbm93IFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWA==</base64></value><value><struct><member><name>a</name><value><double>2.000000</double></value></member><member><name>b</name><value><string>hello</string></value></member><member><name>key</name><value><boolean>1</boolean></value></member></struct></value></data></array></value></member></struct>

JSON (minus the binary data that cannot be serialized, note dates are serialized as strings):
[ 1, "two", null, "2010-05-05 15:35:02.100 Wed +02:00 (CEST)", false, "1970-01-01 00:00:00 Thu Z (UTC)", [ { }, [ ], "three \"things\"" ], "", "2010-05-16 13:32:44.792114 Sun +02:00 (CEST)", { "a" : 2, "b" : "hello", "key" : true } ]

The incredible thing was that I could not find any standard YAML-RPC protocol definition. The closest I could find was a partially-documented protocol called !okay/rpc. So I just implemented a simple YAML-RPC handler and client based on JSON-RPC 1.1 and it works great. I will probably simplify it a bit more to be a little more like !okay/rpc but with a described fault message, then document it and put it online for people to review.

I'm really happy I found YAML; it's conciseness, extensibility, and readability make it a far superior alternative to XML and XML-RPC for data serialization for Qore. In the future I will look at making it possible to serialize and deserialize objects as well - if a class supports writing out its state by using Qore data that can be then passed to the appropriate constructor or other such method on the remote end, it would solve this problem.

Note that Qore's YAML module still is undocumented, but is stable and somewhat tested. IT requires qore 0.8.0+ (still unreleased - only in svn).

Time Zone Support

2010-04-20T06:41:00.000-07:00

Time zone support has recently been added to qore in svn; version 0.8.0 will support time zones. Qore uses the system's zoneinfo database (if it can find it) to load in information about daylight savings time and time zone names, etc.

The entire date/time implementation was internally reimplemented for this change, although the old APIs remain for backwards-compatibility. Basically now all absolute date/time values not having an explicit time zone identifier will be assumed to be local time.

During the change I also extended the precision of relative and absolute date/time values to the microsecond (1,000,000th of a second), previously it was only to the millisecond.

Hard Typing in Qore 0.8.0

2010-03-10T12:42:00.000-08:00

Qore 0.8.0 development is at an advanced stage. One of the major features of this release will be the addition of hard typing. This is an interesting subject because it's not always obvious how hard typing should be added to a language that previously was entirely dynamically typed.

Several issues have presented themselves so far. One is how to handle variables declared with a certain type but not assigned any value. Qore's syntax for typed variable declarations looks like this:
my int $x;

One of the reasons for adding hard (aka strong) typing to the language is that is allows many more errors to be caught at parse time that otherwise can only be caught at run time. Consider the following code:
sub func(int $i) {
}

my int $x;
func($x);

The dilemma is - should the typed variable declaration cause $x to get a default value for the type, or should the value of $x be undefined (or actually in Qore: NOTHING) and therefore still cause a run time type error to be thrown. I've just checked how perl6 should handle this, and it appears that they take both approaches - as above with a lower-case "int", $x will get the default value for the type (0), with an upper-case "Int" the value will be undefined.

Qore's current implementation in svn (subject to change) is that $x holds NOTHING and a run time type exception will be thrown when func(int) is called.

Qore currently contains some other logic that allows such code to work:
my list $l;
$l += "str";

This will cause $l to be a list with "str" as the sole element; so even though $l was not initialized with a value in the declaration, the += operator still behaves as expected.

The idea is to allow the programmer to check if variables have been assigned a value or not. When we had Qore assigning default values in every declaration without an assignment, we found that it broke a lot of code that we otherwise expected to work.

Furthermore, some types cannot have a default value, such as objects (at least for classes that do not have a constructor that takes no arguments) and for the reference and code types. In order to make type handling consistent, we decided that variables and object members will only get values if they are explicitly assigned.

Now that I'm writing this down, it seems to me that the best solution would be to implement some kind of magic to cover the run time error above and pass the default value for the type to func(int) above - a solution similar to the one implemented for the += operator. That would be great, because then we can avoid run time type errors even though we've declared all our types (something hard typing should give us) and still be able to tell if we've assigned the variable or not. Otherwise we could modify the parser to track if a variable's been assigned or not when it's used, however this would not be a perfect solution because the parser cannot always know this with certainty (halting problem), so in some cases run time type errors would still occur.

It wasn't totally clear to me what would happen in Perl6 if an undefined Int value is passed to a function that expects an Int as a value.

Another issue is making the included function and class library friendly for strongly-typed code. There are a lot of functions and class methods that can return multiple values depending on the argument types. Since qore 0.8.0 in svn now supports overloading, we simply tag each variant with its parameter types and return type (note that an overloaded version of a function or method is called a variant in Qore). However there are cases when the return type depends on the values of the arguments, and not just the types (for example, if you pass a string with an invalid URL to the parseURL() function, it will return NOTHING). In these cases we're writing alternate, new versions of the functions that either throw an exception or return a default value for the return type. In this last example, there is now a parse_url() function that always returns a hash if the URL can be parsed, otherwise an exception is thrown.

Many of Qore's functions simply return NOTHING if the arguments are not of the expected type. In Qore 0.8.0, these functions have a default, untyped variant mapped to f_noop(), a function that simply returns NOTHING. All other possibilities are mapped to the actual functions that perform the useful work that needs to be done. This allows qore to recognize function calls that do not make sense and report that they return NOTHING at parse time (as long as types are declared in all the relevant places). An example of this type of function is parseURL(), a new version, parse_url() has been added that only accepts types that can produce a result.

Most of the other issues that have come up have been solved satisfactorily, method overloading in class trees (implemented successfully), default arguments (implemented for user and builtin code), matching variants at parse time and at run time (Qore tries to match the variant at parse time if types are available, otherwise variant matching is performed at run time).

However another issue is that the type system is very simple and flat; you may declare a variable of type list, but you cannot restrict the element type of the list. The same with hash keys and references (you cannot currently declare a variable as a reference to a particular type). Also it's currently not possible to declare code references to code with a particular signature (the new builtin type code "code" is used to restrict lvalues to being assigned a closure or call reference).

It's all subject to improvement before the 0.8.0 release. The libsmoke-based qt4 module has been converted to use hard typing and has been a very valuable test bed for the hard typing implementation.

Qore in svn is currently stable- if you want to check out qore with hard typing, grab the source from svn and compile it. Qore in svn is also currently compiling on windows with cygwin, although without modules and only as a monolithic binary.

To download qore from svn, use the following command:

svn co https://qore.svn.sourceforge.net/svnroot/qore/qore/trunk qore

Comments are very much appreciated!

Qore Stuff

2010-02-27T04:05:00.000-08:00

I finally got first post, and I just had to start my own blog to do it. Here I will be talking about what's going related to the Qore Programming Language. At the moment a major new release is in development, and it's pretty interesting right now, because hard typing has been added to the language.