Translate

Friday, August 24, 2012

Solving the Java Classloader Class Compatibility Problem in Qore

In Java, if you load the same class from two different classloaders, the JVM will see those as two different classes, and not even casting will work to force the JVM to see them as compatible classes.

Qore has a similar problem; Qore uses the Program class to encapsulate code; multiple Program objects can exist in a program (or script), and classes created from the same source code but from different Program objects were (before Qore 0.8.4) not recognized as the same class, analogous to the classloader problem in Java.

Qore 0.8.4 solved this problem by implementing a "class signature" for every user-created class, which is a string describing the interface of the class; the class signature string contains a listing of all the private and public methods and members of the class along with their attributes (types, access control, etc) and the parent classes as well (names and access control plus signatures for user classes and unique class IDs for builtin classes).  Then an SHA-1 hash of this string is made, which is used for comparisons for class compatibility.

Before Qore 0.8.4, if the internal unique class ID matched, then the classes were compatible, now, if the class IDs do not match, but the class names and signature match (and the parent Program objects are different), then the classes are assumed to be compatible.

Note that method implementations are not included in the signature, therefore it would be possible to have two classes with the same signature but different implementations, however (aside from hash collisions), assuming the signatures match, the classes should be compatible.

So far the implementation seems to work in practice; there may be some more modifications made to how the signatures are created, but this is a purely internal process in the Qore library and can be changed at any time without affecting backwards compatibility (assuming I only improve the implementation to detect more false positives).

There also could be some security implications in some scenarios, if necessary I can implement a flag to turn this feature off on a per-Program object basis.

Tuesday, June 12, 2012

Universal References

References have always been a problem in Qore.  The main reason was that there was no unified lvalue handling, and also having local variables thread-local meant that there would either have to be no references to thread-local variables, or nonintuitive restrictions (for example by disallowing a reference to an lvalue expression anchored by a thread-local variable to be assigned to a variable with greater scope).  Up unti now, Qore took the wimpy way out and only allowed references to be passed to function and method calls.

Today I just committed support for universal references, addressing (another) long-standing deficiency in the language.  I was motivated by writing some Qore code that needed to do some reformatting to a complex data structure (that was parsed from XML data - BTW did I ever mention that I prefer YAML?).  Basically I needed to add the number of order items to the last order in a list contained which was an attribute of the last record of a list.

The code looked as follows:

# finalize last order - set numberofitems
recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order[ol.size() - 1].numberofitems = oh.items.size();
recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order += get_order(e, h);

I was frustrated by Qore's lack of references to simplify the above code.  Then I realized that I had the unified lvalue infrastructure (implemented in Qore 0.8.4) and had also solved the thread-local multi-threaded access problem when I implemented closures; when a local variable is bound into a closure, the local variable is not only thread-local anymore; it has a mutual-exclusion lock on it and its lifetime is reference counted - it lasts only as long as it's bound in a runtime closure or until its local scope expires.  Because a runtime closure could be used in multiple threads, all local variables bound in the closure when it's created at runtime are protected by the mutual-exclusion lock to ensure consistency and atomicity.

Therefore I simply applied this same approach to local variables that are referenced and removed all restrictions on the use of the lvalue reference operator ('\') in Qore.

The resulting code is cleaner and more consistent, and I actually found a segfault-inducing memory error with the old cludgy implementation at the same time.

The above code now reads:
# finalize last order - set numberofitems
reference orders = \recs[recs.size() - 1].eventdata.ufulfildespatchconfirm.order;
reference lord = \orders[orders.size() - 1];
lord.numerofitems = lord.items.size();
orders += get_order(e, h);

That's two lines longer than the first one without references, but much easier to read, understand, and maintain.

Qore 0.8.5 should be out before too long; I mainly want to get it out so I can update all the binary modules; I discovered a bug in a new library API that is only used by modules built with qpp, so I want to get Qore 0.8.5 out relatively quickly and then update all the binary modules in common use as well.

This feature is already in svn and looks to be stable; the only other feature I plan on adding to Qore for 0.8.5 is support for abstract class methods - this way java-style interfaces can be implemented by defining a class with abstract methods; I've been wanting this for a while, and it doesn't look too hard to do, so I hope to get that done in the next few days.

Tuesday, May 22, 2012

Weak and Strong Destructors

Qore has C++-style constructors and destructors (with a different naming convention; Qore uses "constructor" and "destructor" while C++ uses names derived from the class's name), however due to Qore's unique memory-management approach, while destructors mostly appear to work like in C++, there are some differences in the details of the implementation regarding how destructors work on built-in classes (either from the Qore library itself or from classes provided by binary modules).

The main data structure that represents an object in Qore is the C++ class "QoreObject".  Objects of this class have access to their internal state serialized with a mutex (thread-safety is a fundamental design principle of Qore); so for a Qore class implemented only with user code (and not inheriting any built-in classes), after any user destructor method is run, the object is marked as deleted and its internal data structures are cleared and all resources are released back to the operating system.  If any copies of references to the object (Qore is like Java in the sense that passing the object by value or assigning the object to another lvalue actually passes/assigns a copy of a reference to the object) are accessed after the object is deleted, an OBJECT-ALREADY-DELETED exception is raised.

This is fairly straightforward and results in predictable behavior for the programmer.  Where "weak" and "strong" destructors come in is with built-in classes.

When a user class inherits a built-in class, the built-in class's constructor links a C++ object containing the internal state of the built-in class to the Qore-language object (a "QoreObject" data structure, as mentioned above).  Whenever a method of the built-in class is executed on the object, the Qore runtime atomically checks the status of the object to see if it's still valid, then, if so, it finds the built-in C++ object for the built-in class linked to the Qore object (let's call this object a "class state object") and atomically increments its reference count, and then calls the C++ function that implements the method being called with the class state object, a pointer to the "QoreObject" (representing "self") and a (possibly-empty) list of Qore-language arguments to the method call.

When the call is complete, the class state object is dereferenced and the return value of the method call is returned to the caller (or an exception state is returned, if applicable).  If the class state object's reference count reaches zero then the class state object is deleted.  This normally happens immediately (and synchronously) after the destructor is processed (at which time all linked class state objects are dereferenced), however it can happen afterwards with "weak" destructors.

A "weak" destructor for a built-in object is a destructor that does not implement any further serialization or gated state checking when method calls are made; it just relies on Qore's object state checking.  In this case, it's possible for one or more threads to call a method on the object while another thread deletes the object in parallel.  If the built-in class does not implement a "strong" destructor, then the built-in class state object will only be deleted after the destructor has been executed (removing the initial reference count for the class state object) and any in-progress methods terminate, which could be quite a while after the actual object destructor has been called depending on the method.

Most built-in classes in Qore implement "weak" destructors because they are simpler to develop and execute faster at runtime (since there's no additional thread synchronization or gating).  Furthermore in normal use, a "strong" destructor is not functionally necessary for most classes; it's normally not a problem that such a race condition (where methods are in progress while the object is explicitly deleted in a separate thread) does not cause additional exceptions to be thrown.  For example, in Qore the Socket class has a "weak" destructor.  However Qore's Queue class has a strong destructor, and, in Qore 0.8.4, the Program class now has a strong destructor in order to enforce stricter discipline on memory resources used.

The Queue class's state object, for example, is implemented by the C++ class "QoreQueue".  All methods in this object that cause state changes are explicitly protected by a mutex (logically as this is a thread-synchronization and messaging class).  The "strong" destructor grabs the mutex and then checks if other threads are waiting on the Queue (either for reading or writing); if so, an appropriate exception is thrown and the waiting threads are notified that the object has been deleted (and exceptions will be thrown in the waiting threads as well).   The race condition with the destructor described earlier is a serious error with the Queue class, particularly due to its critical role in Qore's threading infrastructure, therefore it has a "strong" destructor.  The same goes for the Mutex, RWLock, Gate classes, etc.

The Program object now has a "strong" destructor so that whenever it is deleted, all its memory is immediately freed, and any objects or code references created in the Program that have been exported out of the Program will cause PROGRAM-ERROR exceptions to be raised if they try to access the already-deleted Program.  This was necessary because otherwise, in a program using lots of dynamically-created Program objects, objects exported from the Programs would cause the parent Program to live for as long as the exported objects even if the Program itself were explicitly deleted if it had a "weak" destructor.

Also on a completely different subject, I implemented support for the "final" keyword when declaring classes and class methods for 0.8.4, which should be the very last feature to go into 0.8.4 before its release, which is now imminent.

Monday, April 30, 2012

Preparing for the 0.8.4 Release

I just finished doing a major rewrite of the internal lvalue handling in Qore.  Basically now most types lvalues are stored in a union which consists of one of a 64-bit int, a double, a bool, or a pointer to a generic Qore value object.

The thing with Qore is that at the beginning, all values were dynamically allocated objects derived from a common virtual base.  This was to allow for atomic reference counts and a copy-on-write approach to managing data, which is very efficient for large data structures.  In this way you can pass a large data structure (such as a hash, list, or object) to a function by value, and the value is only copied if it's changed.  Even then, only the top level of the data structure is copied, because each of the values is also a reference-counted object, so, unless they change as well, they are only copied by reference (meaning a pointer is copied and its reference count is incremented).

However this approach is not efficient for small, discrete values such as integers and floating-point values.  It's even worse for boolean values, which can be stored in as little as 1 bit.

Qore has an optimization for special values like True and False and some others, whereas there is only 1 single value in the Qore library that is not subject to reference counting.  However this is not possible with ints and floats.

The solution that I implemented for lvalues is to use the union as described above; the type of the union is set based on the lvalue's type restriction -- so if you declare a local variable as "int" or "softint", then it will be internally stored and operated on only as an integer (the same with "float" or "softfloat").

This allows Qore to store and operate directly on the base data type, instead of always working with another level of indirection (a pointer to a generic value object) and also eliminates the associated dynamic memory management.   So this approach has both memory and speed benefits.

This work showed me a clear way forward for doing some very cool optimizations in Qore regarding value handling - basically long-term I plan on making all Qore values some sort of union like this, which will allows Qore always in every instance to operate directly on the base data type when possible.  This will be necessary before starting llvm integration as well.

This will be a lot of work and will start some time after the upcoming 0.8.4 release.

I also implemented user thread initialization - you can now set a closure or call reference to be executed any time a new thread is started in the current Program object (or any time another Program object accesses the current Program object in a new thread) - this can be used to initialize thread-local data in the Program.

Also I implemented an optional maximum size for the Queue class - if a maximum size is set then writes to the Queue will block if the Queue already has the maximum allowed number of elements in it.  In this way, Qore Queue objects can be used like a buffered Go channel.

At the moment, Qore 0.8.4 is feature complete and stable in svn, however there's still some more packaging work etc to be done before the actual release, which hopefully will be pretty soon (I'm aiming for sometime in the next 2 weeks).

Saturday, April 21, 2012

User Modules

I've recently committed support for user modules in Qore; this will allow the language to be extended in an organized and predictable way with Qore-language code.

Before this was only possible with modules written in C++.

The current documentation for user modules is online here: http://qore.org/manual/current/lang/html/qore_modules.html#user_modules (note: edited to reflect a perma-link for the user module documentation in the latest qore docs)

User modules have the following features:
  • code embedding safety: modules work with Qore's functional domain permission/protection framework so that embedded code can only use modules that use authorized functionality
  • encapsulation: only symbols marked as public are exported; everything else is private to the module
  • uniqueness: multiple pieces (source files) can "require" a module safely - also when embedding Qore code, when multiple Programs use a module, there is only one copy of the module and of its private data (single global state)
Note that also there has been a nearly complete rewrite of the namespace code and handling to facilitate user modules - particularly to enable public and private symbols in modules. For example, now global variables are also contained in namespaces (hence it's possible to have more than 1 copy of a global variable with the same name in different namespaces).

The next step will be to integrate a separate program called "qdx" which converts Qore code to a c++-like format for doxygen parsing so that doxygen documentation can be generated from Qore modules and those can be integrated into Qore's reference documentation (at least for the modules that will be shipped with Qore - this will be the start of Qore's Qore-language runtime library).

I have already added a couple of user modules to the Qore source in svn (HttpServer.qm and Smtp.qm) and updated the build and packaging code accordingly.

The new directory location for the runtime library is the Qore version string as a subdirectory of "qore-modules" (where binary/c++ modules are installed). For example on UNIX this might be:

/usr/lib64/qore-modules/0.8.4

This directory is automatically added to the QORE_MODULE_DIR search path.

I hope this will enable more collaboration to be made on Qore and of course for the language to be more transparent and useful for more people.

Monday, February 27, 2012

Qore Plus Plus

I've implemented a Qore Pre-Processor (qpp) for writing language bindings - that is the c++ code that binds the internal class and function implementations to the Qore language.

The problem was that the language bindings were complex and error-prone, and, while I could make the language bindings easier to use, I think the qpp solution is better, because with qpp:
  • the language bindings can be abstracted from the actual/current implementation
  • language documentation from doxygen-style comments can be generated directly from the language's source code
  • a great deal of repetitive code can be generated automatically
  • (later) the documentation comments can be incorporated internally to provide additional information in reflection-like internal and external APIs
Currently qpp processes "qpp" files to 2 targets - cpp (the Qore language c++ binding files) and dox.h files (the Doxygen source files).

In the qpp file, function, constant, and classes are defined in a Qore-like syntax (with some additional information for internal tags, functional domains, etc). The bodies of each function or class method are then written in C++ (hence Qore Plus Plus).

For example, here is the qpp implementation of the Dir::path() method:
/! Returns the path of the Dir object or @ref nothing if no path is set
/** This path does not necessarily need to exist; the path is adjusted to remove \c "." and \c ".." from the path if present

   @return the path of the Dir object or @ref nothing if no path is set

   @par Example:
   @code
my *string $mypath = $d.path();
   @endcode
*/
*string Dir::path() {
  return d->dirname();
}
The current pre-release documentation based on Doxygen for Qore 0.8.4 can be found here: http://qore.org/manual/qore-0.8.4/lang/html/index.html

I won't be able to get to many optimizations using qpp in this release, because first I want to clean up the namespace code and some related changes. However, qpp lays the groundwork for making easy infrastructure changes to Qore in the future - we'll be able to implement new solutions in Qore and then apply them globally to all language bindings by extending qpp.