Saturday, October 18, 2014

Exception-Safe Programming, RAII, and Deterministic GC

I'd like to make a follow-up post with more information on Qore's new Deterministic Garbage Collection support including more details and more information on why it's important for the language.

First of all regarding the language. deterministic garbage collection was always a goal of Qore's because of it's focus on the RAII idiom (wikipedia) for exception-safe programming and resource management (I also found the following very interesting link on this subject:

Some examples of RAII in Qore are (a subset of possible examples):
  • the AutoLock class releases the Mutex in the destructor (this class is designed to be used with scope-bound exception-safe resource management)
  • the Datasource class closes any open connection in the destructor, and, if a transaction is still in progress, the transaction is rolled back automatically and an exception is thrown before the connection is closed
  • the File class closes the file in the destructor if it's open
  • the RWLock class throws an exception if destroyed while any thread is still holding the lock; note that in this case the underlying object is only destroyed when all threads holding locks have released their locks; this is handled with Qore's thread resource handling and strong references to the underlying RWLock object while thread resources are held; thread resource handling is another potential topic for a blog post
  • the Socket class first shuts down any TLS/SSL connection and then closes the connection in the destructor if it's open
  • the ThreadPool class detaches all currently in-progress worker threads, cancels pending tasks not yet executed (by calling their cancellation closure, if any), terminates the worker thread and destroys the thread pool
Basically one of Qore's primary design points is to free Qore programmers from worrying about the details of memory and resource management; the use of the RAII idiom is a large part of this; also above you can see an example of negative feedback provided to programmers when mistakes are made - deleting a RWLock object while a lock is held (note that there is also scope-related resource management support in Qore in the form of the on_exit, on_success, and on_error statements).

Therefore since support for the RAII idiom is a critical feature of Qore's design, the language should always guarantee that objects are destroyed and therefore their resources are managed and associated memory freed when objects go out of scope.  This is tricky when there are circular references involved in the objects, particularly since Qore uses a reference-counted solution with atomic references due to its multi-threaded nature.  Consider the following post on this subject regarding .NET:  This gives a lot of background to the same problem that Qore (and Java) have for providing for deterministic garbage collection.

Basically to summarize, RAII is not supported in Java and .NET because it's hard to do.  Consider also VB6, basically if you have recursive object references in VB6, those objects will never be destroyed or collected by the system; the programmer has to delete objects with recursive references manually in order to destroy them.  This is basically the same situation as Qore before the introduction of deterministic GC.

The current status of the deterministic GC support in Qore svn is stable; it appears to finally be working in large complex object-oriented multi-threaded programs with good performance and (at the moment - knock on wood) free of deadlocks.  In another post I described the high-level approach with emphasis on the deadlock-avoidance approach since this was one of the trickiest parts to solve.  I'd like to provide more detail here on the recursive graph detection and recursive reference algorithm.

To find recursive graphs and calculate the number of recursive references for each member of the graph, Qore does a scan of some starting object.  Note that the recursive graph detection algorithm has to produce the same results for any given recursive graph independently of the starting node in the graph.  So from a high level, Qore scans through all reachable objects from a start object, and, when a recursive reference is found, increments the recursive count for that object (in fact this is done in a transactional way where all changes are committed atomically to the entire recursive graph at the end of the transaction or rolled back in case of certain kinds of lock contention in order to avoid deadlocks, but this is described in my previous blog post).  If a "path" is found from any given object to itself, the recursive count is set to one for all elements of the path, and any recursive reference then has its recursive count incremented.  Due to the recursive nature of the algorithm, during path detection new elements can be added to the recursive set while scanning an object, so processing the path has to also take this into account by checking if parent objects are reachable from existing recursive sets and must process their recursive counts appropriately (for example, if, while processing a path, one of the objects in the path already has a non-zero pending recursive count, then if the preceding object in the path was not already in the set, then its pending recursive count is incremented, otherwise it is not).  Additionally, if any predecessor in the current path of the current recursive node having a different recursive set is reachable from the current recursive set, then the recursive sets are merged into one.

STL sets are used to provide for O(ln(n)) searches of recursive sets found in the scan, however this may not be the ideal algorithm for small sets due to the overhead in managing red-black balanced binary trees normally used to implement sets in STL.  Also these sets are iterated several times, so fast iterator performance is important.  I believe that some analysis of the usage of data structures in the recursive scan could provide some optimizations.

Recursive scans are performed when changes are made to objects that could potentially create a new recursive set or change an existing recursive set.   The approach also features several checks designed to limit the cases where the recursive graph detection algorithm needs to be applied; I believe more cases can be found to further improve performance.

I have not made a systematic performance test of Qore with deterministic garbage collection enabled (it's currently enabled by default in Qore svn trunk), but from subjective testing on existing code, it seems fast.

You can see if deterministic garbage collection is enabled in Qore by typing qore -V; you should see a line like this if it's enabled:
version 0.8.12-6768 (builtin features: sql, threads, DGC)

If you see "DGC" in the output, then it's supported, otherwise not (currently requires svn trunk - 0.8.12 has not yet been released).

In Qore code, there is a parse define, HAVE_DETERMINISTIC_GC, that can be used to optionally parse code depending on the presence of deterministic garbage collection or not; for example, there are now regression tests in Qore that are executed when deterministic garbage collection is present.

Tuesday, September 30, 2014

Deterministic Garbage Collection

Qore has had a big design problem in its memory management regarding object collection; basically it was possible to make circular references to objects, and those objects would not be destroyed automatically, resulting in a memory leak.

The reason for this is that Qore used a simple strong reference count for object scope, so if object A pointed to object B, object B pointed to object A, then a circular reference would exist and neither object would be destroyed (or collected) when the objects would otherwise go out of scope.  Note that Qore treats objects like Java does in that objects are always passed as a copy of a reference to the object instead of by value.  Weak references to objects are also supported, but weak object references do not affect object collection (ie when the destructor is run on the object); normally the object's destructor is run on the object only when the strong reference count reaches zero (or the object is manually deleted with the delete operator).

Also due to Qore's multi-threaded nature, objects can go out of scope at any time since they can be deleted in another thread at any time.  Qore objects are thread-safe and references to an objects are wrapped in a read-write lock - as are all shared lvalues in Qore - basically access to all lvalues in Qore are wrapped in read-write locks except accesses to "pure" local variables, which are local variables where references are not taken of them and also not used in a closure, either of which case makes it possible to use the local variable in another thread and therefore causes the local variable to be created specially so that accesses are wrapped in read-write locks.

I considered using something like Java's garbage collection approach with a dedicated thread that would scan and collect objects, but I always wanted to do a deterministic garbage collector so that Qore's resource management approach with objects could be applied deterministically.  I finally have an initial working implementation of a deterministic garbage collection algorithm for Qore that is thread-safe, does not require a dedicated collector thread, has a solid deadlock-avoidance mechanism, and exhibits acceptable performance (which I expect can be further improved).

With this new approach, objects are collected immediately when they only have strong references that belong to a recursive cycle; so resources managed by the object are released in a deterministic way even when the objects participate in a recursive reference cycle.

I would like to describe the algorithm here including the locking approach and the deadlock-avoidance mechanism.

Basically the idea is to determine the number of strong references to an object that belong to a circular reference.  In fact, it is more complicated than this, because you have to consider the entire recursive directed graph as a whole and not a single object.  The recursive directed graph in this case is the set of all objects participating in the recursive reference chain.  If any object is reachable in the chain and also contains link(s) (ie members) to other objects in the chain, then it is a member of the recursive directed graph.  So the idea is to maintain the recursive strong reference count of every object in the recursive directed graph and then, when any member of the graph is strongly dereferenced, then the entire graph is checked to see if it can be collected, meaning that each member of the graph is checked to see if the strong reference count is equal to its recursive reference count.  If any single member of the graph has non-recursive references, then no object in the graph can be collected; only when the strong reference count equals the recursive reference count for all objects in the graph can the objects in the graph be collected.

This is accomplished by performing a scan of all reachable objects whenever a potentially-relevant change is made to an object.  In this case the recursive reference counts are calculated for all objects in the graph.

To accomplish this, each object is locked specially.  In fact objects now have a special form of read-write lock that includes a special form of the read lock called an rsection lock.  The rsection lock in a Qore object is a read lock that is unique in that first the read lock is acquired and then only one thread can hold the rsection lock at a time.  This allows objects to be scanned for recursive graphs while also allowing them to be read in other threads to maximize concurrency.  Additionally, since multiple rsection locks are acquired when performing recursive scanning (one for each object) and held for the duration of the scan, and since holding multiple locks could lead to deadlocks, and since this can and is performed in multiple threads simultaneously in multi-threaded object-oriented Qore programs, and since otherwise Qore avoids holding multiple locks simultaneously, the deadlock avoidance approach here is to apply a transaction-handling approach to the rsection scan and if any rsection lock cannot be acquired, the transaction is rolled back and we wait for a confirmation from the other thread that the contentious rsection lock has been released.  Also a transaction counter in each object is maintained, and, after waiting for a contentious rsection lock to be released, we see that the transaction count for the root object has been changed, then we know that the rsection scan has already taken place, so we exit the scan immediately.

Basically all changes to objects are stored in temporary data structures and then only committed to the objects in the graph if all rsection locks are acquired.

Also all containers (lists, hashes, and objects) contain a reachable object count, which is a sum of the children (list elements, hash keys, or object members) that have at least one object reachable through them.  This turned out to be efficient to calculate.  This allows us to ignore any container that has no reachable objects when performing rsection scans.

Additionally if an rsection scan fails due to lock contention after an lvalue change, the scanned objects are marked with an invalid rsection so that a deterministic scan is made on the next strong dereference.  When performing rsection scans due to a strong dereference, the scan is repeated after an rsection rollback until it is successful to guarantee deterministic collection when only recursive references remain.

There is still certainly a lot of room for improvement to this algorithm.  For example, the rsection transaction could be compared to any existing rsection graph and left in place if identical results are found, or possibly a delta operation could be performed on an existing rsection graph.

While this algorithm is complicated, the goal of achieving deterministic garbage collection is a valid one in my opinion.

Knowing exactly when your objects will be collected even in the case of participation in recursive directed graphs of strong references provides an advantage to Qore programmers regarding resource management with objects.

Currently deterministic garbage collection is enabled by default in Qore trunk, and I plan on continuing to work on it.

Feedback on this subject would be appreciated.