.TH "md_doc_help_elektra-data-structures" 3elektra "Sun May 29 2016" "Version 0.8.14" "Elektra" \" -*- nroff -*-
.ad l
.nh
.SH NAME
md_doc_help_elektra-data-structures \- elektra-data-structures(7) -- data structures 
For an introduction, please \fBread first about elektra classes\fP\&. You might want to read \fBabout architecture first\fP\&.
.PP
Data structures define the common layer in Elektra\&. They are used to transfer configuration between Elektra and applications, but also between plugins\&.
.PP
.SS "ADT"
.PP
Both the \fCKeySet\fP and the interface to metadata within a \fCKey\fP are actually \fBADT\fP (abstract data types)\&. The API is designed so that different implementations of the data structures can be used internally\&.
.PP
A \fBhash\fP data structure presents a good candidate as alternative data structure, especially for the metadata interface\&. It is believed to be much faster on lookup, but considerably slower on sorted enumeration\&.
.PP
A \fBAVL tree\fP also serve as a competitor\&. AVL trees are expected to be faster for inserting keys at any place, but may be slower for appending because of the needed reorganisations\&. Their disadvantage is that they need to allocate a large number of small pieces of memory\&. Further investigations, namely implementation and benchmarks, are required to decide\&.
.PP
A \fBtrie\fP could also be used for lookup of key names\&. It performs well for lookup, but needs more memory allocations\&.
.PP
Currently the \fCKeySet\fP is implemented as a sorted array\&. It is fast on appending and iterating, and has nearly no size-overhead\&. To improve the lookup-time, an additional \fBhash\fP will be used\&.
.PP
.SS "ABI compatibility"
.PP
Application binary interface, or ABI, is the interface to all data structures of an application or library directly allocated or accessed by the user\&.
.PP
Special care has been taken in Elektra to support all changes within the data structures without any ABI changes\&. ABI changes would entail the recompilation of applications and plugins using Elektra\&. The functions \fC\fBkeyNew()\fP\fP, \fC\fBksNew()\fP\fP and \fC\fBkdbOpen()\fP\fP allocate the data structures for the applications\&. The user only gets pointers to them\&. It is not possible for the user to allocate or access these data structures directly when only using the public header file \fC<kdb\&.h>\fP\&. The functions \fC\fBkeyDel()\fP\fP, \fC\fBksDel()\fP\fP and \fC\fBkdbClose()\fP\fP free the resources after use\&. Using the C++ binding deallocation is done automatically\&.
.PP
.SS "Meta data"
.PP
Read \fBhere\fP\&.
.PP
.SS "KeySet"
.PP
This subsection describes what has changed between 0\&.7 and 0\&.8 and deals with some general implementation issues\&.
.PP
.SS "Operations"
.PP
\fCKeySet\fP resembles the classical mathematical set\&. Operations like union, intersection or difference are well defined\&. In mathematics typically every operation yields a new set\&. Instead, we try to reuse sets in the following ways:
.PP
.IP "\(bu" 2
A completely new and independent \fCKeySet\fP as return value would resemble the mathematical ideal closely\&. This operation would be expensive\&. Every \fCKey\fP needs to be duplicated and inserted into a new \fCKeySet\fP\&.
.PP
Such a \fBdeep duplication\fP was only needed in \fC\fBkdbSet()\fP\fP\&.
.IP "\(bu" 2
The resulting \fCKeySet\fP is created during the operation, but only a flat copy is made\&. This means that the keys in it are actually not duplicated, but only their reference counter is increased\&. This method is similar to the mathematical model\&. Compared with a deep copy it can achieve good performance\&. But all changes to the values of keys in the resulting \fCKeySet\fP affect the original \fCKeySet\fP, too\&.
.PP
\fC\fBksDup(const KeySet *source)\fP\fP produces a new \fCKeySet\fP that way\&. The \fCsource\fP is not changed as shown by the \fCconst\fP modifier\&.
.IP "\(bu" 2
The result of the operation is applied to the \fCKeySet\fP passed as argument directly\&. This is actually quite common, but for this situation other names of the operations are more suitable\&.
.PP
For example, a union which changes the \fCKeySet\fP is called \fC\fBksAppend()\fP\fP\&.
.IP "\(bu" 2
A new \fCKeySet\fP is created, but the \fCKeySet\fP passed as parameter is reduced by the keys needed for the new \fCKeySet\fP\&. This is useful in situations where many operations have to be applied in a sequence reducing the given \fCKeySet\fP until no more keys are left\&. None of the reference pointers changes in this situation\&.
.PP
\fC\fBksCut(KeySet *ks, const Key *cutpoint)\fP\fP works that way\&. All keys below the \fCcutpoint\fP are moved from \fCks\fP to the returned key set\&.
.PP
.PP
.SS "Consistency"
.PP
There are several ways to define consistency relations on key sets\&. For \fBstrict consistency\fP every parent key must exist before the user can append a key to a key set\&. For example, the key set with the keys 
.PP
.nf
    system
    system/elektra
    system/elektra/mountpoints

.fi
.PP
.PP
would allow the key \fCsystem/elektra/mountpoints/tcl\fP to be added, but not the key \fCsystem/apps/abc\fP because \fCsystem/apps\fP is missing\&. File systems enforce this kind of consistency\&.
.PP
These semantics are however not useful for configurations\&. Especially for user configurations often only some keys need to be overwritten\&. It is not a good idea to copy all parent keys to the users configuration\&. For this reason we use a less strict definition of consistency supporting such holes\&.
.PP
We also evaluated a form of \fBweak consistency\fP\&. It avoids adding some unnecessary keys\&. A constraint is that a key can be added only if it has a parent key\&. But the constraint does not apply if no other key exists above the key about to be inserted\&. From that moment it will serve as parent key for other keys\&. With the current implementation of \fCKeySet\fP, however, it is not possible to decide this constraint in constant time\&. Instead its worst-case complexity would be $log(n) * x$ where $n$ is the number of keys currently in the key set and $x$ is the \fIdepth\fP of the key\&. The depth is the number of \fC/\fP in the key name\&. The worst-case of the complexity applies when the inserting works without a parent key\&. For example, with the keys 
.PP
.nf
    user/sw/apps/abc/current/bindings
    user/sw/apps/abc/current/bindings/key1
    user/sw/apps/abc/current/bindings/key2

.fi
.PP
.PP
the weak consistency would allow inserting \fCuser/sw/apps/abc/current/bindings/key3\fP because it is directly below an existing key\&. It would also allow adding \fCuser/sw/apps/xyz/current\fP because it does not have any parent key\&. But it would not allow \fCuser/sw/apps/abc/current/bindings/dir/key1\fP to add\&. The worst-case complexity was found to be too expensive, and hence \fCKeySet\fP has \fBno consistency\fP check at all\&.
.PP
This means any key with a valid key name can be inserted into \fCKeySet\fP\&. The \fCKeySet\fP is changed so that it is now impossible to append keys without a name\&. \fCksAppendKey(ks, Key *toAppend)\fP takes ownership of the key \fCtoAppend\fP and will delete the key in that case\&. The caller does not have to free \fCtoAppend\fP: either it is in the key set or it is deleted\&.
.PP
Binary search determines the position where to insert a key\&. The C version of binary search \fCbsearch()\fP cannot tell where to insert a key when it is not found\&. So the algorithm has to be reimplemented\&. Java's binary search \fCbinarySearch()\fP uses a trick to both indicate where a key is found and where to insert it with the same return code by returning the negative value \fC((-insertion point) - 1)\fP indicating where the new value should be inserted when the key is not found\&. Elektra now also uses this trick internally\&.
.PP
.SS "Internal Cursor"
.PP
\fCKeySet\fP supports an \fBexternal iterator\fP with the two functions \fC\fBksRewind()\fP\fP to go to the beginning and \fC\fBksNext()\fP\fP to advance the \fIinternal cursor\fP to the next key\&. This side effect is used to indicate a position for operations on a \fCKeySet\fP without any additional parameter\&. This technique is comfortable to see which key has caused an error after an unsuccessful key database operation\&.
.PP
Elektra only has some functions to change the cursor of a key set\&. But these allow the user to compose powerful functions\&. Plugins do that extensively as we will see later in \fCksLookupRE()\fP\&. The user can additionally write more such functions for his or her own purposes\&. To change the internal cursor, it is sufficient to iterate over the \fCKeySet\fP and stop at the wanted key\&. With this technique, we can, for example, realise lookup by value, by specific metadata and by parts of the name\&. Without an additional index, it is not possible that such operations perform more efficiently than by a linear iteration key by key\&. For that reason, Elektra's core does not provide such functions\&. The function \fC\fBksLookupByName()\fP\fP, however, uses the more efficient binary search because the array inside the \fCKeySet\fP is ordered by name\&.
.PP
.SS "External Cursor"
.PP
External cursor is an alternative to the approach explained above\&. Elektra provides a limited \fIexternal cursor\fP through the interface \fC\fBksGetCursor()\fP\fP and \fC\fBksSetCursor()\fP\fP\&. It is not possible to advance this cursor\&. The difference to the internal cursor is that the position is not stored within \fCKeySet\fP\&.
.PP
We considered providing an external cursor for performance reasons\&. But we found out that the speed of iteration is mainly limited because of safety checks\&. The investigated methods become slower proportional to the ease of use and safety\&. When using null pointers and range checks, the function is noticeably slower than without\&. With the same amount of checks, using an external cursor is not much faster than the \fC\fBksNext()\fP\fP\&. External cursor with checks is in a benchmark about 10% faster\&.
.PP
But an external cursor directly accessing the array can be much faster\&. Using an unchecked external cursor can be about 50% faster than using the internal cursor with \fBksNext()\fP\&. For this endeavour, Elektra's private header files need to be included\&. Including private header files, however, should not be done with levity because ABI compatibility will be gone on any changes of the data structures\&. This fact means the application or plugin needs to be recompiled when any of the internal structures of Elektra are changed\&. We strongly discourage including these header files\&.
.PP
Nevertheless, the overall performance impact for iteration is minimal and should not matter too much\&. Even if only a single \fC\fBkeySetMeta()\fP\fP is used inside the iteration loop, the iteration costs are insignificant\&. Only for trivial actions such as just changing a variable, counter or marker for every key the iteration costs become the lion's share\&. In such situations an \fIinternal iterator\fP yields better results\&. For example, \fCksForEach()\fP applies a user defined function for every key in a \fCKeySet\fP without having null pointer or out of range problems\&.
.PP
.SS "Trie vs\&. Split"
.PP
Up to now, we have discussed external data structures visible to the user of the library\&. The application and plugin programmer needs them to access configuration\&. Last, but not least, we will show two internal data structures\&. The user will not see them\&. To understand the algorithm, however, the user needs to understand them as well\&.
.PP
.SS "Trie"
.PP
\fITrie\fP or prefix tree is an ordered tree data structure\&. In Elektra, it provides the information to decide in which backend a key resides\&. The algorithm, presented in \fBalgorithm\fP, also needs a list of all backends\&. The initial approach was to iterate over the \fCTrie\fP to get a list of all backends\&. But the transformation of a \fCTrie\fP to a list of backends, contained many bugs caused by corner cases in connection with the default backend and cascading mount points\&.
.PP
.SS "Split"
.PP
So, instead of transforming the trie to a list of backends, we introduced a new data structure [Split\\lstinline{Split}]{Split}\&. The name \fCSplit\fP comes from the fact that an initial key set is split into many key sets\&. These key sets are stored in the \fCSplit\fP object\&. \fCSplit\fP advanced to the central data structure for the algorithm: 
.PP
.nf
    typedef struct _Split   Split;

    struct _Split {
            size_t size;
            size_t alloc;
            KeySet **keysets;
            Backend **handles;
            Key **parents;
            int *syncbits;
    };

.fi
.PP
.PP
The data structure \fCSplit\fP contains the following fields:
.PP
.IP "\(bu" 2
\fBsize\fP: contains the number of key sets currently in \fCSplit\fP\&.
.IP "\(bu" 2
\fBalloc\fP: allows us to allocate more items than currently in use\&.
.IP "\(bu" 2
\fBkeysets\fP represents a list of key sets\&. The keys in one of the key sets are known to belong to a specific backend\&.
.IP "\(bu" 2
\fBhandles\fP: contains a list of handles to backends\&.
.IP "\(bu" 2
\fBparents\fP: represents a list of keys\&. Each \fCparentKey\fP contains the root key of a backend\&. No key of the respective key set is above the \fCparentKey\fP\&. The key name of \fCparentKey\fP contains the mount point of a backend\&. The resolver writes the file name into the value of the \fCparentKey\fP\&.
.IP "\(bu" 2
\fBsyncbits\fP: are some bits that can be set for every backend\&. The algorithm uses the \fCsyncbits\fP to decide if the key set needs to be synchronised\&.
.PP
.PP
Continue reading \fBwith the error handling\fP\&.