.TH "md_doc_help_elektra-data-structures" 3elektra "Sun May 29 2016" "Version 0.8.14" "Elektra" \" -*- nroff -*- .ad l .nh .SH NAME md_doc_help_elektra-data-structures \- elektra-data-structures(7) -- data structures For an introduction, please \fBread first about elektra classes\fP\&. You might want to read \fBabout architecture first\fP\&. .PP Data structures define the common layer in Elektra\&. They are used to transfer configuration between Elektra and applications, but also between plugins\&. .PP .SS "ADT" .PP Both the \fCKeySet\fP and the interface to metadata within a \fCKey\fP are actually \fBADT\fP (abstract data types)\&. The API is designed so that different implementations of the data structures can be used internally\&. .PP A \fBhash\fP data structure presents a good candidate as alternative data structure, especially for the metadata interface\&. It is believed to be much faster on lookup, but considerably slower on sorted enumeration\&. .PP A \fBAVL tree\fP also serve as a competitor\&. AVL trees are expected to be faster for inserting keys at any place, but may be slower for appending because of the needed reorganisations\&. Their disadvantage is that they need to allocate a large number of small pieces of memory\&. Further investigations, namely implementation and benchmarks, are required to decide\&. .PP A \fBtrie\fP could also be used for lookup of key names\&. It performs well for lookup, but needs more memory allocations\&. .PP Currently the \fCKeySet\fP is implemented as a sorted array\&. It is fast on appending and iterating, and has nearly no size-overhead\&. To improve the lookup-time, an additional \fBhash\fP will be used\&. .PP .SS "ABI compatibility" .PP Application binary interface, or ABI, is the interface to all data structures of an application or library directly allocated or accessed by the user\&. .PP Special care has been taken in Elektra to support all changes within the data structures without any ABI changes\&. ABI changes would entail the recompilation of applications and plugins using Elektra\&. The functions \fC\fBkeyNew()\fP\fP, \fC\fBksNew()\fP\fP and \fC\fBkdbOpen()\fP\fP allocate the data structures for the applications\&. The user only gets pointers to them\&. It is not possible for the user to allocate or access these data structures directly when only using the public header file \fC\fP\&. The functions \fC\fBkeyDel()\fP\fP, \fC\fBksDel()\fP\fP and \fC\fBkdbClose()\fP\fP free the resources after use\&. Using the C++ binding deallocation is done automatically\&. .PP .SS "Meta data" .PP Read \fBhere\fP\&. .PP .SS "KeySet" .PP This subsection describes what has changed between 0\&.7 and 0\&.8 and deals with some general implementation issues\&. .PP .SS "Operations" .PP \fCKeySet\fP resembles the classical mathematical set\&. Operations like union, intersection or difference are well defined\&. In mathematics typically every operation yields a new set\&. Instead, we try to reuse sets in the following ways: .PP .IP "\(bu" 2 A completely new and independent \fCKeySet\fP as return value would resemble the mathematical ideal closely\&. This operation would be expensive\&. Every \fCKey\fP needs to be duplicated and inserted into a new \fCKeySet\fP\&. .PP Such a \fBdeep duplication\fP was only needed in \fC\fBkdbSet()\fP\fP\&. .IP "\(bu" 2 The resulting \fCKeySet\fP is created during the operation, but only a flat copy is made\&. This means that the keys in it are actually not duplicated, but only their reference counter is increased\&. This method is similar to the mathematical model\&. Compared with a deep copy it can achieve good performance\&. But all changes to the values of keys in the resulting \fCKeySet\fP affect the original \fCKeySet\fP, too\&. .PP \fC\fBksDup(const KeySet *source)\fP\fP produces a new \fCKeySet\fP that way\&. The \fCsource\fP is not changed as shown by the \fCconst\fP modifier\&. .IP "\(bu" 2 The result of the operation is applied to the \fCKeySet\fP passed as argument directly\&. This is actually quite common, but for this situation other names of the operations are more suitable\&. .PP For example, a union which changes the \fCKeySet\fP is called \fC\fBksAppend()\fP\fP\&. .IP "\(bu" 2 A new \fCKeySet\fP is created, but the \fCKeySet\fP passed as parameter is reduced by the keys needed for the new \fCKeySet\fP\&. This is useful in situations where many operations have to be applied in a sequence reducing the given \fCKeySet\fP until no more keys are left\&. None of the reference pointers changes in this situation\&. .PP \fC\fBksCut(KeySet *ks, const Key *cutpoint)\fP\fP works that way\&. All keys below the \fCcutpoint\fP are moved from \fCks\fP to the returned key set\&. .PP .PP .SS "Consistency" .PP There are several ways to define consistency relations on key sets\&. For \fBstrict consistency\fP every parent key must exist before the user can append a key to a key set\&. For example, the key set with the keys .PP .nf system system/elektra system/elektra/mountpoints .fi .PP .PP would allow the key \fCsystem/elektra/mountpoints/tcl\fP to be added, but not the key \fCsystem/apps/abc\fP because \fCsystem/apps\fP is missing\&. File systems enforce this kind of consistency\&. .PP These semantics are however not useful for configurations\&. Especially for user configurations often only some keys need to be overwritten\&. It is not a good idea to copy all parent keys to the users configuration\&. For this reason we use a less strict definition of consistency supporting such holes\&. .PP We also evaluated a form of \fBweak consistency\fP\&. It avoids adding some unnecessary keys\&. A constraint is that a key can be added only if it has a parent key\&. But the constraint does not apply if no other key exists above the key about to be inserted\&. From that moment it will serve as parent key for other keys\&. With the current implementation of \fCKeySet\fP, however, it is not possible to decide this constraint in constant time\&. Instead its worst-case complexity would be $log(n) * x$ where $n$ is the number of keys currently in the key set and $x$ is the \fIdepth\fP of the key\&. The depth is the number of \fC/\fP in the key name\&. The worst-case of the complexity applies when the inserting works without a parent key\&. For example, with the keys .PP .nf user/sw/apps/abc/current/bindings user/sw/apps/abc/current/bindings/key1 user/sw/apps/abc/current/bindings/key2 .fi .PP .PP the weak consistency would allow inserting \fCuser/sw/apps/abc/current/bindings/key3\fP because it is directly below an existing key\&. It would also allow adding \fCuser/sw/apps/xyz/current\fP because it does not have any parent key\&. But it would not allow \fCuser/sw/apps/abc/current/bindings/dir/key1\fP to add\&. The worst-case complexity was found to be too expensive, and hence \fCKeySet\fP has \fBno consistency\fP check at all\&. .PP This means any key with a valid key name can be inserted into \fCKeySet\fP\&. The \fCKeySet\fP is changed so that it is now impossible to append keys without a name\&. \fCksAppendKey(ks, Key *toAppend)\fP takes ownership of the key \fCtoAppend\fP and will delete the key in that case\&. The caller does not have to free \fCtoAppend\fP: either it is in the key set or it is deleted\&. .PP Binary search determines the position where to insert a key\&. The C version of binary search \fCbsearch()\fP cannot tell where to insert a key when it is not found\&. So the algorithm has to be reimplemented\&. Java's binary search \fCbinarySearch()\fP uses a trick to both indicate where a key is found and where to insert it with the same return code by returning the negative value \fC((-insertion point) - 1)\fP indicating where the new value should be inserted when the key is not found\&. Elektra now also uses this trick internally\&. .PP .SS "Internal Cursor" .PP \fCKeySet\fP supports an \fBexternal iterator\fP with the two functions \fC\fBksRewind()\fP\fP to go to the beginning and \fC\fBksNext()\fP\fP to advance the \fIinternal cursor\fP to the next key\&. This side effect is used to indicate a position for operations on a \fCKeySet\fP without any additional parameter\&. This technique is comfortable to see which key has caused an error after an unsuccessful key database operation\&. .PP Elektra only has some functions to change the cursor of a key set\&. But these allow the user to compose powerful functions\&. Plugins do that extensively as we will see later in \fCksLookupRE()\fP\&. The user can additionally write more such functions for his or her own purposes\&. To change the internal cursor, it is sufficient to iterate over the \fCKeySet\fP and stop at the wanted key\&. With this technique, we can, for example, realise lookup by value, by specific metadata and by parts of the name\&. Without an additional index, it is not possible that such operations perform more efficiently than by a linear iteration key by key\&. For that reason, Elektra's core does not provide such functions\&. The function \fC\fBksLookupByName()\fP\fP, however, uses the more efficient binary search because the array inside the \fCKeySet\fP is ordered by name\&. .PP .SS "External Cursor" .PP External cursor is an alternative to the approach explained above\&. Elektra provides a limited \fIexternal cursor\fP through the interface \fC\fBksGetCursor()\fP\fP and \fC\fBksSetCursor()\fP\fP\&. It is not possible to advance this cursor\&. The difference to the internal cursor is that the position is not stored within \fCKeySet\fP\&. .PP We considered providing an external cursor for performance reasons\&. But we found out that the speed of iteration is mainly limited because of safety checks\&. The investigated methods become slower proportional to the ease of use and safety\&. When using null pointers and range checks, the function is noticeably slower than without\&. With the same amount of checks, using an external cursor is not much faster than the \fC\fBksNext()\fP\fP\&. External cursor with checks is in a benchmark about 10% faster\&. .PP But an external cursor directly accessing the array can be much faster\&. Using an unchecked external cursor can be about 50% faster than using the internal cursor with \fBksNext()\fP\&. For this endeavour, Elektra's private header files need to be included\&. Including private header files, however, should not be done with levity because ABI compatibility will be gone on any changes of the data structures\&. This fact means the application or plugin needs to be recompiled when any of the internal structures of Elektra are changed\&. We strongly discourage including these header files\&. .PP Nevertheless, the overall performance impact for iteration is minimal and should not matter too much\&. Even if only a single \fC\fBkeySetMeta()\fP\fP is used inside the iteration loop, the iteration costs are insignificant\&. Only for trivial actions such as just changing a variable, counter or marker for every key the iteration costs become the lion's share\&. In such situations an \fIinternal iterator\fP yields better results\&. For example, \fCksForEach()\fP applies a user defined function for every key in a \fCKeySet\fP without having null pointer or out of range problems\&. .PP .SS "Trie vs\&. Split" .PP Up to now, we have discussed external data structures visible to the user of the library\&. The application and plugin programmer needs them to access configuration\&. Last, but not least, we will show two internal data structures\&. The user will not see them\&. To understand the algorithm, however, the user needs to understand them as well\&. .PP .SS "Trie" .PP \fITrie\fP or prefix tree is an ordered tree data structure\&. In Elektra, it provides the information to decide in which backend a key resides\&. The algorithm, presented in \fBalgorithm\fP, also needs a list of all backends\&. The initial approach was to iterate over the \fCTrie\fP to get a list of all backends\&. But the transformation of a \fCTrie\fP to a list of backends, contained many bugs caused by corner cases in connection with the default backend and cascading mount points\&. .PP .SS "Split" .PP So, instead of transforming the trie to a list of backends, we introduced a new data structure [Split\\lstinline{Split}]{Split}\&. The name \fCSplit\fP comes from the fact that an initial key set is split into many key sets\&. These key sets are stored in the \fCSplit\fP object\&. \fCSplit\fP advanced to the central data structure for the algorithm: .PP .nf typedef struct _Split Split; struct _Split { size_t size; size_t alloc; KeySet **keysets; Backend **handles; Key **parents; int *syncbits; }; .fi .PP .PP The data structure \fCSplit\fP contains the following fields: .PP .IP "\(bu" 2 \fBsize\fP: contains the number of key sets currently in \fCSplit\fP\&. .IP "\(bu" 2 \fBalloc\fP: allows us to allocate more items than currently in use\&. .IP "\(bu" 2 \fBkeysets\fP represents a list of key sets\&. The keys in one of the key sets are known to belong to a specific backend\&. .IP "\(bu" 2 \fBhandles\fP: contains a list of handles to backends\&. .IP "\(bu" 2 \fBparents\fP: represents a list of keys\&. Each \fCparentKey\fP contains the root key of a backend\&. No key of the respective key set is above the \fCparentKey\fP\&. The key name of \fCparentKey\fP contains the mount point of a backend\&. The resolver writes the file name into the value of the \fCparentKey\fP\&. .IP "\(bu" 2 \fBsyncbits\fP: are some bits that can be set for every backend\&. The algorithm uses the \fCsyncbits\fP to decide if the key set needs to be synchronised\&. .PP .PP Continue reading \fBwith the error handling\fP\&.