C-Index/II Home Page
|CI2 Home|Literature|Support|Documentation|Y2K|Order|License|


Chapter 7 Advanced Usage of C-Index/II

This chapter contains a discussion of both the advanced features of programming with C-Index/II.

Intermixing Single and Multi-key Functions

The Multi-Key Functions depend exclusively on the Single-Key Functions for their operation. In this sense, the two types of functions are compatible. Under most circumstances, the only reason you may want to mix single-key routines with multi-key routines is to set the integrity level and flush the virtual memory index buffers. Otherwise, it is advisable to use multi-key and single-key routines with separate files in order to avoid conflicts with the operation of the multi-key routines.

It is very possible however, to mix the use of single-key and multi-key functions in one file, provided that certain rules are followed. These rules allow the different functions to be used for the tasks to which each is best suited.

The basic rule to remember when mixing single-key and multi-key routines is that the single-key routines may be used freely except for operations that use:

 

  • any key index specified in a multi-key datalist
  • indexes 15-20 (reserved for multi-key use)
  • the csetrec function (which would confuse record numbering)

 

Using Single-Key Functions for Secondary-Key Access

In most cases, the Multi-Key functions do all that is needed for record management. The Single-Key Functions do have some advantages however, and for certain applications the extra work involved in using single-key functions can be very advantageous.

The single-key functions physically store the data with the keys. For records with one key, it is much faster to store the data record with the key using single-key functions so that reading the data is immediate. Next and previous operations are performed much faster with this method.

For records that are usually retrieved by one "primary" key, or in some cases, by one or more "secondary" keys, the following single-key functions should be used:

 

1. Add the data record with the primary key in index #1. For example, a key value of "JONES" and a record of 0 would be added in the following manner:

 

long recnum; /* for duplicate keys */

int cc; /* return condition code */

 

recnum = cnextrec(&file1);

cc = cdupadd(&file1, 1, "JONES", recnum, datarec, datalen);

 

2. Add the secondary key in index #2. For example, if the record just added has a secondary key value of "NEW YORK" (the city where Jones lives), an entry would be added in index #2 with a key value of "NEW YORK". The data would be the key of index # 1 ("JONES"):

 

cc = cdupadd(&file1,2,"NEW YORK",recnum,"JONES",6);

 

3. When you access the file from index #2, first read the key of "NEW YORK" and then perform a find operation in index #1 with the data portion (which was set up to reflect the key value of "JONES"):

 

char keybuf[10]; /* for found key */

long recnum; /* for found rec # */

char databuf[100]; /* for found data */

 

/* find somebody in new york */

cc = cfind(&file1, 2, "NEW YORK", 0L, keybuf, 10);

if (cc == OK || cc == KEYMATCH) /* found somebody */

{

recnum = file1.rec; /* need for exact match */

cc = cfind(&file1, 1, keybuf, recnum, databuf, 100);

}

 

The last line tells you to look in index #1 for the matching record of the key found in index #2. We use the record number because it identifies the record in index #1 exactly.

Full Tree Deletion

C-Index/II has the ability to reuse nodes that have been made empty through deletion of keys. For example, this allows the program to add any combination of keys, delete all of the keys, and then re-add any other combination of keys without any wasted space being left in the file.

When files are created or opened, the default mode of the file is to NOT perform full deletion (i.e. this feature is disabled). To invoke full tree deletion in your application, add the following two lines after opening or creating a C-Index file:

 

psp->fulldel = TRUE; /* do full tree deletion */

psp->reuse = FALSE; /* do not reuse record numbers */

 

where psp is the parameter structure pointer. The psp->reuse flag set to FALSE indicates that multi-key record numbers should not be reused. The default is for record numbers of deleted records to be reused as a method for reclaiming data space. With full tree deletion turned on, record reuse reduces the performance and space efficiency of the system.

Byte Flipping

Provision has been made to read and write files on a machine with an architecture different than Intel type machines. This involves changing the byte order of shorts, ints, and longs as required by the specific machine reading the file. For example, 68000 and RISC processors use the high to low value byte ordering, whereas x86 (Intel) processors use low to high value byte ordering. C-Index/II data files can be created using any computer, and read on any computer. Byte flipping is automatically performed for all internal C-Index data structures, and for short, int, long, float, and double field and key types.

In order for this processing to work correctly, the C-Index compiler-specific header must have the correct define for INTELFMT setting:

 

#define INTELFMT 0 /* non-Intel processors (High/Low byte order) */

 

or

 

#define INTELFMT 1 /* Intel processors (Low/High byte order) */

 

When the file is created, the application specifies which format the file will use, regardless of the processor it was created on. The correct choice for file type will usually be NATIVEMODE or the mode that matches the processor, INTELMODE for Intel processors, and NONINTELMODE for non-Intel processors. For more details about creating files, see the descriptions of bcreate and dbcreate in the C-Index/II Reference Guide.

Faster Addition of Pre-Sorted keys

C-Index/II is optimized for adding Single-Key entries in sorted order, such as when mastering a large read-only file (like a CDROM). In this situation it uses fewer key comparisons for insertion by employing a modified binary search of the index nodes instead of a linear search. By pre-sorting single-key entries before addition, the application will add entries many times faster over adding keys randomly.

Record Counting Primitives

Two new single-key routines assist in keeping track of the number of records that are stored in the file. These primitives are used by the Multi-Key routines to maintain the number of active records in a file.

 

/* set the count of active records in a file */

int csetcnt(psp, cnt)

CFILE *psp; /* file parameter structure */

long count;

 

/* return the count of active records in a file */

int cgetcnt(psp)

CFILE *psp;

 

The csetcnt function sets a value in the header which can be read using the cgetcnt function. The cgetcnt function returns the record count value in the psp->rec variable.

Setting the Multi-Key Record Number

Normally C-Index/II assigns a record number to each record as it is added with dadd{xe"dadd"}. The application can also override this automatic assignment. Before calling dadd, assign the desired record number value to the psp->setdrec variable. This number is cleared when the file is opened and after each dadd operation. Care must be used to ensure that the record number is unique, otherwise a FAIL error will be returned from dadd. The record number may be any positive number greater than zero. Negative and zero values will be ignored.

Multiple Root Nodes

The default action of C-Index/II is to put all index (and data) information into a single B+Tree structure. This is satisfactory for most simple uses of C-Index. However, when an application starts making more significant demands on the index system, performance and space utilization can degrade. A unique feature of C-Index/II is the ability to specify that more than one B+Tree structure will be used. These additional tree structures are referred to as "multiple roots" or "alternate roots" because each tree structure has its own "root node" (top of the tree).

The following functions can be used to manage index files containing multiple root nodes (separate tree structures in a single file):

 

addroot add alternate root node

droproot drop alternate root node

 

The addroot function is called any time after creating the file. One or more indexes may be placed into an alternate root, provided that there are no entries in that index at the time of calling addroot. If a call to addroot is made when key entries are already in the specified index, those entries will be "lost". In fact, they remain in the file, but cannot be accessed.

The droproot function is called at any time to remove an alternate root tree structure from the file.

In multi-user application, setting the SEMAINDEX to a separate tree structure will improve performance of locking operations. This is because the number of locks on a file tend to be very small, compared to the number of records in the file. By having its own tree, semaphore routines do not need to search as far to find semaphore keys.

When files become very large, placing each index in a separate tree structure may improve performance by improving the buffering of root nodes, and possibly reducing the height of the tree.

When indexes tend to have entries added in order, to the end of the index (such as pre-sorted single-key entries added by a batch process), the space utilization will be optimized when each index is stored in its own tree structure. C-Index/II recognizes addition to the end of the tree structure as being a special case and adjusts its searching and splitting routines to increase insertion performance and space utilization.

When using single-key routines, it may be desirable to remove all the entries in an index in one fast operation. This can be done by placing the index into a separate root tree structure. To remove the index, call the droproot function. This removes all entries and space allocation for this tree structure. Compared to individually deleting each entry, this is a very fast operation.

Fast Next Key Retrieval

Normally an application would use the cnext{xe "cnext"} or dseq{xe "dseq"} functions to locate the next key value in an index. The cnextrep{xe "cnextrep"} and cnextrep2{xe "cnextrep2"} functions allow reading a series of next keys very quickly. This is useful for situations requiring fast searches through a group of contiguous keys, or for filling a user interface list box with a set of selections.

The calling sequence is similar to cnext, with the addition of a callback function to control how many next operations will be performed in repetition:

 

int cnextrep(psp, keyn, data, dlen, nextproc);

 

or

int cnextrep2(psp, keyn, data, dlen, nextproc);

 

where psp is the CFILE parameter structure pointer, keyn is the single-key index number, data is a pointer to the data buffer, dlen is the length of the data buffer, nextproc is a callback function.

 

The calling sequence of the callback function is:

 

int nextproc(psp, keyn, data, dlen);

 

where the parameters passed to the callback function are the same as those passed to the cnextrep or cnextrep2 function. The data found in the cnextrep function is passed to the callback function in the specified work buffer. The key value of the located entry is in the psp->key variable, just as if a call to cnext had been made. The callback function must return non-zero if the processing should continue, or zero if the processing should stop.

The cnextrep and cnextrep2 functions will return the same return codes as cnext. For example, if the function encounters the end of the index, it will return FAIL. Or, if the callback function stops processing in the middle of the index, it will return CCOK.

There are two differences between cnextrep and cnextrep2. The callback function called by cnextrep may not call any C-Index/II functions. Doing so may result in significant corruption of the file. The cnextrep2 function does not have this limitation. Any C-Index/II functions (except cnextrep2, which is not reentrant) may be called from the cnextrep2 callback function. This allows for other reads and writes to the file while inside the callback function. The other difference is that cnextrep2 passes information to the callback starting with the current key location, instead of the next key location. This simplifies many uses where cnextrep requires special work arounds to process the first of a sequence of keys. In most cases you will want to use cnextrep2.

 

Write Queue File Locking

A run-time option controls the order in which processes access a file. After opening the file, the application can set the "writeque" variable in the CFILE parameter structure to TRUE. This tells C-Index to use a queue of write locks to gain access instead of the default single write lock. The write queue technique forces each process to wait for other processes ahead of it in a queue of lock bytes. The process finds the first byte in the write queue (in the FLOCKBYTE region of the header area) which has not been locked. It then repeatedly tries to lock the next higher byte in the write queue. When the next higher byte can be locked, it frees its current lock position. When the process is able to lock the first byte in the write queue, it performs its access to the file.

The benefit of this approach is that it improves the fair access to the file under some conditions. For some operating systems this prevents processes from being locked out of the file for long periods of time under conditions of high file access contention. This is generally true for MSDOS-based LAN systems.

With the default C-Index behavior running under MSDOS, each process will try to lock the single lock byte once every second. If there is high contention to the file, it is possible for a single process to dominate access to the file because it will be able to unlock the file, and then relock the file before any other process has time to retry the lock. The write queue reduces that problem by forcing locking contention to be limited to checking the process ahead in the queue, and by doing the lock test repeatedly, instead of once every second.

The write queue method does not improve performance in all multi-user conditions, and is turned off by default. You need to carefully test this feature for your application to ensure that it provides the desired performance improvements for a particular environment.

User Defined Header Area

Ten bytes have been set aside in the C-Index header area for use by applications. The offset is specified by the define USRBYTE. Applications can write, read and lock bytes in this area of the header without affecting operation of the system. Applications should not read or write any other part of the file directly.

 

To access an open C-Index file, use the file descriptor variable defined in the parameter structure, "psp->fd". For cross-platform compatibility, use the file operations provided in the compiler specific sources (such as ciwrite and ciread in \ci2\msc6\cndxmsc.c or \ci2\unix\cndxunix.c).

Maximum Number of Users

The maximum number of users (processes) that may access a file at any given time is defined by MAXUSERS, which has a default value of 234. In general, C-Index is not designed to efficiently handle over approximately 20 users with light to moderate levels of file access, even though it is possible to have a larger number of users open a file at once. The more users that have opened a file, the longer it will take for a new user to open the file. More importantly, file access speed is reduced when there is significant contention for reading and writing the file. You will need to experiment with your application and computing environment to determine the realistic limit on the number of users who can access the file simultaneously.

Node Size

With changes to the source code, the node size may be changed to any arbitrary size that is at least 1K. The typical application of this feature is in CDROM access where having larger nodes improves access time from the extremely slow optical disk media. To change node size, change the define statement in the compiler specific header file which reads as follows:

 

#define NODESZE 1024

 

For example, to have 8K nodes the define would be changed to read as follows:

 

#define NODESZE 8192

 

Fixed Binary Data Type

C-Index/II includes one special data type to represent fixed length binary data. The FIXIND and FIXFLD types are a fixed length value with the length set when compiling the library. The default value is 6 bytes. The sorting order is controlled by the compiler specific functions cikeycmp and custincrkeyval. The default sorting order assumes that the binary data is a series of unsigned bytes with decreasing significance (high order byte first).

To change the length of the fixed binary value, add a define for FIXINDLEN to the compiler specific header file (such as cndxmsc.h or cndxtc.h). The following example sets the fixed length to 100 bytes:

 

#define FIXINDLEN 100 /* 100 byte fixed length */

 

To change the sorting order of a fixed binary key value, change the code in the compiler specific C file (such as cndxmsc.c or cndxtc.c). The code in the cikeycmp and custincrkeyval functions which refers to FIXIND key type must be changed to reflect the desired type of binary data.

 

ReadShare

ReadShare is an optional compile-time feature designed to improve performance of multi-user access. When multiple processes are reading from a file, they all have access to the file simultaneously when this feature has been enabled. Without ReadShare, only one process has access to the file when reading or writing. With ReadShare, C-Index/II is able to read from the file without locking out other processes, provided that no write operation is pending. Since most applications spend the majority of time reading from a file, this can dramatically improve access times.

The ReadShare feature is not enabled by default in the supplied C-Index/II sources. This is because it requires special processing to handle critical errors. For operating systems which require application specific critical error handling (such as MSDOS), the code supplied in the initcindex function (of the compiler specific code module) will initialize the critical error handling as required for the test routines. However, this may not be the type of error handling desired for your application.

To enable the ReadShare feature, you will need to review the critical error handling code and enable the READSHARE define in the compiler specific header file.

Additionally, it is essential that all applications that access a file use the same method of locking. You cannot mix ReadShare and non ReadShare applications when accessing a file. For this reason it may be desirable to modify the C-Index/II header to enforce access with a new version of the library. This can be accomplished by modifying the crtnwhdr routine in \ci2\src\cindexmu.c and the chkhdr routine in \ci2\src\read.c. These functions set and check bytes in the header to indicate that it is a valid C-Index/II file. The suggestion would be to change the last check byte in the header from a value of '1' to a value of '2'. This will also require writing a simple conversion utility to update existing files in this byte location.

SpeedRead

SpeedRead is a runtime option available to the application to enhance performance. When SpeedRead is turned on, C-Index/II will try to satisfy any read operations from information that is already in the local virtual memory disk buffers, without performing any file locks or checking if the file has been changed on the server. If the entire read operation cannot be processed from disk buffers, it will retry using the usual method (i.e. it will verify the status of the buffers, lock the file for reading, and update the buffers in memory as required).

SpeedRead improves the performance of cnext, cprev, cgetcur, cfind, and dseq. Although the main benefit of SpeedRead is for multi-user applications, it will also improve the performance of single-user file access as well.

An example of the performance improvement can be shown using the testman manual test utility in EXCL mode. With 10,000 entries of 10 bytes each, cnext with SpeedRead turned on is three times faster than cnext with SpeedRead turned off. Alternatively, cnextrep and cnextrep2 are nine times faster than cnext with SpeedRead turned off. Since cnextrep and cnextrep2 keep the file locked while reading, they require extra care when used in a multi-user application. With SpeedRead, however, the application does not need to be concerned with locking out other users, since it does not lock the file at all in most cases. The performance benefit for multi-user access will depend of a number of factors, including the mix of C-Index/II calls being made, and the number of users accessing the file.

For more information, consult the documentation in the Reference Guide for the functions cispeedon and cispeedoff.

Relative Key Positioning

Usually applications need to retrieve records by exact key values. In some cases, however, it may be required to locate a key based on its relative position in the index. Two functions provide the ability to work with indexes by relative key position. The cigetrel function returns a value indicating the relative position of the current key in an index. It is useful for showing relative completion of tasks that traverse an index, such as printing of reports. The cisetrel function will set the current key position to a key which matches the relative key position specified. These functions can be used together to manage scroll bar elevators in GUI based applications.

Deleting Fields in a Datalist

For many applications a datalist does not require modification after it has been used to add records to the C-Index/II file. In some applications, however, fields need to be deleted as a result of changes in the application. In addition, deleting a field will allow special handling of a record.

Deleted fields are not included in records added or updated. When a record is updated, the system also determines if the deleted field of the existing record was included in an index. If so, the key value in that index is deleted.

To mark a field as deleted, logically OR the definition DELETEDFLD with the value for fldtype in the datalist. For example, if a field in the datalist is set as follows:

datalist.fldtype = SEGFLD;

 

then deleting this field from the datalist would be:

datalist.fldtype = SEGFLD | DELETEDFLD;

 

This feature has a number of applications. The most obvious is to remove fields from use as they are not needed in the application. This requires setting psp->checkflg to FALSE (so that C-Index/II will not verify that the datalist is the same as when the record was written to disk). That way new fields can be added to the end of the datalist with old fields being removed from the records by marking them as deleted. Never simply remove fields from the middle of a datalist as this will cause the field number correspondence with later fields to be incorrect (since field numbers are assigned based on position in the datalist).

Another use is to allow more flexibility in how a record is indexed. First, create several fields as segmented key fields and reserve them for future use by marking them as deleted. At the time of calling dadd or dupdate the application can turn on and off the segmented key fields depending on the nature of the record, allowing a record from a single datalist to be indexed differently depending on its content. Since segmented key fields are virtual fields, this changing of the datalist does not affect dread. On dupdate your application can set the pattern of deleted segmented key fields to reflect changes in the nature of the record. The ddelete function is also not affected by the changes to deleted fields, since it keeps track of which fields are indexed using information in the record itself.

Cross-Platform Files

C-Index/II allows for applications to process information in a file even if the file was created for a different computing platform. A C-Index/II file can even be shared between two processes running simultaneously on two platforms. This feature does require that care be taken in developing your application.

C-Index/II will automatically convert internal data to accommodate byte orders which differ between platforms. In addition, C-Index/II will convert byte orders for numeric fields in a multi-key datalist. All of this is transparent to your application. It is important to avoid using an int data value. C-Index/II does not perform any cross-platform conversion between 16 bit and 32 bit int values because of truncation problems. Use short or long field and key values instead, which will be converted correctly.

The internal byte order of a file is determined when it is created (using bcreate or dbcreate). There may be a performance issues on selecting the byte-order format. Although disk i/o activity dominates the processing time for C-Index/II operations, performance will be slightly faster when the byte order of the file matches the byte order of the computer processing the file. Experimentation with different byte orders will help determine the optimum choice for your particular computing environment.

Block File Extend Feature

When C-Index/II finds that a file needs to be extended, it does this by writing blank nodes to the end of the file. The behavior in the past was to write a minimum of 1 node (1K), with the actual number of nodes written determined by a formula to prevent an operation from failing in the middle because of running out of disk space.

 

With Release 5.0, this remains the default behavior, however, a new option is available to improve performance in some applications. C-Index/II will now extend by a minimum number of nodes as specified by the parameter structure variable "minextend". This value must be set in the psp each time the file is opened. For example, you may set psp->minextend to 50. This will cause the file to be extended by a minimum of 50 nodes whenever the file needs to be enlarged. This will reduce the time spent telling the operating system to update allocation tables each time the file is extended a small amount.

This feature can be disabled by defining NO_BLOCK_EXTEND in the compiler specific header. To change the default minimum number of nodes to extend data files, define MINDATAEXTEND in the compiler specific header file.

To change the default minimum for image backup files, define MINIMGEXTEND.

For example, to have a default minimum number of bytes to extend in the data file, define the following:

 

#define MINDATAEXTEND 50 /* extend file by 50 nodes */

 



C-Index/II Home Page

www.triosystems.com © Copyright 1996 - 1999 Trio Systems LLC

C-Index/II User Guide © Copyright 1983-1997 Trio Systems LLC

User Guide Revision Date: 5/2/96

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Trio Systems LLC.

Trio Systems is a registered trademark of Trio Systems LLC. C-Index, C-Index/II, SpeedRead, ReadShare and PowerFail Protection are exclusive trademarks of Trio Systems LLC.