| |CI2 Home|Literature|Support|Documentation|Y2K|Order|License| | |||
| Chapter 2 Using the Multi-key Functions
This chapter provides a general discussion of the Multi-key functions of C-Index/II. You must read this chapter before using these functions. A detailed description of each function is provided in the C-Index/II Reference Guide. Description of the Multi-Key Functions The multi-key functions automate the process of managing multi-keyed records. These functions use an in-memory data definition to create the keys and data record. The data definition includes key descriptions and pointers to each field. The functions do all the work in building and extracting records, as well as error checking during the multi-key operations. The functions are easy to use, and consistent in their syntax. When you have learned to use them effectively, you will find that they satisfy most of your data management needs. If a different data storage system than that provided by the multi-key functions is needed however, the lower level single-key functions may be suitable. The multi-key functions should be tried first however, as most applications can be easily handled by them. Features of the multi-key functions include:
A concise set of powerful functions Automatic building of keys and records from data definition Variable length keys and data Storage of keys and data in same file, reducing open files Full error checking during multi-key adding Fields at any location in memory Multiple record types allowed in same file Deleted record space reclaimed automatically
An Example of Usage: A Name and Phone List The best way to describe how to use the multi-key functions is to provide an example. We will progressively show the basic structure of a program which uses these functions. This example is kept extremely simple for clarity. See the example program, example.c, on the C-Index/II diskette for a more complex example of how to use the multi-key routines. The example program will maintain a name and phone number database. The last name will be used as the key to search through the file. The cleanest way to handle data records in C is to define a structure that represents the data record. The structure can then be used to supply the data to be added, and to receive the data when reading. The structure for this example is:
struct nap { /* name and phone data structure */ char lastname[10]; /* last name, key */ char firstname[20]; /* first name */ char phone[20]; /* phone number */ };
All the functions in the following examples use the above definition. Next we must declare a structure of this type to be used:
struct nap naprec;
We now can use naprec for input and output of data records of this type. In C-Index/II, the default key type is a standard C language zero terminated string, that is, a series of characters ending with a final byte of binary zero. In this example we use the last name field for the only key to access the records.
NOTE: C-Index/II also supports more complex key types, including numeric (int, long, double, etc.), binary, custom binary, and "segmented" (concatenated) keys. This is a more complex feature and is discussed in detail in Chapter 8, "Advanced Key Types." A datalist is a description of the data record to be used. The purpose of the datalist is to provide all the needed information for data record reading, writing, and deleting. The datalist defines which fields in a record will be keys, and what type the fields and keys are. It also defines maximum field lengths, and pointers to each field. Typically, it points to a data structure for the record in memory, although the individual fields of a record may be stored anywhere in memory. Physically, a datalist is an array of structures, each structure being a description of the individual fields and/or keys in the data record. The individual structure is of type DATALIST and its definition looks like this:
struct flddef { char fldtype; /* type of data in field */ char keytype; /* type of key */ char dupflag; /* duplicate or unique key */ short fldlen; /* maximum length of field */ char fldindex; /* index (key) number */ char *fldptr; /* pointer to (& of) field */ };
typedef struct flddef DATALIST; To declare a datalist for a data record with 4 fields (as in our example), the statement:
DATALIST naplist[4];
creates an array of 4 FIELDs. We use one more than the number of data fields because the last FIELD is used for specifying the end. This is accomplished by specifying:
naplist[3].fldtype = ENDLIST;
and will be explained in more detail later. Once the datalist has been created, it must be initialized with the appropriate values in order to correctly describe the data record. The simplest method is to use one function which is called at the beginning of the program. The function to initialize could appear as follows:
struct nap naprec; /* globally defined */ DATALIST naplist[4];
initdlist() /* initialize nap list */ { /* initialize last name field */ naplist[0].fldtype = STRINGFLD; naplist[0].keytype = STRINGIND; naplist[0].dupflag = DUPKEY; naplist[0].fldlen = 10; naplist[0].fldindex = 1; naplist[0].fldptr = naprec.lastname;
/* initialize first name field */ naplist[1].fldtype = STRINGFLD; naplist[1].keytype = NONKEY; naplist[1].dupflag = NONKEY; naplist[1].fldlen = 20; naplist[1].fldindex = 0; naplist[1].fldptr = naprec.firstname;
/* initialize phone field */ naplist[2].fldtype = STRINGFLD; naplist[2].keytype = NONKEY; naplist[2].dupflag = NONKEY; naplist[2].fldlen = 20; naplist[2].fldindex = 0; naplist[2].fldptr = naprec.phone;
/* signal end of list */ naplist[3].fldtype = ENDLIST; }
The following is a brief description of the above example:
naplist[0].fldtype = STRINGFLD;
Indicates that the first field a null terminated C string.
naplist[0].keytype = STRINGIND; Declares that the field value will be indexed using a null terminated C string key.
naplist[0].dupflag = DUPKEY; Declares that duplicate name keys are allowed. With a field that is a unique identifier, use the UNQKEY type.
naplist[0].fldlen = 10; Indicates that the maximum length of the field is 10, and is used to prevent writing past the end of the field buffer. This helps to avoid many common and hard to find C errors generated by memory corruption. The length is the size of the buffer used to hold the string field, including the null terminator.
naplist[0].fldindex = 1; Tells C-Index/II to store the name key in index 1. When searching for a record by name key (using dfind), the key number indicates which index of keys to search.
naplist[0].fldptr = naprec.lastname; Tells C-Index/II where the field is (when adding a record) and where to put the found field (on reading a data record).
naplist[1].fldtype = STRINGFLD; Declares that the second field is a null terminated C string.
naplist[1].keytype = NONKEY; This declares that the field is a non-key field.
naplist[1].dupflag = NONKEY; Although it is a nonkey field, this statement is required for integrity checking. C-Index/II always checks the datalist for valid types to insure that each field has been specified correctly.
list[1].fldindex = 0; Again, although this is a non-key field, this command is another check on the definition of the list.
Also required:
naplist[2].fldtype = STRINGFLD; naplist[2].keytype = NONKEY; naplist[2].dupflag = NONKEY; naplist[2].fldlen = 20; naplist[2].fldindex = 0; naplist[2].fldptr = naprec.phone;
These lines do the same thing as the naplist[1] fields, with the field pointer set to the naprec.phone variable.
naplist[3].fldtype = ENDLIST; Signals the end of the datalist. You MUST have this declaration in your datalist, or the program will work incorrectly or return an error indicating an invalid datalist. How C-Index/II Uses The Datalist The multi-key routines manage variable length records. Only essential information in the data record is stored on disk. Null fields (or 0 fields for numeric types) are not physically stored at all. This can greatly increase the space efficiency for storing data records. C-Index/II also stores very little of the datalist definition in the record. Because of this fact, C-Index/II requires that you always use the same datalist for reading a record that was used for adding or changing a record. A checksum of the datalist used on adding a record is stored with the record on disk, and this value is checked against the supplied datalist on reads. If the checksums do not match, an error is returned on reading the record.
Before C-Index/II adds or reads a data record, it must have a buffer in which to build the record. On opening or creating a file, the program specifies the location and size of a record buffer. To determine the maximum size of the data buffer required for any data structure, the following formula should be used:
bufsize = (max length of all fields) + (7 bytes overhead) + ( (number of fields) * 2) + ( (number of binary fields) * 2)
This would mean for our example:
bufsize = (10 + 20 + 20) /* max fld length */ + 7 /* record overhead */ + 3 * 2 /* field overhead */ + 0 * 2 /* binary overhead */
bufsize = 50 + 7 + 6 + 0
bufsize = 63
Thus, a buffer of at least 63 bytes is necessary. This can be accomplished by:
char workbuf[100];
Notice that there are 37 extra bytes allocated for workbuf. It is advisable to add a few extra bytes to compensate for possible miscalculation. C-Index/II checks that it does not overflow the buffer, but it must use the length that is specified. If this length is wrong and the buffer overflows, the program will probably malfunction in unpredictable ways. If in doubt, be generous in the allocation of buffer space.
At this point, the main function of the sample program can now be produced:
/* example of a name and phone list */
#include "cndx.h" /* required header file */
struct nap { char lastname[10]; /* last name, key */ char firstname[20]; /* first name */ char phone[20]; /* phone number */ };
struct nap naprec; /* declare data record */ CFILE napfile; /* psp */ FIELD naplist[4]; /* our datalist */ char workbuf[70]; /* work buffer */
void main() { initdlist(); /* see definition above */
/* main body of code goes here */
exit(0); }
The above statements are all that is necessary to begin to use the file in this example. This preparatory work must only be written once, and it contains all the information C-Index/II must know about your file. With C-Index/II, it is not necessary to run a separate utility program to define your file information. There are 9 basic file operations needed to maintain a C-Index/II file:
dbcreate create file dbopen open file mclose close file dadd add record dfind find record by key dseq find record sequentially dread read record that has been found dupdate update record ddelete delete record.
Additional multi-key functions are provided for multi-user file access which are discussed in Chapter 6, "Multi-User Functions." With only one flexible function for each operation, it is simple to access C-Index/II files. In addition, writing programs with C-Index/II multi-key routines encourages a consistent programming style. Creating, Opening and Closing a File The three functions to create, open and close a file are: dbcreate, dbopen, and mclose. The dbcreate function is defined as follows:
int dbcreate(psp, filename, workbuf, buflen, sharemode, bytemode, indexlist) CFILE *psp; /* pointer to psp */ char *filename; /* pointer to name of file to open */ char *workbuf; /* pointer to work buffer */ int buflen; /* length of work buffer */ int sharemode; /* EXCL or SHARED file access */ int bytemode; /* type of byte ordering in file */ NDXLIST *indexlist; /* pointer to index type list */
The file must not already exist or there will be an error, so be sure to remove an existing file before creating a new one. The usage for dbcreate looks like:
ret = dbcreate(&napfile, "nap.dat", workbuf, 63, EXCL, NATIVEMODE, NULL);
The file is now open and ready to access. If the file has already been previously created but is not currently open, the dbopen function is used to open the file. It is defined as follows:
int dbopen(psp, filename, workbuf, buflen, sharemode, readmode, indexlist) CFILE *psp; /* pointer to psp */ char *filename; /* pointer to name of file to open */ char *workbuf; /* pointer to work buffer */ int buflen; /* length of work buffer */ int len, /* work buffer length */ int sharemode, /* EXCL or SHARED */ int readmode, /* read mode flag */ NDXLIST *indexlist) /* index type list */
Remember, in order to open a file, it must have been previously created by C-Index/II. The usage for dbopen is:
int ret;
ret = dopen(&napfile, "nap.dat", workbuf, 63, EXCL, CRDWRITE, NULL);
The variable ret is an integer return code. All functions return an integer, and should always be checked. (See reference section for all return codes.) Notice, the parameter &napfile is used here. The & character must be used in this case in order to pass a pointer to the psp rather than to pass the psp itself. The mclose function is defined as follows:
int mclose(psp) CFILE *psp; /* pointer to an open psp */
A file must have been opened before it can be closed. Also, if a psp was used to open a file, that file must be closed before the psp can be used again to open or create another file. The usage for mclose:
int ret;
ret = mclose(&napfile);
There is one way to add a multi-key record to a C-Index/II file using the dadd function. The dadd function is defined as follows:
int dadd(psp, datalist) CFILE *psp; /* pointer to an open psp */ FIELD *datalist; /* pointer to datalist */
Remember, the datalist must have been initialized and the file must have been opened before any information may be added to it. After a routine has completed setting the values of a data record, dadd may then be called. The datalist specifies the type and location of fields in memory. The dadd function uses the datalist to build a variable length record and add the record with its associated keys to the file. For example, a function called getdata obtains all the fields of information from the user, and puts them in the name and address fields in the example program structure naprec. To add a record the following routine could be written:
addrec() /* add a record to the file */ { int ret; /* return code from dadd */
getdata(); /* first get data */ ret = dadd(&napfile, naplist); /* do add */ }
Note that the routine getdata would only have to fill the naprec data structure, and would never change the datalist, naplist. Again, once the datalist has been initialized, there is no need to access it. Of course, in the above example the return code should be checked for any possible errors. C-Index/II uses two methods of finding records. These two methods are referred to as find operations. The two kinds of find operations are: random and sequential. To perform a random find, the dfind function is used; to perform a sequential function, the dseq function is used. The dfind function is defined as follows: int dfind(psp, indexnum, key, keytype, findtype) CFILE *psp; /* pointer to open file psp */ int indexnum; /* index number to search */ char *key; /* value to look for */ int keytype; /* index key type */ int findtype; /* type of find - see below */
The type of random find operation (findtype) may be one of the following:
For example, assuming an entire list of names has been added by using dadd, it may now be desirable to find certain records based on a name key. In this case, the first record with the last name SMITH. Note that the default C-Index/II key type (STRINGIND) uses a case-sensitive string compare, strcmp().
ret = dfind(&napfile, 1, "SMITH", STRINGIND, EQUAL);
This statement is a request to look in index# 1 (defined by the datalist as the name key) for the first record with a last name of "SMITH". If the find was successful, the return code will be CCOK. If it is unsuccessful, the return code will be FAIL, indicating that no records were found with a name of "SMITH". Any other return code indicates that an error occurred. As another example of the use of a random find, a record may be found with a last name that starts with the letter "T".
ret = dfind(&napfile, 1, "T", STRINGIND, GREATEQ);
This statement is a request to look in index# 1 for any record that is equal to or greater than "T". C-Index/II uses the standard ASCII collating sequence, and therefore any name that starts with a "T" (including the name "T") will be equal to or greater than "T". A successful return code in this case would be either CCOK or GREATER. CCOK indicates that an exact match was found; GREATER indicates that a greater value was found. In this example, a return code of GREATER could also indicate a record starting with letters higher than "T", such as "U" or "Z". For the types GREATEQ and LESSEQ, C-Index/II will always first try to find an exact match, and if none is found, it will look for a greater or less than match. The other random find types in C-Index/II may be used in a similar manner. The second type of find operation in C-Index/II is sequential find (dseq). It is used to locate records in sequential order. A sequential find does not locate using a key value, as does a random find, but rather, from the current position in the file. In other words, a sequential find may obtain the next key in sequence, the previous key, the first key (from the beginning) or the last key (from the end). The dseq function is defined as follows:
int dseq(psp, indexnum, seqtype) CFILE *psp; /* pointer to open file */ int indexnum; /* index number to read from */ int seqtype; /* sequential type */
The allowed types of sequential finds are:
To use the FIRST and LAST sequential find types, no previous positioning is needed; the first or last record in the index is retrieved. For example:
ret = dseq(&napfile, 1, FIRST);
finds the first record in the index (such as the first alphabetic name in an index containing name keys). Note that if the return code is FAIL, there are no records to be found using that index. When using the NEXT and PREV sequential find types, the key located by dseq() will depend on what processing of this index has already occurred. If no operation has set a current key position prior to this call, the NEXT type will return the first key in the index (same result as specifying FIRST). The first access to an index using the PREV type will return FAIL (no key found). Otherwise, the next or previous key relative to the last index operation will be returned. For example:
ret = dfind(&napfile, 1, "SMITH", STRINGIND, EQUAL); if (ret == CCOK) ret = dseq(&napfile, 1, NEXT);
finds the record with a key value higher than the first SMITH entry in the index. Note that if the return code is FAIL, there are no more records with key values higher than the last access (or lower key values in the case of PREV).
The above examples are incomplete however, because there is no mention of a datalist in the functions. It might be asked, "Without the datalist, how can my record be read?". That would be a very good question, and the answer is "It can't" because an important function has been omitted up to this point. Why not read the record in directly on finds, then? Remember it is possible to find a record greater than or less than desired, or to finding nothing at all. The exact record needed is not always found. In addition, there may be times when a key count is desired, and in this case, time is saved by not reading the data record. C-Index/II therefore, provides the capability of deciding when to actually read the record. The functions dfind and dseq prepare C-Index/II to read the record requested, but the function dread must be used to actually read the data record. The dread function is defined as follows: int dread(psp, datalist) CFILE *psp; /* pointer to open file */ FIELD *datalist; /* pointer to datalist */
A typical usage:
ret = dfind(&napfile, 1, "JONES", STRINGIND, EQUAL); if (ret == OK) /* found something */ ret = dread(&napfile, naplist);
Deleting a record is accomplished with one function, ddelete. The definition of ddelete is as follows:
int ddelete(psp, datalist) CFILE *psp; /* pointer to open file */ FIELD *datalist; /* pointer to datalist */
In order to delete, a current record must have been set by a previously successful dfind or dseq. When deleting, the datalist is required because C-Index/II must know how to construct the keys before it is able to delete them. An example usage of ddelete is:
/* first find name to delete */ ret = dfind(&napfile, 1, "BAD", STRINGIND, EQUAL); if (ret == OK) ret = ddelete(&napfile, naplist); /* delete on good find */
As an elaboration on the above example, it might be desirable to call dread before deleting, display the record on the console and verify with the user if it really should be deleted. There is one function to update a record. Updating is best described as a single function call to both delete the current record and then add a new one in its place. First a record must be made current, then a new data record built, and then dupdate called. The definition of dupdate is defined as follows:
int dupdate(psp, datalist) CFILE *psp; FIELD *datalist;
For example, to update the record for "JONES":
/* find record */ ret = dfind(&napfile, 1, "JONES", STRINGIND, EQUAL); if (ret == OK) ret = dread(&napfile, naplist); /* read record */ if (ret == OK) { getdata(); /* get new data for this record */ ret = dupdate(&napfile, naplist); /* change it */ }
All the functions needed for fully maintaining a database have now been discussed. For more detailed information, refer to the reference section that follows and study the example program supplied on disk. The file "example.exe" is a executable version of the example source code contained in "example.c". Experimenting with this program will give a sense for how to use the functions that have been discussed. The basic multi-key functions discussed above are very flexible and should accommodate most programming requirements. When more flexibility is needed however, C-Index/II includes some additional capabilities for these multi-key routines. Accessing the Parameter Structure As stated, the parameter structure (psp) is used to transfer information between the application program and the C-Index/II functions. In most cases the program need not examine the information in the psp when using multi-key functions. Those situations in which it might be useful to set or examine the psp information are described below. Examining the Current Record Number The data record number that is returned by the dfind and dseq functions (the "current" record number) is contained in the psp long int variable currec (i.e. psp->currec). For our previous example, we might look at the record number like this:
if (dfind(&napfile, "Test", 1, STRINGIND, EQUAL) == OK) printf("Current record number = %ld", napfile.currec);
It may be desirable for the program to save the current record number for reading a data record at a later time. This is accomplished by saving the current record number, for example after a dfind, and then restoring it in the psp before calling dread. Setting the currec value in the psp does not reset the location of the index. If you want to reposition the current key pointer in an index which contains unique keys, use the dfind function. To reposition to a duplicate key, you must use the single-key function cfind. This is discussed in Chapter 7, "Advanced Usage of C-Index/II." The key that is found by the dfind and dseq functions is returned in the psp pointer variable key (i.e., psp.key). This is always set by the locating functions dfind and dseq which can be useful when using non-equal finds by dfind or sequential operations with dseq, and deciding whether or not to read the data record based on the key found. Again from the previous example, the method for examining the found key might be as follows:
if (dfind(&napfile,"Test",1,STRINGIND,GREATEQ) == GREATER) printf("Greater key found = %s", napfile.key);
Note here that after dread, the current key from a find operation is a null sting. This occurs because dfind also uses the psp, and in looking for the data record, it resets the key. Therefore, be sure to look at the key before reading the record. Because the keys are built from the data record, once it has been read, it is possible to examine the key in the appropriate field of the data record itself. Normally the DATALIST structure array is not modified by C-Index/II. The one exception to this is when a field is of the type BINARYFLD. In this case dread sets the variable fldlen to the length of the returned binary field. For this reason, it is necessary to set the binary field length every time a dread, dadd or dupdate function is called. There may be occasions when it is necessary to have different datalists in the same file. Since C-Index/II always checks the supplied datalist against the datalist of the record, it must be known in advance which datalist to supply to dread. C-Index/II has an additional piece of information encoded with each data record called the idbyte. It is also located in the char psp variable psp.idbyte. The idbyte is an identifying byte value stored near the front of each data record. For instance, in the case of two datalists, naplist1 and naplist2, the idbyte for naplist1 could be set to 1 and the idbyte for naplist2 could be set to the value 2. The value for naplist1 could be set in the psp before adding or updating records as follows:
/* have assembled a record already */ napfile.idbyte = 1; /* this record has idbyte of 1 */ ret = dadd(&napfile, naplist1);
The idbyte may then be retrieved by using the dgetid function, which has the following definition:
int dgetid(psp) CFILE *psp;
After a find and before reading the record using dread, the dgetid function can be used to read the idbyte. Based on its value, the appropriate datalist can be used in a subsequent dread of the record. The function dgetid returns a condition code, and if the code is OK the idbyte will be in the Parameter Structure Variable idbyte. An example of usage is:
ret = dfind(&napfile, 1, "Jones", STRINGIND, EQUAL);
if (ret == OK) if (dgetid(&napfile) == OK) /* always check return*/ if (napfile.idbyte == 1) dread(&napfile, naplist1); else dread(napfile, naplist2);
Because this is a more advanced feature, it can be ignored if it is not necessary to mix multiple record types within a single-key index. The idbyte value will always be set to whatever is in the idbyte variable at the time of add. It is meant only for the programmer's reference, and does not influence the internal functioning of C-Index/II. Auto Initialization of the Datalist Since the datalist is an array, it can be initialized as a static array at the top of the source code file as is any other array. Initializing an array of structures requires thought but it can eliminate the coding involved in setting up the datalist. For example, the previous example's datalist might have been specified as follows:
#include "cndx.h"
/* assumes naprec already defined */ FIELD naplist[] = { /* fldtype keytype duptype len index# fldptr */ STRINGFLD, STRINGIND, DUPKEY, 10, 1, naprec.lastname, STRINGFLD, NONKEY, NONKEY, 20, 0, naprec.firstname, STRINGFLD, NONKEY, NONKEY, 20, 0, naprec.phone, LASTFIELD };
Note: The comment line showing the field types helps keep track of the type being defined. The above definition of naplist accomplishes exactly the same thing as the initdlist() function. With a large datalist a great deal of programming time can be saved since the compiler does all the work. Notice that the macro LASTFIELD sets the appropriate values for the last item in a datalist, and is defined in the header file cndx.h. The main problem with this approach to defining the datalist is that some compilers will not support this syntax. In addition, some compilers will not allow a global variable to be initialized with a pointer cast to a type other than char *. Experimentation is the best way to find out what each compiler allows. A good compromise is to initialize the datalist supplying null field pointers, and code a function that sets only the field pointers. Refer to the example program, example.c, to see this method of initializing a datalist. Example Program for the Multi-key Functions An example program is supplied to illustrate uses for the multi-key functions. This is an interactive program that is designed to run only on IBM PC/AT computers and 100% compatibles since it both writes to screen memory and uses BIOS calls. The example implements a simple database type program that allows full screen data entry and display of a record with 10 fields of information. The information is stored with three key values. Using function keys all the multi-key functions can be performed, including add, delete, update, search on keyed value and next/previous/first/last sequential searches.
Supplied Files for Example Program
Setting the datalist correctly is absolutely essential because C-Index/II must have an accurate image of the record that is being used. The following describes each variable, its uses, and allowable settings:
C-Index/II Home Pagewww.triosystems.com © Copyright 1996 - 1999 Trio Systems LLC |