sphlib C code documentation

Overview

sphlib is a library which contains implementations of various cryptographic hash functions. These pages have been generated with doxygen and document the API for the C implementations.

The API is described in appropriate header files, which are available in the "Files" section. Each hash function family has its own header, whose name begins with "sph_" and contains the family name. For instance, the API for the RIPEMD hash functions is available in the header file sph_ripemd.h.

API structure and conventions

Input/output conventions

In all generality, hash functions operate over strings of bits. Individual bits are rarely encountered in C programming or actual communication protocols; most protocols converge on the ubiquitous "octet" which is a group of eight bits. Data is thus expressed as a stream of octets. The C programming language contains the notion of a "byte", which is a data unit managed under the type "unsigned char". The C standard prescribes that a byte should hold at least eight bits, but possibly more. Most modern architectures, even in the embedded world, feature eight-bit bytes, i.e. map bytes to octets.

Nevertheless, for some of the implemented hash functions, an extra API has been added, which allows the input of arbitrary sequences of bits: when the computation is about to be closed, 1 to 7 extra bits can be added. The functions for which this API is implemented include the SHA-2 functions and all SHA-3 candidates.

sphlib defines hash function which may hash octet streams, i.e. streams of bits where the number of bits is a multiple of eight. The data input functions in the sphlib API expect data as anonymous pointers ("const void *") with a length (of type "size_t") which gives the input data chunk length in bytes. A byte is assumed to be an octet; the sph_types.h header contains a compile-time test which prevents compilation on architectures where this property is not met.

The hash function output is also converted into bytes. All currently implemented hash functions have an output width which is a multiple of eight, and this is likely to remain true for new designs.

Most hash functions internally convert input data into 32-bit of 64-bit words, using either little-endian or big-endian conversion. The hash output also often consists of such words, which are encoded into output bytes with a similar endianness convention. Some hash functions have been only loosely specified on that subject; when necessary, sphlib has been tested against published "reference" implementations in order to use the same conventions.

Function short name

Each implemented hash function has a "short name" which is used internally to derive the identifiers for the functions and context structures which the function uses. For instance, MD5 has the short name "md5". Short names are listed in the next section, for the implemented hash functions. In subsequent sections, the short name will be assumed to be "XXX": replace with the actual hash function name to get the C identifier.

Note: some functions within the same family share the same core elements, such as update function or context structure. Correspondingly, some of the defined types or functions may actually be macros which transparently evaluate to another type or function name.

Context structure

Each implemented hash fonction has its own context structure, available under the type name "sph_XXX_context" for the hash function with short name "XXX". This structure holds all needed state for a running hash computation.

The contents of these structures are meant to be opaque, and private to the implementation. However, these contents are specified in the header files so that application code which uses sphlib may access the size of those structures.

The caller is responsible for allocating the context structure, whether by dynamic allocation (malloc() or equivalent), static allocation (a global permanent variable), as an automatic variable ("on the stack"), or by any other mean which ensures proper structure alignment. sphlib code performs no dynamic allocation by itself.

The context must be initialized before use, using the sph_XXX_init() function. This function sets the context state to proper initial values for hashing.

Since all state data is contained within the context structure, sphlib is thread-safe and reentrant: several hash computations may be performed in parallel, provided that they do not operate on the same context. Moreover, a running computation can be cloned by copying the context (with a simple memcpy()): the context and its clone are then independant and may be updated with new data and/or closed without interfering with each other. Similarly, a context structure can be moved in memory at will: context structures contain no pointer, in particular no pointer to themselves.

Data input

Hashed data is input with the sph_XXX() fonction, which takes as parameters a pointer to the context, a pointer to the data to hash, and the number of data bytes to hash. The context is updated with the new data.

Data can be input in one or several calls, with arbitrary input lengths. However, it is best, performance wise, to input data by relatively big chunks (say a few kilobytes), because this allows sphlib to optimize things and avoid internal copying.

When all data has been input, the context can be closed with sph_XXX_close(). The hash output is computed and written into the provided buffer. The caller must take care to provide a buffer of appropriate length; e.g., when using SHA-1, the output is a 20-byte word, therefore the output buffer must be at least 20-byte long.

For some hash functions, the sph_XXX_addbits_and_close() function can be used instead of sph_XXX_close(). This function can take a few extra bits to be added at the end of the input message. This allows hashing messages with a bit length which is not a multiple of 8. The extra bits are provided as an unsigned integer value, and a bit count. The bit count must be between 0 and 7, inclusive. The extra bits are provided as bits 7 to 0 (bits of numerical value 128, 64, 32... downto 0), in that order. For instance, to add three bits of value 1, 1 and 0, the unsigned integer will have value 192 (1*128 + 1*64 + 0*32) and the bit count will be 3.

The SPH_SIZE_XXX macro is defined for each hash function; it evaluates to the function output size, expressed in bits. For instance, SPH_SIZE_sha1 evaluates to 160.

When closed, the context is automatically reinitialized and can be immediately used for another computation. It is not necessary to call sph_XXX_init() after a close. Note that sph_XXX_init() can still be called to "reset" a context, i.e. forget previously input data, and get back to the initial state.

Data alignment

"Alignment" is a property of data, which is said to be "properly aligned" when its emplacement in memory is such that the data can be optimally read by full words. This depends on the type of access; basically, some hash functions will read data by 32-bit or 64-bit words. sphlib does not mandate such alignment for input data, but using aligned data can substantially improve performance.

As a rule, it is best to input data by chunks whose length (in bytes) is a multiple of eight, and which begins at "generally aligned" addresses, such as the base address returned by a call to malloc().

Implemented functions

We give here the list of implemented functions. They are grouped by family; to each family corresponds a specific header file. Each individual function has its associated "short name". Please refer to the documentation for that header file to get details on the hash function denomination and provenance.

Note: the functions marked with a '(64)' in the list below are available only if the C compiler provides an integer type of length 64 bits or more. Such a type is mandatory in the latest C standard (ISO 9899:1999, aka "C99") and is present in several older compilers as well, so chances are that such a type is available.

The fourteen second-round SHA-3 candidates are also implemented:

For the second-round SHA-3 candidates, the functions are as specified for round 2, i.e. with the "tweaks" that some candidates added between round 1 and round 2. Also, some of the submitted packages for round 2 contained errors, in the specification, reference code, or both. sphlib implements the corrected versions.

Generated on Mon Jun 21 17:48:04 2010 for sphlib by  doxygen 1.6.3