Programming languages are constantly evolving and diversifying. The C language is no exception, especially due to its increased popularity in recent years. The original specification document for C, The C Programming Language by Brian Kernighan and Dennis Ritchie, commonly referred to as K&R, is now almost ten years old. K&R has served as the C programmer's "bible", the de facto standard for C. But as the language has evolved, the need for a formal language standard has become apparent.
Under the auspices of the International Standards Organization (ISO), the American National Standards Institute (ANSI) began the preparation of such a standard through the leadership of the X3J11 Technical Committee. The proposed standard is now in final draft form and is expected to be approved by ANSI and ISO in 1988.
This document is a summary of the proposed standard's major changes to the language as it pertains to the C programmer. All information is drawn from official X3J11 documents: the Draft Proposed American National Standard for Information Systems - Programming Language C and the accompanying Rationale for Draft Proposed National Standard for Information Systems - Programming Language C. (These publications will be referred to as the Standard and the Rationale, respectively.)
Few programmers have either the time or the interest to wade through the actual text of the draft Standard in its entirety. What interests them are the major points of the Standard and the changes it makes to what was defined in K&R and how it affects their current programs. By summarizing these changes, this document is intended to provide a quick reference that the average C programmer can read and understand in one session. In keeping with this goal, the Standard Library is only briefly mentioned. Readers interested in the specifics of the Library should consult the Standard itself or the documentation accompanying any ANSI-conforming compiler.
This document is not a criticism or a justification of the Standard, only a commentary. Nor is it a tutorial on the C programming language. Readers should also be aware that changes may occur to the Standard before its final acceptance by ANSI.
The Standard is not creating a new language definition. Its purpose, to quote the Rationale, is to "codify common existing practice". This means that the fundamental structure and syntax of the language as described in K&R has been left unchanged. The Standard has instead tried to unify the diverse extensions and dialects that have grown over the years (the existing practice) into a single cohesive language definition. Existing practice is often inconsistent, however, so many compromises have had to be made.
Perhaps the most important thing to remember about the Standard is that it is not intended to invalidate existing C code. Existing programs should compile with only minor changes when using an ANSI-conformant compiler.
Throughout this document the word implementation will be used to refer to any particular implementation of an ANSI-conformant C language interpreter or compiler.
The following keywords have been added to the language:
Explanations for each keyword follow in Sections 2 and 4. In addition, the identifier "entry" has been deleted from the list of reserved identifiers as it was never implemented by K&R or subsequent versions of C.
Each keyword is a reserved identifier; programs that currently use these keywords as variable names must be changed to compile under an ANSI-conformant implementation.
Major language changes occur with respect to data types. The trend in the Standard has been to provide the language with stronger typing facilities.
The list of available integer types has been expanded to include "signed char" and the following variations:
long int and short int may also be used as variations of long and short, respectively. As would be expected, the declarations:
signed x; unsigned y;
may be used as shorthand for signed int and unsigned int. All logical combinations for integer types are now allowed.
Whether or not a simple char is considered to be signed or unsigned is left up to the implementation.
While int is still the default type for variables and functions, at least one storage class (auto, register, static, extern) or type specifier must be present when declaring a variable. A declaration of the form:
x;
is no longer allowed and must be replaced with:
int x;
to compile.
The new type long double has been added for more precision. But like any long type, an object of this type is only guaranteed to be at least as large as a double.
The type long float (a previous synonym for double) is now invalid. The only acceptable floating-point types are float, double and long double.
Member name spaces are now unique within structures and unions. That is, two different structures or unions may contain members with the same name without fear of conflict.
Structures and unions may now:
Already available in most compilers, enumerations have been added to the language. An enumeration is a way of declaring a set of integer constants. The declaration:
enum colours { RED, BLUE, GREEN };
would declare colours as an enumeration tag representing the integer constants RED, BLUE and GREEN. These enumeration constants are given integer values starting at 0 and increasing by 1 with each identifier.
An enumeration constant may be used wherever an integer is expected. The following is equivalent to the above enumerated type:
#define RED 0 #define BLUE 1 #define GREEN 2
Enumeration constants are not restricted to upper case, but upper case is a widely recognized convention for constants.
Variables may be declared to have enumeration type. The declaration:
enum colours x, y;
declares x and y to be integer variables capable of holding an enumeration constant of type colours. In practice, little or no checking is done to make sure enumeration constants are used, so the following assignments are equivalent:
x = BLUE; x = 1; /* defeats purpose of enum */
Constant values may be directly assigned within an enumeration as well:
enum relation { EQUAL = 1, LESS_THAN = 2, GREATER_THAN = 4 };
If no value is specified for a given identifier, the constant is taken to have the value of the previous constant plus one.
The size of an enumeration type has been left unspecified; the implementation is free to store it in the most optimal fashion, providing that it always behaves like an int.
The void data type has been added to indicate that an expression has no value. No variables can be declared with such a type, but expressions may be cast to void.
For example, the following declaration:
(void) printf( "hello world" );
specifically indicates to the compiler that the return value from printf (an integer) is to be ignored. As such, the following statement is illegal:
a = (void) func(); /* illegal! */
since the assignment operator expects a value to be returned for assignment.
Void pointers and void functions are discussed below and in Section 4.
Pointers are no longer synonymous with the int type. Pointers may only be compared with or assigned:
Any other use of a pointer will generate a warning message upon compilation. Many assignment statements will require explicit casting of the right-hand values to avoid generating these messages.
A void pointer is a pointer that has no base type that is, it points to a type of unknown specification and is declared using the syntax:
void *ptr;
Indirection through a void pointer is not allowed; it must be cast to an appropriate pointer type first. Its main use is as a generic pointer.
Arrays with storage class auto may now be initialized. If specified, the size of an array must be an integral expression greater than zero.
The Standard makes available the two attributes const and volatile for use as type modifiers.
An object declared to be const cannot be modified (assigned to, incremented or decremented) by a program. Thus the following code is invalid:
const int x; /* ..... */ x = 2; /* illegal! */
Initialization, however, is allowed:
const unsigned char masks[] = { 0x00, 0xff };
A const object (if it is of static storage duration) may be data that is put into read-only memory. Declaring such data with the const attribute allows the compiler to diagnose any attempts at modifying the data. Function parameters may also be declared as const to indicate that they are not modified by the function. This provides both extra documentation and, when function prototypes (described in Section 4) are used properly, consistent error-checking.
A volatile object is one that may be modified outside of program control. Memory-mapped I/O ports are a typical example. Declaring an object as volatile indicates that the compiler should always generate code to fetch the object's value from its actual memory location it may have changed since the last access by the program. (This disallows optimizations which could load the value into a register and possibly return erroneous results.)
volatile char *port1 = 0x00f3; /* ptr to I/O port */ while( *port1 & DATA_FLAG ) /* needs to be volatile */ clear_io();
The const and volatile modifiers may be used (singly or together) in combination with any other valid type specifiers.
Pointers may also be declared to be const or volatile through the use of special syntax:
int const *a; int *const b; int *const *c;
In the example, a is declared to be a pointer to a const integer, whereas b is declared to be a const pointer to an integer. The distinction lies in the placement of the const attribute. The declaration for c is even more confusing: it declares a pointer to a const pointer to an integer. Consider the following statements:
a = NULL; /* ok */ *a = 0; /* error */ b = NULL; /* error */ *b = 0; /* ok */ c = &b; /* ok */ *c = NULL; /* error */ **c = 0; /* ok */
Because a is a pointer to a const int, the value it points to may not be changed. Similarly, because b is a const pointer to an int, it may not be modified, though the value it points to may. The pointer c may be modified, but not the pointer *c, though **c (an integer) is modifiable itself.
The volatile modifier may also be used with pointers in conjunction with or separate from const.
Bit-fields may now be of type int, unsigned int or signed int. Whether or not the high-order bit of an int bit-field is to be considered a sign bit is implementation-defined.
A vacuous definition consisting only of a struct or union specifier with a tag name is now allowed. Its purpose is to hide any outer declaration of the same name in the current block, as the definitions for struct a demonstrate in this example:
struct a { int x; }; int func(){ struct a st1; /* struct defined above */ struct a; /* vacuous definition: it "clears" the current defn of struct a to make way for a new one */ /* references to struct a now refer to the new definition within this block */ struct b { struct a *y; /* refers to NEXT struct */ } st2; struct a { struct b *z; } st3; st1.x = 1; st2.y = &st3; /* &st1 would give warning */ st3.z = &st2; } /* old struct defn now back in scope */
Here the member y of st2 is a pointer to the second struct a, which is defined below it. If the vacuous definition
struct a;
had not been present, y would instead have been a pointer to the struct a defined in the previous (enclosing) scope level outside the function.
The Standard defines the integral promotions as follows: the char, short or bit-field types (with or without the signed or unsigned modifiers) may be used wherever an int is expected. The values will be converted to int if possible; otherwise they will be converted to unsigned int.
The usual arithmetic conversions used with most binary operators have been modified to reflect the new types available to the programmer. Of particular note: expressions of type float are no longer automatically converted to double for arithmetic purposes; such arithmetic may now be performed less accurately.
The Standard also specifies other rules regarding conversions. Where signed and unsigned integer values are concerned, the Standard now advocates value preserving as opposed to unsigned preserving rules: unsigned values are promoted to signed int if possible, otherwise they are promoted to unsigned int. Floating-point values must now truncate towards zero when converted to integral types. No rounding need occur when a double is demoted to float. Otherwise, the rules in K&R are unchanged.
Any program comparing or performing arithmetic on values of different types should be closely screened for possible changes in behaviour.
Any compiler conforming to the Standard must also respect the following limits with respect to the range of values any particular type may accept. Note that these are lower limits: an implementation is free to exceed any or all of these. Note also that the minimum range for a char is dependent on whether or not a char is considered to be signed or unsigned.
Type | Minimum Range |
signed char | -127 to +127 |
unsigned char | 0 to 255 |
short int | -32767 to +32767 |
unsigned short int | 0 to 65535 |
int | -32767 to +32767 |
unsigned int | 0 to 65535 |
long int | -2147483647 to +2147483647 |
unsigned long int | 0 to 4294967295 |
Type | Minimum Precision |
float | 6 digits |
double | 10 digits |
long double | 10 digits |
The Standard also specifies that these limits should be present as preprocessor macros in the header file <limits.h>.
Changes in this area have occurred mainly with respect to variable (object) linkage and initialization.
Objects declared as either static or auto may be initialized by following the declaration with an equals sign, '=', and an initialization expression. External (inter-module) objects are discussed below.
If no initialization is given for a static object, all arithmetic types in the object are assigned 0 and all the pointers are set to NULL. If no initialization is given to an auto object, its initial value is undefined. These rules are unchanged from K&R.
All initializers for either static objects or auto arrays, unions and structures must be constant expressions.
Unions may now be initialized: the initialization value is assigned to the first member of the union.
The initialization expression for a scalar (integral, floating-point or pointer) object may optionally be enclosed in braces. Braces must enclose the initialization expressions for arrays, structures and unions. There can be no more initializers in an initialization list than there are objects to be initialized (there may be less, though, and any remaining uninitialized objects are handled as described above).
An array of char or pointer to char may be initialized with a string constant.
In this section object refers to an object declared outside of any function.
The linkage of an object determines its scope within the program. An object with external linkage is known to all files in a program. An object with internal linkage is known only to the file in which it is declared. Current C compilers often differentiate between the two in incompatible ways, an issue which the Standard resolves.
An object is said to be defined if it includes an initializer. A defined object has internal linkage if the storage class static is specified; otherwise it has external linkage. An object can only be defined once.
Any object declaration without the extern modifier and without an initializer constitutes what is known as a tentative definition of the object. If an actual definition for the object is encountered in the same file, all tentative definitions are considered to be simple declarations referring to that object. Otherwise the first tentative definition is considered to be an actual definition with initializer equal to 0.
/* example drawn from the Standard */ int i1 = 1; /* definition, external linkage */ static int i2 = 2; /* definition, internal linkage */ extern int i3 = 3; /* definition, external linkage */ int i4; /* tentative definition */ static int i5; /* tentative definition */ int i1; /* tentative def., refers to previous */ int i2; /* invalid -- linkage disagreement */ int i3; /* tentative def., refers to previous */ int i4; /* tentative def., refers to previous */ int i5; /* invalid -- linkage disagreement */ extern int i1; /* these are all valid references */ extern int i2; extern int i3; extern int i4; extern int i5;
These complex rules provide the most flexibility and allow the majority of current C code to be compatible with the Standard.
The simplest way to declare an externally-linked object is to define it in one file (with or without initializer) and reference it in all others through the use of an appropriate extern declaration.
Additions to the language definition occur in the area of function declarations, function definitions and variable parameter lists.
The Standard now allows the types of formal parameters to be specified within the actual function declaration at the start of the function definition. This new-style definition form more closely resembles languages such as Pascal and Modula-2:
int main( int argc, char *argv[] ){ /* ... */ }
If this style is used, a type must be specified separately for each formal parameter in the argument list. Mixing the new-style with the K&R-style in the same definition is not allowed.
This style is intended by ANSI to become the favored style, and a future Standard may disallow the K&R-style definition. For the moment, however, both styles may be used interchangeably.
Functions may also be defined as explicitly having no return values. Such functions are called void functions and are defined using the type void:
void func( int a ){ /* ... */ return; }
Though the use of the return statement is allowed, such functions must not return an expression. If no type is explicitly specified, the function return type still defaults to int to retain compatibility with K&R.
Function type declarations are also consistent with the new-style function definitions and may include a list of formal parameters. These parameters consist of type declarations with or without identifiers. Identifiers are cosmetic only and need only be included for readability. Some examples are:
int main( int, char *[] ); extern char *strcpy( char *dst, const char *src );
Note that declarations must be consistent as they will be checked by the compiler. Each declaration of a function should agree with all previous declarations in both the number and types of parameters.
The following declarations illustrate two special cases:
extern int func1( void ); extern int func2();
The first case explicitly declares that the function func1 does not take any parameters; that the parameter list is empty or void.
The second declaration states that no information is known on the number and types of any formal parameter. This is to provide compatibility with K&R.
A function declaration that provides the number and types of parameters is called a function prototype. The addition of prototypes to C allows for stricter type-checking by the compiler. When a prototype for a function has been declared, each subsequent call to that function is checked to make sure that the correct number of arguments has been supplied. As well, the type of each argument is compared with what was declared in the prototype. If different, the argument is converted to the required type as if it had been assigned to an object of that type. The default argument promotions (char and short to int, float to double) are not performed when a prototype has been declared. (Note: The default argument promotions are separate from the usual arithmetic conversions.)
If a function prototype occurs in the same file as the definition of that function, both the prototype and the definition must agree exactly if the definition is of the new style. In K&R-style definitions, the formal parameters are first widened by the default argument promotions and then compared to the prototype(s). If no prototype occurs in the file, the function definition itself serves as a prototype for the code following it.
Certain C functions are designed to take a variable number of parameters. Unfortunately, some compilers use different schemes for handling such situations and what works in one implementation may not work elsewhere. The Standard therefore provides for the explicit declaration of such functions and portable facilities for handling them. A function that takes a variable number of parameters is defined by ending the parameter list (new-style only) with an ellipsis:
int printf( const char *format, ... ){ /* ... */ }
Thus the only thing known about printf is that it takes at least one parameter, the type of which is a pointer to const char. Prototypes may also be declared in this fashion:
extern sprintf( char *dest, const char *format, ... );
The compiler will then make sure that each call to sprintf has at least two arguments, both of which are pointers to char.
The arguments themselves are accessed through the use of special macro facilities defined in the header file <stdarg.h>, part of the ANSI Standard Library.
The C preprocessor, long since recognized as an integral part of the language, has benefitted from a number of additions and clarifications in the Standard.
The #elif directive has been added as a shorthand form of the #else #if preprocessor sequence.
The identifier defined is reserved during an #if or #elif so that:
#if defined( NULL ) #if !defined( TRUE )
are equivalent to:
#ifdef NULL #ifndef TRUE
Also new on the list are the directives #error and #pragma. The former produces an error message at compile-time; the latter is implementation-defined in its use and effects.
Besides the two allowable forms:
#include <fname1> #include "fname2"
a third form:
#include fname3
is acceptable, provided that fname3 is a macro which expands into one of the other two forms.
Two new operators have been added for use within a macro replacement string. The ## (concat) operator concatenates two adjacent preprocessor tokens (a preprocessor token is any consecutive series of non-blank characters). The # (stringize) operator places the parameter following it in string form. For example, consider the following definition:
#define debug( s ) printf( "x" # s "= %d\\n", x ## s )
The following macro call:
debug( 1 );
expands to:
printf( "x" "1" " = %d\\n", x1 );
which after string concatenation (see Section 7) gives the final result:
printf( "x1 = %d\\n", x1 );
Program debugging through the use of macros has thus been made simpler.
Five new macros are predefined in the Standard, all of which are expanded to their appropriate values upon file compilation.
Macro | Expands To |
__DATE__ | current date |
__TIME__ | current time |
__FILE__ | current file name |
__LINE__ | current line name |
__STDC__ | non-zero value |
The definition of the __STDC__ macro indicates an ANSI-conformant compiler.
None of these macros may be redefined by a program.
A standardized library of routines aids the programmer and enhances portability. The Standard defines such a library, which is too large to describe here in any detail. The Standard Library is based on the library compiled by /usr/group, a UNIX user's group, with all the UNIX dependencies deleted.
The Standard Library also provides a set of standard library headers. These headers provide function prototypes for the set of routines that make up the library and define commonly-used macros. As well, the functions and their prototypes have been changed so as to be invariant to the default promotions all are declared using promoted types (such as int and double) for parameters. Thus parameters passed to a library function will always be of the same type, regardless of whether a prototype is in scope or not.
Macros may also be defined in a header file to take the place of actual calls to library routines. However, the library routines themselves must exist as the macros may be subjected to an #undef directive by the user at any time.
Among the most notable additions to the library are variable argument handling, numeric limits information, and locale (the current environment) information.
K&R library functions have also been converted to the new style and syntax, so that malloc, for example, now returns a void * as opposed to a char *.
Numerous other minor changes and additions have occurred throughout the language:
Character | Trigraph |
# | ??= |
[ | ??( |
\ | ??/ |
] | ??) |
^ | ??' |
{ | ??< |
| | ??! |
} | ??> |
~ | ??- |
Identifiers in current programs that are now reserved by the Standard will have to be altered to be portable across compilers.