Friday, June 26, 2020

Critiques on C Language

Critique
 

Two ideas are the most characteristic of C between the languages of its class: the relationship between arrays and pointers, and how the declaration syntax mimics the expression syntax. They are also among its most frequently criticized features, and often serve as stumbling blocks to beginners. In both cases, historical accidents or errors have exacerbated their difficulties. The most important of these was the tolerance of C compilers to type errors. As should be clear from the above history, C has evolved from a variety of languages. It did not suddenly appear to its early users and developers as a completely new language with its own rules; Instead, we had to continuously adapt the existing programs as the language was developed and make provision for an existing body of code. (Later, the ANSI X3J11 C standardization committee would have the same problem.)


Compilers in 1977, and even after that, did not complain about uses such as assigning between integers and pointers, or using objects of the wrong type to refer to the members of the structure. Although the language definition presented in the first edition of K&R was reasonably (though not entirely) consistent in its handling of the type rules, the book admitted that existing compilers did not enforce them. Besides, some of the rules designed to ease early transitions have contributed to later confusion. For example, empty square brackets in the declaration function int f(a) int

a[]; {....}

  

They are a living fossil, a remnant of NB 's way of declaring a pointer; an is, in this special case only, interpreted in C as a pointer. The notation survived partly because of compatibility, partly because of the rationalization that it would allow programmers to communicate to their readers an intention to pass a pointer generated from the array, rather than a reference to a single integer. Unfortunately, it serves as much to confuse the learner as it does to alert the reader.

 

In K&R C, it was the responsibility of the programmer to supply arguments of the proper type to a function call, and the existing compilers did not check for a type agreement. The failure of the original language to include the type of argument in the type of function signature was a significant weakness, indeed one that required the boldest and most painful innovation of the X3J11 committee to be remedied. The early design is explained (if not justified) by avoiding technological problems, in particular cross-checking between separate source files, and my incomplete assimilation of the implications of moving from untyped to typed language. The lint program mentioned above tried to ease the problem: among its other functions, lint checks the consistency and consistency of the entire program by scanning a set of source files, comparing the types of function arguments used in calls with those used in their definitions.


C Language

 

The syntax accident contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are needed to direct parsing.

 

For example, to distinguish indirection by a value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)()respectively. The style used in expressions is followed by a declaration, so that names can be declared

int *fp();

int (*pf)();

 

In more ornate but still realistic cases, things get worse:

int *(*pfp)();

 

It is a pointer to a function that returns a pointer to an integer. Two effects are occurring. Most importantly, C has a relatively rich set of ways to describe types (compared, say, with Pascal). Statements in languages as expressive as C — Algol 68, for example — describe objects that are equally difficult to understand, simply because the objects themselves are complex. A second effect is due to the syntax details. Statements in C must be read in an 'inside-out' style that many find difficult to grasp [Anderson 80]. Sethi [Sethi 81] noted that many of the nested statements and expressions would have become simpler if the indirect operator had been taken as a postfix operator instead of a prefix, but by then it was too late to change. Despite its difficulties, I believe that the C approach to declarations remains plausible, and I am comfortable with it; it is a useful unifying principle.

 

The other characteristic feature of C, its treatment of arrays, is more suspect on practical grounds, although it also has real virtues. Although the relationship between pointers and arrays is unusual, this can be learned. Moreover, language has considerable power to describe important concepts, such as vectors whose length varies over time, with only a few basic rules and conventions. In particular, character strings are handled by the same mechanisms as any another array, plus the convention that a null character will terminate a string. It is interesting to compare C's approach with that of two almost contemporary languages, Algol 68 and Pascal [Jensen 74].

 

Arrays in Algol 68 either have fixed limits or are 'flexible:' a considerable mechanism is required both in language definition and in compilers to accommodate flexible arrays (and not all compilers fully implement them.) Original Pascal had only fixed-size arrays and strings, and this proved to be confined to [Kernighan 81]. Later, this was partially fixed, although the resulting language is not yet universally available.

 

C treats strings as character arrays conventionally terminated by a marker. Apart from a specific rule on string literal initialization, string semantics are fully subsumed by more general rules governing all arrays and, as a result, the language is easier to describe and translate than one that incorporates a string as a unique data type. Some costs arise from its approach: certain string operations are more expensive than other designs because the application code or library routine must occasionally search for the end of a string. After all, few built-in operations are available, and because the burden of string management falls more on the user. C's approach to strings, however, works well.

 

mistakes


On the other hand, C's treatment of arrays in general (not just strings) has unfortunate implications for both optimization and future extensions. The prevalence of pointers in C programs, whether explicitly stated or derived from arrays, means that optimizer must be prudent and must use careful data flow techniques to achieve good results. Sophisticated compilers can understand what most pointers might change, but some important uses remain difficult to analyze. Functions with pointer arguments derived from arrays, for example, are difficult to compile into efficient vector machine code, because it is rarely possible to determine that one argument pointer does not overlap data that is also referred to by another argument or accessible externally. More fundamentally, the C definition so specifically describes the semantics of arrays that changes or extensions that treat arrays as more primitive objects, and allow operations on them as a whole, are difficult to fit into the existing language. Even extensions to allow the declaration and use of multidimensional arrays whose size is dynamically determined are not entirely straightforward [MacDonald 89] [Ritchie 90], although they would make it much easier to write numerical libraries in C. Thus, C covers the most important uses of strings and arrays resulting from a uniform and simple mechanism in practice but leaves problems for highly efficient implementations and extensions. 

 

There are, of course, many minor infelicities in the language and its description besides those discussed above. There are also general criticisms to be made, which go beyond detailed points. The most important of these is that language and its generally-expected environment are of little help in the writing of very large systems. The naming structure only provides two main levels, 'external' (visible everywhere) and 'internal' (within a single procedure). The intermediate level of visibility (within a single file of data and procedures) is weakly linked to the language definition. There is therefore little direct support for modularization, and project designers are forced to set up their conventions.

 

Similarly, C itself provides two storage duration: 'automatic' objects that exist while the control resides in or below the procedure, and 'static' objects that exist throughout the execution of the program. Off-stack, dynamically allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.

No comments:

Post a Comment

Thanks

Climate Crisis and Innovation: Navigating Earth's Future

Climate Change: Recent Events and Technological Solutions 1. The Escalating Climate Crisis The climate crisis has intensified in recent year...