Origins of C languages

BCPL was designed by Martin Richards in the mid-1960s while visiting MIT and was used in the early 1970s for several interesting projects, including the Oxford [Stoy 72] OS6 operating system, and parts of the Alto seminal work at Xerox PARC [Thacker 79]. We became acquainted with this because the MIT CTSS system [Corbato 62] on which Richards worked was used for the development of Multics. The original BCPL compiler was transported to both the Multics system and the GE-635 GECOS system by Rum Canaday and others at Bell Labs [Canada 69]; During the final throes of Multics' life at Bell Labs, and immediately thereafter, it was the language of choice among the group of people who would later become involved with Unix.

BCPL, B, and C all fit firmly into the traditional procedural family of Fortran and Algol 60. They are particularly oriented towards system programming, are small and compactly described, and can be translated by simple compilers. They are close to the machine in that the abstracts they introduce are easily grounded in the concrete data types and operations provided by conventional computers and rely on library routines for input-output and other interactions with the operating system. With less success, library procedures are also used to specify interesting control structures, such as coroutines and process closures. At the same time, their abstractions lie at a sufficiently high level that it is possible, with care, to achieve portability between machines.

BCPL, B, and C differ syntactically in many details but are generally similar. Programs consist of a sequence of global statements and function (procedure) declarations. Procedures may be nested in BCPL, but may not refer to non-static objects defined in procedures. B and C avoid this restriction by imposing a more serious one: no nested procedures at all. Each language (except the earlier versions of B) recognizes a separate compilation and provides a means to include text from named files.

Several BCPL syntactic and lexical mechanisms are more elegant and regular than those of B and C. For example, the BCPL procedure and the data declarations have a more uniform structure and a more complete set of loop constructions is provided. Although the BCPL programs are theoretically supplied from an undelimited stream of characters, the clever rules allow most semicolons to be elided after statements that end on a line boundary.

B and C omit this convenience, and end most of the statements with semicolons. Despite the differences, the majority of BCPL statements and operators map directly to corresponding B and C.

Some of the structural differences between BCPL and B stemmed from the limitations of the intermediate memory. BCPL declarations, for example, may take the form

let P1 be command

and P2 be command

and P3 be command

...

The text of the program represented by the commands contains complete procedures. Subdeclarations are connected to and occurring at the same time, so the name P3 is known in the P1 procedure. In the same way, BCPL can package a group of declarations and statements into a value-giving expression, for example

E1 := valof ( declarations ; commands ; resultis E2 ) + 1

The BCPL compiler can easily handle such constructs by storing and analyzing the parsed representation of the entire program in memory before the output is generated. The storage limitations of the B compiler required a one-pass technique in which output was generated as soon as possible, and the syntactic redesign that made this possible was forwarded to C.

Some less pleasant aspects of BCPL were due to their technical problems and were consciously avoided in the design of B. For example, the BCPL uses a 'global vector' mechanism to communicate between separately compiled programs. The BCPL compiler can easily handle such constructs by storing and analyzing the parsed representation of the entire program in memory before the output is generated. The storage limitations of the B compiler required a one-pass technique in which output was generated as soon as possible, and the syntactic redesign that made this possible was forwarded to C. Some less pleasant aspects of BCPL were due to their technical problems and were consciously avoided in the design of B. For example, the BCPL uses a 'global vector' mechanism to communicate between separately compiled programs.

Other violins in the transition from BCPL to B have been introduced as a matter of taste, and some remain controversial, such as the decision to use a single character = for assignment instead of:=. Similarly, B uses/**/to enclose comments whereas BCPL uses/to ignore text up to the end of the line.

Here, the legacy of PL / I is evident. (C++ has resurrected the BCPL comment convention.) Fortran has influenced the syntax of the declarations: B declarations begin with a specifier like auto or static, followed by a list of names, and C has not only followed this style, but has also decorated it by putting its keyword type at the start of the declarations.

Not every difference between the BCPL language documented in Richards's book [Richards 79] and B was deliberate; we started with the earlier version of BCPL [Richards 67]. For example, the end case that escapes from the BCPL switch on the statement was not present in the language when we learned it in the 1960s, and so the overloading of the break keyword to escape from the B and C switch statement is due to diverging evolution rather than conscious change.

In contrast to the widespread syntax variation that occurred during the creation of B, the core semantic content of the BCPL — its type structure and expression evaluation rules — remained intact. Both languages are typeless or have a single data type, 'word,' or 'cell,' a fixed-length bit pattern. Memory in these languages consists of a linear array of such cells, and the meaning of the contents of the cells depends on the operator used. The + operator, for example, simply adds its operands using the integer add instruction of the machine, and the other arithmetic operations are equally unconscious of the actual meaning of their operands. Because memory is a linear array, the value in a cell can be interpreted as an index in this array, and BCPL supplies an operator for this purpose. It was spelled RV in the original language, and later! , while B is using the unary *. Thus, if p is a cell containing the index of (or the address of, or the pointer to) another cell, * p refers to the contents of the point-to-cell, either as the value in the expression or as the target of the assignment.

Because BCPL and B pointers are only integer indices in the memory array, their arithmetic is meaningful: if p is the address of the cell, then p+1 is the address of the next cell. This convention is the basis for array semantics in both languages. When one writes in the BCPL

let V = vec 10

or in B,

auto V[10];

The effect is the same: a cell named V is assigned, then another group of 10 contiguous cells is set aside, and the memory index of the first cells is set to V. By general rule, the expression B is set to B.

*(V+i)

Adds V and I and refers to the I location after V. Both BCPL and B each add a special notation to sweeten that array of accesses; in B, the equivalent expression is

V[i]

and in BCPL

V!i

Even at that time, this approach to arrays was unusual; C would later assimilate it in an even less conventional manner.

None of the BCPL, B, or C formats strongly support character data in the language; each handles strings much like integer vectors and complements the general rules with a few conventions. In both BCPL and B, a string denotes the address of a static area initialized with string characters packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count, and the strings are terminated by a special character that B spelled '*e.' This change was made partly to avoid limiting the length of the string caused by holding the count in an 8-bit or 9-bit slot and partly because, in our experience, keeping the count seemed less convenient than using the terminator.

In general, individual characters in the BCPL string were manipulated by spreading the string to another array, one character per cell, and then repackaging it later; B provided the corresponding routines, but people more often used other library functions that accessed or replaced individual characters in a string.

Search This Blog

Digital Evolution Report

What are the origins of C languages

Origins of C languages

Comments

Post a Comment

Popular posts from this blog

Climate Crisis and Innovation: Navigating Earth's Future

AI and Employment: Navigating the Changing Landscape of Work in 2024

Next.js Project File Structure (For Frontend): SVG Management Best Practices