B

THE PROGRAMMING LANGUAGE B

THE PROGRAMMING LANGUAGE B


S. C. Johnson
B. W. Kernighan
Bell Laboratories
Murray Hill, New Jersey


ABSTRACT

B is a computer language designed by D. M. Ritchie and K. L.
Thompson, for primarily non-numeric applications such as system
programming. These typically involve complex logical
decision-making, and processing of integers, characters, and bit strings.
On the H6070 TSS system, B programs are usually much easier to
write and understand than assembly language programs, and object
code efficiency is almost as good. Implementation of simple TSS
subsystems is an especially appropriate use for B.
This technical report contains a description of the MH-TSS
(Honeywell 6070) version of B (by S. C. Johnson), and a tutorial
introduction to most of the features of the language (by B. W.
Kernighan).


DMR note, June 1997:

This WWW page is a rendition of
Bell Laboratories Computing Science Technical
Report #8: The Programming Language B
, January, 1973.
It was scanned using Adobe OCR software and
the version here was edited by Dennis Ritchie.
It is divided into two sections, each in several formats:

For scholars, the page images are also available:

  • PDF Tutorial is a scanned PDF image of
    the tutorial. Caution: 1.2MB in size.
  • PDF Reference is a scanned PDF image of
    the reference. Caution: 1.4MB in size.

The document seems to exist only on (partially) original
paper printed on a Teletype model 37 terminal. It uses underlining
for emphasis. You need to look at the PDF scans to verify any
typos I might have introduced in cleaning up the OCR, which was
pretty good except where there was underlining or double-quote
characters; they tended to merge into the line above. I
avoided the urge to redact the original except for a few
obvious mistakes, in particular some missing semicolons
in the syntax for some of the commands.

When this CSTR was issued, which was probably
some months after the papers were written,
the use of B was growing
on the local Honeywell GCOS system.
Its time-sharing facility was
called MH-TSS here, and it was then the main computation
facility at Bell Labs in Murray Hill, NJ.
By this time, use of B in the early Unix system was already pretty
much at an end; early C had already taken over
(see The Development of the
C Language
). In fact, by September 1973, the operating system
had already been translated into C and most of the B utilities
converted.

The memos shown here were based on an earlier document,
User’s Reference to B,
by Ken Thompson.
The Unix dialect of B closely followed the Honeywell version
described here–the compiler front-ends were the same,
but of course the Unix system calls were present, and the
GCOS-specific I/O stuff absent. The basic library must
have looked very similar.

Indeed, there were never very many B programs on early Unix.
The compiler itself was written in B, and a few of the utilities,
for example the first /etc/glob, which expanded wild-card
characters for the shell.
This is because no compiler to machine code for the PDP-7 or PDP-11 was
ever built; the Unix B compilers produced interpreted, threaded
code that wasn’t efficient enough to write a whole system in.

On the other hand, B, with a real compiler, flourished in a modest way on the Honeywell
machines, as indicated by this CSTR.
Moreover, it had direct use and even progeny elsewhere, especially
at the University of Waterloo in Ontario.
It apparently lives on today: see
Thinkage Ltd. UW Tools Package,
for example.

A final note of possible historical interest or amusement:
so far as Brian and I can remember, the Tutorial contains the
first instance of a “Hello, world” program.

A TUTORIAL INTRODUCTION
TO THE LANGUAGE B

B. W. Kernighan
Bell Laboratories
Murray Hill, New Jersey

1. Introduction

B is a new computer language designed and implemented at Murray
Hill. It runs and is actively supported and documented on the
H6070 TSS system at Murray Hill.

B is particularly suited for non-numeric computations, typified
by system programming. These usually involve many complex logical decisions, computations on integers and fields of words,
especially characters and bit strings, and no floating point. B
programs for such operations are substantially easier to write
and understand than GMAP programs. The generated code is quite
good. Implementation of simple TSS subsystems is an especially
good use for B.

B is reminiscent of BCPL [2] , for those who can remember. The
original design and implementation are the work of K. L. Thompson
and D. M. Ritchie; their original 6070 version has been substantially improved by S. C. Johnson, who also wrote the runtime
library.

This memo is a tutorial to make learning B as painless as possible. Most of the features of the language are mentioned in passing, but only the most important are stressed. Users who would
like the full story should consult A User’s Reference to B on
MH-TSS, by S. C. Johnson [1], which should be read for details
anyway.

We will assume that the user is familiar with the mysteries of
TSS, such as creating files, text editing, and the like, and has
programmed in some language before.

Throughout, the symbolism (->n) implies that a topic will be
expanded upon in section n of this manual. Appendix E is an
index to the contents.

2. General Layout of Programs

   main( ) {
   -- statements --
   }

   newfunc(arg1, arg2) {
   -- statements --
   }

   fun3(arg) {
   -- more statements --
   }

All B programs consist of one or more “functions”, which are
similar to the functions and subroutines of a Fortran program, or
the procedures of PL/I.
main
is such a function, and in fact all
B programs must have a
main.
Execution of the program begins at
the first statement of
main,
and usually ends at the last.
main
will usually invoke other functions to perform its job, some coming from the same program, and others from libraries. As in Fortran, one method of communicating data between functions is by
arguments. The parentheses following the function name surround
the argument list; here
main
is a function of no arguments, indicated by ().
The {} enclose the statements of the function.

Functions are totally independent as far as the compiler is concerned, and
main
need not be the first,
although execution starts
with it. This program has two other functions:
newfunc
has two
arguments, and
fun3
has one. Each function consists of one or
more statements which express operations on variables and
transfer of control. Functions may be used recursively at little
cost in time or space (->30).

Most statements contain expressions, which in turn are made up of
operators, names, and constants, with parentheses to alter the
order of evaluation. B has a particularly rich set of operators
and expressions, and relatively few statement types.

The format of B programs is quite free; we will strive for a uniform style as a good programming practice. Statements can be
broken into more than one line at any reasonable point, i.e.,
between names, operators, and constants. Conversely, more than
one statement can occur on one line. Statements are separated by
semicolons. A group of statements may be grouped together and
treated as a single statement by enclosing them in {}; in this
case, no semicolon is used after the ‘}’ . This convention is
used to lump together the statements of a function.

3. Compiling and Running B Programs

   (a) SYSTEM? filsys cf bsource,b/1,100/<br />   (b) SYSTEM? filsys cf bhs,b/6,6/,m/r/<br />   (c) SYSTEM? [create source program with any text editor]<br />   (d) SYSTEM? ./bj (w) bsource bhs<br />   (e) SYSTEM? /bhs<br />

(a) and (b) need only be done once; they create file space for
the source program and the loaded program. (d) compiles the
source program from
bsource,
and after many machinations, creates a program in
bhs.
(e) runs that program. ./bj is more fully
described in [1].

You may from time to time get error messages from the B compilers
These look like either of

   ee n<br />   ee name n<br />

where ee is a two-character error message, and
n
is the approximate source line number of the error. (->Appendix A)

4. A Working B Program; Variables

main( ) {<br />  auto a, b, c, sum;<br /><br />  a = 1; b = 2; c = 3;<br />  sum = a+b+c;<br />  putnumb(sum);<br />}<br />

This simple program adds three numbers, and prints the sum. It
illustrates the use of variables and constants, declarations, a
bit of arithmetic, and a function call. Let us discuss them in
reverse order.

putnumb
is a library function of one argument, which will print a
number on the terminal (unless some other destination is specified (->28)).
The code for a version of
putnumb
can be found in
(->30). Notice that a function is invoked by naming it, followed
by the argument(s) in parentheses. There is no CALL statement as
in Fortran.

The arithmetic and the assignment statements are much the same as
in Fortran, except for the semicolons. Notice that we can put
several on a line if we want. Conversely, we can split a statement among lines if it seems desirable – the split may be between
any of the operators or variables, but not in the middle of a
variable name.

One major difference between B and Fortran is that B is a
typeless
language: there is no analog in B of the Fortran IJKLMN
convention. Thus a, b, c, and sum are all 36-bit quantities, and
arithmetic operations are integer. This is discussed at length
in the next section (->5).

Variable names have one to eight ascii characters, chosen from
A-Z, a-z, ., _, 0-9, and start with a non-digit. Stylistically,
it’s much better to use only a single case (upper or lower) and
give functions and external variables (->7) names that are unique
in the first six characters. ( Function and external variable
names are used by batch GMAP, which is limited to six character
single-case identifiers. )

The statement “auto …” is a declaration, that is, it defines
the variables to be used within the function. auto in particular
is used to declare local variables, variables which exist only
within their own function
(main
in this case). Variables may be
distributed among auto declarations arbitrarily,
but all declarations must precede executable statements.
All variables must be
declared, as one of auto, extrn (->7), or implicitly as function arguments (->8).

auto
variables are different from Fortran local variables in one
important respect – they appear when the function is entered, and
disappear when it is left. They cannot be initialized at compile
time, and they have no memory from one call to the next.

5. Arithmetic; Octal Numbers

All arithmetic in B is integer, unless special functions are
written. There is no equivalent of the Fortran IJKLMN convention, no floating point, no data types, no type conversions, and
no type checking. Users of double-precision complex will have to
fend for themselves. (But see ->32 for how to call Fortran
subroutines.)

The arithmetic operators are the usual ‘+’, ‘-’, ‘*’, and ‘/’
(integer division). In addition we have the remainder operator
‘%’:

   x = a%b<br />

sets x to the remainder after a is divided by b (both of which
should be positive).

Since B is often used for system programming and bit-manipulation, octal numbers are an important part of the
language. The syntax of B says that any number that begins with
0 is an octal number (and hence can’t have any 8′s or 9′s in it).
Thus 0777 is an octal constant, with decimal value 511.

6. Characters; putchar; Newline

main( ) {<br />  auto a;<br />  a= 'hi!';<br />  putchar(a);<br />  putchar('*n' );<br />}<br />

This program prints “hi!” on the terminal it illustrates the use
of character constants and variables. A “character” is one to
four ascii characters, enclosed in single quotes. The characters
are stored in a single machine word, right-justified and zero-filled. We used a variable to hold “hi!”, but could have used a
constant directly as well.

The sequence “*n” is B Jargon for “newline character”, which,
when printed, skips the terminal to the beginning of t:e next
line. No output occurs until putchar encounters a “*n” or the
program terminates. Omission of the “*n”, a common error, causes
all output to appear on one line, which is usually not the
desired result. There are several other escapes like “*n” for
representing hard-to-get characters (->Appendix B). Notice that
“*n” is a single character: putchar(‘hi!*n’) prints
four characters (the maximum with a single call).

Since B is a typeless language, arithmetic on characters is quite
legal, and even makes sense sometimes:

c = c+'A'-'a';<br />

converts a single character stored in c to upper case (making use
of the fact that corresponding ascii letters are a fixed distance
apart).

7. External Variables

main( ) {<br /> extrn a, b, c;<br /> putchar(a); putchar(b); putchar(c); putchar('!*n');<br />}<br /><br />a 'hell';<br />b 'o, w';<br />c 'orld';<br />

This example illustrates
external variables,
variables which are
rather like Fortran COMMON, in that they exist external to all
functions, and are (potentially) available to all functions. Any
function that wishes to access an external variable must contain
an
extrn
declaration for it. Furthermore, we must define all
external variables outside any function% For our example
variables a, b, and c, this is done in the last three lines.

External storage is static, and always remains in existence, so
it can be initialized. (Conversely,
auto
storage appears
whenever the function is invoked, and disappears when the
function is left.
auto
variables of the same name in different
functions are unrelated. ) Initialization is done as shown: the
initial value appears after the name, as a constant. We have
shown character constants; other forms are also possible. External variables are initialized to zero if not set explicitly.

8. Functions

main( ) {<br />  extrn a,b,c,d;<br />  put2char(a,b) ;<br />  put2char(c,d) ;<br />}<br /><br />put2char(x,y) {<br />putchar(x);<br />putchar(y);<br />}<br /><br />a 'hell'; b 'o, w'; c 'orld'; d '!*n';<br />

This example does the same thing as the last, but uses a separate
(and quite artificial) function of two arguments,
put2char.
The
dummy names x and y refer to the arguments throughout putchar;
they are not declared in an auto statement.
We could equally
well have made a,b,c,d
auto
variables, or have called
put2char
as

put2char( 'hell','o, w');<br />  etc.<br />

Execution of
put2char
terminates when it runs into the closing ‘}’,
as in
main;
control returns to the next statement of
the calling program.

9. Input; getchar

main( ) {<br />  auto c;<br />  c= getchar();<br />  putchar(c);<br />}<br />

getchar
is another library IO function. It fetches one character
from the input line each time it is called, and returns that
character, right adjusted and zero filled, as the value of the
function. The actual
implementation of
getchar
is to read an
entire line, and then
hand out the characters one at a time.
After the ‘*n’ at the
end of the line has been returned, the next
call to getchar will cause an entire new line to be read, and its
first character returned. Input is generally from the terminal (->28).

10. Relational Operators

	== equal to (".EQ." to Fortraners)<br />	!= not equal to<br />	> greater than<br />	< less than<br />	>= greater than or equal to<br />	<= less than or equal to<br />

These are the relational operators; we will use ‘!=’ (“not equal
to) in the next example. Don’t forget that the equality test is
‘==’; using just one ‘=’ leads to disaster of one sort or another. Note: all comparisons are arithmetic, not logical.

11 if; goto; Labels; Comments

main( ) {<br />  auto c;<br />read:<br />  c= getchar();<br />  putchar(c);<br />  if(c != '*n') goto read;<br />}               /* loop if not a newline */<br />

This function reads and writes one line. It illustrates several
new things. First and easiest is the comment, which is anything
enclosed in /*…*/, just as in PL/I.

read
is a label – a name followed by a colon ‘:’, also just as in
PL/I. A statement can have several labels if desired. (Good
programming practice dictates using few labels, so later examples
will strive to get rid of them.) The goto references a label in
the same function.

The
if
statement is one of four conditionals in B. (while (->14),
switch (->26), and “?:” (->16) are the others.) Its general form is

	if (expression) statement<br />

The condition to be tested is any expression enclosed in
parentheses. It is followed by a statement. The expression is
evaluated, and if its value is non-zero, the statement is
executed. One of the nice things about B is that the statement
can be made arbitrarily complex (although we’ve used very simple
ones here) by enclosing simple statements in {} (->12)

12. Compound Statements

      if(a<b) {<br />	t = a;<br />	a = b;<br />	b = t;<br />      }<br />

As a simple example of compound statements, suppose we want to
ensure that a exceeds b, as part of a sort routine. The interchange of a and be takes three statements in B; they are grouped
together by {} in the example above.

As a general rule in B, anywhere you can use a simple statement,
you can use any compound statement, which is just a number of
simple or compound ones enclosed in {}. Compound statements are
not followed by semicolons.

The ability to replace single statements by complex ones at will
is one thing that makes B much more pleasant to use than Fortran.
Logic which requires several GOTO’s and labels in Fortran can be
done in a simple clear natural way using the compound statements
of B.

13. Function Nesting

        read:<br />           c = putchar(getchar());<br />           if (c != '*n') goto read;<br />

Functions which return values (as all do, potentially ->24) can
be used anywhere you might normally use an expression. Thus we
can use
getchar
as the argument of
putchar,
nesting the functions.
getchar
returns exactly the character that
putchar
needs
as an argument.
putchar
also passes its argument back as a
value, so we assign this to c for later reference.

Even better, if c won’t be needed later, we can say

         read:<br />          if (putchar(getchar()) != '*n') goto read;<br />

14. While statement; Assignment within an Expression

     while ((c=getchar()) != '*n') putchar(c);<br />

Avoiding
isgoto‘s
Thisgoto‘s.
the previous one. (The difference? This one doesn’t print the
terminating ‘*n’.) The
while
statement is basically a loop statement, whose general form is

   while (expression) statement<br />

As in the
if
statement, the expression and the statement can both
be arbitrarily complicated, although we haven’t seen that yet.
The action of while is

    (a) evaluate the expression<br />    (b) if its value is true (i. e., not zero), do the<br />        statement and return to (a)<br />

Our example gets the character and assigns it to c, and tests to
see if it’s a ‘*n’. If it isn’t, it does the statement part of
the
while,
a
putchar,
and then repeats.

Notice that we used an assignment statement “c=getchar()’” within
an expression. This is a handy notational shortcut, usually produces better code, and is sometimes necessary if a statement is
to be compressed. This works because an assignment statement has
a value, just as any other expression does. This also implies
that we can successfully use multiple assignments like

  x = y = f(a)+g(b)<br />

As an aside, the extra pair of parentheses in the assignment
statement within the conditional were really necessary: if we had
said

  c = getchar() != '*n'<br />

c would be set to 0 or 1 depending on whether the character
fetched was a newline or not. This is because the binding
strength of the assignment operator ‘=’ is lower than the relational operator ‘!=’. (-> Appendix D)

15. More on Whiles; Null Statement; exit

main( ) {<br />  auto c;<br />  while (1) {<br />    while ( (c=getchar()) != ' ')<br />      if (putchar(c) == '*n') exit();<br />    putchar( '*n' );<br />    while ( (c=getchar()) == ' '); /* skip blanks */<br />    if (putchar(c)=='*n') exit(); /* done when newline */<br />    }<br />}<br />

This terse example reads one line from the terminal, and prints
each non-blank string of characters on a separate line: one or
more blanks are converted into a single newline.

Line four fetches a character, and if it is non-blank, prints it,
and uses the library function
exit
to terminate execution if it’s
a newline. Note the use of the assignment within an expression
to save the value of c.

Line seven skips over multiple blanks. It fetches a character,
and if it’s blank, does the statement ‘;’. This is called a null
statement – it really does nothing. Line seven is just a very
tight loop, with all the work done in the expression part of the
while clause.

The slightly odd-looking construction

  while(1){<br />    ---<br />  }<br />

is a way to get an infinite loop. “1″ is always non-zero; so the
statement part (compound here!) will always be done. The only
way to get out of the loop is to execute a goto (->11) or break
(->26), or return from the function with a
return
statement
(->24), or terminate execution by
exit.

16. Else Clause; Conditional expressions

      if (expression) statement else statement2<br />

is the most general form of the
if
- the
else
part is sometimes
useful. The canonical example sets x to the minimum of a and b:

      if (a<b) x=a; else x=b;<br />

If’s and else’s can be used to construct logic that branches one
of several ways, and then rejoins, a common programming structure, in this way:

      if(---)<br />        { -----<br />          ----- }<br />       else if(---)<br />        { -----<br />          ----- }<br />       else if(----)<br />        { -----<br />          ----- }<br />

etc.

The conditions are tested in order, and exactly one block is executed – the first one whose if is satisfied. When this block is
finished, the next statement executed is the one after the last
“else if” block.

B provides an alternate form of conditional which is often more
concise and efficient. It is called the “conditional expression”
because it is a conditional which actually has a value and can be
used anywhere an expression can. The last example can best be
expressed as

 x = a<b ? a : b;<br />

The meaning is: evaluate the expression to the left of ‘?’. If it
is true, evaluate the expression between ‘?’ and ‘:’ and assign
that value to x. If it’s false, evaluate the expression to the
right of ‘:’ and assign its value to x.

17. Unary Operators

  auto k;<br />  k = 0;<br />  while (getchar() != '*n') ++k;<br />

B has several interesting unary operators in addition to the
traditional ‘-’. The program above uses the increment operator
‘++’ to count the number of characters in an input line. “++k”
is equivalent to “k=k+1″ but generates better code. ‘++’ man
also be used after a variable, as in

          k++<br />

The only difference is this. suppose k is 5. Then

            x = ++k<br />

sets k to 6 and then sets x to k, i.e., to 6. But

             x = k++<br />

sets x to 5, and then k to 6.

Similarly, the unary decrement operator ‘–’ can be used either
before or after a variable to decrement by one.

18. Bit Operators

   c = o;<br />   i = 36;<br />   while ((i = i-9) >= 0) c = c | getchar()<<i;<br />

B has several operators for logical bit-manipulation. The example above illustrates two, the OR ‘|’ , and the left shift <<.
This code packs the next four 9-bit characters from the input
into the single 36-bit word c, by successively left-shifting the
character by i bit positions to the correct position and OR-ing
it into the word.

The other logical operators are AND ‘&’, logical right shift
‘>>’, exclusive OR ‘^’, and 1′s complement ‘~’.

  x = ~y&0777<br />

sets x to the last 9 bits of the 1′s complement of y, and

 x = x&077<br />

replaces x by its last six bits.

The order of evaluation of unparenthesized expressions is determined by the relative binding strengths of the operators involved
(-> Appendix D).

19. Assignment operators

An unusual feature of B is that the normal binary operators like
‘+’, ‘-’, etc. can be combined with the assignment operator =
to form new assignment operators. For example,

  x =- 10<br />

means “x=x-10″. ‘=-’ is an assignment operator. This convention
is a useful notational shorthand, and usually makes it possible
for B to generate better code. But the spaces around the operator are critical! For instance,

  x = -10<br />

sets x to -10, while

  x =- 10<br />

subtracts 10 from x. When no space is present,

  x=-10<br />

also decreases x by 10. This is quite contrary to the experience
of most programmers.

Because assignment operators have lower binding strength than any
other operator, the order of evaluation should be watched
carefully:

  x = x<<y|z<br />

means “shift x left y places, then OR in z, and store in x”. But

  x =<< y|z<br />

means “shift x left by y|z places”, which is rather different.

The character-packing example can now be written better as

  c= 0;<br />  i = 36;<br />  while ((i =- 9) >= 0) c =| getchar()<<i;<br />

20. Vectors

A vector is a set of consecutive words, somewhat like a one-dimensional array in Fortran. The declaration

  auto v[10];<br />

allocates 11 consecutive words? v[0], v[1], …, v[10] for v.
Reference to the i-th element is made by v[i] ; the [] brackets
indicate subscripting. (There is no Fortran-like ambiguity
between subscripts and function call.)

  sum = o;<br />  i = 0;<br />  while (i<=n) sum =+ V[i++];<br />

This code sums the elements of the vector v. Notice the increment
within the subscript – this is common usage in B. Similarly, to
find the first zero element of v,

  i = -1;<br />  while (v[++i]);<br />

21. External Vectors

Vectors can of course be external variables rather than
auto.
We declare v external within the function (don’t mention a size) by

  extrn v;<br />

and then outside all functions write

  v[10];<br />

External vectors may be initialized just as external simple
variables:

  v[10]  'hi!', 1, 2, 3, 0777;<br />

sets v[0] to the character constant ‘hi!’, and V[1] through v[4]
to various numbers. v[5] through v[10] are not initialized.

22. Addressing

To understand all the ramifications of vectors, a digression into
addressing is necessary. The actual implementation of a vector
consists of a word v which contains in its right half the address
of the 0-th word of the vector; i.e., v is really a pointer to
the actual vector.

This implementation can lead to a fine trap for the unwary. If
we say

  auto u[10], v[10];<br />  ---<br />  u = v;<br />  v[0] = ...<br />  v[1] = ...<br />  ---<br />

the unsuspecting might believe that u[0] and u[1] contain the old
values of v[0] and v[1]; in fact, since u is just a pointer to
v[0] , u actually refers to the new content. The statement “u=v”
does not cause any copy of information into the elements of u;
they may indeed be lost because u no longer points to them.

The fact that vector elements are referenced by a pointer to the
zeroeth entry makes subscripting trivial – the address of v[i] is
simply v+i. Furthermore, constructs like v[i][j] are quite
legal: v[i] is simply a pointer to a vector, and v[i][j] is its
j-th entry. You have to set up your own storage, though.

To facilitate manipulation of addresses when it seems advisable,
B provides two unary ad~ress operators, ‘*’ and ‘&’. ‘&’ is the
address operator so &x is the address of x, assuming it has
one. ‘*’ is the indirection operator; “*x” means “use the content of x as an address.”

Subscripting operations can now be expressed differently:

  x[0]  is equivalent to *x<br />  x[1]  is equivalent to *(x+1)<br />  &x[0] is equivalent to x<br />

23. Strings

A string in B is in essence a vector of characters. It is written as

  "this is a string"<br />

and, in internal representation, is a vector with characters
packed four to a word, left justified, and terminated by a mysterious character known as ‘*e’ (ascii EOT). Thus the string

  "hello"<br />

has six characters, the last of which is ‘*e’. Characters are
numbered from zero. Notice the difference between strings and
character constants. Strings are double-quoted, and have zero or
more characters; they are left justified and terminated by an
explicit delimiter, ‘*e’. Character constants are single-quoted,
have from one to four characters, and are right justified and
zero-filled.

Some characters can only be put into strings by “escape sequences” (-> Appendix B ). For instance, to get a double quote
into a string, use ‘*”‘, as in

  "He said, *"Let's go.*""<br />

External variables may be initialized to string values simply by
following their names by strings, exactly as for other constants.
These definitions initialize a, b, and three elements of v:

   a "hello"; b "world'*;<br />   v[2] "now is the time", "for all good men",<br />       "to come to the aid of the party";<br />

B provides several library functions for manipulating strings and
accessing characters within them in a relatively machine-independent way.
These are all discussed in more detail in section 8.2 of [1]. The two most commonly used are
char
and
lchar.
char(s,n) returns the n-th character of s, right justified and
zero-filled. lchar(s,n,c) replaces the n-th character of s by c,
and returns c. (The code for
char
is given in the next section.)

Because strings are vectors, writing “s1=s2″ does not copy the
characters in s2 into s1, but merely makes s1 point to s2. To
copy s2 into s1, we can use char and lchar in a function like
strcopy:

   strcopy(sl ,s2){<br />   auto i;<br />   i = 0;<br />   while (lchar(sl,i,char(s2,i)) != '*e') i++;<br />  }<br />

24. Return statement

How do we arrange that a function like char returns a value to
its caller? Here is
char:

   char(s, n){<br />     auto y,sh,cpos;<br />     y = s[n/4];        /* word containing n-th char */<br />     cpos = n%4;        /* position of char in word */<br />     sh = 27-9*cpos;    /* bit positions to shift */<br />     y =  (y>>sh)&0777; /* shift and mask all but 9 bits */<br />     return(y);         /* return character to caller */<br />   }<br />

A
return
in a function causes an immediate return to the calling
program. If the
return
is followed by an expression in
parentheses, this expression is evaluated, and the value returned
to the caller. The expression can be arbitrarily complex, of
course:
char
is actually defined as

   char(s,n) return((s[n/4]>>(27-9*(n%4)))&0777);<br />

Notice that char is written without {}, because it can be expressed as a simple statement.

25. Other String Functions

concat(s,al,a2,…,a10) concatenates the zero or more strings ai
into one big string s, and returns s (which means a pointer to
S[0]). s must be declared big enough to hold the result
(including the terminating ‘*e’).

getstr(s) fetches the entire next line from the input unit into
s, replacing the terminating ‘*n’ by ‘*e’. In B, it is

   getstr(s){<br />   auto c,i;<br />   while ((c=getchar()) != '*n') lchar(s,i++,c);<br />   lchar(s,i,'*e');<br />   return(s) ;<br />   }<br />

Similarly putstr(s) puts the string s onto the output unit,
except for the final ‘*e’. Thus

   auto s[20] ;<br />   putstr(getstr(s)); putchar('*n');<br />

copies an input line to the output.

26. Switch statement; break

     loop:<br />       switch (c=getchar()){<br /><br />        pr:<br />              case 'p':  /* print */<br />              case 'P':<br />                print();<br />                goto loop;<br /><br />              case 's':  /* subs */<br />              case 'S':<br />                subs() ;<br />                goto pr;<br /><br />              case 'd':  /* delete */<br />              case 'D':<br />                delete();<br />                goto loop;<br /><br />              case '*n': /* quit on newline */<br />                break;<br /><br />              default:<br />                error();  /* error if fall out */<br />                goto loop;<br /><br />        }<br />

This fragment of a text-editor program illustrates the
switch
statement, which is a multi-way branch.
Here, the branch is controlled by a character read into c. If c is ‘p’ or ‘P’, function
print
is invoked. If c is an ‘s’ or ‘S’,
subs
and
print
are both
invoked. Similar actions take place for other values.
If c does
not mention any of the values mentioned in the case statements,
the default statement (in this case, an error function) is invoked.

The general form of a switch is

      switch (expression) statement<br />

The statement is always complex, and contains statements of the
form

         case constant:<br />

The constants in our example are characters, but they could also
be numbers. The expression is evaluated, and its value matched
against the possible cases. If a match is found, execution continues at that case. (Execution may of course fall through from
one case to the next, and may transfer around within the switch,
or transfer outside it.) Notice we put multiple cases on several
statements.

If no match is found, execution continues at the statement labeled
default:
if it exists; if there is no match and no
default,
the entire statement part of the switch is skipped.

The
break
statement forces an immediate transfer to the next
statement after the switch statement, i.e., the statement after
the {} which enclose the statement part of the
switch.
break
may also be used in while statements in an analogous manner.
break
is a useful way to avoid useless labels.

27. Formatted IO

In this section, we point to the function
printf,
which is used
for producing formatted IO.
This function is so useful that it
is actually placed in Appendix C for ready reference.

28. Lots More on IO

File IO is described in much more detail in [1], section 8. This
discussion does only the easy things.

Basically, all IO in B is character-oriented, and based on
getchar and putchar. By default, these functions refer to the
terminal, and IO goes there unless explicit steps are taken to
divert it. The diversion is easy and systematic, so IO unit
switching is quite feasible.

At any time there is one input unit and one output unit;
getchar
and
putchar
use these for their input and output by implication.
The unit is a number between -1 and 10 (rather like Fortran file
numbers). The current values are stored in the external variables
rd.unit
and
wr.unit
so they can be changed easily, although
most programs need not use them directly. When execution of a B
program begins,
rd.unit
is 0, which means “read from terminal”,
and
wr.unit
is -1, meaning “write on terminal”.

To do input from a file, we set
rd.unit
to a value, and assign a
file name to that unit. Then, until the assignment or the value
is changed,
getchar
reads its characters from that file. Thus,
to open for reading an ascii permanent file called “cat/file”, as
input unit 5,

   openr(5, "cat/file")<br />

The first argument is the unit number, and the second is a
cat/file description. (An empty string means the terminal.)
rd.unit
is set to 5 by
openr
so successive calls to
getchar,
getstr,
etc., will fetch characters from this file. If the file
is non-existent or if we read to end of file, all successive
calls produce ‘*e’.

Writing is much the same:

   openw(u,s)<br />

opens ascii file s as unit u for writing. Successive calls to
putchar,
putstr,
printf,
etc., will place output on file s.

If the IO unit mentioned in
openr
or
openw
is already open, it is
closed before the next use. That is, any output for output files
is flushed and end of file written; any input from input files is
thrown away. Files are also closed this way upon normal termination.

Programs which have only one input and one output unit open
simultaneously need not reference
rd.unit
and
wr.unit
explicitly,
since the opening routines will do It automatically. Otherwise,
don’t forget to declare them in an
extrn
statement.

The function
flush
closes the current output unit without writing
a terminating newline character. If this file is the terminal,
this is the way to force a prompting question before reading a
response on the same line:

   putstr("old or new?");<br />   flush();<br />   getstr(s);<br />

prints

<tt>         old or new?<br /></tt>

and reads the response from the same line.

File names may only be “cat/file”, “/file”, or “file”: no permissions, passwords subcatalogs or alternate names are allowed.
File names containing a ‘/’ are assumed to be permanent files; if
you try to open a permanent file which is already accessed, it is
released and reaccessed. If it can’t be found, the access fails.
A file name without ‘/’ may or may not be a permanent file, so if
a file of this name is already accessed, it is used without question. If it is not already accessed, a search of the user’s
catalog is made for it. If it can’t be found there either, and
writing is desired, a temporary file is made.

The function
reread
is used to re-read a line. Its primary function is that if it is called before the first call to
getchar,
it
retrieves the command line with which the program was called.

<tt>   reread();<br />   getstr(s);<br /></tt>

fetches the command line, and puts it into string s.

29. System Call from TSS

TSS users can easily execute other TSS subsystems from within a
running B program, by using the
library function
system.
For
instance,

<tt>   system("./rj (w) ident;file");<br /></tt>

acts exactly as if we had typed

<tt>   ./rj (w) ident;file<br /></tt>

in response to “SYSTEM?”. The argument of
system
is a string,
which is executed as if typed at “SYSTEM?” level. (No ‘*n’ is
needed at the end of the string.) Return is to the B program
unless disaster strikes. This feature permits B programs to use
TSS subsystems and other programs to do major functions without
bookkeeping or core overhead.

We can also use
printf
(->Appendix C) to write on the
“system” IO
unit, which is predefined as -1. All output written on unit -1
acts as if typed at “SYSTEM?” level. Thus,

<tt>   extrn wr.unit;<br />   ---<br />   wr.unit = -1;<br />   opts = "(w)";<br />   fname = "file";<br />   printf("./rj %s ident;%s*n", opts,fname');<br /></tt>

will set the current output unit to -1, the system, and then
print on it using
printf
to format the string. This example does
the same thing as the last, but the RUNJOB options and the file
names are
variables, which can be changed as desired. Notice that
this time a ‘*n’ is necessary to terminate the line.

Because of serious design failings in TSS, there is generally no
decent way to send more than the command line as input to a program called this way, and no way to retrieve output from it.

30. Recursion

<tt>   putnumb(n) {<br />     auto a;<br />     if(a=n/10)   /* assignment, not equality test */<br />        putnumb(a);  /* recursive */<br />        putchar(n%10 + '0');<br />    }<br /></tt>

This simplified version of the library function
putnum
illustrates the use of recursion. (It only works for n>0.) “a” is set
to the quotient of n/10; if this is non-zero, a recursive call on
putnumb is made to print the quotient. Eventually the high-order
digit of n will be printed, and then as each invocation
of putnumb is popped up,
the next digit will be added on.

31. Argument Passing for Functions

There is a subtlety in function usage which can trap the unsuspecting Fortran programmer. Argument passing is call by
value”, which means that the called function is given the actual
values of its arguments, and doesn’t know their addresses. This
makes it very hard to change the value of one of the input
arguments!

For instance,
consider a function flip(x,y) which is to interchange its arguments.
We might naively write

<tt>      flip(x,y){<br />         auto t;<br />         t = x;<br />         x = y;<br />         y = t;<br />       }<br /></tt>

but this has no effect at all.
(Non-believers: try it!) What we
have to do is pass the arguments as explicit addresses, and use
them as addresses.
Thus we invoke
flip
as

<tt>   flip(&a,&b);<br /></tt>

and the definition of flip is

<tt>   flip(x,y){<br />   auto t;<br />    t = *y;<br />    *x = *y;<br />    *y = t;<br />    }<br /></tt>

(And don’t leave out the spaces on the right of the equal signs.)
This problem doesn’t arise if the arguments are vectors, since a
vector is represented by its address.

32. Calling Fortran and GMAP Subroutines from B Programs

As we mentioned, B doesn’t have floating
point, multi-dimensional arrays,
and similar things that Fortran does better. A B program,
callf,
allows B programs to call Fortran and GMAP
subroutines, like this:

<tt>   callf(name, a1,a2, . . .,a10)<br /></tt>

Here
name
is the Fortran function or subroutine to be used, and
the a’s are the zero or more arguments of the subroutine.

There are some restrictions on
callf,
related to the “call by
value” problem mentioned above. First, variables that are not
vectors or strings must be referenced by address: use “&x” instead of “x” as an argument. Second, no constants are allowed
among the arguments, because the construction “&constant” is
illegal in B. We can’t say

<tt>   tod = callf(itimez,&0)<br /></tt>

to get the time of day; instead we have to use

<tt>   t=0;<br />   tod = callf(itimez,&t) ;<br /></tt>

Third, it is not possible to return floating-point function
values from Fortran programs, since the results are in the wrong
register. They must be explicitly passed back as arguments. Thus
we can’t say

<tt>   s  = callf(sin,&x)<br /></tt>

but rather

<tt>   callf(sin2,&x,&s)<br /></tt>

where sin2 is defined as

<tt>   subroutine sin2(x,y)<br />   y = sin(x)<br />   return<br />   end<br /></tt>

Fourth,
callf
isn’t blindingly efficient, since there’s a lot of
overhead converting calling sequences. (Very short GMAP routines
to interface with B programs can often be coded quite efficiently
see [1], section 11.)

1273-bwk-pdp

B. W. KERNIGHAN

Attached

References

Appendix A,B,C,D

References

[1]
Johnson, S. C. A User’s Reference to B on MH-TSS. Bell
Laboratories memorandum.
[2]
Canaday, R. H., Ritchie, D. M. Bell Laboratories BCPL. Bell
Laboratories memorandum.

Appendix A: B Compiler Diagnostics

Diagnostics consist of two letters, an optional name, and a
source line number. Because of the free form input, the source
line number may be too high.

<tt>$)	-	{ } imbalance<br />()	-	( ) imbalance<br />*/	-	/**/ imbalance<br />[]	-	[] imbalance<br />>c	-	case table overflow (fatal)<br />>e	-	expression stack overflow (fatal)<br />>i	-	label table overflow (fatal)<br />>s	-	symbol table overflow (fatal)<br />ex	-	expression syntax error<br />lv	-	address expected<br />rd	name	name redeclaration - multiple definition<br />sx	keyword	statement syntax error<br />un	name	undefined name -  line number is last line<br />		of function where definition is missing<br />xx	-	syntax error in external definition<br /></tt>

Appendix B: Escape Sequences and Typography

Escape sequences are used in character constants and strings to
obtain characters which for one reason or another are hard to
represent directly. They consist of ‘*-’, where ‘-’ is as below:

<tt>	*0 	null (ascii NUL = 000)<br />	*e	end of string (ascii EOT = 004)<br />	*(	{<br />	*)	}<br />	*t	tab<br />	**	*<br />	*'	'<br />	*"	"<br />	*n	newline<br /></tt>

Users of non-ascii devices like model 33/35 Teletypes or IBM
2741′s should use a reasonably clever text-editor like QED to
create special characters for B programs.

Appendix C: printf for Formatted IO

printf(fmt,a1,a2,a3,.. . ,a10)

This function writes the arguments ai (from 0 to 10 of which may
be present) onto the current output stream under control of the
format string
fmt.
The format conversion is controlled by
two-letter sequences of the form ‘%x’ inside the string
fmt,
where x
stands for one of the following:

<tt>c -- character data (ascii)<br />d -- decimal number<br />o -- octal number<br />s -- character string<br /></tt>

The characters in
fmt
which do not appear in one of these two
character sequences are copied without change to the output.

Thus the call

<tt>   printf("%d + %o is %s or %c*n", 1,-1,"zero",'0');<br /></tt>

leads to the output line

<tt>   1 + 777777777777 is zero or 0<br /></tt>

As another example, a permanent file whose name is in the string
s can be created with size n blocks and general read permission
by using
printf
together with the system output unit (->29) and
the “filsys” subsystem:

<tt>   wr.unit = -1 ;<br />   printf("filsys cf %s,b/%d/,r*n",s,n);<br /></tt>

Appendix D: Binding Strength of Operators

Operators are listed from highest to lowest binding strength;
there is no order within groups. Operators of equal strength
bind left to right or right to left as indicated.

<tt>     ++  --  *  &  -  ! ~ (unary) [RL]<br />     *  /  % (binary)    [LR]<br />     +  -  [LR]<br />     >>  << [LR]<br />     >  <  <=  >=<br />     ==   !=<br />     &   [LR]<br />     ^   [LR]<br />     |   [LR]<br />     ?:  [RL]<br />     =   =+  =-  etc. (all assignment operators) [RL]<br /></tt>

Appendix E: Index

By programming on January 31, 2011 | B