# HALSTEAD’S SOFTWARE SCIENCE—AN ANALYTICAL TECHNIQUE

**HALSTEAD’S SOFTWARE SCIENCE—AN ANALYTICAL TECHNIQUE **

Halstead’s software science is an analytical technique to measure size, development effort, and development cost of software products.

Halstead used a few primitive program parameters to develop the expressions for overall program length, potential minimum volume, actual volume, language level, effort, and development time.

For a given program,

let: h1 be the number of unique operators used in the program,

h2 be the number of unique operands used in the program,

N1 be the total number of operators used in the program,

N2 be the total number of operands used in the program.

Although the terms operators and operands have intuitive meanings, a precise definition of these terms is needed to avoid ambiguities.

But, unfortunately we would not be able to provide a precise definition of these two terms.

There is no general agreement among researchers on what is the most meaningful way to define the operators and operands for different programming languages.

However, a few general guidelines regarding identification of operators and operands for any programming language can be provided.

For instance, assignment, arithmetic, and logical operators are usually counted as operators.

A pair of parentheses, as well as a block begin —block end pair, are considered as single operators.

A label is considered to be an operator, if it is used as the target of a GOTO statement.

The constructs if … then … else … endif and a while … do are considered as single operators.

A sequence (statement termination) operator ’;’ is considered as a single operator.

Subroutine declarations and variable declarations comprise the operands.

Function name in a function call statement is considered as an operator, and the arguments of the function call are considered as operands.

However, the parameter list of a function in the function declaration statement is not considered as operands. We list below what we consider to be the set of operators and operands for the ANSI C language.

Operators and Operands for the ANSI C language ( [ . , -> * + – ~ ! ++ — * / % + – << >> < > <= >= != == & ^ | && || = *= /= %= += -= <<= >>= &= ^= |= : ? { ; CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE BREAK RETURN and a function name in a function call Operands are those variables and constants which are being used with operators in expressions. Note that variable names appearing in declarations are not considered as operands.

**Example** **-1**

Consider the expression a = &b; a, b are the operands and =, & are the operators.

**Example**-2

The function name in a function definition is not counted as an operator. int func ( int a, int b ) { . . . } For the above example code, the operators are: {}, ( ) We do not consider func, a, and b as operands, since these are part of the function definition.

**Example**-3

Consider the function call statement: func (a, b);. In this, func ‘ ,’ and ; are considered as operators and variables a, b are treated as operands.

**Length and Vocabulary**

The length of a program as defined by Halstead, quantifies total usage of all operators and operands in the program.

Thus, length N = N1 + N2. Halstead’s definition of the length of the program as the total number of operators and operands roughly agrees with the intuitive notion of the program length as the total number of tokens used in the program.

The program vocabulary is the number of unique operators and operands used in the program. Thus, program vocabulary h = h1 + h2.

**Program Volume**

The length of a program (i.e., the total number of operators and operands used in the code) depends on the choice of the operators and operands used.

In other words, for the same programming problem, the length would depend on the programming style. This type of dependency would produce different measures of length for essentially the same problem when different programming languages are used.

Thus, while expressing program size, the programming language used must be taken into consideration: V = N log2 h

Let us try to understand the important idea behind this expression.

Intuitively, the program volume V is the minimum number of bits needed to encode the program.

In fact, to represent h different identifiers uniquely, we need at least log2 h bits (where h is the program vocabulary).

In this scheme, we need N log2 h bits to store a program of length N.

Therefore, the volume V represents the size of the program by approximately compensating for the effect of the programming language used.

**Potential Minimum Volume**

The potential minimum volume V* is defined as the volume of the most succinct program in which a problem can be coded.

The minimum volume is obtained when the program can be expressed using a single source code instruction, say a function call like foo();. In other words, the volume is bound from below due to the fact that a program would have at least two operators and no less than the requisite number of operands.

Note that the operands are the input and output data items.

Thus, if an algorithm operates on input and output data d1, d2, … dn, the most succinct program would be f(d1, d2 , …, dn);

for which, h1 = 2, h2 = n.

Therefore,

V* = (2 + h2) log2 (2 + h2).

The program level L is given by

L = V*/V.

The concept of program level L has been introduced in an attempt to measure the level of abstraction provided by the programming language.

Using this definition, languages can be ranked into levels that also appear intuitively correct.

The above result implies that the higher the level of a language, the less effort it takes to develop a program using that language.

This result agrees with the intuitive notion that it takes more effort to develop a program in assembly language than to develop a program in a high-level language to solve a problem.

**Effort and Time**

The effort required to develop a program can be obtained by dividing the program volume with the level of the programming language used to develop the code.

Thus, effort E = V /L,

where E is the number of mental discriminations required to implement the program and also the effort required to read and understand the program.

Thus, the programming effort

E = V2/V* (since L = V*/V) varies as the square of the volume.

Experience shows that E is well correlated to the effort needed for maintenance of an existing program.

The programmer’s time T = E/S, where S is the speed of mental discriminations. The value of S has been empirically developed from psychological reasoning, and its recommended value for programming applications is 18.

**Length Estimation**

Even though the length of a program can be found by calculating the total number of operators and operands in a program, Halstead suggests a way to determine the length of a program using the number of unique operators and operands used in the program.

Using this method, the program parameters such as length, volume, cost, effort, etc., can be determined even before the start of any programming activity.

His method is summarised below. Halstead assumed that it is quite unlikely that a program has several identical parts— in formal language terminology identical substrings—of length greater than h(h being the program vocabulary). In fact, once a piece of code occurs identically at several places, it is usually made into a procedure or a function. Thus, we can safely assume that any program of length N consists of N/h unique strings of length h. Now, it is a standard combinatorial result that for any given alphabet of size K, there are exactly Kr different strings of length r. Thus, Since operators and operands usually alternate in a program, we can further refine the upper bound into N ≤ hh1h1 h2h3. Also, N must include not only the ordered set of N elements, but it should also include all possible subsets of that ordered set, i.e. the power set of N strings (This particular reasoning of Halstead is hard to justify!).

Experimental evidence gathered from the analysis of a large number of programs suggests that the computed and actual lengths match very closely. However, the results may be inaccurate when small programs are considered individually.

**Example**

Let us consider the following C program:

main() { int a,b,c,avg; scanf(“%d %d %d”,&a,&b,&c); avg=(a+b+c)/3; printf(“avg= %d”,avg); }

The unique operators are: main, (), {}, int, scanf, &, “,”, “;”, =, +, /, printf The unique operands are: a,b,c,&a,&b,&c,a+b+c,avg,3,”%d %d %d”, “avg=%d” Therefore, In conclusion,

Halstead’s theory tries to provide a formal definition and quantification of such qualitative attributes as program complexity, ease of understanding, and the level of abstraction based on some low-level parameters such as the number of operands, and operators appearing in the program.

Halstead’s software science provides gross estimates of properties of a large collection of software, but extends to individual cases rather inaccurately.