Difference between revisions of "Implementation"
(Created page with "==Coding Standards== When the design has been completed, coding begins. Code must be written in such a way that people can read it. Compare: void calc (double m[],char *g){...") |
Revision as of 20:31, 14 November 2013
Contents |
Coding Standards
When the design has been completed, coding begins. Code must be written in such a way that people can read it. Compare:
void calc (double m[],char *g){ double tm=0.0; for (int t=0;t<MAX_TASKS;t++) tm+=m[t]; int i=int(floor(12.0*(tm-MIN)/(MAX-MIN))); strcpy(g,let[i]);}
and
void calculateGrade (double marks[], char *grade) { double totalMark = 0.0; for (int task = 0; task < MAX_TASKS; task++) totalMark += marks[task]; int gradeIndex = int(floor(12.0 * (totalMark - minMarks) / (maxMarks - minMarks))); strcpy(grade, letterGrades[gradeIndex]); }
These two function definitions are identical as far as the compiler is concerned. For a human reader, however, the difference is like night and day. Although a programmer can understand what calc() is doing in a technical sense, the function has very little “meaning” to a person.
The second version of the function differs in only two ways from the first: it makes use of “white space” (tabs, line breaks, and blanks) to format the code, and it uses descriptive identifiers. With these changes, however, we can quite easily guess what the program is doing, and might even find mistakes in it without any further documentation.
In practice, we would include comments even in a function as simple as this. But note how the choice of identifiers reduces the need for comments. The three basic rules of coding are:
- Use a clear and consistent layout.
- Choose descriptive and mnemonic names for constants, types, variables, and functions.
- Use comments when the meaning of the code by itself is not completely obvious and unambiguous.
These are generalities; in the following sections we look at specific issues.
Layout
The most important rule for layout is that code must be indented according to its nesting level. The body of a function must be indented with respect to its header; the body of a for, while, or switch statement must be indented with respect to its first line; and similarly for if statements and other nested structures. You can choose the amount of indentation but you should be consistent. A default tab character (eight spaces) is too much: three or four spaces is sufficient. Most editors and programming environments allow you to set the width of a tab character appropriately. Bad indentation makes a program harder to read and can also be a source of obscure bugs. A programmer reading:
while (*p) p->processChar(); p++;
assumes that p++ is part of the loop. In fact, it isn’t, and this loop would cause the program to “hang” unaccountably. Although indentation is essential, there is some freedom in placement of opening and closing braces. Most experienced C++ programmers position the braces for maximum visibility, as in this example:
Entry *addEntry (Entry * & root, char *name) // Add a name to the binary search tree of file descriptors. { if (root == NULL) { root = new Entry(name); if (root == NULL) giveUp("No space for new entry", ""); return root; } else { int cmp = strcmp(name, root->getName()); if (cmp < 0) return addEntry(root->left, name); else if (cmp > 0) return addEntry(root->right, name); else // No action needed for duplicate entries. return root; } }
Other programmers prefer to reduce the number of lines by moving the opening braces (“{”) to the previous line, like this:
Entry *addEntry (Entry * & root, char *name) { // Add a name to the binary search tree of file descriptors. if (root == NULL) { root = new Entry(name); if (root == NULL) giveUp("No space for new entry", ""); return root; } else { int cmp = strcmp(name, root->getName()); if (cmp < 0) return addEntry(root->left, name); else if (cmp > 0) return addEntry(root->right, name); else // No action needed for duplicate entries. return root; } }
The amount of paper that you save by writing in this way is minimal and, on the downside, it is much harder to find corresponding pairs of braces. In addition to indentation, you should use blank lines to separate parts of the code. Here are some places where a blank line is often a good idea but is not essential:
- between method declarations in a class declaration;
- between variable declarations in a class declaration;
- between major sections of a complicated function.
Here are some places where some white space is essential. Put one or more blank lines between:
- public, protected, and private sections of a class declaration;
- class declarations (if you have more than one class declaration in a file, which is not usually a good idea);
- function and method declarations.
Naming Conventions
Various kinds of name occur within a program:
- constants;
- types;
- local variables;
- attributes (data members);
- functions;
- methods (member functions).
It is easier to understand a program if you can guess the kind of a name without having to look for its declaration which may be far away or even in a different file. There are various conventions for names. You can use:
- one of the conventions described here;
- your own convention;
- your employer’s convention.
You may not have an option: some employers require their programmers to follow the company style even if it is not a good style. As a general rule, the length of a name should depend on its scope. Names that are used pervasively in a program, such as global constants, must have long descriptive names. A name that has a small scope, such as the index variable of a one-line for statement, can be short: one letter is often sufficient.
Another rule that is used by almost all C++ programmers is that constant names are written with upper case letters and may include underscores.
Brown University Conventions
Brown University has a set of coding standards used for introductory software engineering courses. Most of the conventions are reasonable, but you can invent your own variations. Here are some of the key points of the “Brown Standard”, with my (P.G.) comments in [square brackets]:
File names use lower case characters only.
UNIX systems distinguish cases in file names: mailbox.h and MailBox.h are different files. One way to avoid mistakes is to lower case letters only in file names. Windows does not distinguish letter case in file names. This can cause problems when you move source code from one system to another. If you use lower case letters consistently, you should not have too many problems moving code between systems. Note, however, that some Windows programs generate default extensions that have upper case letters!
Types and classes start with the project name.
An abbreviated project name is allowed. For example, if you are working on a project called MagicMysteryTour, you could abbreviate it to MMT and use this string as a prefix to all type and class names: MMTInteger, MMTUserInterface, and so on. [I (P.G.) don’t think this is necessary for projects. The components of a project are usually contained within a single directory, or tree of directories, and this is sufficient indication of ownership. The situation is different for a library, because it must be possible to import library components without name collisions.]
Methods start with a lower case letter and use upper case letters to separate words.
Examples: getScore, isLunchTime. [I (P.G.) use this notation for both methods and attributes (see below). In the code, you can usually distinguish methods and attributes because method names are followed by parentheses.]
Attributes start with a lower case letter and use underscores to separate words.
Examples: start time, current task.
Class constants use upper case letters with underscores between words.
Examples: MAXIMUM TEMPERATURE, MAIN WINDOW WIDTH.
Global names are prefixed with the project name.
Example: MMTstandardDeviation. [I (P.G.) use the same convention for methods and global functions. When a method is used in its own class, this can lead to ambiguity. In many cases, however, you can recognize methods because they appear as o.method() or p->method(). My rule for global variables is avoid them.]
Local variables are written entirely in lower case without underscore.
Examples: index, nextitem.
Whatever convention you use, it is helpful to distinguish attributes from other variables. In Large-Scale C++ Software Design, John Lakos suggests prefixing all attributes with d_. This has several advantages; one of them is that it becomes easy to write constructors without having to invent silly variations. For example:
Clock::Clock (int hours, int minutes, int seconds) { d_hours = hours; d_minutes = minutes; d_seconds = seconds; }
Hungarian Notation
Hungarian notation was introduced at Microsoft during the development of OS/2. It is called “Hungarian” because its inventor, Charles Simonyi, is Hungarian. Also, identifiers that use this convention are hard to pronounce, like Hungarian words (if you are not Hungarian, that is).
If you do any programming in the Windows environment, you will find it almost essential to learn Hungarian notation. Even if you have no intention of ever doing any Windows programming, you should consider adopting this convention, because it is quite effective.
Hungarian variable names start with a small number of lower case letters that identify the type of the variable. These letters are followed by a descriptive name that uses an upper case letter at the beginning of each word.
For example, a Windows programmer knows that the variable lpszMessage contains a long p to a string terminated with a zero byte. The name suggests that the string contains a message of some kind. The following table shows some commonly used Hungarian prefixes.
prefix | meaning |
---|---|
c | character |
by | unsigned char or byte |
n | short integer (usually 16 bits) |
i | integer (usually 32 bits) |
x, y | integer coordinate |
cx, cy | integer used as length (“count”) in X or Y direction |
b | boolean |
f | flag (equivalent to boolean) |
w | word (unsigned short integer) |
l | long integer |
dw | double word (unsigned long integer) |
fn | function |
s | string |
sz | zero-terminated string |
h | handle (for Windows programming) |
p | pointer |
Commenting Conventions
Comments are an essential part of a program but you should not over use them. The following rule will help you to avoid comments: Comments should not provide information that can be easily inferred from the code. There are two ways of applying this rule.
Use it to eliminate pointless comments
counter++; // Increment counter.
// Loop through all values of index. for (index = 0; index < MAXINDEX; index++) { .... }
Use it to improve existing code.
When there is a comment in your code that is not pointless, ask yourself “How can I change the code so that this comment becomes pointless?” You can often improve variable declarations by applying this rule. Change
int np; // Number of pixels counted. int flag; // 1 if there is more input, otherwise 0. int state; // 0 = closed, 1 = ajar, 2 = open. double xcm; // X-coordinate of centre of mass. double ycm; // Y-coordinate of centre of mass.
to
int pixelCount; bool moreInput; enum { CLOSED, AJAR, OPEN } doorStatus; Point massCentre;
There should usually be a comment of some kind at the following places:
- At the beginning of each file (header or implementation), there should be a comment giving the project that the file belongs to and the purpose of this file.
- Each class declaration should have a comment explaining what the class is for. It is often better to describe an instance rather than the class itself:
class ChessPiece { // An instance of this class is a chess piece that has a position, // has a colour, and can move in a particular way. ....
- Each method or function should have comments explaining what it does and how it works. In many cases, it is best to put a comment explaining what the function does in the header file or class declaration and a different comment explaining how the function works with the implementation. This is because the header file will often be read by a client but the implementation file should be read only by the owner. For example:
class MathLibrary { public: // Return square root of x. double sqrt (double x); .... }; MathLibrary::sqrt (double x) { // Use Newton-Raphson method to compute square root of x. .... }
- Each constant and variable should have a comment explaining its role unless its name makes its purpose obvious.
Block Comments
You can use “block comments” at significant places in the code: for example, at the beginning of a file. Here is an example of a block comment:
/***************************************/ /* */ /* The Orbiting Menagerie */ /* Author: Peter Grogono */ /* Date: 24 May 1984 */ /* (c) Peter Grogono */ /* */ /***************************************/
Problems with block comments include:
- they take time to write;
- they take time to maintain;
- they become ugly if they are not maintained properly (alignment is lost, etc.);
- they take up a lot of space.
Nevertheless, if you like block comments and have the patience to write them, they effectively highlight important divisions in the code.
Inline Comments
“Inline comments” are comments that are interleaved with code in the body of a method or function. Use inline comments whenever the logic of the function is not obvious. Inline comments must respect the alignment of the code. You can align them to the current left margin or to one indent with respect to the current left margin. Comments must always confirm or strengthen the logical structure of the code; they must never weaken it. Here is an example of a function with inline comments.
void collide (Ball *a, Ball *b, double time) { // Process a collision between two balls.
// Local time increment suitable for ball impacts. double DT = PI / (STEPS * OM_BALL);
// Move balls to their positions at time of impact. a->pos += a->vel * a->impactTime; b->pos += b->vel * a->impactTime;
// Loop while balls are in contact. int steps = 0; while (true) {
// Compute separation between balls and force separating them. Vector sep = a->pos - b->pos; double force = (DIA - sep.norm()) * BALL_FORCE; Vector separationForce; if (force > 0.0) { Vector normalForce = sep.normalize() * force;
// Find relative velocity at impact point and deduce tangential force. Vector aVel = a->vel - a->spinVel * (sep * 0.5); Vector bVel = b->vel + b->spinVel * (sep * 0.5); Vector rVel = aVel - bVel; Vector tangentForce = rVel.normalize() * (BALL_BALL_FRICTION * force); separationForce = normalForce + tangentForce; }
// Find forces due to table. Vector aTableForce = a->ballTableForce(); Vector bTableForce = b->ballTableForce(); if ( separationForce.iszero() && aTableForce.iszero() && bTableForce.iszero() && steps > 0)
// No forces: collision has ended. break;
// Effect of forces on ball a. a->acc = (separationForce + aTableForce) / BALL_MASS; a->vel += a->acc * DT; a->pos += a->vel * DT; a->spin_acc = ((sep * 0.5) * separationForce + bottom * aTableForce) / MOM_INT; a->spinVel += a->spin_acc * DT; a->updateSpin(DT);
// Effect of forces on ball b. b->acc = (- separationForce + bTableForce) / BALL_MASS; b->vel += b->acc * DT; b->pos += b->vel * DT; b->spin_acc = ((sep * 0.5) * separationForce + bottom * bTableForce) / MOM_INT; b->spinVel += b->spin_acc * DT; b->updateSpin(DT);
steps++; }
// Update motion parameters for both balls. a->checkMotion(time); b->checkMotion(time); }
Standard Types
Type representations vary between C++ implementations. For example, the type int may be represented as 16, 32, or 64 bits. Consequently, it is common practice for experienced C++ programmers to define a set of types for a program and use them throughout the program. Here is one such set:
typedef int Integer; typedef char Boolean; typedef char Character; typedef double Float; typedef char * Text; typedef const char * ConstText;
To see why this is helpful, consider the following scenario. Programmer A develops a program using these types for a system with 64-bit integers. The program requires integer values greater than 2^32. Programmer B is asked to modify the source code so that the program will run on processors for which int is represented by 32-bit integers. Programmer B changes one line:
typedef long Integer;
and recompiles the program. Everything will work, provided that long is implemented using 64-bits. (If it isn’t, B can type long long instead.)
Other applications include: a programmer using an up to date compiler can redefine Boolean as bool (which is included in the C++ Standard); and can redefine Text to allow unicode characters. The need for typedef statements is perhaps less than it used to be, because there is less variation between architectures than there used to be. With the final demise of DOS and its segmented address space, most processors provide 32-bit integers and pointers. The distinction between 16 and 32 bits was important because 2^16 = 65,536 is not a large number. The distinction between 32 and 64 bits is less important since 2^32 = 4,294,967,296 is large enough for many purposes. Since the introduction of bool (and the constants true and false), it is not really necessary to define Boolean. Nevertheless, you may wish to use typedefs to improve the portability of your code.
Programming Class Declaration and Implementation Files
Some languages, e.g. C++, separate class implementation in two files:
- the header file that contains the class declaration. It is basically defining a prototype for the class and all its elements.
- the implementation file that contains the actual code defining the behavior of all operations of the class.
Both of these files have a different purpose, though it is possible to write all your code in either.
Programming Class Declaration Files
The class declaration file (in C++, ending with a .h suffix) is called the header file. It is supposed to contain a generic description, or template, of the class(es) implemented in the corresponding implementation file. It is actually a straight mapping of the class descriptions developed in the design phase. Its goal is to provide quick and essential reference information to both the compiler and the programmer.
It should be formatted in a very clear manner, so that programmers can use it for quick reference to all the members of the class(es) involved. This is where all class members (attributes and functions) are declared (but not implemented). To help making this file as readable as possible, and the corresponding class as usable and maintainable as possible, several concerns have to be taken care of:
- Data members should be private members. If some data is to be made accessible, either in read or write mode, implement public functions to get or set them.
- Operations not called externally should be declared private. Only allow other classes or components to access information that is really needed. This follows the very important concept of information hiding. Explicitly hiding information (attributes of functions) has two advantages: (1) it prevents the programmer to access some forbidden information in future development or maintenance; (2) it makes it clear to the programmer what is accessible and what should remain hidden.
- Data and operation members should be listed separately. Intermingling of attributes and functions can lead to confusion. I suggest you put attributes on top, followed by functions.
- Comments should be used to describe all members. Using meaningful identifiers is a good way to make things clear, but remember that the class declaration file is going to be used for reference. Clarity is one of its main important qualities.
- For function prototypes, the parameters should be named to provide understandability of purpose. Function prototypes are internal variables in the function, so equal care should be taken to name them appropriately, and consistently with any naming conventions used for regular variables.
- Operations that do not change the state of attributes should be declared explicitly as constant. This both provides valuable information to programmers in the future, and prevents them from changing the code by allowing it to change the state of the objects of the class.
- Constant function parameters should explicitly be declared as such. This both provides valuable information to programmers in the future, and prevents them from changing the code by allowing it to change the value of a parameter declared as constant.
- Constructor(s) and destructor functions should be redefined in most cases. C++ objects are very often relying on dynamic memory allocation. In all such cases, the default constructor and destructor cannot be used to create and destroy objects without memory leaks.
Programming Class Implementation Files
The class implementation file ((in C++, ending with a .cpp suffix) It is supposed to contain the implementation code of each of the functions declared in the corresponding header file. It is actually a straight mapping of the design algorithm for each function specified in the detailed design phase.
The main complexity of the programs developed is contained in these files. Code readability and understandability is a very important quality to reach easy code maintainability and reuse. It is thus of prime importance to make the code in these files as clear as possible. Strictly following clear coding conventions is of great help to reach code clarity and understandability. To help making this file as understandable as possible, and the corresponding class as usable and maintainable as possible, several concerns have to be taken care of:
- Constructors should be defined that initializes all the data elements used by the objects created. You should never rely on the default constructor and destructor, even when no dynamic memory allocation is involved in the class. Always remember that dynamic memory management is the most error-prone part of C and C++. • If several constructors are defined, their behavior should be consistent.
- Update functions should be coded using defensive coding. Be careful of input values outside of the valid range, or of combinations of input values that might lead to failure, e.g. division by zero. Always try to trap errors before they happen. It is always better to have the system output an error message than to have the whole system crash.
- Variable declarations should all be put at the top of the function body, except for the occasional local scope variables such as loop counters. Always try to limit the scope of your variables, and declare them at the beginning of their scope.
- Whenever possible, simple variable declarations should include an initial value. Especially when using pointer variables, failure to initialize a variable can lead to random program execution, which can be extremely hard to debug.
- Constant variables should be declared explicitly. Defining something (variable, parameter, or function) as constant always have the double advantage of giving meaningful information to future programmers, and preventing them to make ”unlawful” changes to the code.
- Comments should be used to give more information on the meaning of local variables, and to highlight important steps in the algorithm implementation. But always remember that spaghetti code or other kind of unclear code can never be effectively fixed with comments.
Quality Issues
Coding and Testing
Code Verification
It is often more productive to read the code than to run tests on it. After you have observed a fault, you have to read the code anyway, to see what went wrong. Often, you can avoid the need for debugging by spotting the error before running the tests. There are several ways of reading code.
Desk Checking. Get into the habit of reading code that you have just written before running it. This often saves time because simple errors sometimes lead to very strange symptoms.
Code Walkthrough. During a “walkthrough”, a programmer presents code to a small group line by line, explaining the purpose of each line. Members of the group read the lines as they are presented to ensure that the code actually does what the presenter claims it does. It is important to place emphasis on the code itself, not on the person who wrote it. When people question features of the code, the programmer’s job is not to defend it but to admit that it is wrong or explain why it is correct although it appears to be wrong. It is best not to attempt to correct errors during the walkthrough; the important thing is to identify errors for subsequent correction.
Code Inspection. Code inspection was introduced in 1976 at IBM by Mike Fagan. An inspection is like a walkthrough in that code is studied carefully but an inspection involves another set of documents, which can be requirements, specifications, or designs. The purpose of the inspection is to check that the code does in fact meet the requirements, satisfy the specifications, or implement the designs. The people present include the originators of the documents as well as the implementors of the code. Inspections are sometimes conducted as three-phase processes. There is a preliminary meeting at which the goals of the inspection are identified. Then team members study the code individually, noting any errors that they find. Finally, there is another meeting of the inspection team during which the errors are presented and discussed.
There has been a substantial amount of research into the effectiveness of code reviews. Most of the results confirm that reading and inspecting code is more effective than debugging. That is, the number of faults identified for each person-hour spent is greater when the people are inspecting code than when they are debugging it.