Coding Standards

A few years ago I was asked to comment on how source code should be formatted. Following is my reply, reformatted only slightly for web presentation. Bear in mind that this predates the ascendance of C#, but I think all of the comments apply equally to it. Someday I′ll expand on all of this, but until then this will have to do.


From: Ted Burghart
To:   a software development manager
Sent: Tuesday, May 22, 2001 9:09 PM

Coding standards, what a rat′s nest. Nevertheless, I′ll throw in my two cents worth, knowing full well that I′ll undoubtedly offend someone. I saw a quote once that suggested that the goal of a coding standard should be to offend everyone it applies to equally ;) Please, keep in mind there′s nothing personal to this, we all have our own opinions.

Language

Accomodate the multi-lingual nature of long-term projects. While there may be a preponderance of Java, there will also be (possibly multiple dialects of) IDL, C++ and possibly others (C#, CORBAscript, Python, Pascal, Perl, ... ). Initially focus on a standard that makes sense for Java, OMG IDL and C++, but try not to preclude similar styles for other languages. The SGML derivatives are different enough to warrant a separate standard for them anyway, so I won′t address them here, though clearly some of the general comments apply.

Javadoc

Keywords should be chosen that can be used accross multiple javadoc-like tools. Doxygen is widely used, and I′ve used doc++ in the past, both of which parse javadoc style comments in IDL and/or C++. But they don′t all support the same tags as javadoc. In particular, I know they didn′t support the @throws tag when I started using them, so I standardized on @exception. I suspect they′ve probably caught up to javadoc better by now, but do the legwork and choose a set of tools and tags that can be used across languages. Also use the @since tag - I′ve found it invaluable a number of times.

Variable names (including parameters)

The notion of indicating in a variable name it′s type or location is contrary to the tenets of distributed and object oriented systems.
Another reason why any form of name decoration, whether it be Hungarion notation denoting type or other monikers denoting location (aXXX, gXXX, mXXX, myXXX, sXXX, _XXX, XXX_, etc.), should be avoided is that it′s an invitation to getting names out of sync with what they′re supposed to be telling the programmer. Say you′re debugging some code that′s got to be released, and you find that you need to change foo′s type or move it from one name decoration scope to another. Are you going to change the name everywhere it′s referenced (presuming that you can even reliably find them all)? No, you′re going to make the smallest change you can to avoid side effects, which is exactly what you should do. Of course, one side effect of that is that you′ve just broken the name decoration scheme. I have never seen this not happen where such schemes are used in real-world systems.

As for embedding the type as a/the major part of a name, the same applies. A variable′s name should tell you what the value represents, not how or where it′s implemented. The how and where may change, and should be able to do so without affecting the semantics, or name, of the operation. Encoding a type in the name, as in

    Foo A.getFoo( Bar aBar )

is a) simply redundant and b) exposing implementation details that should be opaque. The better way is to use the semantics, not the implementation, to name things, as in

    Foo A.getFailureMode( Bar failed_operation )
Class/Interface names

Don′t differentiate, for the same reasons listed above. It may become desirable at some point to change an interface to a class or vice-versa for sound reasons, don′t force every method that accepts one as a parameter to change too. Also, the approach of naming concrete implementations InterfaceNameImpl thwarts polymorphism - the implementation′s name should express its strategy, not just a decoration of the interface. Not only that, but what do you name the second, third, etc. alternate implementations?

Import/Using statements

... Are plainly evil. Java′s import and C++′s using statements (outside of member promotion within a class) make names ambiguous within the body of code they apply to. They make debugging difficult and interpretting code with which you′re not familiar excruciatingly painfull. I have yet to see a justification for either that outweighs the suffering they cause. Names outside the current namespace (package) should be fully scoped. No, I′m not a zealot, I know people who′s opinions on this are much stronger than mine ;)

Brace and whitespace style

Many of the coding standards floating about have their roots in AT&T′s coding standard for their Indian Hill labs. Not surprisingly, since AT&T begat Unix, C and K&R, this standard has gained relevance simply by virtue of the preponderance of examples of its use. That Sun and their minions recommend a preferred style for Java code at all is preposterous, their adoption of this relic for their base compounds the offense.

The Indian Hill style of brace positioning and whitespace elimination hails from the days when terminals were line-buffered, communication speeds were measured in Bytes/sec and memory devices were measured in KB. Whitespace used to cost real money, it′s virtually free now. The driving forces behind the Indian Hill coding standard just no longer apply.

Cognitive research has shown that code with vertically-aligned braces is easier to ‘chunk’, which is how our brains work. Surveys have shown that a majority (from slim to large, depending on who′s asking) of programmers prefer vertically-aligned bracing because they find it easier to follow, effectively validating the researchers. And of course, the proponents of braces vertically aligned under the start of their controlling statement (Whitehead? style) point out that this is the only format that′s consistent across all levels of code from outer scope inward. My personal experience brings me to the same conclusion.

There are also proponents of the GNU standard that places the vertically aligned braces half an indent in from the controlling statement, though I confess I′ve never seen the point.

Along those same lines, whitespace sets off tokens in a statement, making them more easily discernable. Hence,

    if ( evaluate( some_value ) > SOME_VALUE_MAX )

is easier to parse (for humans) than

    if(evaluate(some_value)>SOME_VALUE_MAX)
Tabs

Tab characters have no place in source code. All indentation should be with spaces, period. Programmers are an independent lot, so you can count on them all having their tab stops set differently, meaning code with tabs will look terrible on most other peoples′ displays. All reasonably competent editors have the capability of inserting spaces instead of tabs when indenting, and the presence of tabs in anything but makefiles (where they′re required) should be grounds for rejecting checkin of sources. Again, whitespace is pretty much free at current speeds and capacities, there′s just no justification for tabs any more.

There′s much more I could go on about, but time′s tight and I′ve touched on the major issues.

  - Ted