Stop Reinventing C++ Wheels!

Posted by · August 9, 2012 12:06 pm

Necessity is the mother of invention

One thing that always amazes me is how often I see C++ programmers reinvent the wheel. The times I’ve seen engineers writing yet another sort routine or implementing their own version of a linked list, because they have a “special case” that necessitates it, just beggars belief. A “special case”? Really? Come on!

The C++ programming language was designed from the ground up (more or less) to support generic programming via templates & metaprogramming as well as having a complete set of generic algorithms and data structures in the guise of the STL (Standard Template Library). Although the language specification is set in stone the functionality is extensible via a plethora of proven 3rd party libraries, such as the excellent Boost library. In fact, this library is so good that parts of it are now part of the C++11 standard.

With a few very rare exceptions, there is no such thing as a “special case” if you have designed your code right to start with. For example, the STL provides a nice simple design pattern that allows one to apply standard algorithms to any data container. That’s right, any container. Not just those that are part of the STL. The pattern is called the iterator pattern and to take advantage of this all a container need do is provide support for iterators.

This design pattern is such an amazingly and deceptively simple concept that it’s actually easy to forget just how useful it is. It allows you to apply pretty much any algorithm to any container in a consistent manner. In fact, the only special case is the one where the programmer has implemented their own data structure and has failed to provide iterator support.

The STL provides a rich set of generic containers. It is a rare case indeed that one would need to look elsewhere. One possible “special case”, prior to C++11, was the need for a hash table. The STL, prior to this version of the standard, did not contain such a container; however there are plenty of tried and tested 3rd party hash_map implementations that could be used instead. My personal favorite is the Sparsehash Package, originally developed by Google. So, is this really such a special case? No, not really.

So, let’s get back to the “special case”. What’s so special about it that can’t be solved by implementing the appropriate less-than inequality comparison operation for use with the std::sort algorithm or iterators for transforming that linked list using std::transform? In other words, if you write your code from the ground up to interface with the STL you really shouldn’t have any thing like a special case.

Growing your own herbal tea

Ok, ok. So, what’s such a big deal that has made me climb onto my soapbox? What’s wrong with ignoring the STL or other tried and tested libraries (such as Boost or Spasehash). Why shouldn’t you implement your own home-grown algorithms or data structures? Why should it matter if you write your own sort routine, implement your own linked list or (and this is the one that really scares me) code up your own encryption routines? After all, yours is a special case, right?

That’s a great question and I’m glad you asked it. There are a number of problems but they can all be summarized in short sentence. It’s a long walk down a road that only leads to Defect City. Why bother when you can take the bus to Bugfree Ville, an altogether far happier place to visit?

Before we consider why this is the case I’d like you to consider this for a moment: Do you have a special kind of sleeping pattern that means you need to build your own bed? Or maybe you drink a special kind of herbal tea that requires you to grow it yourself? Or, perhaps you have a special driving style that necessitates you building your own car?

That’d be silly, right?

You just go and choose a bed from the store that best suits your needs. You buy the brand of tea that meets your high standards. You visit a number of car showrooms until you find a car that suits your driving style. So is it too unreasonable to think that a programmer might choose an existing design pattern, implementation of an algorithm or data structure that best suits their needs?

Great programmers steal

There’s a great quote that you’ll see posted on a great number of [open source] sites. It basically says:

“Good programmers write good code; great programmers steal great code.”

It’s a bit tongue in cheek but basically it’s suggesting that there’s no point in trying to write great code when someone has already done the hard work for you. Eric S. Raymond puts it slightly more eloquently in his book “The Cathedral and the Bazaar”:

“Good programmers know what to write. Great ones know what to rewrite (and reuse).”

Of course, sometimes code needs reworking. There’s no getting away from the fact that every programmer will, at some point or another, come across code and realise that the best thing they can do is put it out of its misery.

This is often referred to as “refactoring” and it is a standard part of the development cycle of any product. There is, of course, a right way and wrong way to go about this but that is a discussion for a different time. Sufficed to say, attempting to refactor code without good unit test coverage is not the way to go.

This is not what I’m talking about here. I am referring more specifically to the fact that way too many programmers will ignore existing design patterns, algorithms, data structures and tried and tested libraries in the (often) misguided belief that they can do better or that their case is way too special to use an “off the shelf” implementation.

Fake stones

In my time as an expert on Experts Exchange I’ve come across plenty of questions where the asker wants to know how to implement something that is already part of the C++ Standard or available as an Open Source library. Often these are academic questions and the asker is learning how to do things from scratch. No harm, no foul. This is all part of the academic learning process.

Every so often; however, we’ll get a question that is clearly not academic but is posted by a seasoned engineer who is looking to carve their very own wheel. The worse of these, without doubt, are the ones asking how to implement encryption. Often, these questions are asked by someone who, clearly, doesn’t have a clue as to what true encryption is all about.

The most common fallacy committed by inexperienced programmers who attempt encryption is that implementing a proprietary method for scrambling data is far more secure than using a well known method. The argument being that no one will know how the data scrambling mechanism works. This is flawed thinking on two fronts:

1. Compiled code can be disassembled to see exactly how the data is scrambled.
2. What inexperienced programmers consider encryption rarely is.

Generally, there are two ways of preventing a 3rd party from reading private data; namely, encryption and obfuscation. Unfortunately, all too often inexperienced programmers will confuse the two and think they are one and the same. They are most definitely not.

The main difference between encryption and obfuscation is the key, or rather the lack of. Both mechanisms will scramble “plain text” so that it is, to all intents and purposes, unreadable; however, with obfuscation it is possible to reverse the process if you know the original method used to scramble. This is not true with encryption. Even if the mechanism for scrambling the data is known it is impossible to unscramble it again without the key required to do so.

Let’s put this another way. Consider, we have an important paper document that contains confidential information that must not be seen by prying eyes. Well, encryption would be like putting the paper in a safe, locking the door and putting the key in a safe place. Obfuscation, on the other hand, is like hiding the paper in a cardboard box and putting it in the cupboard hoping no one will find it.

So, back to askers who want to know how to implement their own “encryption” Let’s be honest, if they did they wouldn’t be asking and the fact they are asking means they probably don’t understand the potential consequences of what they are planning to do. True strong encryption is a highly complex subject matter and all but a few mathematicians truly understand it.

The most common “home grown” solution is to XOR data with an arbitrary value. The result is a scrambled mess and the best bit is that arbitrary value is a key, so this is strong encryption, right? Wrong. Seriously wrong. Congratulations, you’ve just leaked all your confidential information to the world! This is, at best, convoluted obfuscation.

Just because you can’t think of a way to ‘crack’ your XOR’d data doesn’t mean no one else can. It turns out that it’s relatively easy to figure out the “key” because the nature of XOR means it’s actually now part of the scrambled message. All that is required to figure out the key is to XOR a plain text message with a scrambled version of the same message; the result is the key. Once the key is known it can then be used to decipher any content encrypted with that key.

Yes, you read that right!

It’s a bit like those fake stones you can get that allow you to hide a spare front door key inside. You put the key in the fake stone and then put the fake stone in your front garden. As long as no one can be bothered to look through the stones to find the fake one you’re safe. Of course, if a bad guy decides to look the chances are they will find the key.

Swiss Army Knife

Ok, so what should we do then I can hear you ask?

That’s simple, before you go off and reinvent the wheel take a look at what already exists. Most problems you will try to solve as a programmer have already been solved over and over in one guise or another. So much so that over the years programmers have developed standard algorithms and design patterns that simplify solving these problems.

Find a library that is tried and trusted and does what it is you need. In the case of encryption there are a number of different libraries that provide support for standard proven encryption algorithms. My favorite of these is Crypto++, which is like the Swiss Army Knife of encryption. It supports just about every standard you can think of.

For other things the first place to look is the Boost Library. This is a set of high quality, peer reviewed libraries that cover just about everything you could need when writing C++. It has a plethora of different containers and algorithms for just about every occasion. If Boost isn’t already a part of your standard development toolkit it should be!

There are a plethora of different standard algorithms. For example, there are plenty of proven sort algorithms, each has its own time/space constraints and characteristics. Pick the one that is best suited to your problem. Don’t try and invent a new way to sort data, at least not until you’re sure none of the standard sorting algorithms will do it for you (and if you reach that conclusion you are almost certainly wrong).

Be aware of the various design patterns that exist. These are like recipes for solving known problems. They are generally programing language agnostic, so finding one to suit your needs shouldn’t be a problem.

For example, do you need to implement an “undo” mechanism in your program? Considered using the Memento design pattern or, maybe, the Command design pattern? Both of these provide a nice simple and elegant framework for implementing “undo” semantics.

Performing lots of dynamic memory allocations and worried about exception safety? Cursing C++ for not providing a “finally” clause or garbage collection? Littering your code with lots of try/catch and duplicate lines of code to delete resources in the event that an exception is thrown? Well don’t! Just make use of the RAII pattern and use a smart pointer.

Or, maybe you need to implement a discriminatory union to implement a variant type? Of course, you’ll need to have switch/case statements scattered throughout your code to convert from the variant to the concrete type? Did you know there is a nice design pattern you could use that would allow you to use the real C++ type system and still be able to safely convert from the static to dynamic type without all those switch/cases?

Moreover, do you realise that this design pattern allows for compile-time rather than run-time type checking. This means type inconsistencies will be caught as you compile your code and not at runtime where there isn’t much you can do about it.

This design pattern is called the Visitor pattern and it is probably one of the most useful design patterns for C++ programmers as it allows us to mix static and dynamic polymorphism at compile time in a way that is normally very hard to achieve.

Follow the design pattern and you’ll have a solution that works. Try and cobble something together on your own and you’ll end up with something that works but how many edge cases have you tested to ensure it always behaves as you expect?

But wait. It gets better. You don’t need to lift a finger to implement a type safe variant type. Boost has already done it for you. And, guess what, it uses the visitor pattern to discriminate between the times it represents.

Go make some big bucks

So, what’s the moral here? It’s pretty simple really. Don’t reinvent the wheel, choose one that already exists and spend your valuable time focusing on the business logic of your application. After all, that’s what’ll end up making you the big bucks, right?