Monday, May 28, 2012

Silver Bullets - Information Hiding

This is another post in the Silver Bullets series. This series presents a set of best practices, design and implementation principles to combat software complexity.

In this post I'll discuss a well-known pillar of good software design. Information hiding is cool. Nobody can mess with your stuff if they can't find it. You should practice information hiding at every level and reap the following benefits:


1. Make the system simpler for the user (user can be human or another component) - Users don't have to understand, deal with or consider the implications of something that is hidden from them (but, watch out for leaky abstractions).

2. Flexibility - You may change at will any hidden part. This can go a long way and allow radical changes like completely changing your persistence layer. Moving computation from in-process to separate process or even to the cloud etc.

3. Testability - The interactions with your components are defined by what's visible. Your external testing surface will be smaller.

4. Performance - You can seriously modify your design and implementation without impacting users if the code you modify is hidden.
5. Security - Duh!

Programming languages support information hiding at different levels. Let's explore some of them. Most object-oriented languages support encapsulation via access levels (public, protected, private). This is a pretty basic form of information hiding. If a C++/Java/C# class define a field or method private then only the class methods can access this field or method. C# adds the nuance of properties where the set or get actions can have different access levels.

What about class definitions and types? C and C++ use header files to group definitions together and in general if you don't #include the proper header file that contains the definition of a class you can't call it's methods, even if you get a pointer or reference to it from somewhere. C# provide the internal keyword that allow you to make classes visible inside their assembly only (ignoring reflection) and java provide package level scoping as the default.

Higher level of information hiding is at the build/deployment level. Suppose your system contains some debugging code, test frameworks and test cases. You don't need to and don't want to deploy them in production. This kind of code is often very intrusive and can wreck havoc on your system if executed accidentally in production. The solution is to isolate it as much as possible into separate modules/assemblies/jars/DLLs that are used during development only and never deployed in production. This is not always possible especially with monolithic C/C++ systems that are composed of many static libraries that are linked together to form one executable. In these cases, you have to rely on special builds (Debug,Release, etc.).

Let's talk a little bit more about hiding classes. What's wrong about making all your classes public? A lot. Once you make a class public it means anybody can instantiate this class or sub-class it. When that happens in a large system you can forget about making any changes to the public interface or the semantics of this class. You will break this foreign code. Many people say that if you want to test a class from the outside (and you should) then it's much easier if it's public. They are right. It is a lot easier, but it doesn't justify exposing the class to the world. I will talk a lot about testing in future posts, but the first rule of testing is that you should not put test support code in production code or make design decisions just for testability. It turns out that well designed code is also testable code.

Another common misconception I see is that when people implement a base class they often make every non-public method and field  protected. This is done in the name of reuse. These people claim that they don't know what information is going to be relevant to sub-classes. This is a mistake. Once a protected method or field is used by a sub-class, you can't change it or its semantics without breaking the sub-class. A better approach is to keep everything private and provide elevated access levels only when needed.
Implementation inheritance (As opposed to interface inheritance) is supported by virtually (pun intended) any object-oriented programming language and is often taught as the main feature of object-oriented design. This stems from the common myth that good OO design models the world via object hierarchies. In practice this is usually one of the worst architectural choices you can make due to the fragile base class problem.

Take Home Points:


1. Information Hiding rules
2. Carefully consider what to expose at each level (class, assembly/package, dynamic/shared library)
3. Hide everything else including classes if your language allow it.
4. Testing is external and the code under test should NOT be aware of being tested
5. OOP is awesome, but early on its proponents got a lot of stuff wrong.

Silver Bullets

There is No Silver Bullet - Fred Brooks


Hi, I'm Gigi and this is my first post ever. You can expect some deep essays, sometimes spreading over multiple posts as well as lots of crazy coding and mixing a bunch of technologies. Today, is all about fixing the Software Crisis and eradicating software complexity.

First, let's see what we are up against. Fred Brooks in his mythical Mythical Man-Month book eloquently distinguished between accidental difficulty (complexity that results from bad design, implementation, choice of tools, development processes, etc.) and essential difficulty (complexity which results from how hard is the problem you are trying to solve). He drew the somewhat depressing conclusion that there is no silver bullet (for developing complex software). Even if you manage to remove all the accidental difficulty you still have to deal with the essential difficulty. Since we humans are such discombobulated creatures, we can't expect to develop too fancy software because the essential complexity will bring us to our knees in no time.

Well, I disagree. Humans are not as discombobulated as Fred Brooks claims. They are way way more discombobulated than that and have attention spans that match the half-life of materials with very short half-life. Nowadays, self-respecting introverts are also expected to update their Facebook status every 15 minutes, twit, check-in, follow, like, +1, stumble upon and read a bunch of blogs like you're doing right now.

But, all is not lost. I believe we can still develop great software and not succumb to complexity. You already know the answer: Divide and Conquer. By breaking the system into multiple sub-systems or components that communicate using well-defined protocols you can control the complexity of each component and the complexity of interaction between components. The good news (if you are into job security) is that it's hard. Really really hard. The reason it's so hard is that you have countless ways to screw up everything in every decision you make.

I'll tell you all about it in a series of essays. The short of it is: architecture, processes and automation. Oh, and you need a few good programmers and at least one exceptional programmer.

Finally, is there a silver bullet? No, there isn't. There are many silver bullets and you will have to hit all of them.