Monday, May 28, 2012

Silver Bullets - Information Hiding

This is another post in the Silver Bullets series. This series presents a set of best practices, design and implementation principles to combat software complexity.

In this post I'll discuss a well-known pillar of good software design. Information hiding is cool. Nobody can mess with your stuff if they can't find it. You should practice information hiding at every level and reap the following benefits:


1. Make the system simpler for the user (user can be human or another component) - Users don't have to understand, deal with or consider the implications of something that is hidden from them (but, watch out for leaky abstractions).

2. Flexibility - You may change at will any hidden part. This can go a long way and allow radical changes like completely changing your persistence layer. Moving computation from in-process to separate process or even to the cloud etc.

3. Testability - The interactions with your components are defined by what's visible. Your external testing surface will be smaller.

4. Performance - You can seriously modify your design and implementation without impacting users if the code you modify is hidden.
5. Security - Duh!

Programming languages support information hiding at different levels. Let's explore some of them. Most object-oriented languages support encapsulation via access levels (public, protected, private). This is a pretty basic form of information hiding. If a C++/Java/C# class define a field or method private then only the class methods can access this field or method. C# adds the nuance of properties where the set or get actions can have different access levels.

What about class definitions and types? C and C++ use header files to group definitions together and in general if you don't #include the proper header file that contains the definition of a class you can't call it's methods, even if you get a pointer or reference to it from somewhere. C# provide the internal keyword that allow you to make classes visible inside their assembly only (ignoring reflection) and java provide package level scoping as the default.

Higher level of information hiding is at the build/deployment level. Suppose your system contains some debugging code, test frameworks and test cases. You don't need to and don't want to deploy them in production. This kind of code is often very intrusive and can wreck havoc on your system if executed accidentally in production. The solution is to isolate it as much as possible into separate modules/assemblies/jars/DLLs that are used during development only and never deployed in production. This is not always possible especially with monolithic C/C++ systems that are composed of many static libraries that are linked together to form one executable. In these cases, you have to rely on special builds (Debug,Release, etc.).

Let's talk a little bit more about hiding classes. What's wrong about making all your classes public? A lot. Once you make a class public it means anybody can instantiate this class or sub-class it. When that happens in a large system you can forget about making any changes to the public interface or the semantics of this class. You will break this foreign code. Many people say that if you want to test a class from the outside (and you should) then it's much easier if it's public. They are right. It is a lot easier, but it doesn't justify exposing the class to the world. I will talk a lot about testing in future posts, but the first rule of testing is that you should not put test support code in production code or make design decisions just for testability. It turns out that well designed code is also testable code.

Another common misconception I see is that when people implement a base class they often make every non-public method and field  protected. This is done in the name of reuse. These people claim that they don't know what information is going to be relevant to sub-classes. This is a mistake. Once a protected method or field is used by a sub-class, you can't change it or its semantics without breaking the sub-class. A better approach is to keep everything private and provide elevated access levels only when needed.
Implementation inheritance (As opposed to interface inheritance) is supported by virtually (pun intended) any object-oriented programming language and is often taught as the main feature of object-oriented design. This stems from the common myth that good OO design models the world via object hierarchies. In practice this is usually one of the worst architectural choices you can make due to the fragile base class problem.

Take Home Points:


1. Information Hiding rules
2. Carefully consider what to expose at each level (class, assembly/package, dynamic/shared library)
3. Hide everything else including classes if your language allow it.
4. Testing is external and the code under test should NOT be aware of being tested
5. OOP is awesome, but early on its proponents got a lot of stuff wrong.

No comments:

Post a Comment