Abstract: Software re-engineering is concerned with re-implementing older systems to improve them or make them more maintainable. Refactoring is re-engineering within an object-oriented context. Object oriented programs differ from programs written in other programming languages with the key difference being that they are organized around data. Applying re-engineering in an object-oriented program has wider ramifications while compared to applying re-engineering on other software as fundamental changes are carried out while re-engineering the former. In this article, given an object-oriented program the opportunities present where refactoring can be applied are examined. The opportunities are class misuse, violation of the principle of encapsulation, lack of usage of inheritance concept, misuse of inheritance and misplaced polymorphism.
Indexing Terms: Refactoring, Re-engineering in C++, Object-oriented re-engineering
Software re-engineering is the transformation from one representation form to another at the same relative abstraction level, while preserving the systems’ external behavior. Re-engineering a software system has two key advantages over more radical approaches to system evolution:
- Reduced risk: There is a high risk in redeveloping software, which is presently an essential backbone of the organization. Errors may be made in the system specification; there may be development problems; financial risk may be high; etc.
- Reduced cost: The cost of re-engineering is significantly less than the costs of developing new software.
Refactoring is re-engineering within an object-oriented context. Software refactoring can be defined as “the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure”. “Improvements to its internal structure” amount to improvements to the quality of the software system. Examples of such improvements are “making the code easier to understand and cheaper to modify” (Fowler et al, 1999). There are many opportunities in an object-oriented program where refactoring can be applied. They are:
1. Class misuse
Prior to the invention of object-oriented programming (OOP), many projects were nearing (or exceeding) the point where the structured approach no longer worked. Object oriented methods were created to help programmers break through these barriers. In an object-oriented language one defines the data and the routines that are permitted to act on that data. Thus a data type defines precisely what sort of operations can be applied to that data. The most fundamental mistake done in developing a program in OOP context is not designing a class properly. Even if one class is not designed properly the entire program is spoilt. One bad class has a cumulative effect on all other classes in the entire software. So the first step in developing a good object-oriented program is to design the basic construct, i.e. the class, correctly. The classes have to be designed taking into view the UML concepts, its attributes, i.e. both data and functions defined adequately, the interfaces properly laid down and the interrelationships between different classes correctly defined.
The question that arises is what happens if a single class in not defined correctly. Consider example 1 of a program relating to a company. The class employee contains the salary and office details, the class emp_address contain the address details and the class emp_name contains the name details. In order to distinguish objects of the same class the attribute employee_id has been used as a differentiator in all the three classes.
The basic mistake in the above design of classes is that all the information pertains to a single employee. Hence all the information should be stored in a single class. If an employee leaves the organization and his records have to be destroyed, in the above design three different objects pertaining to three different classes have to be destroyed instead of one single object pertaining to one single class. Also, if information about a particular employee has to be obtained, then one has to access three different objects instead of one single object. Class misuse instances are wonderful opportunities for refactoring. Inheriting all the classes into one single class often does not solve all the problems as in the example 1 shown since some of the variables might be repeated in more than one class and this leads to compile time errors during implementation. The variable employee_id is repeated in each of the three classes, so inheriting it leads to three sets of employee_id. Also, if the function for printing the class values is defined three times and one has to print the values in the inherited class then the three functions have to be called one after the other, i.e. print(), printk() and printl(). This leads to poor functionality. Hence a new function has to be defined to print all the inherited class attributes. This leads to repetition of coding. The only option is to redesign the classes and make a single class with all the functionality in it, instead of many classes with the functionality spread out.
2. Violation of the principle of encapsulation
Encapsulation is the mechanism that binds together code and the data it manipulates, and keeps both safe from outside interference and misuse. When code and data are linked together in this fashion an object is created. In other words, an object is the device that supports encapsulation. The violation of encapsulation arises mainly from class misuse.
When classes are not designed correctly, friend functions have to be used. A friend function is a nonmember function, which has access to all private members of the class, of which it is a friend. As shown in the previous example, instead of defining a single class, when there are many classes and one has to access private members of these different defined classes (which otherwise would have been in a single class), then one has to use friend functions. These friend functions also have the capability of manipulating the private variables and these can belong to different classes. Therefore the private variables of one class can have an indirect effect on the private variables of another class through a friend function.
Once a class has been designed, executed and tested it becomes a black box where one knows what the inputs are and what are the outputs. The presence of friend functions leads to ineffective design as usage of such classes gets altered. As a rule, unnecessary implementation and usage of friend functions in classes should be avoided and if present, such functions should be removed by refactoring. Manipulation of existing classes by addition of mere friend functions often result in unwarranted side effects, which are not known initially and are discovered at a much later stage after spending considerable time and energy. Thus the very idea of encapsulation is lost. Such programs are a constant source of bugs and the more one tries to fix it the more the errors arise. These problems pose a great scope for refactoring.
In order to refactor programs containing such problems new classes have to be designed and the friend functions have to be replaced by member functions. Friend functions should be used only when members of two or more different classes have to be accessed at the same time and some manipulations have to be performed on them and the classes cannot be clubbed together as a single unit. Even in cases of operator overloading it is best to overload using member functions.
There is only one situation in which overloading by using a friend increases the flexibility of an overloaded operator. When we overload a binary operator by using a member function, the object on the left side of the operator generates the call to the operator function. Further, a pointer to that object is passed in the ‘this’ pointer. Assume some class that defines a member ‘operator+( )’ function that adds an object of the class to an integer. Given an object of that class called ‘Ob’, the following expression is valid:
In this case, ‘Ob’ generates the call to the overloaded + function, and addition is performed. But a problem arises if the expression is written like this:
In this case, it is the integer that appears on the left. Since an integer is a built-in type, no operation between an integer and an object of “Ob’s” type is defined. Therefore, the compiler will not compile this expression. In some applications, having always to position the object on the left could be a significant burden. The solution to the preceding problem is to overload addition using a friend, not a member function. When this is done, both arguments are explicitly passed to the operator function. Therefore, to allow both ‘object + integer’ and ‘integer + object’, simply overload the function twice – one version for each situation. Thus when you overload an operator by using two friend functions, the object may appear on either the left or the right side of the operator. As far as possible only in this case should friend functions be used.
3. Lack of usage of inheritance concept
In C++, inheritance is supported by allowing one class to incorporate another class into its declaration. The process involves first defining a base class, which defines those qualities common to all objects to be derived from the base. The base class represents the most general description. The classes derived from the base are referred to as derived class. A derived class includes all features of the generic base class and adds qualities specific to the derived class.
Let us assume we have got a class X defined initially as follows:
Later on we have to define another class Y which contains the variable k and method showk() in addition to the methods and variables defined in X. Now instead of defining it right from scratch, the method followed is to build up the derived class on the base class. So class Y is defined as
The advantage of using inheritance is that once the base class is tested and debugged correctly then only the elements defined in the derived class have to be tested and debugged. So testing time and size of source code becomes very less. This concept can be extended to the concept of class libraries wherein classes are defined and tested correctly from which future classes can be created with far lesser effort.
However, inheritance may not be properly used. Programmers tend to define their classes right from scratch. So the same code gets duplicated in more than one class. If any change has to be made to any one method, then the changes have to be made in all the classes which contain the code, thus duplicating effort. If inheritance had been implemented, then the changes will have to be made in only one of the classes in which it is defined. As the concept of inheritance can be extended to many generations, this code replication can be avoided. This provides a wonderful opportunity for refactoring. Wherever inheritance can be implemented it should be implemented and code of a particular member function should be implemented in only one class. This makes the software more maintainable and will solve problems in future.
4. Misuse of inheritance
Inheritance is a powerful concept (if used correctly) but is often misused. Let a class be defined initially as
Programmers tend to use the concept of inheritance in the example shown above although the variable ‘i’ is not defined in class B. If class B is inherited from class A then there will be an extra variable ‘i’ which is not necessary for class B. Such dangling variables are dangerous. This poses a problem for future maintainability of the software. This extra variable will not be present in the specification and has to be initialized correctly to some initial value; else it might lead to bizarre errors. Programmers often in the haste of implementing inheritance do such mistakes.
A base class should be inherited by the derived class only when all the contents of the base class are used fully and at the same time distinct in the derived class. Often inheritance is implemented for code reuse rather than polymorphism. For example:
Now class C should be inherited by class D only if and if the functions set () and showall() along with the variables are implemented in totality by class D. If set () is defined in class C as
In such a case in order to implement inheritance the function set () in class D should not be broken in two parts as two different functions in order to implement inheritance. Inheritance should not be implemented for the sake of implementing it but rather it should be implemented to make the program easier to develop, implement and understand.
5. Misplaced Polymorphism
Polymorphism is the attribute that allows one interface to control access to a general class of actions. The specific action selected is determined by the exact nature of the situation. Function overloading is the process of using the same name for two or more functions. The secret to overloading is that each redefinition of the function must use either different types of parameters or a different number of parameters. For example, there may be a program that defines three different types of stacks. One stack is used for integer values, one for character values and one for floating point values. Because of polymorphism, only one set of names, push () and pop () can be defined which are used for all three stacks. In the program three specific versions of these functions are created, one for each type of stack, but names of the functions will be the same. The compiler will automatically select the right function based upon the data being stored. Thus the interface to a stack- the functions push () and pop () – are the same no matter which type of stack is being used. The individual version of these functions defines the specific implementations for each type of data.
However, often polymorphism is not implemented properly as classes are constructed ad-hoc and not step by step. Consider an example the one explained in the previous para. If the class is developed initially for integer values and much later the class is modified to include for character values then for the same push() operation the function for character values is given a different name. These results in similar code (not the same code) in different functions, i.e. functions with different names provide same interface for different types. Such cases provide a wonderful chance for refactoring. Implementing polymorphism while refactoring is a slow task. The functions have to be scanned and the actions performed by the functions defined. Functions performing the same action on different data types should be given the same name. Polymorphism helps reduce complexity allowing the same interface to be used to access a general class of actions.
The issues discussed above if used while refactoring an object-oriented program leads to much efficient software. The objective of refactoring is to improve the system structure and make it easier to understand. Refactoring increases the time span of the usability of a program. Refactoring, if not applied properly might lead to further deterioration of the software. There are, however, practical limits to the extent that a system can be improved by refactoring. Major architectural changes or radical reorganizing of the system data management cannot be carried out automatically, and would involve high additional costs. The costs of refactoring obviously depend on the extent of the work that is carried out.