Object-Oriented Design Concepts in UML

The Unified Modeling Language™ (UML®) is inherently object-oriented modeling language and was designed for use in object-oriented software applications. The applications could be based on the object-oriented technologies recommended by the Object Management Group (OMG), which owns the UML. The initial versions of UML (UML 1.x) were based on three leading object-oriented methods - Booch, OMT, and OOSE, to represent "the culmination of best practices in practical object-oriented modeling". UML 2.x is still object-oriented in its core (though there were some apparently unsuccessful attempts to extend UML to support other development methods).

Here's some illustration of this object orientation of UML from UML 1.4.2 Specification:

A frequently asked question has been: Why doesn’t UML support data-flow diagrams? Simply put, data-flow and other diagram types that were not included in the UML do not fit as cleanly into a consistent object-oriented paradigm.

This might explain why UML does not care for example to support database modeling, which is still mostly based on relational models to describe relational databases. It does not explain why UML ignores modeling of object relational databases or modeling of Graphical User Interfaces (GUI). GUI designs were and still are prominent examples of object-oriented design and programming, while completely neglected by UML.

To understand and use UML as intended by its authors, software architects and developers should be familiar with general concepts and methods of object-oriented analysis and design (OOAD) and/or of the object-oriented development (OOD), and how those were applied to UML itself. There is one problem with this requirement: though OOAD/OOD is being used for several decades, there is still no concensus on what is OOAD and even what are the fundamental concepts ("quarks") of the OOAD. See one nice but pedantic attempt to define fundamental OOD concepts in [DJA 06].

A problem is that UML specification uses some OOD concepts assuming that there are some generally accepted definitions of those concepts, which is a mistaken assumption. To make things even worse, the Glossary that was present in UML 1.x specifications and included terms from OMG standards and object-oriented analysis and design methods in addition to UML and MOF specific terminology, was removed from all UML 2.x specifications. Removing Glossary from UML specification was really wicked decision.

Ok, so we are really in trouble: UML specifications use OOAD concepts which have no clear and generally accepted definitions without providing own interpretations or definitions of those concepts. Wait, sometimes they do provide some definitions. Let's see what do we have.

Object-Oriented Design

Whether you agree or not, there is no commonly accepted definition of Object-Oriented Design (or Development, or Programming) (OOD, OOP). So I will make up some definition:

Object-Oriented Design is a software development approach to design and implement software system as a collection of interacting stateful objects with specified structure and behavior.

There are several fundamental concepts defining OOD but there is no agreement on the exact list of the concepts, their definition and taxonomy (classification). We will take a look at some of OOD concepts that seem relevant to the UML:

Class and Object

The origin of classes introduced in Simula 67 was computer simulation.

The classes in Simula 67 (called processes in Simula) [SK 03], have a list of statements with execution started when object is created. When the execution of the statements ends, object becomes terminated. Object's local data and procedures can still be accessed from outside of the object after termination. During execution objects may choose to temporarily suspend their execution (even if it is currently inside one or more procedure calls) and let another object to take control. If the control is later returned back to the object, it will resume execution from the point where it was suspended. For a Java or C# programmer this concept will sound like as a Thread.

Note, that the concept of the process in Simula was based on the block in Algol-60. Block in Algol-60 allowed declaration of local variables, local procedures and a list of executable statements. But execution of block could not be suspended, and once it is finished, local variables and procedures are removed from execution stack and no longer available.

Class and Object in UML

UML class is a classifier which describes a set of objects that share the same

Class may be modeled as being active, meaning that an instance of the class has some autonomous behavior.

In all versions of UML from UML 1.x to UML 2.5, the essence of object is the same:

Object is an instance of a class.

Glossary of the now obsolete UML 1.4.2 Specification defined object as

An entity with a well defined boundary and identity that encapsulates state and behavior. State is represented by attributes and relationships, behavior is represented by operations, methods, and state machines. An object is an instance of a class.

UML 2.5 describes object as

An object is an individual [thing] with a state and relationships to other objects. The state of an object identifies the values for that object of properties of the classifier of the object.

Message

Message concept is probably one of the most confusing in OOAD, especially for the software developers familiar with modern messaging systems and APIs such as the Java Message Service (JMS) or Microsoft Message Queuing (MSMQ), which allow separate, uncoupled applications or components to reliably communicate asynchronously. Message concept in OOAD and UML is quite a different thing.

If we search language specifications of C++ [BSC 77], Java [JGJ 05], or C# [CSH 10] - there is no concept of message. So let's travel back in time to Smalltalk [A. Goldberg and D. Robson. Smalltalk-80: The Language and its Implementation.].

Messages in Smalltalk-80 represent two-way communications between the objects of the system. Note, that "in Smalltalk everything is an object", including primitive values and classes. A message requests an operation from the receiver. Object selector and arguments transmit information to the receiver about what type of response to make. It is up to the receiver to decide how to respond to the message. The receiver returns an object back that becomes the value of the message expression.

Let's consider simple Smalltalk example of a binary message to request arithmetic operation:

3 + 4

In this case, object '3' is the receiver of the message '+ 4'. The message contains selector '+' and argument '4'. Selector of the message is a name (or symbol) for the type of interaction required from the receiver. The receiver of the message ('3') returns back an object ('7') that becomes the value of the message expression.

If a message expression includes an assignment prefix, the object returned by the receiver will become the new object referred to by the variable. Even if no information needs to be communicated back to the sender, a receiver always returns a value for the message expression. Returning a value indicates that the response to the message is complete. For example,

sum ← 3 + 4

makes 7 to be the new value of the variable sum.

Object Creation Message

Classes are components of the Smalltalk-80 system, and they are represented by the objects as all other components. Each class has a name that describes the type of component its instances represent. A class name is a way for instances to identify themselves, and it provides a way to refer to the class in expressions. Class name becomes the name of a globally shared variable, and it must be capitalized.

New objects in Smalltalk-80 are created by sending messages to classes. Note, this approach resolves the dilemma of sending create message to nonexisting object to create itself. In Smalltalk the message is actually sent to the class object to create an instance of the class. Most classes respond to the unary message new by creating a new instance of themselves. For example,

users ← Dictionary new

sends the 'new' message to the Dictionary class which creates and returns back an instance of itself to be assigned to the variable users.

Note, that some Smalltalk classes could create instances in response to other messages. For example, the standard class Date responds to the message today with an instance representing the current day. In C# the same result could be achieved by using static property Today of the DateTime structure.

DateTime thisDay = DateTime.Today;

Message in UML

Messages are intrinsic elements of UML interaction diagrams. A message defines a specific kind of communication between lifelines of an interaction. A communication can be, for example, invoking an operation, replying back, creating or destroying an instance, raising a signal. It also specifies the sender and the receiver of the message.

Create message is shown as a dashed line with open arrowhead, and pointing to the created lifeline's head.

Online Bookshop creates Account.

Online Bookshop creates Account.

Note, that this weird convention to send a message to a nonexisting object to create itself is used both in UML 1.x and 2.x. As we saw above, in Smalltalk-80 new objects are created by sending messages to classes, with instance of the class created and returned back. So one way to interpret UML create message notation is probably as a shortcut for these actions.

Operation and Method

Operation and Method in UML

Operation is defined in UML 1.4.2 as a service that can be requested from an object to effect behavior. An operation has a signature, which may restrict the actual parameters that are possible.

Method is defined as the implementation of an operation. It specifies the algorithm or procedure associated with an operation.

Encapsulation

Encapsulation is one of the loosely defined OOAD concepts. The term is known in software development for many years but I can't find any reliable origin. Encapsulation was mentioned in the article [CLU 77] describing abstraction mechanisms in programming language CLU in the context of hiding details of implementation.

CLU restricted access to the implementation by allowing to use only (public) cluster operations, i.e. public interface. It promoted design practices where abstractions are used to define and simplify the connections between system modules and to encapsulate implementation decisions that are likely to change.

If we look up the English word encapsulate in a dictionary, we will find two meanings: (1) to encase or become enclosed in a capsule (2) to express in a brief summary, epitomise. Both of these meanings of encapsulation seem appropriate in the context of OOAD.

Let's assume that the definition of encapsulation in OOAD is something like:

Encapsulation is a development technique which includes
  • creating new data types (classes) by combining both information (structure) and behaviors, and
  • restricting access to implementation details.

Encapsulation is very close or similar to the abstraction concept. The difference is probably in "direction" - encapsulation is more about hiding (encapsulating) implementation details while abstraction is about finding and exposing public interfaces. The two concepts are supported by access control.

Access control allows both to hide implementation (implementation hiding or information hiding) and to expose public interface of a class.

Encapsulation in UML

UML specifications provide no definition of encapsulation but use it loosely in several contexts.

For example, in UML 1.4 object is defined as an entity with a well defined boundary and identity that encapsulates state (attributes and relationships) and behavior (operations, methods, and state machines). Elements in peer packages are encapsulated and are not a priori visible to each other.

In UML 2.4 and 2.5 a component represents a modular part of a system that encapsulates its contents and whose manifestation is replaceable within its environment, and also a Component is encapsulated and ... as a result, Components and subsystems can be flexibly reused and replaced by connecting ("wiring") them together.

Encapsulated classifier in UML 2.4 and 2.5 is a structured classifier isolated from its environment (encapsulated ?) by using ports. Each port specifies a distinct interaction point between classifier and its environment.

Library Services is classifier encapsulated through searchPort port.

Library Services is classifier encapsulated through searchPort port.

UML 2.4 specification also used term completely encapsulated without providing any explanation. It was removed in UML 2.5.

Abstraction

There is no single commonly accepted definition of abstraction in OOD. Some sources define abstraction as a way or mechanism to represent complex reality using simplified model. It could be also defined as a way to capture only those details about an object that are relevant to the current perspective. We can try to go back to origins - sometime in 70s - when programming languages CLU, Alphard, Modula-2, etc. introduced abstraction mechanisms.

Data abstractions in CLU were introduced with abstract data type construct called a cluster. Data abstractions required that the behavior of the data objects were completely characterized by the set of operations. Classical example is definition of stack cluster using only push and pop operations.

CLU also introduced separation of abstraction from its implementation(s):

An abstraction isolates use from implementation: an abstraction can be used without knowledge of its implementation and implemented without knowledge of its use.

Description unit of the cluster is interface specification of the abstraction. For the data abstractions it included the number and types of parameters, constraints on type parameters, and the name and interface specification of each operation. The implementation involved both selecting a representation for the data objects and implementing each cluster operation in terms of that data representation. Ultimately, there can be many implementations of an abstraction. Each implementation must satisfy the interface specification of the cluster.

Abstraction in UML

Abstraction in UML corresponds to the concept of abstraction in OOD (as described above). UML provides different types (subclasses) of abstraction, including realizations (i.e. implementations).

Abstraction is a dependency relationship that relates two elements or sets of elements (called client and supplier) representing the same concept but at different levels of abstraction or from different viewpoints.

Realization is a specialized abstraction relationship between two sets of model elements, one representing a specification (the supplier) and the other represents an implementation of the latter (the client).

Inheritance

In OOAD and in UML 1.4 inheritance is defined as a mechanism by which more specific classes (called subclasses or derived classes) incorporate structure and behavior of the more general classes (called superclasses or base classes).

Inheritance in UML 1.4

Glossary of the UML 1.4.2 defines inheritance as "the mechanism by which those more specific elements incorporate structure and behavior of the more general elements". Inheritance supplements generalization relationship.

Generalization is defined as a taxonomic relationship between a more general element and a more specific element. The more specific element is fully consistent with the more general element and contains some additional information. An instance of the more specific element may be used where the more general element is allowed.

Inheritance was explained in UML 1.4.2 using the concepts of a full descriptor and a segment descriptor. A full descriptor contains a description of all of the attributes, associations, operations, and constraints that the object contains, and is usually implicit because it is built out of incremental segments combined together using inheritance.

In an object-oriented language, the description of an object is built out of incremental segments that are combined using inheritance to produce a full descriptor for an object. The segments are the modeling elements that are actually declared in a model. They include elements such as class and other generalizable elements. Each generalizable element contains a list of features and other relationships that it adds to what it inherits from its ancestors.

Each kind of generalizable element has a set of inheritable features. For any model element, these include constraints. For classifiers, these include features ( attributes, operations, signal receptions, and methods) and participation in associations. [UML 1.4.2 Specification]

If a generalizable element has more than one parent (multiple inheritance), then its full descriptor contains the union of the features from its own segment descriptor and the segment descriptors of all of its ancestors.

Attributes in UML 1.4 could not be redefined but a method may be declared in more than one subclass. A method declared in any segment supersedes and replaces a method with the same signature declared in any ancestor.

Inheritance in UML 2.x

UML 2.4 and the newest UML 2.5 specifications provide no definition for inheritance. UML 2.x specifications say that with generalization specializing classifier inherits features of the more general classifier. Any constraint applying to instances of the general classifier also applies to instances of the specific classifier.

UML 2.5 provides some vague and incomplete explanation of how inheritance works in UML:

When a Classifier is generalized, certain members of its generalizations are inherited, that is they behave as though they were defined in the inheriting Classifier itself. For example, an inherited member that is an attribute has a value or collection of values in any instance of the inheriting Classifier, and an inherited member that is an Operation may be invoked on an instance of the inheriting Classifier. [UML 2.5 Specification]

Polymorphism

As it is usual with other OOAD concepts, polymorphism is also poorly defined. You can find all kinds of strange definitions of polymorphism, and there is no agreement which one is the best. To make things even worse, I will add my own definition of polymorphism:

Polymorphism is ability to apply different meaning (semantics, implementation) to the same symbol (message, operation) in different contexts.

When context is defined at compile time, it is called static or compile-time polymorphism. When context is defined during program execution, it is dynamic or run-time polymorphism.

It is believed that term "polymorphism" was introduced by Strachey in 1967 [CS 67] to describe operations and functions that could be applied to more than one type of arguments.

A typical example of static "ad hoc" polymorphism in procedural languages like ALGOL-60 or ALGOL-68 is:

sum := x + y;

In this example "+" is polymorphic operation which could be used with different types of operands - integer, real, string, complex, vector, etc. Specific static context - types of operands x and y - will determine at compile time which implementation of "+" is to be used.

This kind of static polymorphism is usually called overloading and means using the same operation symbol or function name on different types. Note, that overloading also allows different number of parameters and sometimes (ALGOL-68) even different priorities.

Another kind of static polymorphism is parametric polymorphism [BSC 77] and is based on templates. In C++ example below, Vector template defines generic method 'elem' and operator '[ ]':

template<class T> class Vector {
  T* v;
  int size;
public:
  Vector();
  Vector(int);
  T& elem(int i) { return v[i]; }
  T& operator[] (int i);
  /* ... */
};
Vector<int> vi;
Vector<Shape*> vps;

In OOAD polymorphism means dynamic polymorphism and is commonly related to as late binding or dynamic binding. It could be defined as:

(Dynamic) Polymorphism is ability of objects of different classes to respond to the same message in a different way.

Virtual Functions

Procedural language Algol-60 allowed to pass a procedure as a parameter to another procedure. In Simula 67 classes may, like procedures, have formal parameters but they did not allow procedures as parameters to classes. Simula 67 authors found another way to have the same statements to have slightly different effect in different objects [SK 03] by declaring a procedure in a class C as "virtual", it could be redefined (overridden) in a subclass D. ... Thus, the same procedure call could activate different “versions” of the procedure, at least in objects of different subclasses.

Dynamic binding mechanism in C++, Java, C# allows to determine behavior (implementation) to be invoked in response to the message received by a specific object. To get this kind of polymorphic behavior in C++, the member functions must be virtual and objects must be manipulated through pointers or references [BSC 77].

Polymorphism in UML

There is no definition of polymorphism in UML specifications but there are some differences in how this term is used in different versions of UML.

In UML 1.4.2 operation declares whether or not it may be realized by a different method in a subclass (which looks very similar to virtual functions in C++) by using isPolymorphic attribute. Methods realizing polymorphic operation have the same signature as the operation and have a body implementing the specification of the operation. Methods in descendants override and replace methods inherited from ancestors.

The isPolymorphic attribute is no longer present in UML 2.4 and UML 2.5, without any explanation. Does it mean that all operations are now polymorphic (virtual), as if inspired by Java language? The UML 2.4.1 specification had one obscure statement mentioning polymorphism in Chapter 11, Actions, and this statement is now removed from UML 2.5:

Operations are specified in the model and can be dynamically selected only through polymorphism.

I remember long time ago, Pascal language was promoted fiercely as a significant simplification compared to the huge, too formal, and complex Algol-68. The result was plain, restrained, and poorly defined language with ambiguous semantics. Hope, UML 2.5 simplification effort will not end up the same way, simple is not always lucid.