Welcome to Modeling | Modeling News | Modeling Videos | Latest Modeling Trends


Tuesday, November 13, 2007

Toward a Next Generation Data Modeling Facility: Neither the Entity-Relationship Model nor UML Meet the Need

In this article, we define five purposes of a data model and describe a typical data modeling problem. We then evaluate the Entity-Relationship and Unified Modeling Language data models against those five purposes in the context of the example problem. We find severe limitations with both data models. We conclude the article with a survey of the characteristics needed for a new data model.

A database is a model of the users' perceptions of the objects in their business environment. Databases succeed or fail on how well they match the users' perceptions. Database designs that do not support the user's perceptions will be judged to be "difficult to use" or "not really what I need." In some cases, database designs that conflict with the users' perceptions can be made usable by complicating the logic of application programs to transform the given database structure into the user-perceived application components. Such programs are needlessly expensive to develop and a nightmare to maintain.

For all but the simplest databases, it is too difficult to express the users' perceptions in terms of a particular database model such as the relational model. Instead, the users' perceptions are normally first expressed in terms of a data model, which is an abstraction of the users' view. Data models thus serve as an intermediary between the users' requirements on one hand and the DBMS database design. The data model is normally constructed during the requirements stage of a database project and is converted into a database design during the design stage.

We cannot overemphasize that the primary purpose of a data model is to describe and document the users' view of their world. A data model is not a tool for recording a database design. The primary purpose of a data model is not to define the tables that will appear in the database. A relational schemata is a representation of a DBMS storage definition, not of the users' perceptions. Unfortunately, the table model is not rich enough to represent the users' needs. Consequently, without a suitable data modeling facility, the developers contort the users' requirements into the relational schemata and in the process, lose many important requirements.

We believe that neither the existing versions of the entityrelationship model nor the UML data model are adequate for use as a data model for documenting user requirements. We believe that both have significant limitations and that either a new data model or a substantially extended version of E-R or UML is needed.

Our argument proceeds as follows: We begin by defining characteristics of a desirable data model. We then describe an example problem and demonstrate, in subsequent sections, how neither the E-R model nor UML adequately describes that example. We conclude with a description of what we believe are the minimum requirements for an appropriate data model.

1.1 Needed Characteristics of a Data Model

In our view, a data model should have the following characteristics:

1. Sufficiently robust to readily express the users' perceptions

2. As simple as possible

3. Independent of any physical database model

4. Utilize domains with inheritable properties

5. Readily support database migration

Let us now consider each of the criteria in turn.

1.2 Sufficiently Robust

The features and functions of a data model must be rich enough to support the users' perceptions of the objects in their world. Of course this means that a data model should represent the entities and their relationships, but additional features and functions should allow the modeling of many other semantic constructs as well.

Consider two examples. First, suppose the user wants to keep track of customers and indicates that those customers hve an Address that consists of Street, City, State, and Zip. Additionally, the user states that Address is not required, but that if any portion of the Address is provided, then all of the elements of address become required. Thus, in a data entry form, the user need not enter any part of address, but if the user enters a value for, say, City, then all of the attributes Street, State, and Zip become required.

A second example is more subtle. Suppose the same users states that each customer has a contact person. For each contact, the user wants to record Name, Email, and Phone. A customer must have at least one contact, but may have as many as 3. Name is required, and both Email and Phone are optional. The question then becomes, is a contact simply an attribute of a customer, or is a contact a separate thing, independent of customer that has a relationship to a customer? We will return to this question in a moment.

An easy way to visualize these requirements is to suppose that we have constructed a prototype customer form and we record the underlying structure of that form. Figure 1 shows such a form-based schematic. The dotted subscript notation indicates the minimum and maximum cardinalities of each attribute, respectively. The 1.1 subscript on Name means that exactly one value of Name is required and allowed. The 0.1 subscript on Description indicates that no value for Description is required, but that a maximum of one value is allowed.