Welcome to Modeling | Modeling News | Modeling Videos | Latest Modeling Trends


Saturday, October 27, 2007

Essential Skills of Data Modeling, The

The critical data modeling issue is learning to think like a data modeler. The representation method is a less important concern, because all dialects of these methods capture the same core data. For data modeling teachers, there are two issues. First, what representation method enables quick sketching of models on a board? Second, what method should students use to capture the fine detail for their assignments? Other issues related to teaching data modeling are also discussed, including the argument for intertwining the teaching of data modeling and SQL

Approximately 20 years of teaching data modeling have provided ample evidence to me that the critical issue is not how to represent entities and relationships. The essential skill is learning to identify entities and the correct relationships among them. Students demonstrate over and over again the difficulty that they have in learning to think like a data modeler. The various representation methods, such as Chen's E-R diagram (Chen 1976) and UML (Rumbaugh, Jacobson, and Booch 1999), are easily learned. However, proficiency with a representation method does not make a data modeler. Data modeling is a higher-level skill than drawing a diagram.

I reiterate continually to my students, and in my textbook (Watson 2006), that the purpose of a data model is to capture reality so that the database based on the data model can be used to answer questions about reality. A model that fails to represent reality is likely to fail at some point because a client's question cannot be converted to SQL. When realworld entities or relationships are not represented in a data model, then real-world queries about these missing entities and relationships cannot be answered. I learned this need to focus on reality representation early in my career when I was given data collected by pharmacy researchers who needed some statistical analysis. The very first analysis they asked me to run, which was central to their research, could not be answered because they had not collected data on the relationship between two central entities. The data model did not represent the reality they wanted to study, and I could not answer a key question about that reality.

Some of the typical errors that students make include:

* Not recognizing that an attribute is an entity

* Failing to generalize several entities as a single entity

* Not reading a relationship both ways and thus making a cardinality mistake

* Ignoring exceptions that result in a failure to represent reality.

These are problems of domain understanding and not representation. It does not matter how you represent errors of domain interpretation. They are still errors.

As far as I am concerned, reality is far more important than representation. Even if there were a single best representation, and I don't believe there is, as I will argue shortly, it is worthless if the resulting data model does not capture the domain of interest's reality. It is all about reality and not so much about representation.

2. REPRESENTATION CHOICES

From a teaching perspective, there are two issues with regard to representation. First, I need a method that enables me to quickly sketch a model on a board. Second, I need a method that my students can use to capture the fine detail of a model and generate SQL create and constraint statements.

Thus, I don't believe that one can argue that one data modeling dialect is superior to another without considering the context in which it is used or the tools available. If a student comes to my office to ask for help with a model, I find it far easier to work with them using a sheet of paper or whiteboard than sitting at a computer using a CASE tool.

2.1 Minimalist, Quick Modeling

For quick modeling on a board or sheet of paper, when dealing with a class or small group of students respectively, a data modeling dialect is required that quickly records entities, identifiers, attributes, and relationships. It should also be a system that is quickly amended in response to class questions or recognition of exceptions. Thus, I don't worry about modality (optional or mandatory), but I am concerned about recognition of weak entities (represented by a '+') because of their impact on primary keys and maybe foreign keys. My models (see Figure 1) sparsely capture the essentials.

2.2 case Tools

In my current class, students use DB Visual Architect (DB VA) under the auspices of Visual Paradigm's academic partnership. DB VA is typical of the E-R modeling tools in common use in industry. Models can be drawn quickly and fine detail recorded, such as modality and data type. When the model is complete, students can generate SQL statements for creating the database.

As Table 1 shows, translation between the two representations' dialects is a simple task, and my students have had little difficulty in making the transition between my board drawings and DB VA. Indeed, some students directly translate my drawings into DB VA format in class so they have a record of all the models I have discussed. I had to cover two aspects of DB VA with the class that were not immediately obvious: representation of recursive relationships and weak entities.