ࡱ > ^ [@ C bjbj44 # Vi Vi |
8 | j < w Ξ A п d $ - R
1 ` Z 0O> P
G 0 w U @ ݠ $O T C T
The Effect Of the Number of Concepts On the Readability of Schemas: An Empirical Study With Data Models
Akhilesh Bajaj
The College of Business Administration
The University of Tulsa
Tulsa, OK 74104, USA
Email: akhilesh-bajaj@utulsa.edu
Phone: (918) 631-2786
Fax: (918) 631-2164
ABSTRACT
The number of concepts in a model has been frequently used in the literature to measure the ease of use of creating model schemas. However, to the best of our knowledge, nobody has looked at its effect on the readability of the model schemas, after they have been created. In this work, we operationalize readability along three dimensions: effectiveness, efficiency and learnability; and study the effects of the number of concepts in a data model on these dimensions. Our work makes the following contributions: a) it extends the operationalization of the readability construct, and b) it proposes an empirical methodology that isolates the effect of a model-independent variable (the number of concepts) on readability. From a practical perspective, our findings have implications for both creators of new models, and for practitioners who use currently available models for creating schemas to communicate requirements during the entire lifecycle of the system.
KEYWORDS: cognitive psychology, conceptual modeling, model size, readability, experiment
1. INTRODUCTION
Conceptual models play an important role in the area of requirements modeling. Essentially, a conceptual model is a method of documenting elements of an underlying reality. In the area of modeling organizational requirements for an IS, the underlying reality may be described by an ontology that includes concepts like entities, relationships, properties, processes and roles ADDIN EN.CITE Wand19951400140Y. WandR. Weber1995On the Deep Structure of Information SystemsInformation Systems Journal5203-223[1]. Conceptual model schemas are used as a) a method of either informally or formally documenting end-user requirements, which are initially articulated in a natural language like English, and/or b) a method of optimally designing the subsequent information system (IS). A commonly used example of both a) and b) is the use of the Entity Relationship Model (ERM) ADDIN EN.CITE Chen197691091P.P. Chen1976The Entity-Relationship Model: Towards a Unified Model of DataACM Transactions on Database Systems119-36[2] to capture end-user requirements for constructing a relational database application. Once the requirements are documented in an ERM schema, the ERM schema can then be mapped, using well-known rules, to a measurably good relational schema design. Over a hundred conceptual models have been proposed for requirements modeling ADDIN EN.CITE Olle19861279127T.W. Olle1986Proceedings of the IFIP WG 8.1 Working Conference on the Comparative Review of ISD Methodologies: Improving the Practice.Borth Holland[3]. Next, we examine past work that has empirically evaluated conceptual models.
Dependent Variables In Earlier Empirical Work
A survey of the literature on the evaluation of modeling methods reveals several desirable attributes for conceptual modeling methods, which have been used as dependent variables in past empirical studies. These include a) the adequacy or completeness of the modeling method in being able to represent the underlying reality ADDIN EN.CITE Siau20047350735K. Siau2004Informational and Computational Equivalence in Comparing Information Modeling MethodsJournal of Database Management15173-86Amberg199673073M.A. Amberg1996A Pattern Oriented Approach to a Methodical Evaluation of Modeling MethodsAustralian Journal Of Information Systems413-10Bajaj199675075A. BajajS. Ram1996A Content Specification for Business Process ModelsAustralian Journal of Information Systems4122-31Brosey197886086M. BroseyB. Schneiderman1978Two Experimental Comparisons of Relational and Hierarchical Database ModelsInternational Journal of Man Machine Studies10625-637Kramer19911120112B. KramerLuqi1991Towards Former Models of Software Engineering ProcessesJournal Of Systems and Software1563-74Mantha19871190119R.W. Mantha1987Data Flow and Data Structure modeling for database requirements determination: a comparative studyMIS QuarterlyDecember531-545Moynihan19961223122A. Moynihan1996An attempt to compare OO and functional decomposition in communicating information system functionality to usersWorkshop on evaluation of modeling methods in systems analysis and design: CAiSE*96.[4-10], b) the readability of the modeling methods schemas ADDIN EN.CITE Hardgrave19951050105B.C. HardgraveN. Dalal1995Comparing Object Oriented and Extended Entity Relationship ModelsJournal of Database Management6315-21Shoval19941440144P. ShovalI. Frummerman1994OO and EER Schemas: A Comparison of User ComprehensionJournal of Database Management5428-38[11, 12], and c) how easy it is to use the modeling method to represent requirements ADDIN EN.CITE Siau20027363736K. SiauJ. EricksonL. Lee2002Complexity of UML: Theoretical versus Practical Complexity12th Workshop on Information Technology and Systems (WITS '02)Barcelona, Spain13-18Bock199382082D. BockT. Ryan1993Accuracy in Modeling with Extended Entity Relationship and O-O Data ModelsJournal of Database Management4430-39Kim19951090109Y-G. KimS.E. March1995Comparing Data Modeling FormalismsCommunications of the ACM386103-113Kramer19911120112B. KramerLuqi1991Towards Former Models of Software Engineering ProcessesJournal Of Systems and Software1563-74Shoval19871340134P. ShovalM. Even-Chaime1987Database Schema design: An Experimental Comparison Between Normalization and Information AnalysisDatabase18330-39Siau20013770377K. SiauQ. Cao2001Unified Modeling Language (UML)-A Complexity AnalysisJournal of Database ManagementJan-Mar26-34Siau20017343734K. SiauY. Tian2001The Complexity of Unified Modeling Language-A GOMS ApproachFourteenth International Conference on Information Systems (ICIS '01)New Orleans443-448[8, 13-18]. ADDIN EN.CITE Batra199279079D. BatraA. Srinivasan1992A Review and Analysis of the Usability of Data Modeling EnvironmentsInternational Journal of Man-Machine Studies36395-417[19] present an excellent summary of the early work in the area. More recently, ADDIN EN.CITE Wand20026410641Y. WandR. Weber2002Information Systems and Conceptual Modeling: A Research AgendaInformation Systems Research134363-376[20] and ADDIN EN.CITE Gemino20016433643A. GeminoY. Wand2001Towards Common Dimensions In Empirical Comparisons of Conceptual Modeling TechniquesT. Halpin, K. Siau, J. KrogstieSeventh CAiSE/IFIP-WG8.1 Intenrational WOrkshop on the Evaluation of Modeling Methods in Systems Analysis and DesignToronto, Canada144-151[21] have highlighted the several dimensions along which empirical work can be pursued in the area, while ADDIN EN.CITE Topi20026420642H. TopiV. Ramesh2002Human Factors on Research on Data Modeling: A Review of Prior Research, An Extended Framework and Future Research DirectionsJournal of Database Management1323-19[22] present a summary of recent empirical studies. In this work, we consider the readability of a model as the dependent variable.
The readability of a modeling method essentially indicates how easy it is to read a model schema and reconstruct the underlying domain reality from the schema. Readability is desirable in situations where the model schemas are created by one team of analysts and then need to be read and interpreted by other analysts, system developers or maintenance administrators during the course of the systems lifecycle. For example, if a new database administrator requires an understanding of the schemas of existing database applications in the organization, then the readability of the model schemas that were created during the earlier analysis phases of the projects becomes important.
Next, we examine the independent variables that have been considered in earlier work.
Independent Variables In Earlier Empirical Work
The first independent variable is the level of experience and familiarity of the subjects with the conceptual model used. Readers who are more experienced in the underlying conceptual model are thought to perform better at interpreting the schemas as well. In most studies ADDIN EN.CITE Brosey197886086M. BroseyB. Schneiderman1978Two Experimental Comparisons of Relational and Hierarchical Database ModelsInternational Journal of Man Machine Studies10625-637Palvia19924460446P.C. PalviaC. LiaoP-L. To1992The Impact of Conceptual Models on End-User PerformanceJournal of Database Management344-15Hardgrave19951050105B.C. HardgraveN. Dalal1995Comparing Object Oriented and Extended Entity Relationship ModelsJournal of Database Management6315-21Peleg20004470447M. PelegD. Dori2000The Model Multiplicity Problem: Experimenting with Real Time Specification MethodsIEEE Transactions on Software Engineering2661-18[7, 11, 23, 24], this variable has been controlled, by using subjects with similar backgrounds for all treatment levels. Second, past studies have attempted to control for the level of familiarity with the domain by utilizing domains that are reasonably familiar to all subjects, and further by randomly allocating subjects across treatment levels. A random allocation reduces the likelihood of small differences in domain familiarity between subjects in different treatment levels. A third variable is the underlying complexity of the requirements for a particular situation, where a more complex set of requirements is harder to reconstruct than a simpler set. This is controlled by utilizing the same requirements case across treatments ADDIN EN.CITE Juhn19854453445S. JuhnJ.D. Naumann1985The Effectiveness of Data Representation Characteristics on User ValidationInternational COnference on Information SystemsIndianopolis, IN212-226Kim19951090109Y-G. KimS.E. March1995Comparing Data Modeling FormalismsCommunications of the ACM386103-113Peleg20004470447M. PelegD. Dori2000The Model Multiplicity Problem: Experimenting with Real Time Specification MethodsIEEE Transactions on Software Engineering2661-18[15, 24, 25].
Table 1 summarizes some illustrative examples of past empirical work in measuring the readability of conceptual model schemas. Based on table 1, we note that the independent variable whose effect has been studied is a fourth: different modeling methods, with the variables discussed earlier being controlled. While the results of earlier empirical studies have shown if one models schema is more readable than that of another model, there has been very little attempt to explain why any differences were observed. There has been lack of a theoretical basis for the hypotheses that were examined in empirical work, and for explaining findings. For example, finding that the extended ERM (EER) schema is more or less readable than the objectoriented (OO) model ADDIN EN.CITE Booch19941871187Grady Booch1994Object Oriented Analysis and Design with ApplicationsRedwood CityBenjamin/Cummings[26] schema for a case does not indicate why this was observed. The problem is that existing models view reality in differing ways, and hence differ from each other along several dimensions. Hence, it is difficult to isolate what aspect of a model may cause more or less readability.
StudyIndependent VariablesMeasuresResults ADDIN EN.CITE Brosey197886086M. BroseyB. Schneiderman1978Two Experimental Comparisons of Relational and Hierarchical Database ModelsInternational Journal of Man Machine Studies10625-637[7] a) Hierarchical v/s Relational Models and b) User ExperienceQuestions on domainHierarchical schemas were easier to read by novice users ADDIN EN.CITE Juhn19854453445S. JuhnJ.D. Naumann1985The Effectiveness of Data Representation Characteristics on User ValidationInternational COnference on Information SystemsIndianopolis, IN212-226[25] Semantic v/s non-semantic modelsQuestions on domainSemantic models subjects identified relationships and cardinalities better ADDIN EN.CITE Palvia19924460446P.C. PalviaC. LiaoP-L. To1992The Impact of Conceptual Models on End-User PerformanceJournal of Database Management344-15[23] O-O versus non O-OQuestions on domainO-O subjects performed better ADDIN EN.CITE Shoval19941440144P. ShovalI. Frummerman1994OO and EER Schemas: A Comparison of User ComprehensionJournal of Database Management5428-38[12] EER v/s OOTrue/false questions on domainEER subjects interpreted ternary relationships more correctly ADDIN EN.CITE Hardgrave19951050105B.C. HardgraveN. Dalal1995Comparing Object Oriented and Extended Entity Relationship ModelsJournal of Database Management6315-21[11]EER v/s OMTAbility to understand and time to understandOO subjects were significantly faster at answering questions than EER subjects ADDIN EN.CITE Peleg20004470447M. PelegD. Dori2000The Model Multiplicity Problem: Experimenting with Real Time Specification MethodsIEEE Transactions on Software Engineering2661-18[24] OPM/T v/s OMT/TTrue/false questions on domainOPM/T subjects better at comprehensionTable 1. Illustrative past work on the readability of conceptual models
One possible solution is to identify a set of universal attributes of all models, and then consider treatments that differ along one of these universal attributes. One major step in this direction is the ontological framework called the Bunge Wand Weber framework (BWW) ADDIN EN.CITE Wand19951400140Y. WandR. Weber1995On the Deep Structure of Information SystemsInformation Systems Journal5203-223Weber19976921692R. Weber1997Ontological Foundations of Information SystemsMelbourneCoopers and Lybrand[1, 27]. The BWW framework utilizes an underlying ontology for all information systems. It then compares existing information system models on the basis of the degree to which concepts (or constructs) in the model and the ontology match. For example, a model that does not contain sufficient concepts to capture all the underlying reality is termed to have construct deficit.
A complementary approach for identifying a set of universal attributes is to consider measurable properties of all models. The most obvious example of this kind of universal attribute is the number of concepts in a model: a property which is common to all models and easily measured. In this work, we take the first step towards isolating potential causes of readability of models, by investigating the effect of the number of concepts in the model on the readability of the schema.
The rest of this work is organized as follows. In section 2, we operationalize the variables used in the current study and present the hypotheses. In section 3, we describe the research study, and present the findings. We conclude in section 4, with a discussion, limitations and implications for future research.
2. OPERATIONALIZATION OF VARIABLES AND RESEARCH MODEL
The independent variable in this work is the number of concepts (NOC) in a model, which we define to be a numerical count of the number of distinct syntactic constructs in a model. For example, a simple version of the ERM consisting of entity sets, relationship sets, attributes of entity sets, attributes of relationship sets and primary keys of entity sets, will have an NOC = 5. The NOC has been widely posited to affect several properties of the model, including the ease-of use of creating model schemas ADDIN EN.CITE Marcos19992883288E. MarcosJ. CerveraL. Fernandez1999Evaluation of Data Models: A Complexity MetricKeng SiauFourth Caise / IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and DesignHeidelberg, GermanyRossi19963140314M. RossiS. Brinkkemper1996Complexity Metrics for Systems Development Methods and TechniquesInformation Systems212209-227Castellini19981983198X. Castellini1998Evaluation of Models defined with Charts of Concepts: Application to the UML ModelKeng SiauThird Workshop on the Evaluation of Modeling Methods in Systems Analysis and Design, in concjunction with CAiSE.Pisa, Italy[28-30] and the completeness of the model, as demonstrated by the addition of several new concepts into the recent versions of the Unified Modeling Language (UML) ADDIN EN.CITE Booch19972151215Grady BoochIvar JacobsonJames Rumbaugh1997UML DistilledAddison Wesley179[31]. Thus increasing the number of concepts in a model makes it harder to create model schemas from an analyst standpoint, but enables it to capture more elements of the underlying domain. However, to the best of our knowledge, no one has yet empirically investigated the effect of NOC on the readability of model schemas.
2.1 Operationalization Of NOC
We operationalize NOC to be a numerical count of the number of concepts in a model. We represent this count for each treatment level (conceptual model) i as ni.In this study, the number of conceptual models used as treatments is 2.
2.2 Task
The task performed by the subjects in this study was to read a given schema and then answer questions about the underlying requirements, as implied by the schema. This is very similar to the tasks in earlier work on readability, as shown in table 1.
2.3 Dimensions of Readability
Table 1 also illustrates that the most common operationalization of readability is the mean percentage of correct responses of the subjects in each treatment level, when questioned about the schema. In one case, the amount of time taken by the subjects to answer the questions was also considered. A second contribution of our work is that it extends the operationalization of readability and defines it along three different quantifiable dimensions: the effectiveness, the efficiency and the learnability. This need for extended operationalization of dependent variables is recognized in ADDIN EN.CITE Wand20026410641Y. WandR. Weber2002Information Systems and Conceptual Modeling: A Research AgendaInformation Systems Research134363-376[20] who state: A method must enable stakeholders to elicit knowledge about a domain..The effectiveness and efficiency of a method in accomplishing this task is an important issue for empirical research.
We define readability effectiveness to be the percentage of correct answers given when asked questions about the domain. Readability efficiency is defined as the inverse of the time it takes to answer questions regarding schemas. In addition to these two dimensions, we consider the learnability of the task of interpreting the model schemas when given a particular treatment. Learnability has a strong basis in traditional human computer interaction. For example, ADDIN EN.CITE Nielsen199370170J. Nielsen1993Usability EngineeringAcademic Press[32] considers learnability or ease-of-learning one of the five basic attributes of usability, in his classic text. Learnability is also recognized by ADDIN EN.CITE Shneiderman19981971197B. Shneiderman1998Designing the User InterfaceAddison Wesley LongmanThird[33] as an important metric when tasks are performed using a system. In the context of this study, we define learnability to be the improvement in the dimensions of effectiveness and efficiency of readability, over successive tasks. Our study teases out the effects of NOC on these three dimensions of readability.
Next, we operationalize these three dimensions of readability, and develop the hypotheses that were tested in this study.
2.3.1 ReadabilityEffectiveness (REF)
We operationalize REF as the percentage of questions about the domain that the subject can answer. Thus, for each treatment level i,
REF = EMBED Equation.3
In hypothesizing the effect of NOC on REF, we draw from two different areas of work. First, the BWW framework indicates that a model with a construct deficit will lead to schemas with increased ambiguous information, since a portion of the domain will not be depicted in the schema. Several studies from psychology ADDIN EN.CITE Svensson19816930693O. Svensson1981Are we all less risky and more skilful than our fellow drivers?Acta Psychologica47143-148Kahnemann19826941694D. KahnemannP. SlovicA. Tversky1982Judgment under uncertainty: heuristics and biasesCambridge University Press[34, 35] and economics ADDIN EN.CITE Daniel19986950695K. DanielD. HirshleiferA. Subrahmanyan1998Investor Psychology and Security Marlet Under- and Over- ConfidenceJournal of Finance531839-85Long19906960696B.J. De LongA. ShleiferL.H. SummersR. Waldmann1990Noise Trader Risk in Financial MarketsJournal of Political Economy98703-38[36, 37] indicate that subjects become overconfident about their relative abilities when faced with ambiguous information, especially in domains where they have some familiarity. Based on this work, subjects in our study who are treated with a model with construct deficit will tend to over-estimate their knowledge, and answer incorrectly if asked questions about the information that is lacking in the schema. In other words, a model with lower NOC will lead to lower REF.
Second, support for this hypothesis comes from the signal detection theory of recognition in cognitive psychology, which originated in ADDIN EN.CITE Egan195869710697J.P. Egan1958Recognition Memory and the Operating CharacteristicsBloomingtonIndiana UniversityTechnical NoteAFCR-TN-58-51[38] and is more fully described in ADDIN EN.CITE Reed19886981698S. K. Reed1988Cognition: Theory and ApplicationsBelmont, CalifornaiBrooks/Cole Publishing[39]. Egans study showed subjects a set of items in a trial. The subjects were subsequently given another set of items and asked to identify which items in the new set had been shown in the trial. A false alarm occurred when subjects falsely believed that an item in the new set was also in the old set. It was found that items with which the subjects had greater familiarity had a greater tendency to raise false alarms. This theory indicates that ambiguous information leads to higher error rates when subjects are more familiar with a domain.
Further support for a positive hypothesized correlation between NOC and REF can be found in the finding in the area of reading comprehension that indicate that longer sentences with greater causal links actually tend to improve comprehension versus smaller sentences ADDIN EN.CITE Pearson19786991699P.D. PearsonD.D. Johnson1978Teaching Reading ComprehensionNew YorkHolt[40].
Based on this earlier work, we propose hypothesis 1:
Hypothesis 1:
H1: A higher NOC will lead to a higher REF
Next, we discuss readability efficiency.
2.3.2 Readability Efficiency (REN)
We operationalize REN to be the inverse of the amount of time a subject decides to use to answer the questions in a study, given some reasonable incentive to answer these questions correctly. REN = EMBED Equation.3
In order to hypothesize the effect of NOC on REN we draw inferences from ACT, a semantic network theory in cognitive psychology, described in ADDIN EN.CITE Anderson19787000700J.R. Anderson1978Arguments Concerning Representations for Mental ImageryPsychological Review85249-277Anderson19953341334J.R. Anderson1995Cognitive Psychology and its ImplicationsNew YorkW.H. FreemanFifth[41, 42]. These studies established that subjects tend to take longer to analyze those concept pairs in a semantic network that had richer semantics. For example, the sentence A hippie is in the park cognitively activates the hippie concept and the park concept. Efficiency in processing this link depends on how many other concepts are linked to hippie and park, as well as the number of concepts between hippie and park in the semantic network. Thus a network with richer semantics has more alternatives and a longer path, leading to lower efficiency of analysis by the subjects. Based on these findings we propose hypothesis 2:
Hypothesis 2:
H2: A higher NOC will lead to a lower REN.
Next we discuss the learnability dimension.
2.3.3 Readability Learnability (RLN)
As mentioned earlier, learnability is the improvement in the REF and REN, over successive tasks. We operationalize RLN to be the slope of the curves of REN and REF, over successive tasks, for the same subject.
Thus, RLN(REF) = EMBED Equation.3 where x is the order of the task, in a sequence of m within-subject reading tasks, with x = 1..m.
Similarly, RLN (REN) = EMBED Equation.3where x is the order of the task, in a sequence of m within-subject reading tasks, with x = 1..m.
A lower slope value indicates lower gains in REF or REN, over successive tasks.
We hypothesize that schemas of models with a higher NOC will take longer to learn to interpret. Support for this hypothesis can be found in literature on learning curves, where more complex languages are considered harder to learn ADDIN EN.CITE Reeves19964441444W.W. Reeves1996Cognition and complexity: The cognitive science of managing complexityLanham, MD, USAScarecrow Press Inc.[43]; ADDIN EN.CITE Anderson19953341334J.R. Anderson1995Cognitive Psychology and its ImplicationsNew YorkW.H. FreemanFifth[42]. Based on this, we develop hypothesis 3:
Hypothesis 3:
H3(a): A higher NOC will lead to a lower RLN(REF)
H3(b): A higher NOC will lead to a lower RLN(REN)
Figure 1 displays the research model that is proposed in this work, and the hypotheses that we test.
EMBED PowerPoint.Slide.8
Figure 1. Research Model and Directions of Hypotheses
Having developed the hypotheses, we next describe the experimental study and results.
3. EXPERIMENTAL STUDY
3.1 Subject Selection And Controls
The experimental design was single factor and between-subjects, with two levels of the independent variable (two models with different values of NOC) being applied. The subjects for this study were MIS masters level students in a university based in northeastern USA. As subjects signed up for the experiment, they were randomly assigned to either treatment level. All the subjects were in the age range 22-30, and had one year of experience (two courses) in using conceptual data models, with no previous usage of conceptual data models in the work place. As such, the subjects in this study represent beginner professional level systems analysts.
As mentioned in research and statistic texts, such as ADDIN EN.CITE Fromkin19761007100H.L. FromkinS. Streufert1976Laboratory ExperimentationM.D. DunnetteHandbook of Industrial PsychologyChicagoRand-Mcnally415-465Wonnacott198458158T.H. WonnacottR.J. Wonnacott1984Introductory Statistics for Business & EconomicsJohn Wiley & Sons, Inc.Third[44, 45], random assignment of subjects to different treatment levels eliminates several potential biases attributable to subjects, such as intelligence, previous learning, cultural differences, and language skills. To illustrate this, in the base level of NOC, there were two females out of eight and two non-native English speakers (Chinese), with credible English skills. In the higher level of NOC, there were three females out of eight, with two non-native English speakers (Chinese), with credible English skills. The same tasks (question sets) were given to subjects in both the levels. All of these controls are similar to those used in earlier empirical studies in table 1.
3.2 Independent and Dependent Variables in the Experiment
In order to maintain internal validity, we needed to vary the NOC, but at the same time to utilize models that were at least reasonably similar otherwise. One solution to this was to use two models, such that one was a subset of another model. We selected two versions of the ERM. The sets of concepts in each version are shown below.
ERMbase_level_NOC = {entity sets, relationship sets, attributes of entity sets, attributes of relationship sets, primary keys of entity sets}
ERMhigher_level_NOC = {entity sets, relationship sets, attributes of entity sets, attributes of relationship sets, primary keys of entity sets, cardinalities of relationship sets, inheritance, optionality of relationship sets, weak entity sets}
Note that the ERM with the higher value of NOC has all the concepts of the other model, with extra concepts used to increase the value of NOC. In order to measure the learnability, each level was assigned four tasks. To implement this, four schemas were prepared in each ERM, from four different domains by one researcher. The four domains were: a library, an academic conference management organization, an MIS department in an organization and a hotel. As an example, the schemas of both models for the library domain are shown in Appendix 1. The schemas for the other three domains were similarly constructed.
After the schemas were prepared, a set of 10 true/false/cant tell questions and an answer key were prepared, in advance, for each schema. For example, subjects in both levels received the same question set (shown in appendix 1) for the library schema, but saw different schemas (also shown in appendix 1). Thus, while the tasks were the same, the solutions in each level were different. This is similar to earlier studies that used different models for each level (see table 1) and which also had different solutions for each level.
We controlled for the possibility that subjects in different levels may have a different a priori understanding of the domains by a) selecting domains with which most people in our society have a reasonable degree of familiarity, and b) minimizing small differences that may have existed across treatments by randomly allocating subjects to the two treatment groups.
Subjects had to answer the questions after looking at the ERM schema for their treatment level. As mentioned, the question sets for the two treatment levels were the same. The percentage of correct answers each subject obtained was the REF value for that subject. The inverse of the time each subject took was their REN value. Each subject was given the four schemas, one after the other. For the two experimental groups, the sequence was the same: library, academic conference management organization, MIS department in an organization, and finally a hotel.
To minimize researcher bias, the previously prepared answer key was shown to another expert in data modeling, who concurred that the answer key for each level did indeed provide the correct answers, given the schema. The same researcher scored the completed tasks for all subjects. RLN(REF) and RLN(REN) for each subject were measured by performing a linear regression on the REF and REN scores respectively for that subject (each subject had four values of REF and four values of REN, one each for each domain) and by using the slope of the regression line.
3.3 Experimental Process
All the subjects had been trained for a year in the basic concepts behind data modeling. However, at the start of each treatment, the subjects were first refreshed regarding the concepts behind the ERM that they would be using. For the base NOC level ERM, the instruction took 20 minutes, while, for the higher NOC level, the instruction took approximately 40 minutes. For each model, instruction was stopped after all subjects indicated that they were comfortable with the model, and had worked through the same practice examples for each model. The same instructor was used to instruct both treatment levels.
After the instruction, the subjects were told they would be given four schemas, one after the other (only one schema at a time). As an incentive, the subjects were offered $20 to participate in the study, an additional $8 if they had a correct score of over 90% (across 40 questions) and $2 if they finished the study in less than 60% of the maximum time that was allocated. The reasoning behind the incentive scheme was to model the motivation that drives analysts in the real world when reading schemas on projects. Essentially, subjects had a higher incentive to get answers right, and a lower, but still finite incentive to take less time than was allocated. The reason for the latter incentive was that if it did not exist, then subjects would essentially take the maximum time they could to perform the study, thereby disallowing the measurement of REN. The entire protocol was pilot tested and worked adequately.
3.4 Data Analysis
Table 2 shows the raw REF and REN scores for each experimental group, across the four domains.
NOC Baseline NOC Higher LevelREFRENREFREN500.4444441000.480077400.444444900.413907600.5900.395257300.5555561000.510204600.4800.374532400.4800.363636500.480769900.420168900.520833700.47259800.429185600.342853500.387597900.368189700.6666671000.422476500.4347831000.454545600.25400.413787400.374532600.571429600.414938500.469484800.510204800.458085900.387597600.238095800.460829800.25800.5800.357143800.520833800.3367800.374532600.224719600.27933700.215983800.387597600.285714800.324675900.319489800.374532600.363636500.434783700.2832861000.444444900.420168900.5464481000.487805500.374532500.245881500.460829400.220751600.5900.297600.5800.331565Means650.44015476.250.369036Table 2. Raw REF and REM Measures for the two treatment levels
across domains
A 2-tailed t-test analysis of the difference between the mean REF and REN scores of the two groups across the four domains is shown in table 3. This analysis is equivalent to testing the null hypothesis that the means are equal. Analyzing a single mean across all the domains reduces the bias that may have occurred if only domain (e.g., only the library) had been used, with only one schema for each treatment level.
Dependent Variables Testedt-value2-tailed p valueREF-2.50.014REN3.20.002Table 3. Results of 2-tailed t-test for REF and REN means across the treatment groups
The results in table 3 show statistically significant support for both hypotheses H1 and H2.
Table 4 shows the raw slope scores for the REF and REN for the two models.
NOC Baseline NOC Higher LevelREF SlopesREN SlopesREF SlopesREN Slopes-40.03800.007100.044-20.035-70.03130.03880.082110.003-30.0460.042-0.00480.03580.05140.0540.04140.033Means2.250.0480.030125Table 4. Raw REF and REM Slope values for all subjects in both treatment levels.
A 2-tailed t-test analysis of the difference between the mean REF and REN slopes of the two groups is shown in table 5. This analysis is equivalent to testing the null hypothesis that the means are equal. Note that we do not graph lines with the mean slopes of both treatment groups, since we are interested in the differences between the slopes for the two groups.
Dependent Variables Testedt-value2-tailed p valueRLN(REF)-1.8250.089RLN(REN)0.970.35Table 5. Results of 2-tailed t-test for REF and REN slope means across the treatment groups
The results in table 5 show statistically significant support for the contra of hypothesis H3(a) at the 10% level, and lack of support for hypothesis H3(b). Next we discuss the significance of the findings, limitations of this study and directions for future research.
4. DISCUSSION, LIMITATIONS AND FUTURE RESEARCH
4.1 Discussion and Implications
The support for H1 implies that models with more concepts will lead to more accurate conceptualizations of the underlying domain. This finding is in accordance with the theoretical areas we examined. The BWW framework indicates that models with construct deficit will lead to more ambiguous schemas. Since many of our questions in the lower NOC treatment dealt with concepts that were not in the schema, the correct answer in many cases was cant tell from the model schema. The fact that subjects chose a definite answer (either true or false) in the face of this uncertain information is to be expected, given earlier work in psychology and economics on overconfidence in decision making under uncertainty, when the subjects are in a familiar domain. A clear implication here is that when users or analysts are faced with a leaner schema in a domain in which they have some familiarity, they will tend to make erroneous inferences. In such cases, it may be preferable to adopt a richer schema by utilizing a model with a higher NOC. Support for H1 also validates the signal detection theory of recognition, where familiarity with a domain leads to an exaggerated emphasis on the objects therein.
The significant support for H2 implies that adding more concepts to a model increases the amount of time it takes to map back to the problem. This finding is in accordance with the ACT theory, which signifies that richer semantic networks take longer to navigate cognitively. Taken together, the findings above imply that while schemas of models with larger NOC will take longer to read, they will lead to a more accurate interpretation of the underlying domain reality.
The support for the contra of H3(a) at the 10% level, indicates that the learnability of models with a larger number of concepts is faster than for a fewer number. In the context of the experiment done in this study, one possible explanation is that there was little learning associated with the basic model, since it had a low value of NOC, whereas there was some learning associated with the model with higher NOC, and this learning occurred at a faster rate. This result may be different in a replication, if the NOC value becomes much higher than that used in this study.
From table 4, we note that the time required to process the schemas reduces for both treatment levels (a positive slope) as subjects get more experienced in processing schemas. The lack of support for H3(b) indicates that the reduction in time required is not affected by the number of concepts in the model, i.e. no significant difference is found between the rates of learning across the two treatment levels.
From a theoretical perspective, this work makes two distinct contributions. First, it takes a step towards empirical validation of theory regarding the readability of conceptual models, building on the work in ADDIN EN.CITE Siau19997330733K. Siau1999Information Modeling and Method Engineering: A Psychological PerspectiveJournal of Database Management10444-50[46]. As table 1 indicates, past studies used models that differed in many ways (many had different ways of interpreting the underlying reality) and this led to a lack of generalizability of past work. In this work, we present an empirical methodology with better controls, that can hopefully be used to study the effects of other factors on readability, and indeed other dependent variables as well. Second, this work extends the operationalization of the readability construct to incorporate learnability, effectiveness and efficiency. In past studies, efficiency was not investigated because of a potential conflict with effectiveness (to maximize effectiveness, subjects will take the maximum time allocated for the study, if the entire reward scheme is based on effectiveness). The incentive scheme for subjects in this study indicates one potential method of incorporating the tradeoff that analysts face in the real world, between being correct and being timely in the performance of their tasks.
From a practical standpoint, the findings in this study have implications for both developers of new models, as well as for practitioners who use existing models to analyze requirements. For developers of new models, the findings indicate that adding more concepts into a model (such as, for example, UML or the ERM) will actually add to the readability effectiveness of the model schemas. Of course, adding more concepts could cause the models to be harder to use to actually create schemas ADDIN EN.CITE Marcos19992883288E. MarcosJ. CerveraL. Fernandez1999Evaluation of Data Models: A Complexity MetricKeng SiauFourth Caise / IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and DesignHeidelberg, Germany[28]; ADDIN EN.CITE Rossi19963140314M. RossiS. Brinkkemper1996Complexity Metrics for Systems Development Methods and TechniquesInformation Systems212209-227[29]; ADDIN EN.CITE Castellini19981983198X. Castellini1998Evaluation of Models defined with Charts of Concepts: Application to the UML ModelKeng SiauThird Workshop on the Evaluation of Modeling Methods in Systems Analysis and Design, in concjunction with CAiSE.Pisa, Italy[30]. Adding more concepts will also mean that the time taken to process the model schemas will increase. However, if the goal is to create model schemas that can be useful in communicating to other analysts, or to future maintainers of the system, then adding more concepts is an appropriate strategy.
From an analyst perspective, our findings assist in strategies at the beginning of the analysis phase, when decisions are made regarding how many concepts to actually use to create the schemas. Our findings indicate that simplifying the number of concepts that are chosen to be used will reduce the accuracy of the model schemas created, when these schemas are interpreted by other personnel. If one of the goals of using the model is as a communication tool with system maintenance personnel or with other analysts, then a good strategy is to increase the number of concepts that are chosen to be used. For example, instead of using a simplified version of the class diagram in UML, it may make sense to use some of the more recently added concepts, even if they make it harder to create the model schema. Thus, this study introduces the notion of a tradeoff between ease of use when creating the schemas versus the readability of the final schema.
4.2 Limitations of the Study and Future Research
The findings of this study used two levels of NOC for a data model. The interpretation of the findings should be taken within this context. Thus, it is quite possible that information overload may creep in if too many concepts are used in a model, leading to a negative effect on the readability effectiveness of the schemas. This is a potential topic for future research. Second, as the value of NOC increases, the effect on learnability may also change, if the value of NOC becomes excessive. Third, the models used here were data models. It is possible that different results may be used for other types of conceptual models (such as process or workflow models). Replicating this study with other types of models will lend greater external validity to theory building in the area. Finally, the NOC variable is just one variable whose effects we have investigated here. Studying the impacts of other variables such as the experience of the subjects, the ontological fit and the complexity of the requirements are all important topics for future empirical work in this area.
REFERENCES
ADDIN EN.REFLIST [1] Y. Wand and R. Weber, "On the Deep Structure of Information Systems," Information Systems Journal, vol. 5, pp. 203-223, 1995.
[2] P. P. Chen, "The Entity-Relationship Model: Towards a Unified Model of Data," ACM Transactions on Database Systems, vol. 1, pp. 9-36, 1976.
[3] T. W. Olle, "Proceedings of the IFIP WG 8.1 Working Conference on the Comparative Review of ISD Methodologies: Improving the Practice.," Borth Holland, 1986.
[4] K. Siau, "Informational and Computational Equivalence in Comparing Information Modeling Methods," Journal of Database Management, vol. 15, pp. 73-86, 2004.
[5] M. A. Amberg, "A Pattern Oriented Approach to a Methodical Evaluation of Modeling Methods," Australian Journal Of Information Systems, vol. 4, pp. 3-10, 1996.
[6] A. Bajaj and S. Ram, "A Content Specification for Business Process Models," Australian Journal of Information Systems, vol. 4, pp. 22-31, 1996.
[7] M. Brosey and B. Schneiderman, "Two Experimental Comparisons of Relational and Hierarchical Database Models," International Journal of Man Machine Studies, vol. 10, pp. 625-637, 1978.
[8] B. Kramer and Luqi, "Towards Former Models of Software Engineering Processes," Journal Of Systems and Software, vol. 15, pp. 63-74, 1991.
[9] R. W. Mantha, "Data Flow and Data Structure modeling for database requirements determination: a comparative study," MIS Quarterly, pp. 531-545, 1987.
[10] A. Moynihan, "An attempt to compare OO and functional decomposition in communicating information system functionality to users," presented at Workshop on evaluation of modeling methods in systems analysis and design: CAiSE*96., 1996.
[11] B. C. Hardgrave and N. Dalal, "Comparing Object Oriented and Extended Entity Relationship Models," Journal of Database Management, vol. 6, pp. 15-21, 1995.
[12] P. Shoval and I. Frummerman, "OO and EER Schemas: A Comparison of User Comprehension," Journal of Database Management, vol. 5, pp. 28-38, 1994.
[13] K. Siau, J. Erickson, and L. Lee, "Complexity of UML: Theoretical versus Practical Complexity," presented at 12th Workshop on Information Technology and Systems (WITS '02), Barcelona, Spain, 2002.
[14] D. Bock and T. Ryan, "Accuracy in Modeling with Extended Entity Relationship and O-O Data Models," Journal of Database Management, vol. 4, pp. 30-39, 1993.
[15] Y.-G. Kim and S. E. March, "Comparing Data Modeling Formalisms," Communications of the ACM, vol. 38, pp. 103-113, 1995.
[16] P. Shoval and M. Even-Chaime, "Database Schema design: An Experimental Comparison Between Normalization and Information Analysis," Database, vol. 18, pp. 30-39, 1987.
[17] K. Siau and Q. Cao, "Unified Modeling Language (UML)-A Complexity Analysis," Journal of Database Management, pp. 26-34, 2001.
[18] K. Siau and Y. Tian, "The Complexity of Unified Modeling Language-A GOMS Approach," presented at Fourteenth International Conference on Information Systems (ICIS '01), New Orleans, 2001.
[19] D. Batra and A. Srinivasan, "A Review and Analysis of the Usability of Data Modeling Environments," International Journal of Man-Machine Studies, vol. 36, pp. 395-417, 1992.
[20] Y. Wand and R. Weber, "Information Systems and Conceptual Modeling: A Research Agenda," Information Systems Research, vol. 13, pp. 363-376, 2002.
[21] A. Gemino and Y. Wand, "Towards Common Dimensions In Empirical Comparisons of Conceptual Modeling Techniques," presented at Seventh CAiSE/IFIP-WG8.1 Intenrational WOrkshop on the Evaluation of Modeling Methods in Systems Analysis and Design, Toronto, Canada, 2001.
[22] H. Topi and V. Ramesh, "Human Factors on Research on Data Modeling: A Review of Prior Research, An Extended Framework and Future Research Directions," Journal of Database Management, vol. 13, pp. 3-19, 2002.
[23] P. C. Palvia, C. Liao, and P.-L. To, "The Impact of Conceptual Models on End-User Performance," Journal of Database Management, vol. 3, pp. 4-15, 1992.
[24] M. Peleg and D. Dori, "The Model Multiplicity Problem: Experimenting with Real Time Specification Methods," IEEE Transactions on Software Engineering, vol. 26, pp. 1-18, 2000.
[25] S. Juhn and J. D. Naumann, "The Effectiveness of Data Representation Characteristics on User Validation," presented at International COnference on Information Systems, Indianopolis, IN, 1985.
[26] G. Booch, Object Oriented Analysis and Design with Applications. Redwood City: Benjamin/Cummings, 1994.
[27] R. Weber, Ontological Foundations of Information Systems. Melbourne: Coopers and Lybrand, 1997.
[28] E. Marcos, J. Cervera, and L. Fernandez, "Evaluation of Data Models: A Complexity Metric," presented at Fourth Caise / IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Heidelberg, Germany, 1999.
[29] M. Rossi and S. Brinkkemper, "Complexity Metrics for Systems Development Methods and Techniques," Information Systems, vol. 21, pp. 209-227, 1996.
[30] X. Castellini, "Evaluation of Models defined with Charts of Concepts: Application to the UML Model," presented at Third Workshop on the Evaluation of Modeling Methods in Systems Analysis and Design, in concjunction with CAiSE., Pisa, Italy, 1998.
[31] G. Booch, I. Jacobson, and J. Rumbaugh, UML Distilled: Addison Wesley, 1997.
[32] J. Nielsen, Usability Engineering: Academic Press, 1993.
[33] B. Shneiderman, Designing the User Interface, Third ed: Addison Wesley Longman, 1998.
[34] O. Svensson, "Are we all less risky and more skilful than our fellow drivers?," Acta Psychologica, vol. 47, pp. 143-148, 1981.
[35] D. Kahnemann, P. Slovic, and A. Tversky, Judgment under uncertainty: heuristics and biases: Cambridge University Press, 1982.
[36] K. Daniel, D. Hirshleifer, and A. Subrahmanyan, "Investor Psychology and Security Marlet Under- and Over- Confidence," Journal of Finance, vol. 53, pp. 1839-85, 1998.
[37] B. J. D. Long, A. Shleifer, L. H. Summers, and R. Waldmann, "Noise Trader Risk in Financial Markets," Journal of Political Economy, vol. 98, pp. 703-38, 1990.
[38] J. P. Egan, "Recognition Memory and the Operating Characteristics," Indiana University, Bloomington, Technical Note AFCR-TN-58-51, 1958.
[39] S. K. Reed, Cognition: Theory and Applications. Belmont, Californai: Brooks/Cole Publishing, 1988.
[40] P. D. Pearson and D. D. Johnson, Teaching Reading Comprehension. New York: Holt, 1978.
[41] J. R. Anderson, "Arguments Concerning Representations for Mental Imagery," Psychological Review, vol. 85, pp. 249-277, 1978.
[42] J. R. Anderson, Cognitive Psychology and its Implications, Fifth ed. New York: W.H. Freeman, 1995.
[43] W. W. Reeves, Cognition and complexity: The cognitive science of managing complexity. Lanham, MD, USA: Scarecrow Press Inc., 1996.
[44] H. L. Fromkin and S. Streufert, "Laboratory Experimentation," in Handbook of Industrial Psychology, M. D. Dunnette, Ed. Chicago: Rand-Mcnally, 1976, pp. 415-465.
[45] T. H. Wonnacott and R. J. Wonnacott, Introductory Statistics for Business & Economics, Third ed: John Wiley & Sons, Inc., 1984.
[46] K. Siau, "Information Modeling and Method Engineering: A Psychological Perspective," Journal of Database Management, vol. 10, pp. 44-50, 1999.
APPENDIX 1
Figure 1 shows the library schema for the base line NOC model. Figure 2 shows the library schema for the higher level NOC model. The questionnaire following figure 2 was used to measure the mapping from the schema to the underlying domain.
Figure 1. Base Line NOC model library schema
Figure 2. Higher Level NOC model library schema
Questionnaire to Test Mapping to Underlying Domain
For each question, please select the right choice (only one choice per question). Our answers should not be based on actual library systems, but on what is represented in the model schema:
Every book needs to have at least one subject area
True ____
False ____
Cant tell from the model schema ____
Users can reserve and checkout the same book at the same time
True ____
False ____
Cant tell from the model schema ____
An author can write books in multiple subject areas
True ____
False ____
Cant tell from the model schema ____
A book can be on multiple shelves at the same time
True ____
False ____
Cant tell from the model schema ____
A reading area can be near multiple shelves
True ____
False ____
Cant tell from the model schema ____
From the schema, we can determine which user is sitting on which chair in the library
True ____
False ____
Cant tell from the model schema ____
We can find out the number of books checked out by the user in a year
True ____
False ____
Cant tell from the model schema ____
A book can have multiple checkouts on the same date
True ____
False ____
Cant tell from the model schema ____
We can tell which users are interested in which subject areas
True ____
False ____
Cant tell from the model schema ____
We can tell which author has the most checkouts
True ____
False ____
Cant tell from the model schema ____
In this work, the terms conceptual model or model refer to the modeling method. We refer to the application of a modeling method for a particular situation as a model schema.
PAGE 7
( Y j k { %
q t @ Y E
F
G
X
i
j
}uj h&H