Talk:Data model

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
WikiProject Computing (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
WikiProject Databases / Computer science  (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as High-importance).
WikiProject Systems (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is within the field of Software engineering.

Merge datamodeling here in data model[edit]

This must seem like an odd merge proposal, because the current data model article here is no more then a stub. Now I am not proposing a full merge. I would like to:

  1. merge most content (about datamodels) in the datamodeling article here
  2. and specialize the datamodeling article really on data modelling

At the moment both articles seems to be all about datamodels, and I think this could be improved by more specializing both articles. -- Marcel Douwe Dekker (talk) 13:37, 7 October 2008 (UTC)

I started merging both articles and have improved this article some more. For now three sections needs to be created/improved about: history, database models and data modeling. I think I will first try to improve those articles some more, before I get back here. -- Marcel Douwe Dekker (talk) 00:09, 8 October 2008 (UTC)

Distinction between structure and function in relation to data[edit]

I suggest we separate model that define structure of data (this article) from models that define process or function (see function model, process model, business model and enterprise model. I would also distinguish between the terms 'model' and 'diagram' or 'notation', following the ISO distinction between 'meaning' and 'expression'. Data Flow Diagram is a particular notation for a model that describes the function that involves some transformations or movement or data. It complements data models, but is not a data model itself. Therefore I took the liberty to move the section of DFD to 'related models' and added the link to 'function model' in the introductory paragraph. Equilibrioception (talk) 22:32, 10 January 2009 (UTC)

Thanks for the move. I allready realized in this article based on these arguments:
  1. A data model is defined as a model that describes how data is represented and accessed.
  2. And a Data flow diagram doesn't fit this definition.
  3. A Data flow diagram shows how data flows through an enterprise.
  4. The "flow of data" is a subject beside the "representation and access", so both models are complentary.
I guess I agree with your other suggestions. I don't know if you noticed but I have been rewritting all of these five articles data model, function model, process model, business model, enterprise model) recently, and the database model and view model and a lot more. In fact I created the function model and view model, and recreated the business process modelling and database model articles to get a clear division of "basic Software engineering models" in Wikipedia. One way or an other these models all seem to come together in the Enterprise Architecture Frameworks.
Back to this article I notices one other thing, which I don't know yet what to think of it. Now the Entity-relationship model is listed as Database model. I wonder if it shouldn't be listed as separate type of data model. What do you think?
-- Marcel Douwe Dekker (talk) 22:55, 10 January 2009 (UTC)
I just read some other text by William Olle (1996) here:
The term "data model" has been the source of confusion. It is most widely used to refer to a model for a specific business area (order processing, insurance claims, airline seat reservations) prepared using to a data modelling technique. Unfortunately, the term "data model" was hijacked in the early seventies and used in the sense of "the network data model, the relational data model and the hierarchical data model"... This use of the term has been widely taught and causes confusion whenever one is in a group which needs to reference both interpretations...
I wonder if this confusing is also in this article. Does for example the listed Geographic data model classifies as datamodel? And isn't the Data Structure Diagram a type of Entity-relationship model? -- Marcel Douwe Dekker (talk) 23:29, 10 January 2009 (UTC)
I agree with your approach. I noticed the lists of articles at your user page (very impressive, by the way) - this can be very useful to provide an overview for the e.g. entire field of models, which can make a difference in the uniformity of coverage. I agree with you that there is some confusion in the usage of the concepts data model and even data structure. Equilibrioception (talk) 03:43, 13 January 2009 (UTC)
My approach leaves me with a lot of unanswered questions, which gives me reason to proceed. I am not sure I understand your suggestions about provide "an overview for the e.g. entire field of models". However, this is or maybe has been one of my prime objectives, and this is why I started the scientific modeling article 3 years ago.
Creating a uniformity in a Wikipedia article is called "Wikification" here. Creating a uniformity in the coverage of any subject in Wikipedia is an other balgame of creating, merging and deleting articles. Two moths ago I for example proposed to merge the Logical data model, Logical schema and Semantic data model.
I still wonder if the Entity-relationship model should be listed as Database model or not. What do you think?
-- Marcel Douwe Dekker (talk) 23:10, 13 January 2009 (UTC)
By "an overview for an entire field" I meant the list of related articles, such as ones at your user page, or the topics of the Wikipedia, like list of mathematics lists or topic overviews, for example topic outline of information science. Regarding the Entity-Relationship Model, in my opinion it is definitely a Data Model. I will add it to the list of "see also" links for Data Model.
-- Equilibrioception (talk) 14:28, 25 January 2009 (UTC)
Thanks, but I think you missed my question. At the moment the Entity-relationship model is allready listed in the Types of data models/Data base model, the seventh item. My question is if this is right or wrong?
As I mentioned, in my opinion the Entity-Relationship model does belong to the Data Model article, because it represents a certain kind of a Data Model.
-- Equilibrioception (talk) 16:07, 26 January 2009 (UTC)
As to that overview of the field of modelling, the scientific modeling gives such an overview, or not? -- Marcel Douwe Dekker (talk) 23:17, 25 January 2009 (UTC)
Yes, it does a very good job of providing an overview to the field. I'll make few suggestions at the Talk:Scientific modelling page. Given your interest in models and modeling, what are your thoughts about proposing a Wikiproject on Modeling as a subproject under Computing ?
-- Equilibrioception (talk) 16:16, 26 January 2009 (UTC)
Ok thanks. I don't know about such a subproject. I am more of an systems engineer, and that is why I initiated the WikiProject Systems. I created a separate field of scientific modelling within this project, because of my own interest. But there is not much more movement here. -- Marcel Douwe Dekker (talk) 22:32, 26 January 2009 (UTC)

View model[edit]

I am adding corrections to the articles that are in support of my day job, and as the next few months I am going to be responsible for reconciling the terminology between two ISO standards, one of them being the ISO/IEC 42010 IEEE Recommended Practice for Architectural Descriptions of Software-Intensive Systems, I am willing to put some time into the models and viewpoints.

I looked at the view model. My first impression was that this article is about the models of user interface, in other words, that it is related to the model-view-controller architecture pattern. I like the content of the article, with one exception. Do you think, the section about perspective and projection in maps is less directly related to models? It looked somewhat anomalous, it is only an analogy (maybe a useful one).
I do not think, there is a view model in the same sense as e.g. a data model. The concept of a view belongs to a higher meta-level as it provides some organization to models.
Here is what IEEE 42010 says:
In the conceptual framework of this recommended practice, an architectural description is organized into one or more constituents called (architectural) views. Each view addresses one or more of the concerns of the system stakeholders. The term view is used to refer to the expression of a system’s architecture with respect to a particular viewpoint.
NOTE—This recommended practice does not use terms such as functional architecture, physical architecture, and technical architecture, as are frequently used informally. In the conceptual framework of the recommended practice, the approximate equivalents of these informal terms would be functional view, physical view, and technical view, respectively.
Other information, not contained in any constituent view, may appear in an AD , as a result of an organization's documentation practices. Examples of such information are the system overview, the system context, the system stakeholders and their key concerns, and the architectural rationale.
A viewpoint establishes the conventions by which a view is created, depicted and analyzed. In this way, a view conforms to a viewpoint. The viewpoint determines the languages (including notations, model, or product types) to be used to describe the view, and any associated modeling methods or analysis techniques to be applied to these representations of the view. These languages and techniques are used to yield results relevant to the concerns addressed by the viewpoint.
An architectural description selects one or more viewpoints for use. The selection of viewpoints typically will be based on consideration.
What do you think?

Equilibrioception (talk) 03:43, 13 January 2009 (UTC)

I have copied this question to the Talk:View model#View model page and have given a respons over there. -- Marcel Douwe Dekker (talk) 21:12, 13 January 2009 (UTC)

Data Models in Telecommunications and Networking[edit]

Data Models, i.e. formal approaches to describe how data is represented and accessed are one of the key topics for the telecommunications and network protocols.

  • One of the key International Standards in this area is ASN.1, defined in 1984 by ITU-T. ASN.1 is to my knowledge one of the first formal models of data. The emphasis of ASN.1 is efficient encoding and cross-platform interoperability, and it was the dominant standard in the pre-XML days. It is integrated in Z-series specification languages, standardized by ITU.
    • the current article on ASN.1 is ok, but can be improved
  • Another standard related to organization of data in the interoperability context is the CORBA IDL, part of the CORBA specification by OMG. CORBA IDL is the foundation for several OMG specifications.
    • the current article on CORBA IDL is inadequate
  • ISO has another standard related to data organization, called ISO/IEC 11404 Language-Independent Data Types
  • A very important standard for data model interoperability is called OMG Common Warehouse Metamodel (CWM). This is also an ISO standard, known as ISO/IEC DIS 19504. It is part of the ISO Open Distributed Processing (ODP) Stack
    • the current article on CWM is inadequate

All of the above standards address complex data types and references (aka foreign keys). Some fruit for thought.

-- Equilibrioception (talk) 04:31, 5 February 2009 (UTC)

Interesting links. I wonder how this can be integrated into the article? At first I thought, you can add this whole section (without these secondary remarks) to the article in a new "Data models in Telecommunications and Networking" section. This paragraph could be added to the "types of data models" section, because it is about one type of "data model" and not about "related models".
On the other hand these standards seems rather applied, and it seems to me there must be other standards like this in related fields as well. The current "types of data models" section is more about theoretical data models. So maybe with the texts about these applied standards we could create a separte "application" section...!? What do you think? -- Marcel Douwe Dekker (talk) 12:27, 5 February 2009 (UTC)
There exists a useful categorization of data based on its use within an enterprise system:
  • data at rest - aka persistent data in a database or in the data repository
  • data in use - data that is being used by an information system, usually for the purpose of presenting it to the user, and
  • data in transit - data in transactions, also data in events, messages, in the network packages, etc.
This is a rather neat way of looking at data, isn't it? When it comes to information assurance, each of the above categories requires specific techniques. So, this categorization may be the way to categorize data models. Obviously my notes were specifically focused on the 'data in transit'. Now, 'data in use' is mostly about the data structures in programming languages. And it is related to the "user interface models".
Let me trace the exact origins of this categorization (must be somewhere in one of the enterprise architecture framework, like TOGAF, UPDM, etc.)
-- Equilibrioception (talk) 05:10, 6 February 2009 (UTC)
I found (only) two sources mentioning this
  • Keith D. Willett (2008) in Information Assurance Architecture page 159 in Chapter Data State explaines: "Data states are at rest, in transit, and in use. Data at rest is on a permanent storage medium... Data in transit refers to data traverersing a network... Data in use is in the virtual storage..
  • Fred Cohen (2007) in IT Security Governance Guidebook with Security Program Metrics on CD-ROM, p.189 in a paragraph on the structure of information protection states: "Users tend to deal with data life cycles and information at rest, in use, and in transit, and are subject to organizational effects, mandates, and awareness..."
From a Wikipedia point of view none of these concepts data at rest, data in use, data in transit seem that notable. An overview article like data model is no the place to introduce such new concepts. -- Marcel Douwe Dekker (talk) 08:46, 6 February 2009 (UTC)
OK, it makes sense; these could be security-related concepts. In that case, they may not add much value to data models.
However, there are quite a few uses of these concepts 'in the wild'. Nothing earth-shuttering, I agree, but may be worth considering in the Computer Security project.
In particular, data at rest seems a common term.
  • There is a ComputerWorld article 'Encrypting Data at Rest', March 27, 2006;
  • there is a GSA award for data at rest encryption, see [1]
  • webopedia does have a definition for data at rest [2]
  • data at rest is part of the US Army Training and Doctrine Command manual [3]
and so on.
-- Equilibrioception (talk) 18:38, 6 February 2009 (UTC)
I guess you are right, these concepts are more data security and/or information security related. -- Marcel Douwe Dekker (talk) 19:33, 6 February 2009 (UTC)

Section "Method related models" removed[edit]

I removed the "Method related models", which stated.

A lot of the existing data modeling methods, software development methodologies, and other modeling languages in the field of computer science have defined their own type of models.

I think this statement is confusing. -- Marcel Douwe Dekker (talk) 19:24, 8 March 2009 (UTC)

The "Zachman Framework"section removed[edit]

I removed the following section from the article

Zachman Framework Perspectives of Data Focus
In an alternative framework, called the Zachman Framework, a data model instance may be one of six kinds (according to John Zachman, 1987, 1992, 2005, 2007):
  • a contextual data model (list) identifies entity classes (representing things of significance to the organization).
  • a conceptual data model (semantics) defines the meaning of the things in an organization. This consists relationships (assertions about associations between pairs of entity classes).
  • a Logical schema | logical data model (schema) describes the logic representation of the properties without regard to a particular data manipulation technology. This consists of descriptions of the attributes (role a data element plays in relation to the thing (entity) it represents.
  • a Physical schema | physical data model (blueprint) describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.
  • a data definition (configuration) This is the actual language coding of the database schema in the chosen development platform.
  • a data instantiation holds the values of properties applied to the data in the schema.
The significance of this approach, according to John Zachman, is that it allows the six perspectives to be relatively independent of each other and have different contributors, audiences and purposes. In each case, of course, the structures must remain consistent with the other model instances although the details change. The table/column structure may be different from a direct translation of the entity classes, relationships and attributes, but it must ultimately carry out the objectives of the contextual entity class structure and conceptual relationship structure. Zachman regards each perspective a separate and distinct vantage point of the data: his view is not a methodology but rather a way of classifying the parts, however development projects and software tools often proceed from Contextual list, to conceptual data model, followed by the Logical schema|logical data model. In later stages when the data platform is known (whether it be database software or filing cabinets), this model may be translated into a Physical schema|physical data model followed by the data definition. When the database actually stores values and is operational data manipulation can take place.

I think this section doesn't explain the Zachman Framework and its relation to data model, and data modeling. I also can't find the listing given here in the Zachman 1987 article.

-- Marcel Douwe Dekker (talk) 19:34, 8 March 2009 (UTC)

Renewed first image[edit]

The removal of the first image here, reminded me that the image had to be redrawn and the caption had to be improved. So I did both an readded the image (with the data modelling part highlighted), because I do think this image is particularly appropriate in that article. It gives an overview of both the data modelling process, and the context of enterprise modelling. -- Marcel Douwe Dekker (talk) 13:02, 2 July 2009 (UTC)

Text about W3C, RDF and OWL removed[edit]

I removed the following text, see here, which was added both here and in the data modeling by User:EddyVanderlinden:

The end of the 1990 provided W3C standards (ref : Standards on RDF [4] and recommendations on OWL [5]) which enabled ontologies to unite 4 modelling functions in 1 knowledge model: the knowledge representation (in RDF(S) and OWL), the knowledge generation through inferences, the conceptual model through ontologies and the physical model through triple stores.
The latest developments allow to generate applications straight from the knowledge systems (ontologies) (ref : See the finance semantic web application [6]).

I removed this text about W3C, RDF, OWL because it doesn't explain much itself, and doesn't explain the link with data modeling. -- Marcel Douwe Dekker (talk) 09:44, 28 September 2009 (UTC)

Copy-paste registration[edit]

-- Mdd (talk) 20:36, 6 November 2009 (UTC)

Lead sentence[edit]

I have undone the change in the lead sentence for the simple reason that it was not wikified. In a high-frequently viewed article like this it should be. Now I don't appose to a change in the lead sentence but it does have to fit the Wikipedia rules about lay out. Now the current article states:

A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest.

Now a new lead sentence is proposed by user:

Data is the documentation of real world entities (a person, place or thing), on a specific date. A data model is a plan that is used to document data with rigor. The data model is then used to specify how to store and retrieve the data from the appropriate place. A favorite saying of data architects is "a place for everything and everything in its place".

Now one of the rules is that the article has to start with the subject. So it could become:

A data model is a plan that is used to document data with rigor. The data model is then used to specify how to store and retrieve the data from the appropriate place. A favorite saying of data architects is "a place for everything and everything in its place".

Now for me as a not-native American I don't know what the expression "to document data with rigor" means? -- Mdd (talk) 19:54, 13 January 2011 (UTC)