Sabtu, 25 April 2009

Database Normalization


Database Design Process

There are some process that must be done to design a database :

· Gathering user needs / business

· Developing the ER model based on user needs / business

· Converting ER Model to set the relation (table)

· Normalization of relations, for the anomaly

· Implementation goes to database by make table for each relationship already most normalization

Database Normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Usually only up to the level of 3NF or BCNF because already sufficient to generate the tables of good quality.


First Normal Form

First Normal Form (1NF) sets the very basic rules for an organized database:

  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form

Second normal form (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third normal form

Third normal form (3NF) goes one large step further:

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

Fourth Normal Form

Finally, fourth normal form (4NF) has one additional requirement:

  • Meet all the requirements of the third normal form.
  • A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Normalization must done beacuse:

· Need to optimizing the structures of table

· It can increasing speed

· The income data is the same

· It is more efficient in the use of storage media

· It is able to reduce redundancy

· Need to Avoid anomalies (insertion anomalies, deletion anomalies, update anomalies)

One table is said well (efficient) or normal if accomplish 3 criterions as follows:

· If there is decomposition (parsing) table, therefore the decomposition shall be secured safe (Lossless Join Decomposition). It’s mean, after that table is untied / at decomposition becomes new tables, that new tables can result original table equally exactly

· Its preserve dependency functional at the moment data change (Dependency Preservation).

· Don't breach Boyce-Code Normal Form (BCNF)

If the third criterion (BCNF) can't be accomplished, therefore at least that table not breach the third Normal Form (3rd Normal Form/ 3NF).

Normalization is required because of the redundancy relations, not relations of "good". Why?

  • The main reason is the possibility of "update anomalies" (when the insert, delete, update) because can impact on the data inconsistencies
  • The reason is other waste storage space (hard disk)

An update anomaly. Employee 519 is shown as having different addresses on different records.

An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.

A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.

How to Handle anomaly?

1. Anomaly in the handle by programming language used to create the application database. Designer should note this anomaly and to tell a programmer.

2. Anomaly does not handle the system, but submitted to the operator to be careful in making modifications, inserts and deletes. It is human error risk.

3. Anomalies be avoided, with ways to secure the normalization.

Denormalization

Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for Online Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas.

Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.

Breaking Rules : When to Denormalize

Sometimes it's necessary to break the rules of normalization and create a database that is deliberately less normal than it otherwise could be. You'll usually do this for performance reasons or because the users of the database demand it. While this won't get you any points with database design purists, ultimately you have to deliver a solution that satisfies your users. If you do break the rules, however, and decide to denormalize you database, it's important that you follow these guidelines:

· Break the rules deliberately; have a good reason for denormalizing.

· Be fully aware of the tradeoffs this decision entails.

· Thoroughly document this decision.

· Create the necessary application adjustments to avoid anomalies.

This last point is worth elaborating on. In most cases, when you denormalize, you will be required to create additional application code to avoid insertion, update, and deletion anomalies that a more normalized design would avoid. For example, if you decide to store a calculation in a table, you'll need to create extra event procedure code and attach it to the appropriate event properties of forms that are used to update the data on which the calculation is based.

If you're considering denormalizing for performance reasons, don't always assume that the denormalized approach is the best. Instead, I suggest you first fully normalize the database (to Third Normal Form or higher) and then denormalize only if it becomes necessary for reasons of performance.

If you're considering denormalizing because your users think they need it, investigate why. Often they will be concerned about simplifying data entry, which you can usually accomplish by basing forms on queries while keeping your base tables fully normalized.

Here are several scenarios where you might choose to break the rules of normalization:

· You decide to store an indexed computed column, Soundex, in tblCustomer to improve query performance, in violation of 3NF (because Soundex is dependent on LastName). The Soundex column contains the sound-alike code for the LastName column. It's an indexed column (with duplicates allowed) and is calculated using a user-defined function. If you wish to perform searches on the Soundex column with any but the smallest tables, you'll find a significant performance advantage to storing the Soundex column in the table and indexing this computed column. You'd likely use an event procedure attached to a form to perform the Soundex calculation and store the result in the Soundex column. To avoid update anomalies, you'll want to ensure that this column cannot be updated by the user and that it is updated every time LastName changes.

· In order to improve report performance, you decide to create a column named TotalOrderCost that contains a sum of the cost of each order item in tblOrder. This violates 2NF because TotalOrderCost is dependent on the primary key of tblOrderDetail, not on tblOrder's primary key. TotalOrderCost is calculated on a form by summing the column TotalCost for each item. Since you often create reports that need to include the total order cost, but not the cost of individual items, you've broken 2NF to avoid having to join these two tables every time this report needs to be generated. As in the last example, you have to be careful to avoid update anomalies. Whenever a record in tblOrderDetail is inserted, updated, or deleted, you will need to update tblOrder, or the information stored there will be erroneous.

· You decide to include a column, SalesPerson, in the tblInvoice table, even though SalesId is also included in tblInvoice. This violates 3NF because the two non-key columns are mutually dependent, but it significantly improves the performance of certain commonly run reports. Once again, this is done to avoid a join to the tblEmployee table, but introduces redundancies and adds the risk of update anomalies.

Functional dependency

Attribute B has a functional dependency on attribute A (i.e., A → B) if, for each value of attribute A, there is exactly one value of attribute B. If value of A is repeating in tuples then value of B will also repeat. In our example, Employee Address has a functional dependency on Employee ID, because a particular Employee ID value corresponds to one and only one Employee Address value. (Note that the reverse need not be true: several employees could live at the same address and therefore one Employee Address value could correspond to more than one Employee ID. Employee ID is therefore not functionally dependent on Employee Address.) An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain. For example, an Employer may require certain employees to split their time between two locations, such as New York City and London, and therefore want to allow Employees to have more than one Employee Address. In this case, Employee Address would no longer be functionally dependent on Employee ID.

Another way to look at the above is by reviewing basic mathematical functions:

Let F(x) be a mathematical function of one independent variable. The independent variable is analogous to the attribute A. The dependent variable (or the dependent attribute using the lingo above), and hence the term functional dependency, is the value of F(A); A is an independent attribute. As we know, mathematical functions can have only one output. Notationally speaking, it is common to express this relationship in mathematics as F(A) = B; or, B → F(A).

There are also functions of more than one independent variable—commonly, this is referred to as multivariable functions. This idea represents an attribute being functionally dependent on a combination of attributes. Hence, F(x,y,z) contains three independent variables, or independent attributes, and one dependent attribute, namely, F(x,y,z). In multivariable functions, there can only be one output, or one dependent variable, or attribute.

Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is

· functionally dependent on X, and

· not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency

A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and YZ.

Multivalued dependency

A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey

A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

Candidate key

A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.


First Normal Form (1NF)
  • Eliminate repeating groups from the same table
  • Aggregate similiar data in separate tables and identify each row with an unique identifier

In simple language if we were to say each attribute of the relation would be atomic in nature for 1NF. Look at the example below to understand better.


Second Normal Form (2NF)

Moving forward lets take a look at the rules that goven 2NF. We get a step even more closer to remove duplicate records.

  • Remove data that apply to multiple rows and place them in a separate table
  • Relate the above table with foreign keys

Consider the below example to understand the same.


Third Normal Form (1NF)

This is the most preferred normalization technique followed for most of the database.

  • Eliminate all fields that donot depend on the Primary key

Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.


Note: All these normalization are cummulative in nature. I re-iterate this point.

There are 4NF otherwise called as Boyce-Codd normal form (BCNF). I wouldnot deal much into this form as it becomes far beyond practicle limits to have such a requirement. The rule is, we are in BCNF if and only if every determinant is a candidate key.

Reffrences :

1. ER Ngurah Agus Sanjaya. Slide Part 6 - NORMALISASI.

2. Fundamentals of Relational Database Design_By Paul Litwin (http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx)

3. Normalizig Techniques_extremeexperts (http://www.extremeexperts.com/sql/articles/Normalizing.aspx)

4. Database Normalization Basics_By Mike Chapple (http://databases.about.com/od/specificproducts/a/normalization.htm)

5. Answers.com (http://www.answers.com/topic/database-normalization)

Minggu, 19 April 2009

Database and ERD


Database

The database is a set of data stored in the magnetic disk, optical disk or other secondary storage. Besides, the data base can also be defined as the collection of data, which can be described as the activities of one or more organizations that be relations. The database can be a collection of integrated data-related data of an enterprise (company, government or private).


DBMS (Database Management System)

Database management system represent database aliance or corps with software of application being based on database. This application programs is used to access and look after database. Especial target of DBMS is to provide a efficient and easy environment for the usage of, to withdrawal and is depository of information and data.

A Relational Database Management System (RDBMS) implements the features of the relational model outlined above. In this context, Date’s “Information Principle” states: “the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations(tuples). Therefore, there are no explicit pointers between related tables.”

BIT,BYTE,FIELD
- Bit representing part of containing smallest data of value 0 or 1.
- Byte representing corps of beets which of a kind.
- Field representing a group of byte-byte which of a kind, in data bases used by attribute term.

Field example:

Fields can be one of a range of different data types, including: Text, Integer, Real number, Yes/No (Boolean), Date, Time, Sound, and Video.


ATTRIBUTE


Attribute is the nature or characteristics of an entity that provides provide detail on these entities. A relationship can also have attributes. Example attributes:

  1. STUDENTS: NIM, Name, Faculty
  2. CAR: Colour, Brand, Type, CC

TYPES OF ATTRIBUTE

- single vs multivalue

  • single : only can containing mostly 1 value
  • multivalue : can containing more than 1 value with same kind value

- atomic vs composition

  • atomic : cannot divided into smaller attribute
  • composition : alliance of smaller attribute

- derived attribute : attribute that can be yield from other attribute value, for example : age can be yield from attribute of birth datenull

- value attribute : attribute with no value for a record

- mandatory value attribute : attribute that must have a value


RECORD / TUPLE
Record is a row of data in a relationship. Consists of the set of attributes where the attribute-attribute-attribute is to inform each other entity / relationship fully. The main advantage of using records instead of tuples is that fields in a record are accessed by name, whereas fields in a tuple are accessed by position. To illustrate these differences, suppose that we want to represent a person with the tuple {Name, Address, Phone}.

We must remember that the Name field is the first element of the tuple, the Address field is the second element, and so on, in order to write functions which manipulate this data. For example, to extract data from a variable P which contains such a tuple we might write the following code and then use pattern matching to extract the relevant fields.

Name = element(1, P),
Address = element(2, P),
...

Code like this is difficult to read and understand and errors occur if we get the numbering of the elements in the tuple wrong. If we change the data representation by re-ordering the fields, or by adding or removing a field, then all references to the person tuple, wherever they occur, must be checked and possibly modified.

Records allow us to refer to the fields by name and not position. We use a record instead of a tuple to store the data. If we write a record definition of the type shown below, we can then refer to the fields of the record by name.

-record(person, {name, phone, address}).

For example, if P is now a variable whose value is a person record, we can code as follows in order to access the name and address fields of the records.


ENTITY / FILE

File / Entity is a collection of similar records and have the same elements, the same attributes but different data value.

In processing applications, files / entity can categoried with several types as follows:

- Master File
- Transaction Files
- File Reports
- File History
- File Protection
- File Work


DOMAIN

Domain represent corps of values enabled to stay in one or more attribute. Every attribute in data bases of relasional defined by as a domain.


KEY DATA ELEMENT

Key elements of record which is used to find these records at the time of access, or can also be used to identify each entity / record / line.

TYPES OF KEY

Super key

Super keyis one or more attribute that can be used for identification entity/record in table as uniquely (not all of attribute can be superkey).
Candidate key

Candidate key is supperkey with minimal attribute. Candidate key cannot containing the attribute from other table, so candidate key already definite as superkey but not yet the other way.
Primary key

Primary key is one of candidate key that can be chosen or determined as primary key with 3 category, that is :

  1. key is more natural for used as reference
  2. key is more simple
  3. key is guaranteed the unique

Alternate key

Alternate key is attribute from candidate key that not chosen as primary key

Foreign key

Foreign key is any kind attribute that showing to primary key in other table. Foreign key happen in a relation that having cardinality one-to-many or many-to-one. Usually, Foreign key always put in table direct to many.
External key is a lexical attribute (or compilation of lexical attribute) that its value always identification one object instance.


ERD (Entity Relationship Diagram)


ERD is a model of a network that uses word order is stored in the system of abstract.


The different between DFD and ERD
- DFD is a model of network functions that will be implemented by the system.
- ERD is a model that emphasizes the network data on the structure and relationship data.

The elements of ERD

Entity

  • Something exist inside real system or abstract system which data stored or where are the data.
  • Symbolized as square of length. There are also line symbol as link between compilation of entity with entity and compilation entity with its attribute.

Relationship

  • Natural Relation happened between entity.
  • Generally, given name with basic verb that facilitating for reading its relation.
  • Symbolized as rhomb

Relationship degree

  • Account of entity that participated inside a relationship.
  • Degree is often used in ERD\

Attribute

  • Characteristic of each entity or relationship
  • Symbolized as circle

Cardinality

  • Showing maximal account tupel that can be relation with entity in the other one.

Relationship of degree

- Unary relationship
is the relationship model between the entity originating from the same entity set.
- Binary relationship
is the relationship model between 2 entities
- Ternary relationship
is a relationship between the instance of 3 types of entities are unilateral

Cardinality example of types




NOTATIONS OF E-R DIAGRAM

Symbolic notations in the ER diagram are :

1. Rectangle represent the collective entity
2. Circle represent the attributes
3. Rhomb represent collective relationships
4. Line as the set of relationships between the entity and the collective entity with the attribute

Entity Relationship Diagram Notations
Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin have added some sligh refinements to the basic ERD principles.


Entity


An entity is an object or concept about which you want to store information.


Weak Entity

Attributes are the properties or characteristics of an entity.


Key attribute


A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee's social security number might be the employee's key attribute.


Multivalued attribute

A multivalued attribute can have more than one value. For example, an employee entity can have multiple skill values.


Derived attribute

A derived attribute is based on another attribute. For example, an employee's monthly salary is based on the employee's annual salary.


Relationships


Relationships illustrate how two entities share information in the database structure.

Reffrences :

1. ER Ngurah Agus Sanjaya. Slide Part 5 - DATABASE DAN ER-DIAGRAM.

2. Elmasri & Navathe, Fundamental of Database Systems, 5th Edition, Chapter 3, 2007.

3. http://erlang.org/doc/programming_examples/records.html

4. http://www.smartdraw.com/tutorials/software/erd/tutorial_01.htm

Minggu, 05 April 2009

Data Flow Diagram (DFD)






Data Flow Diagram
DEFINITION
Data flow diagram is a well known approach to visualize the data processing in business analysis field. A data flow diagram is strong in illustrating the relationship of processes, data stores and external entities in business information system.

ZERO DIAGRAM

- Describe the process of the DFD

- Provide views on the overall system in which, showing the main function or process that is, the flow of data and entity eskternal

- At this level of data storage possible

- For processes that do not detailed again on the next level then added the symbol '*' or 'P' at the end of the process

- Input and output balance (balancing) between 0 to diagram context diagram should be maintained

The goal of zero diagram is to “break down” a system to “processes” that must be done ‘in person’. Or, if made in the sentence is: “What is the process that must be done in order to reach the system?.”
Thus, this diagram is the continuation of the context diagram, which “extend the circle,” and for (the number and content) and the terminator (the amount and content) data flow to and from the terminator should be fixed.

DETAILED DIAGRAM
Diagram is a detailed diagram of the process of decipher what is in a chart or diagram zero level above.

Numbering level in the DFD:

Level Name

Diagram Name

Process Number

0

Context


1

Diagram 0

1.0, 2.0, 3.0, …

2

Diagram 1.0

1.1, 1.2, 1.3, …

3

Diagram 1.1

1.1.1, 1.1.2, …




EXTERNAL ENTITIES

Something that is outside the system, but it provides data into the system or providing data from the system, is represented with a notation box. External entity does not include part of the system. When the information system is designed for one section (department) then the other part that is still a relevant external entity.

Guidelines of external entities:

- The name of the form of noun
- Terminal may not have the same name unless the object is the same (described twice, is intended to make the diagram more clear). If so, then the terminal should be a forward slash in the top left corner.

SPECIFICATION PROCESS

- Each process in the DFD must have a specification process.

- At the top level method is used to describe the process you can use with descriptive sentences.

- At a more detailed level, namely on the bottom (functional primitive) require a more structured specification.

- Specification process will be the guideline for the programmer to make this program (coding).

- Method used in the specification process: the process of disintegration in the form of a story, decision table, decision tree.


DATA FLOW

Flow data consist of a group of related data elements in a logical move from one process to another process.

- Depicted with a straight line connecting the components of the system.

- Flow data is shown with the direction arrows and the name on the flow of data that flows.

- Cash flow of data between processes, saving data, the unit outside, and shows data flow from data in the form of inputs to the system.

- Guidelines of the name:

1. Name of the flow of data that consists of some words associated with the flow lines connect.

2. No flow data for the same and the name should reflect its content

3. The flow of data that consists of several elements can be expressed with the group element

4. Avoid using the word 'data' and 'information' to give a name to the flow of data

5. Wherever possible the complete flow of data is written

6. Name of the flow of data into a process may not be the same as the name of the data flow out of the process

7. Data flow into or out of data storage does not need to be given a name if:
- Flow of data simple and easy to understand
- Flow of data describes all data items

8. There can be no flow of data from the terminal to the data storage, or vice versa because the terminal is not part of the system, the relationship with the terminal data storage must be through the process.


PROCESS


Logical process models omit any processes that do nothing more than move or route data , thus leaving the data unchanged.

Valid processes include those that:

· Perform computations (e.g., calculate grade point average)

· Make decisions (determine availability of ordered products)

· Sort, filter or otherwise summarize data (identify overdue invoices)

· Organize data into useful information (e.g., generate a report or answer a question)

· Trigger other processes (e.g., turn on the furnace or instruct a robot)

· Use stored data (create, read, update or delete a record)


Guidelines of the process:

  1. Name of the process consists of a verb and noun, which reflects the function of the process
  2. Do not use the process as part of the name of a bubble
  3. May not have some process that has the same name
  4. The process should be given a number. Order number wherever possible to follow the flow of the process or sequence, but the sequence number does not mean that the absolute is a process in chronological order

The Process Symbol
Processes transform or manipulate data. Each box has a unique number as identifier (top left) and a unique name (an imperative - e.g. 'do this' - statement in the main box area) The top line is used for the location of, or the people responsible for, the process.
Processes are 'black boxes' - we don't know what is in them until they are decomposed
Processes transform or manipulate input data to produce output data. Except in rare cases, you can't have one without the other.


DFD Levels

The Context and Top Level diagrams in the example start to describe 'Home Catalogue' type sales system. The two diagrams are just the first steps in creating a model of the system. (By model we mean a co-ordinated set of diagrams which describe the target system and provide answers to questions we need to ask about that system).As suggested the diagrams presented in the example will be reworked and amended many times - until all parties are satisfied. But the two diagrams by themselves are not enough; they only provide a high level description. On the other hand, the initial diagrams do start to break down, decompose, what might be quite a complex system into manageable parts.

A revision of the example Top Level DFD



The next step - the Next Level(s)

Each Process box in the Top Level diagram will itself be made up of a number of processes, and will need to be decomposed as a second level diagram.


Decomposition stops when a process box can be described with an Elementary Process Description using ordinary English, later on the process will be described more formally as a Function Description using, for example, pseudocode.


DATA STORAGE

Data Stores are some location where data is held temporarily or permanently.
In physical DFDs there can be 4 types.
D = computerised Data
M = Manual, e.g. filing cabinet.
T = Transient data file, e.g. temporary program file
T(M) = Transient Manual, e.g. in-tray, mail box.
As with external entities, it is common practice to have duplicates of data stores to make a diagram less cluttered


DFD SYMBOLS



DATA DICTIONARY
A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them. A first step in analyzing a system of objects with which users interact is to identify each object and its relationship to other objects. This process is called data modeling and results in a picture of object relationships. After each data object or item is given a descriptive name, its relationship is described (or it becomes part of some structure that implicitly describes relationship), the type of data (such as text or image or binary value) is described, possible predefined values are listed, and a brief textual description is provided. This collection can be organized for reference into a book called a data dictionary.

When developing programs that use the data model, a data dictionary can be consulted to understand where a data item fits in the structure, what values it may contain, and basically what the data item means in real-world terms. For example, a bank or group of banks could model the data objects involved in consumer banking. They could then provide a data dictionary for a bank's programmers. The data dictionary would describe each of the data items in its data model for consumer banking (for example, "Account holder" and ""Available credit").

BALANCING IN DATA FLOW DIAGRAM
The flow of data into and out of a process must be the same as the flow of data into and out of the details of the process on the level / levels below it. Name of the data flow into and out of the process must match the name of the flow of data into and out of the details of the process. Number and the name of an entity outside the process must be equal to the number of names and entities outside of the details of the process. The issues that must be considered in the DFD which have more than one level:
1. There must be a balance between input and output of one level and the next level.
2. Balance between level 0 and level 1 at the input / output of stream data to or from the terminal on level 0, while the balance between level 1 and level 2 is seen on the input / output of stream data to / from the process concerned.
3. Name of the flow of data, data storage and terminals at each level must be the same if its object same.


PROHIBITION IN DFD

1. Data flow may not from external entity directing to external entity another without via a process

2. Data flow may not of direct data storage make towards external entity without via a process

3. Data flow may not of direct data storage wend another data trove without via a process

4. Data flow of one process directing to wend another process without via an advisable data storage / one can maybe be avoided





Refrences:

ER Ngurah Agus Sanjaya, S.KOM, M.KOM. Slide Part 4 - DATA FLOW DIAGRAM.

HM, Jogiyanto. 2005. Analisis & Desain Sistem Informasi. Yogyakarta: ANDI.

http://www.edrawsoft.com/Data-Flow-Diagram-Symbols.php

http://www.cems.uwe.ac.uk/~tdrewry/dfds.htm

http://searchsoa.techtarget.com/sDefinition/0,,sid26_gci211896,00.html

http://facweb.cs.depaul.edu/yele/Course/IT215/DFD%20Mechanics.ppt