Commonwealth Computer Navigator's Certificate/Milestone 4/Re-structure of template/Example Tutorial How a database is organised

From WikiEducator
Jump to: navigation, search
Tutorial.png Database Concepts 

What is a Database | How a Database is Organized | Data Types | Normalization | Self Assessment | Summary & FAQs

Tutorial - How a Database is Organized

Icon objectives.jpg


Upon completion of this tutorial the learner will:
  • Understand how a database is organized in terms of tables, records and fields
  • Identify entities and attributes for a database design
  • Discuss and apply different data types to attributes
  • Describe the difference between mandatory and optional attributes
  • Identify and describe relationships between entities
  • Compare and contrast first normal form with second normal form
  • Normalize entities to second normal form

Tables, Records and Fields (or Entities, Data and Attributes)

A database is a collection or tables. The tables contain records that are organized into fields. Or using database terminology; A database is a collection of entities. The tables contain data that are organized into attributes. The use of these two terminologies is frequent when working with databases; become familiar with the terms table and entity, data and record and field and attribute being used interchangeably. Another important concept is the idea of a design view and a query view. Consider the two tables below, one being design view the second being query view;

A design view is a look at the structure of a table as it is being designed.

Students (in Design View)
field name data type length allow nulls
first_name Text 32 No
last_name Text 64 No
birth_date Date 10 No
gender Char 1 Yes
Icon present.gif
Tip: Basic data types
  • field name is the name given to the field as it will be stored in the database.
  • data type is the type of data stored in the database. Usually; Numeric, Text, Date or Char.
  • length is the number of characters long the field will be.
  • allow nulls indicates if the field can be left blank or empty

A query view is a look at the data in rows and columns. Notice the field names from design view are the column headers in query view.

Students (in Query View)
first_name last_name birth_date gender
Peter Rawsthorne 28/12/1963 M
David Rawsthorne 26/10/1961
Lisa Rawsthorne 11/03/1965 F
Malcolm Rawsthorne 03/05/1987 M
Hannah Rawsthorne 23/09/1989 F
Note: it is common for field names to not contain spaces. There are a number of different conventions
for naming fields which can change depending on the standards defined by the organization you are
working. The convention used in this course is to be clear with your field names and use underscores
for spaces. (i.e. 'first name' becomes first_name)

Icon activity.jpg
Design an address entity;
  • Using a pencil and paper write down all the fields for an address entity
  • List the fields in an order as they would happen on a form. It is important to keep fields names and order as they occur in the 'real' world.
  • Assess the fields and give each field a data type and length. Ask yourself if the field can be left blank.

Identifying entities and attributes

This is where we start to use the small school scenario introduced at the beginning of this module. From now on all the analysis and design we do will be based upon this scenario.

Identifying entities and attributes could be considered both and art and a science. It is an art for you need to bring together and analyze information from many different sources. The ability to understand a subject domain and identify the entities with their attributes challenges you to see something virtual from many perspectives and groupings, this can be a very creative process. It is also a science for we use proven analysis processes based upon relational algebra, this process refines how our database is structured into entities and attributes. Analyzing a subject domain for the purpose of designing a database means you identify three things; the entities, which are objects you seek information, the attributes, which are the data collected for an entity and the relationships among entities.

This simple diagram shows the relationship between the students and addresses entities.


And once we begin to add attributes to the entities, the diagram will look like this;


The symbols used to create these entity relationship diagrams are as follows;


Icon activity.jpg
Read over the small school scenario and using a paper and pencil identify all the entities described.
  1. Keep it simple, just identify an entity, and write down a word that describes the entity
  2. Draw a box around each entity
  3. Try and group related entities

Data types

When designing a database table you will need to decide on the data type for each field. There are three primary data types available, they are;

  • Text
  • Numbers
  • Date Time
  • Binary

These data types then break down into more specific types within each primary type. For the OpenOffice Database the following types are available;

Text - character data

  • fixed - a fixed length string of text characters
  • varchar - a variable length string of text characters

Numbers - numeric data

  • Boolean - single digit representing either yes or no
  • Integer - and integer or whole number (i.e. without decimal places)
  • Float - a number with decimal places

Date Time date and time data

  • Date - a character string specific for storing dates
  • Time - a character string specific for storing time
  • Date/Time (timestamp) - a field specifically designed to store a date and time stamp

Binary - large amounts of binary data

  • Image - a large amount of binary data in the form of an image (i.e. gif, jpg, etc.)
  • Memo - a large amount of text
  • Binary - similar to an image field, used to store anything binary (i.e. mp3, ogg, avi, etc.)

It should be noted that this is not a full description of the available data types and how they can be used. It is the beginning of an extensive topic and beyond the scope of this course.

Mandatory and optional fields

When designing a database table you should always ask yourself if the field (or attribute) is mandatory or optional. In other words, can the field be left empty? The answer to this question varies from database to database depending on the information needs. An example would be a persons birth date; if the situation requires that decisions are made on the age of the person, then birth date would be a mandatory field. The other side of this issue would be when would you like a field to be optional? Answer: When not all data is available at time of data entry... Remember to give the mandatory and optional issue some thought as you build database tables.

Entity relationships

An understanding of a subject domain increases as entities and their attributes are identified. The relationships among the entities becomes an essential part of your database design. When identifying entities and their relationships, think in terms of nouns and verbs. The entities are nouns and relationships are verbs. In the previous example of the Students and their Addresses; "Students" and "Addresses" are the entities (nouns) and "live at" and "contain" are the relationships (verbs).

Icon inter.gif

Web Resources

Wikipedia has a number of entity relationship and normalization based entries, here are a few of the most relevant;

A review of object role modeling can also provide a good method for identifying entities and relationships.

Exploring first and second normal form

A solid understanding of normalization and normals forms is essential for database design. During database normalization you analyze entities and attributes through a series of rules known as normal forms. There are five basic normal forms, and each builds upon the rules of the previous. In other words, you cannot have a database design in third normal form without also meeting all the rules of first and second normal form. Consider the process of normalization as the process of simplification and the removal of redundancy. For our first review of normalization we are only going to look at the first and second normal forms. What you learn from these two normal forms will provide enough foundation to begin normalizing a database design.

First Normal Form

A database is said to be in first normal form when the tables;

  1. are guaranteed to not have any duplicate records (i.e. have a way to identify uniqueness within each record)
  2. do not have any repeating groups (i.e. {grade-id, course1, course2, course3} where course is the repeating group)

Second Normal Form

A database is said to be in second normal form when the tables;

  1. are in first normal form
  2. have fields (or attributes) that create a logical entity. And the fields relate back to the unique identifier of the record. (i.e. Students consist of first-name, last-name, gender, and birth-date. A phone number field would NOT be a part of the Student)

Normalize this

Normalize the following table into second normal form.

person_id first_name last_name birth_date gender phone_numbers
001 Bill Smith 28/12/1963 M home: (604) 555-1234; fax : (604) 555-2345
001 Bill Smith 28/12/1963 M cell: (778) 555-2345
005 Fiona Jones 26/10/1961 F home: (604) 555-1234; pager (605) 555-9988
006 John Morrison 14/07/1951 M home: (604) 555-0987; fax: (604) 555-2345; cell: (877) 555-7654
012 Lisa Ballard 11/03/1965 F home: (604) 555-1234

Icon activity.jpg
Using these 5 steps modify the above table into first, then second normal form;
  1. Create an entity relationship diagram for the above table
  2. Field entries should be singular (i.e. one field should not contain more than one entry)
  3. Identify the columns that would create uniqueness
  4. Identify duplicate data
  5. Groups fields within entities so duplicate records can be reduced to one record

After Step 1 (entity relationship diagram)


After Step 2 (first normal form)

person_id first_name last_name birth_date gender phone_type phone_number
001 Bill Smith 28/12/1963 M home (604) 555-1234
001 Bill Smith 28/12/1963 M fax (604) 555-2345
001 Bill Smith 28/12/1963 M cell (778) 555-2333
005 Fiona Jones 26/10/1961 F home (604) 555-4455
005 Fiona Jones 26/10/1961 F pager (604) 555-9988
006 John Morrison 14/07/1951 M home (604) 555-0987
006 John Morrison 14/07/1951 M fax (604) 555-0096
006 John Morrison 14/07/1951 M cell (877) 555-7654
012 Lisa Ballard 11/03/1965 F home (604) 555-6543

After Step 5 (second normal form)

person_id first_name last_name birth_date gender
001 Bill Smith 28/12/1963 M
005 Fiona Jones 26/10/1961 F
006 John Morrison 14/07/1951 M
012 Lisa Ballard 11/03/1965 F
phone_type phone_number person_id
home (604) 555-1234 001
fax (604) 555-2345 001
cell (778) 555-2333 001
home (604) 555-4455 005
pager (604) 555-9988 005
home (604) 555-0987 006
fax (604) 555-0096 006
cell (877) 555-7654 006
home (604) 555-6543 012
Icon present.gif
Tip: person_id is a key
  • You will notice that the person_id is present in both the People and PhoneNumbers tables. This allows you to join the phone numbers to their people 'owners'. More on keys in the next section.

If your getting bored with all this theory

Well we've covered enough material where you could jump into actually building a database. So go to tutorial 5.2 (Designing and creating tables) and get your hands dirty. I would strongly recommend you come back to this section as we are going to get deeper into normalization and how to build a database in third normal form.