 |
 |
 |
| Organizing Data: Tables |
The basic storage unit of a database is the table. Although much more sophisticated, database tables bear a superficial resemblance to the familiar worksheets produced by spreadsheet applications such as Excel and Lotus 1-2-3. Tables form the foundation of database application, you need to give as much care to planning and designing the tables as an architect applies to creating a new building plan. Without a doubt, most database application failures are caused by poorly designed, poorly implemented tables, and a lack of proper relationships between the tables.
A well-designed relational database rewards its developer and users with the following benefits:
- Reliable updates and retrievals of the data managed by the database can be made quickly and efficiently.
- Maintenance and improvements to the application are easier to make.
- The application itself is at least partially self-documenting. The purpose of each table is immediately clear to a developer assigned to providing maintenance or updates to the basic application.
|
| What is normalization? Explain different levels of normalization? |
Normalization is the process of organizing data in a database. This
includes creating tables and establishing relationships between those
tables according to rules designed both to protect the data and to
make the database more flexible by eliminating two factors: redundancy and
inconsistent dependency.
Redundant data wastes disk space and creates maintenance problems. If
data that exists in more than one place must be changed, the data must
be changed in exactly the same way in all locations. A customer
address change is much easier to implement if that data is stored only
in the Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it is intuitive for a user
to look in the Customers table for the address of a particular
customer, it may not make sense to look there for the salary of the
employee who calls on that customer. The employee's salary is related
to, or dependent on, the employee and thus should be moved to the
Employees table. Inconsistent dependencies can make data difficult to
access; the path to find the data may be missing or broken.
There are a few rules for database normalization. Each rule is called
a "normal form." If the first rule is observed, the database is said
to be in "first normal form." If the first three rules are observed,
the database is considered to be in "third normal form." Although
other levels of normalization are possible, third normal form is
considered the highest level necessary for most applications.
As with many formal rules and specifications, real world scenarios do
not always allow for perfect compliance. In general, normalization
requires additional tables and some customers find this cumbersome. If
you decide to violate one of the first three rules of normalization,
make sure that your application anticipates any problems that could
occur, such as redundant data and inconsistent dependencies.
NOTE: The following descriptions include examples.
First Normal Form
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.
Do not use multiple fields in a single table to store similar data.
For example, to track an inventory item that may come from two
possible sources, an inventory record may contain fields for Vendor
Code 1 and Vendor Code 2.
But what happens when you add a third vendor? Adding a field is not
the answer; it requires program and table modifications and does not
smoothly accommodate a dynamic number of vendors. Instead, place all
vendor information in a separate table called Vendors, then link
inventory to vendors with an item number key, or vendors to inventory
with a vendor code key.
Second Normal Form
- Create separate tables for sets of values that apply to multiple records.
- Relate these tables with a foreign key.
Records should not depend on anything other than a table's primary key
(a compound key, if necessary). For example, consider a customer's
address in an accounting system. The address is needed by the
Customers table, but also by the Orders, Shipping, Invoices, Accounts
Receivable, and Collections tables. Instead of storing the customer's
address as a separate entry in each of these tables, store it in one
place, either in the Customers table or in a separate Addresses table.
Third Normal Form
- Eliminate fields that do not depend on the key.
Values in a record that are not part of that record's key do not
belong in the table. In general, any time the contents of a group of
fields may apply to more than a single record in the table, consider
placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate's
university name and address may be included. But you need a complete
list of universities for group mailings. If university information is
stored in the Candidates table, there is no way to list universities
with no current candidates. Create a separate Universities table and
link it to the Candidates table with a university code key.
EXCEPTION: Adhering to the third normal form, while theoretically
desirable, is not always practical. If you have a Customers table
and you want to eliminate all possible interfield dependencies, you
must create separate tables for cities, ZIP codes, sales
representatives, customer classes, and any other factor that may
be duplicated in multiple records. In theory, normalization is
worth pursuing; however, many small tables may degrade performance
or exceed open file and memory capacities.
It may be more feasible to apply third normal form only to data that
changes frequently. If some dependent fields remain, design your
application to require the user to verify all related fields when any
one is changed.
Other Normalization Forms
Fourth normal form, also called Boyce Codd Normal Form (BCNF), and
fifth normal form do exist, but are rarely considered in practical
design. Disregarding these rules may result in less than perfect
database design, but should not affect functionality.
**********************************
Examples of Normalized Tables
**********************************
Normalization Examples:
Unnormalized table:
Student# Advisor Adv-Room Class1 Class2 Class3
-------------------------------------------------------
1022 Jones 412 101-07 143-01 159-02
4123 Smith 216 201-01 211-02 214-01
- First Normal Form: NO REPEATING GROUPS
Tables should have only two dimensions. Since one student has
several classes, these classes should be listed in a separate
table. Fields Class1, Class2, & Class3 in the above record are
indications of design trouble.
Spreadsheets often use the third dimension, but tables should not.
Another way to look at this problem: with a one-to-many
relationship, do not put the one side and the many side in the same
table. Instead, create another table in first normal form by
eliminating the repeating group (Class#), as shown below:
Student# Advisor Adv-Room Class#
---------------------------------------
1022 Jones 412 101-07
1022 Jones 412 143-01
1022 Jones 412 159-02
4123 Smith 216 201-01
4123 Smith 216 211-02
4123 Smith 216 214-01
- Second Normal Form: ELIMINATE REDUNDANT DATA
Note the multiple Class# values for each Student# value in the
above table. Class# is not functionally dependent on Student#
(primary key), so this relationship is not in second normal form.
The following two tables demonstrate second normal form:
Students: Student# Advisor Adv-Room
------------------------------
1022 Jones 412
4123 Smith 216
Registration: Student# Class#
------------------
1022 101-07
1022 143-01
1022 159-02
4123 201-01
4123 211-02
4123 214-01
- Third Normal Form: ELIMINATE DATA NOT DEPENDENT ON KEY
In the last example, Adv-Room (the advisor's office number) is
functionally dependent on the Advisor attribute. The solution is to
move that attribute from the Students table to the Faculty table,
as shown below:
Students: Student# Advisor
-------------------
1022 Jones
4123 Smith
Faculty: Name Room Dept
--------------------
Jones 412 42
Smith 216 42
|
| What is denormalization and when would you go for it? |
|
As the name indicates, denormalization is the reverse process of normalization. It's the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.
|
| Relationship between different tables |
Relationships established at the table level take precedence over those established at the query level.
- One-to-one - though rarely used in database systems, can be a very useful way to link two tables together.
- One-to-many - is used to relate one record in a table with many records in another.
- Many-to-one(often called the lookup table relationship) - tells database that many records in the table will be related to a single record in another table. Normally, many-to-one relationships are not based on a primary key fields in either tables.
- Many-to-many - Think of it generally as a pair of one-to-many relationships between two tables.
|
| How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables? |
- One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships.
- One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships.
- Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table.
|
| The mystery of joins in Queries |
To view data in two tables, you must join them through a link, which you establish via a common field (or group of fields) between the two tables. The method of linking the tables is known as joining. When you add tables to the Query window, database displays join lines when it finds corresponding fields in two tables in the query. A join line indicates that a relationship exists between two tables. Within a query, you can create new joins or change an existing join line; just as there are different types of relationships, there are different types of joins.
- Inner join(the default) - Sometimes called an equi-join, this type of join returns only those records for which the same key field value exists in both tables. If a record in either table does not have a matching record in the other table, it is not included in the query's result set.
- Left outer join - With this type of join, the query returns records from the first table (the table on the left side in the Query window), even if there are no matching records in the second (right-side) table.
- Right outer join - This has the same effect as a left outer join except that it works in the opposite direction: the query returns records from the second (right-side) table, even if there are no matching records in the first table. Depending on which table you add to the query first, you might need to create either a left or right outer join.
- Self join - In rare cases it may be necessary to include the same table more than once in a query. Some tables contain "recursive" data in which a field in one record may reference another record in the same table. If you need to build a query that combines data from two records within the same table, you create a "self join" from the table to a copy of itself within the query.
|
| What's the difference between a primary key and a unique key? |
|
Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where as unique key creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.
|
| What are user defined datatypes and when you should go for them? |
|
User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables.
|
| What is bit datatype and what's the information that can be stored inside a bit column? |
|
Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.
|
| Define candidate key, alternate key & composite key? |
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
A key formed by combining at least two or more columns is called composite key.
|
| What are defaults? Is there a column to which a default can't be bound? |
|
A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them.
|
|
|
|
|
|
 |
 |
 |