Data referential integrity is often not well understood in database design. When integrity is not properly applied, it can cause a lot of inefficiency to data retrieval. Eventually the table data needs to be fixed. Duplicate records must be located manually by using queries, and removed. “Orphan” records (due to deleted records) must be repaired.
All this effort requires time and energy, and costs money to fix. It is much more cost-effective to properly plan the design and implementation of the database in the first place.
This article reviews table relationships, focuses on correct table linking and identifies the consequences for failing to do so.
Implement in Correct Sequence
A database design to correctly plan and implement table relationships with referential integrity has previously been described by this author. The design begins with identifying entities from business data; tables are designed following conventions such as normalization, and then linked with each other. Only then are tables populated with data.
Table Relationship Types
In theory, there are three types of relationships:
- Many-to many
In practice, the many-to-many relationship breaks down into a number of one-to-many relationships. The one-to-one relationship is not useful and is usually absorbed into another table. Therefore, we are left with just the one-to-many relationship to contemplate.
Fortunately, this table arrangement is easy to conceptualize. Figure 1 shows a simple one-to-many example that anyone can relate to. Using the Access database, a table named tblSong is used to record unique songs (i.e. one), while songs that are being played one or more times (i.e. many) are recorded in the linked table named tblSongPlayed.
Enabling Referential Integrity
The link in Figure 1 is achieved by dragging the primary key SongID onto the corresponding “foreign key” in tblSongPlayed. This causes a panel to show giving the option to check the “Enforce Referential integrity” checkbox.
If this is not checked, the link is still made but a thin line is shown. If checked, a heavy line results (see Figure 1). If there is existing data already in the tables, Access checks that there are matching IDs in both tables, otherwise a dialog message will be returned advising of the inability to make the connection with referential integrity.
How to “Clean up” Data
When data exists in tables and a sound link cannot be made, the data has to be corrected. There are often two types of problems to consider:
- Duplicate records have been allowed
- “Orphan” records exist due to deleted records
A query needs to be run to check for and locate duplicate records. When duplicates are found, they need to be removed manually or with a delete query. Careful judgement must be exercised and decisions to remove records made. The degree of difficulty for these tasks depend on
- complexity of the query (number of tables involved)
- number of records in participating tables
Similarly a check must be made to locate orphan records. This requires a different query to be run. It should be obvious by now that it is advisable to start any new database by leaving tables empty, then connecting tables with referential integrity, then lastly, populating the tables.
Summary and Conclusion
In an Access database, the heavy line between two tables indicates that referential integrity is in place. This is achieved by clicking the appropriate checkbox in the panel that opens up when the link is made.
Using a query, one may detect the soundness of such table connections, by checking whether the line is a thin or heavy line. In the case of an unsound link, the data in tables should be “cleaned up”, and tables connected with proper referential integrity. The latter prevents data corruption in the database, ensuring no attendant, costly exercise to clean up data. Therefore, it makes sense to implement an initial database that has proper data integrity before any of the tables are populated.