SQL: How to Structure a Database

This tutorial will use MySQL; but most of the concepts we will cover apply more generally to most SQL databases, sometimes with slightly different syntax.

The Scenario

We are going to design a database for image galleries. There will be four tables, galleries, gallery_types, gallery_images, and gallery_gallery_images. (You can probably guess what they're for.)

Here's our schema:

CREATE TABLE `gallery_types` (
  `gallery_type_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `gallery_type_title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`gallery_type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `galleries` (
  `gallery_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `gallery_type_id` int(11) unsigned NOT NULL,
  `gallery_title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`gallery_id`),
  KEY `gallery_type_id` (`gallery_type_id`),
  CONSTRAINT `galleries_gallery_types`
    FOREIGN KEY (`gallery_type_id`)
    REFERENCES `gallery_types` (`gallery_type_id`)
    ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `gallery_images` (
  `gallery_image_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `gallery_image_title` varchar(255) DEFAULT NULL,
  `gallery_image_file` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`gallery_image_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `gallery_gallery_images` (
  `gallery_gallery_image_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `gallery_id` int(11) unsigned NOT NULL,
  `gallery_image_id` int(11) unsigned NOT NULL,
  `gallery_gallery_image_index` int(11) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`gallery_gallery_image_id`),
  UNIQUE KEY `gallery_id` (`gallery_id`,`gallery_image_id`),
  KEY `gallery_image_id` (`gallery_image_id`),
  CONSTRAINT `gallery_gallery_images_gallery_images`
    FOREIGN KEY (`gallery_image_id`)
    REFERENCES `gallery_images` (`gallery_image_id`)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `gallery_gallery_images_galleries`
    FOREIGN KEY (`gallery_id`)
    REFERENCES `galleries` (`gallery_id`)
    ON DELETE CASCADE
    ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Overview

You will first notice that the database is pretty well-"normalized". To put it this way is a bit reductive, but that basically means that we are not duplicating fields between tables and we have tried to remove redundancy where possible.

For example, instead of storing the gallery_type as repeated values of an ENUM or VARCHAR column on the galleries table, we have instead given them their own table. That way, when we update the gallery_type_title of one gallery_type, we update it once on the gallery_types table and don't need to worry about what its value is on any related tables. In application code, it often much more convenient to query the possible values for a gallery_type from its own table.

The galleries table represents each individual gallery. Each gallery is assigned to a gallery_type through the field gallery_type_id.

The gallery_images table represents the images that may appear in each gallery. A single image may be assigned to multiple galleries through the gallery_gallery_images table. The gallery_gallery_image_index determines the order that image should appear in the gallery.

A table name like gallery_gallery_images might seem odd and ugly, but I find it's best to follow your conventions strictly. If you start making exceptions because something is "ugly", when you go back to it in the future (or even in a few days) you will not remember what you considered ugly at the time or how you "solved" it. Every exception is something more to remember. Best to always stay consistent (even at the expense of typing).

Naming Conventions

In this tutorial, we've used all lower-cased names, with words separated by underscores.

All table names are be plural, and all fields within a table will are prefixed by the singular name of the table. I prefer this convention; because as long as you aren't joining the same table twice in a query, there are unique names for each field on each table, even if they are both a "title", "index", etc.

PRIMARY KEYs will be the singular name of the table suffixed by _id. Every table will having an AUTO_INCREMENT-ing PRIMARY KEY.

FOREIGN KEYs will match the name of the PRIMARY KEY on their related tables.

For tables whose items are children of another table, we prefix the name of that table with the singular version of the parent table name.

For relational tables, we prefix with the singular table name of one followed by the plural version of the other table. These tables should contain FOREIGN KEYs with columns that match the names of the PRIMARY KEYs of their related tables.

Try to keep naming conventions for similar items between tables. For example, I almost always use "tablename_title" instead of "tablename_name", and I always use "tablename_index" instead of "tablename_position" or "tablename_order".

I also try to abbreviate as little as possible, though I do make exceptions for common abbreviations. For example, I may be willing to abbreviate "tablename_date_of_birth" as "tablename_dob", but I'd never abbreviate something to a few letters I made up. (You've seen this before... things like "job" as "jb", "distributor" as "distrib", etc.)

Naming conventions are not universal (particularly the letter-case of the identifiers, which I have used all lowercase to eliminate case-insensitivity issues when migrating between platforms on MySQL). However, I have found that these conventions work very well.

The Data

INSERT INTO `gallery_types` (`gallery_type_id`, `gallery_type_title`)
VALUES
	(1, 'Image Gallery'),
	(2, 'Slideshow');

INSERT INTO `galleries` (`gallery_id`, `gallery_type_id`, `gallery_title`)
VALUES
	(1, 1, 'Image Gallery 1'),
	(2, 1, 'Image Gallery 3'),
	(3, 2, 'Slideshow 1');

INSERT INTO `gallery_images` (`gallery_image_id`, `gallery_image_title`, `gallery_image_file`)
VALUES
	(1, 'Image 1', 'image1.jpg'),
	(2, 'Image 2', 'image2.jpg'),
	(3, 'Image 3', 'image3.jpg'),
	(4, 'Image 4', 'image4.jpg'),
	(5, 'Image 5', 'image5.jpg');

INSERT INTO `gallery_gallery_images` (`gallery_gallery_image_id`, `gallery_id`, `gallery_image_id`, `gallery_gallery_image_index`)
VALUES
	(2,  1, 1, 0),
	(3,  1, 2, 1),
	(4,  1, 5, 2),
	(5,  2, 2, 0),
	(6,  2, 3, 1),
	(7,  2, 4, 2),
	(8,  3, 1, 0),
	(9,  3, 3, 1),
	(10, 3, 4, 2);

Constraints

For those unfamiliar with CONSTRAINTs, they restrict the value of a field in the database, and in some cases can trigger other actions in response to INSERT, UPDATE, and DELETE queries. The syntax and semantics of how these work varies wildly between different SQL databases, so you will want to read up more on this if you use some dialect of SQL other than MySQL.

This schema only uses FOREIGN KEY CONSTRAINTs. These are the most common that you will use. However there are other types of constraints which vary by the dialect of SQL, and many even allow you to write your own custom constraints.

FOREIGN KEY CONSTRAINTs mark that a column on one table must match the value in a column on another table. It states that two tables are related through particular columns.

Let take a look at them:

galleries.galleries_gallery_types

CONSTRAINT `galleries_gallery_types`
  FOREIGN KEY (`gallery_type_id`)
  REFERENCES `gallery_types` (`gallery_type_id`)
  ON UPDATE CASCADE

This CONSTRAINT relates galleries to gallery_types through the field galleries.gallery_type_id. This constraint is set to CASCADE on UPDATEs. That means if I update the gallery_type_id on the gallery_types table, that change will CASCADE to all related rows on the galleries table.

Try this:

UPDATE gallery_types
SET gallery_type_id = 3
WHERE gallery_type_id = 1;

You will now notice that "magically", the galleries.gallery_type_id has also updated to 3 where it used to be 1. Great, but notice we didn't specify ON DELETE CASCADE on this CONSTRAINT. Let's try this:

DELETE FROM gallery_types
WHERE gallery_type_id = 2;

You will receive an error message similar to the following:

Cannot delete or update a parent row: a foreign key constraint fails (`test`.`galleries`, CONSTRAINT `galleries_gallery_types` FOREIGN KEY (`gallery_type_id`) REFERENCES `gallery_types` (`gallery_type_id`) ON UPDATE CASCADE)

We tried to delete a gallery_type that still had galleries assigned to it. But we don't want that to happen, because we will end up with "orphaned" galleries which would reference a type that no longer exists. This is a good error. We don't want to accidentally DELETE a gallery_type, then have that DELETE CASCADE to the galleries table. We could end up DELETE-ing galleries unintentionally. (In some cases, we DO want DELETEs to CASCADE to a related table. There will be an example of this following.)

If we want to DELETE a gallery_type, we first need to delete or reassign all of the galleries of that gallery_type.

Let's say we want to get rid of the gallery_type, and all it's associated galleries. This will work:

DELETE FROM galleries
WHERE gallery_type_id = 2;

DELETE FROM gallery_types
WHERE gallery_type_id = 2;

gallery_gallery_images.gallery_gallery_images_gallery_images / gallery_gallery_images.gallery_gallery_images_galleries

CONSTRAINT `gallery_gallery_images_gallery_images`
  FOREIGN KEY (`gallery_image_id`)
  REFERENCES `gallery_images` (`gallery_image_id`)
  ON DELETE CASCADE
  ON UPDATE CASCADE,

CONSTRAINT `gallery_gallery_images_galleries`
  FOREIGN KEY (`gallery_id`)
  REFERENCES `galleries` (`gallery_id`)
  ON DELETE CASCADE
  ON UPDATE CASCADE

In some cases, we want both our DELETEs and UPDATEs to CASCADE to a related table. This happens often for relational tables, such as gallery_gallery_images. When we delete either a gallery or a gallery_image, we want their related record to be removed from the gallery_gallery_images table. Since they are related through another table, we don't have to worry about DELETE-ing a gallery_image unintentionally DELETE-ing all of its related galleries, or vice-versa.

Try this:

DELETE FROM gallery_images
WHERE gallery_image_id = 2;

Now if you look at the gallery_gallery_images table, all records that used to have a gallery_image_id of 2 have automatically been DELETE-ed.

Notes:

FOREIGN KEY CONSTRAINTs can be NULL-able. That means you can choose not to assign columns to the related table, and set them to NULL instead. Oftentimes you will see these as 0's in databases with no FOREIGN KEY CONSTRAINTs. That will not work with the CONSTRAINT because 0 doesn't refer to a real row on the other table.

Troubleshooting Constraints:

Some people have trouble with relations because they can cause errors at seemingly unexpected times, and usually have terrible, unhelpful error messages. But don't give up when this happens. Here are a few common scenarios that cause these errors and how to fix them.

Sometimes you will get an error when defining a new relation, you will receive an error that is caused by a mismatch of types between two columns you are trying to set up a FOREIGN KEY CONSTRAINT for. Here's a few things to look at:

Are they both the same type and size? Are you trying to relate an INT to a VARCHAR, or something similar?

If it is an INT type, are is one of them signed and the other is unsigned? As a rule of thumb, their types must match exactly.

Other times you will get errors when you try to define CONSTRAINTs on columns that already would fail the constraint. Are you trying to create a relation between two tables with data already in them which would fail if you were inserting it into the table with the CONSTRAINT?

You also need to take care when trying to clear data out from multiple related tables. The order you DELETE from the tables matters, because the CONSTRAINTs must always hold (so long as FOREIGN_KEY_CHECKS = 1 in MySQL). Otherwise, you'll get an error. (Again, a good error.)

Indexes

Indexes are used to tell the database to cache information about the location of various values in the database, which can greatly improve performance if they are used correctly. I'll describe briefly how these work and when to use them, but I encourage you to look further into them if you are interested in SQL optimization.

Essentially, they store information in hidden tables, ordered by the fields that you define for your index (the important word here is ORDERED). You can have indexes on more than one column, which we will see below. Indexes allow the database engine to find rows based on the value of a particular column or columns basically through a binary search algorithm (which performs in O(log n) time). In other words, the performance benefits of this increase exponentially with the number of rows on the table.

There are three common types of index used: PRIMARY KEY, KEY and UNIQUE KEY.

Only one PRIMARY KEY can be defined per table. It is a unique value that is the absolute identifier of a given row in a table.

A KEY can be assigned on any field that isn't unique, but is often used in JOIN, WHERE, or HAVING conditions and sometimes in ORDER BY clauses. As a general rule, every column that holds a FOREIGN KEY should definitely have an index.

A UNIQUE KEY must be unique in the table, meaning it can appear only once. For example you would want a UNIQUE KEY on the username field on a users table, because you don't want more than one person to share the same username.

If you try to INSERT or UPDATE a row with the same value occurring multiple times, the database engine will give you an error. Again, this is a good error. We do not want to ruin the integrity of our data. I've run into databases without proper UNIQUE keys, and you can run into all sorts of problems. What does it mean for more than one user to have the same (supposedly unique) username? When you try to authenticate them, which row should you check if their password matches?

Sometime you define a KEY on more than one field. For example, we defined a UNIQUE KEY 'gallery_id' on the columns gallery_gallery_images.gallery_id, gallery_gallery_images.gallery_image_id. We do not want the same gallery_image assigned to a gallery more than once.

The order here is important. The index table will be store ordered by the first column, then the second, etc. So there is no need for another KEY that is only defined on gallery_gallery_images.gallery_id. The database engine is smart enough to figure out it can use the KEY because the first column is ordered exactly the same as if it was defined on just the gallery_id column.

In MySQL there are some other features that utilize UNIQUE keys for their behavior, such as INSERT IGNORE, and UPDATE ... ON DUPLICATE KEY. It's a good idea to look into these further, because they do some in handy.

Conclusion

If you made it this far, thanks for reading! I will likely follow this article with a tutorial in more advanced SQL topics, such as optimization.

Grid Image