Database Style Guide

Defer to the PostgreSQL wiki about what not to do.

The PostgreSQL Wiki has a handy “Don’t Do This” page. While there are exceptions when most of these discouraged practices may be safe or even necessary, we should avoid them unless we’re sure we’re the exception.

Enforce data integrity with unique indexes and not-null constraints

Bugs resulting from invalid data can be tricky to detect and it can be time-consuming to clean up—best to prevent the possibility of it in the first place. PostgreSQL gives us a range of tools to ensure invalid data isn’t just unlikely but impossible:

  • Use UNIQUE CONSTRAINTs/INDEXes (unique: true in migrations) to prohibit “duplicate” data. For instance, if it doesn’t make sense for a Lesson to be tagged with “Sales” twice, we want a unique composite index on the tag and taggable.
  • Use NOT NULL (null: false in migrations) to guarantee necessary relationships (e.g. make company_id non-nullable on tables that must be associated with a company) or attributes (if a title is required, make it NOT NULL).
  • However, avoid using FOREIGN KEY constraints, as we’ve run into performance issues deleting from large tables like users that have very many of them. If there is an overriding concern about data fidelity to justify them, be mindful of performance when adding foreign key constraints: see this migration for an illustration of the challenges and a way around them. Rails’ dependent: :destroy and dependent: :nullify callbacks do not contribute to database-level slowness with deleting records, so prefer them to foreign-key constraints for maintaining data fidelity.

Avoid nullable booleans.

We expect boolean values typically to be either true or false. Make them NOT NULL with a default value when we don’t intend to make use of the default third null state.

Use transactions only when necessary.

Transactions are sometimes essential for ensuring data integrity, but when we include operations within the transaction that don’t need to be atomic, it makes transactions take longer and increases the likelihood of locks, deadlocks, and other performance issues resulting from slow queries (example).

Identify all rows with a required company_uuid column on all tables

In code and reporting, we often need to know the company/tenant to whom data belongs. This can be difficult (and slow) when we need to join on a series of tables to find it. And it can be ambiguous when there are multiple pathways to a company which should in theory agree but might not. Denormalizing a reference to the company’s UUID on to all tables will help in both cases. Do not add a forign-key constraint, however, as too many of those will slow down writes to the companies table. The supporting Rails associations will require two extra options:

# app/models/widget.rb
belongs_to :company, primary_key: :uuid, foreign_key: :company_uuid

# app/models/company.rb
has_many :widgets, primary_key: :uuid, foreign_key: :company_uuid

The timestamps at the bottom of the structure.sql file are in chronological order

Rails generates structure.sql automatically based on the state of the database when it runs migrations, adding new migration timestamps in chronological order. However, if we hand edit the file due to merge conflicts the timestamps can get out of order.

Keeping the timestamps in chronological order makes it easier to see what migration is included without searching for it (ie. if we know the date we can easily see if the migration is there if it is chronological order).

Use UUID or BIGINT for new primary and foreign keys.

INTEGER-type columns support a maximum value of approximately 2 billion, which is plausible for us to exhaust. For this reason, BIGINT (which maxes out at 9 quintillion) has been the default for primary keys since Rails 5.1 and should be used instead of INTEGER. While foreign key columns (e.g. foos.bar_id) are typically required to match the type of the primary keys they reference (e.g., a BIGINT foreign key can reference an INTEGER primary key, so always use BIGINT for numeric foreign keys. While we’ve also begun using UUID-type primary keys, we don’t currently have a recommendation for UUID vs. BIGINT.