June 30, 2021
At Channable we store most of our data in a relational database. Relational databases are an amazing tool. Not only do they reliably store data, they also provide efficient access to the stored data, and they make sure that the data adheres to predefined invariants. The data format, efficient access, and invariants are configured in the database schema in the form of tables, indexes, and constraints respectively. Configuring these properly can be tricky, and from time to time we encounter performance or correctness problems related to the schema design. In this blog post we’ll discuss how we made it less likely that we will reintroduce these problems using our new linting tool dbcritic.
A problem that we encountered several times was delete statements taking a very long time to complete. A delete statement deletes rows from a table that match a certain predicate. Normally, the time it takes for a delete statement to run scales linearly with the number of rows being deleted. But this did not seem to be the case for us; it was much slower than expected.
One relational database feature we use is foreign key constraints. Like other types of constraints, foreign key constraint help us prevent bad data from ending up in the database. For instance, in the following diagram there are two tables
ad_statuses, along with a single foreign key constraint highlighted in blue.
The foreign key constraint links the
id column in the
ads table to the
ad_id column in the
ad_statuses table, making sure that for every row in the
ad_statuses table, there exists a corresponding row in the
ads table. In other words, there are no ad statuses that belong to a non-existent ad. The hypothetical
ad_statuses row with
ad_id = 4, highlighted in red in the diagram, would violate this constraint; therefore it could never exist while the foreign key constraint is in place.
There are various ways a foreign key constraint could be violated, but the one we are interested in is the following: trying to delete a row from the
ads table that has an
id value that still occurs in the
ad_id column of the
ad_statuses table (i.e. an ad for which ad statuses still exist). What happens when a foreign key constraint is violated is that the database will refuse to delete the row, and reports an error instead.
The way the database makes sure that no foreign key constraints are ever violated
is by performing a check every time rows are inserted, updated, or deleted.
These checks are not free: the database will actually go and look up the row in the table. This lookup essentially works the same as a
SELECT statement would.
For instance, if you were to delete the row with
id = 42 from the
the database will check the foreign key constraint isn’t violated using a query like this:
SELECT FROM ad_statuses WHERE ad_id = 42
If this query returns any rows, then the foreign key constraint would be violated.
The way the lookup is performed in the most general case is that the database looks at each row in the
ad_statuses table, in turn, until it has found a row that matches the criteria. The time this takes is linear in the number of rows. These lookups are performed for each row being deleted. So if you delete n rows on a table involved in a foreign key constraint the database will perform n checks to make sure that the rows being deleted are not still referenced by any rows in the other table. If the table at the other end of the foreign key constraint contains m rows, we have an overal O(m × n) operation, which is very slow if there are many rows in either table.
An easy “solution” to this problem would be to drop the foreign key constraints. When the foreign key constraints are removed, the database will cease to perform the checks, and the deletes will be fast once again. However, we find that foreign key constraints are invaluable: without them, it is just too easy to accidentally introduce data inconsistencies. Inconsistencies which must later be detected and resolved manually, and that have a cause (some bug in the code base) that is not immediately apparent. With the foreign key constraints in place, any buggy code that would cause a violation will crash and log a neat stack trace that makes it easy to debug and resolve.
If only there was a way to make these checks run in better than linear time. If a foreign key constraint could be checked in just logarithmic time, the overal complexity of a batch delete would be O(n × log(m))! Logarithmic time algorithms are faster than linear time algorithms for sufficiently large numbers of rows (and for insufficiently large numbers of rows, the running time is low either way so it doesn’t matter).
Luckily, it is very easy to make the database perform lookups in logarithmic time. All you have to do is add a suitable index to the database schema. An index is a tree structure that organizes rows by the value of a certain column. A suitable index for the above example would be on the
ad_id column of the
Trying to remind yourself of creating an index every time you create a foreign key constraint is just as error-prone as trying to remind yourself of not introducing data inconsistencies. The approach to preventing data inconsistencies was automated checks, so we took the same approach for preventing us not creating these indexes: automated checks. The dbcritic tool we wrote looks for missing indexes on columns involved in foreign key constraints. If they are missing, and dbcritic is not instructed to ignore them, it will report warnings.
Postgres has two types that seem very similar:
timestamp type is a simple pair of a date and a time of day. The
timestamptz type represents moments in time. These are not the same thing, as a date–time pair may represent different moments in time depending on the time zone in effect or may not exist at all or are ambiguous in time zones with DST changes. Because the vast majority of our use cases for storing timestamps is to represent specific, unambiguous moments in time, we pretty much always want to use the
timestamptz type. However, the name of the
timestamp type is the “obvious” one; it is easy to forget it’s not what we want to use. dbcritic will therefore complain if it finds any
timestamp columns, and tell the user that they should use
dbcritic is not useful if it is not actually run, so we run it as part of our automated CI pipeline. This way, any PR that causes a violation of dbcritic’s checks cannot be merged, and the reported issues must be resolved. This prevents them from happening in production, and increases confidence when deploying. 🥳 Each issue generated by dbcritic provides a list of possible problems caused by the violation, as well as a list of possible ways to resolve the issue:
ISSUE #1: Theconstraint ‘cache_log.cache_log_job_id_fkey’ is an its referencing side. you solve this issue, you will encounter the following problems: • Updating rows the referenced causes a sequential scan the referencing . • Deleting rows the referenced causes a sequential scan the referencing . You can solve this issue taking one of the following measures: • an ‘cache_log (job_id)’. • Silence this issue: ‘silence index_fk_ref cache_log cache_log_job_id_fkey’. • Silence this : ‘silence index_fk_ref’.
Ever since we introduced this CI step, we have not encountered any slow deletes!
Sometimes you want to deviate from dbcritic’s advice. For example, maybe you never do updates or deletes on a specific table, and therefore you do not need the index to make updates and deletes faster. Or maybe you really want to represent timestamps as mere (date, time) pairs. In those cases dbcritic can be told to silence specific issues using a configuration file.
dbcritic does not perform a whole lot of checks right now, but it is pretty easy to add more checks. A check is simply a SQL query that consults the pg_catalog schema, and generates the issues with their list of problems and solutions.
dbcritic came to life as a Channable Hackathon project. dbcritic is written in Idris, a functional programming language, to get an idea of what the language and tooling are like. It is quite similar to Haskell, which we use for some of our larger projects, but it has some neat extra features, such as the early returns in do expressions, and of course dependent types (which we didn’t really use in dbcritic).