Clean Up WordPress Revisions Using a MySQL Multi-Table Delete

I’ve been doing some research on the revisions WordPress creates automatically whenever you edit or update a post. The revision feature is sparsely documented in the WordPress Codex, so I was pleased to find an article from SitePoint on the subject: How to Control Post Revisions in WordPress. It’s an excellent write-up overall, and I recommend reading it. However, I did notice a flaw in the SQL statement given for deleting old revisions. Since this flaw could potentially cause unintended data loss, and since comments on the article are closed, I wanted to explain the problem here.

In order to thoroughly remove all traces of old revisions from the WordPress database, it is necessary to clean up the wp_term_relationships table. This is because WordPress prior to version 3.0 stores the category relationships for revisions in that table. The SQL statement given in the SitePoint article to purge revision data is as follows:

DELETE a,b,c
FROM wp_posts a
LEFT JOIN wp_term_relationships b ON (a.ID = b.object_id)
LEFT JOIN wp_postmeta c ON (a.ID = c.post_id)
WHERE a.post_type = ‘revision’;

Notice that the SQL statement is joining the wp_posts and wp_term_relationships tables, and then performing a multiple table delete to remove all rows from both tables where post_type equals ‘revision’ (and from the wp_postmeta table as well). It’s an elegant statement designed to clean up the entire database in one step.

Here’s the flaw: there are two types of category relationships stored in wp_term_relationshps — post categories, and link categories. The object_id field can contain either an ID from the wp_posts table, or a link_id from the wp_links table. It is entirely possible that a revision and a link could have the same object_id, as was in fact the case in my own WordPress database:

| object_id | term_taxonomy_id | term_order |
| 2 | 1 | 0 |
| 3 | 1 | 0 |
| 9 | 1 | 0 | <-- same object_id
| 9 | 2 | 0 | <--
| 10 | 1 | 0 | <-- same object_id
| 10 | 2 | 0 | <--

When I ran the above SQL statement on my database (after having made a backup, of course), it deleted the category relationships for both revisions and links, in all cases where they happened to have the same object_id. Oops.

How can this be? It is important to understand that wp_term_relationships uses a composite primary key. If we run ‘describe’ on that table, we can see that both the object_id and term_taxonomy_id fields are marked ‘PRI’:

| Field | Type | Null | Key | Default |
| object_id | bigint(20) unsigned | NO | PRI | 0 |
| term_taxonomy_id | bigint(20) unsigned | NO | PRI | 0 |
| term_order | int(11) | NO | | 0 |

The wp_term_relationships table sits between wp_posts and wp_term_taxonomy, and defines the many-to-many relationships between them. This means that, in order to select a unique row from that table, we must join it with both wp_posts and wp_term_taxonomy, which stores the distinction between post categories and link categories in the ‘taxonomy’ field:

| term_taxonomy_id | term_id | taxonomy | . . .
| 1 | 1 | category | . . .
| 2 | 2 | link_category | . . .

Armed with this knowledge, we can write a revised SQL statement that removes all revisions, without the potential for accidentally deleting link relationships. When I ran the revised query on my database, it worked exactly as intended:

DELETE a,b,c
FROM wp_posts a
LEFT JOIN wp_term_relationships b ON ( a.ID = b.object_id)
LEFT JOIN wp_postmeta c ON ( a.ID = c.post_id )
LEFT JOIN wp_term_taxonomy d ON ( b.term_taxonomy_id = d.term_taxonomy_id)
WHERE a.post_type = 'revision'
AND d.taxonomy != 'link_category';

Granted, this is not a catastrophic flaw; many people who use WordPress as a CMS don’t even use the wp_links table, and for those who do, any damage done would be relatively easy to fix. However, since it did affect me personally, and since it is a interesting case study in database design and the use of composite keys, I felt it was worth pointing out.

Addendum

My thanks to Craig Buckler, author of the SitePoint article, for graciously responding to my message concerning the flaw, and posting a correction.

More From the Blog

Comments

  1. Gravity says:

    Thank you for posting this. Exactly what I was googling for.

  2. Terescia says:

    Thank you for writing this up. I actually landed on the Sitepoint article but followed the link here. I had used a similar query in the past and had this exact same thing happen to me. Several of my categories disappeared from a group of my links and I had no idea this was the cause. I’ve saved this bit of code and will be using it in the future instead. Thanks again!

  3. nemnhe says:

    Error
    SQL query:
    DELETE a,
    b,
    c FROM dod_posts a LEFT JOIN dod_term_relationships b ON ( a.ID = b.object_id ) LEFT JOIN dod_postmeta c ON ( a.ID = c.post_id ) WHERE a.post_type = ‘revision’;
    MySQL said: Documentation
    #1054 – Unknown column ‘‘revision’’ in ‘where clause’

    and
    result for:
    DELETE a,
    b,c FROM dod_posts a LEFT JOIN dod_term_relationships b ON ( a.ID = b.object_id ) LEFT JOIN dod_postmeta c ON ( a.ID = c.post_id ) LEFT JOIN dod_term_taxonomy d ON ( b.term_taxonomy_id = d.term_taxonomy_id ) WHERE a.post_type = ‘revision’ AND d.taxonomy != ‘link_category’;
    is “Deleted rows: 0 (Query took 0.0128 sec)”
    (my prefix: dod_)
    Please help..
    What about export the SQL (exclude revision) then import ?
    then using bigdump.php for large database.

  4. nemnhe says:

    It works now. it should be: ‘revision’, not ‘revision’.
    result: “Deleted rows: 19 (Query took 0.0158 sec)”
    I use SQL statement given in the SitePoint.
    the second statement result error. My WP version 3.2.1
    then i add:
    define(‘WP_POST_REVISIONS’, false); // Disable Revision
    on my wp-config.php or
    define(‘AUTOSAVE_INTERVAL’, 120 ); // Default value is 60 seconds.
    define(’WP_POST_REVISIONS’, 5); // Number of revisions to save.
    thanks :)

  5. Chris says:

    What do you do if you followed another tutorial that simply deleted WHERE post_type = ‘revision’?

    Now there is no reference :/ doh.

  6. ambrosite says:

    @Chris There is a thread here discussing methods for removing orphaned records from wp_term_relationships:

    http://wordpress.org/support/topic/plugin-wp-optimize-wp_term_relationships?replies=10

  7. Jose Morales says:

    WOW the script is GREAT! if you have a site with 250k articles or more it will clean it. I had 500k+ in wp_postmeta among and it left 1700 and the site is working 100%. I will run this query on all my WP sites…

  8. Andi says:

    DELETE a,b,c
    FROM wp_posts a
    LEFT JOIN wp_term_relationships b ON ( a.ID = b.object_id)
    LEFT JOIN wp_postmeta c ON ( a.ID = c.post_id )
    LEFT JOIN wp_term_taxonomy d ON ( b.term_taxonomy_id = d.term_taxonomy_id)
    WHERE a.post_type = ‘revision’
    AND d.taxonomy != ‘link_category’;

    DOESN\T WORK. It says 0 rows affected.

  9. ambrosite says:

    @Andi If it says “0 rows affected”, that means you did not have any data in your tables that needed to be cleaned up.

Comments are closed.