Use Joins to Detect Orphaned Data in Postgres

Brett Cassette
InstructorBrett Cassette
Share this video with your friends

Social Share Links

Send Tweet
Published 8 years ago
Updated 6 years ago

Unfortunately, sometimes we don’t add the right constraints to our data. A user gets deleted and old data doesn’t get cleaned up. Primary information is deleted, but its audit trail isn’t. With left and right joins, we can flag anomalous data for deletion.

[00:01] You can select * from bid histories and do a left join on bids.ID with bid histories.bidID.

[00:15] We can see that there are nulls in this table. Those are the ones that we're actually going to be interested in, in this case. Let's say where the bid.ID is null -- what that means is that there was a record in the bid histories table, we left-joined into bids, there was not a record in the bids table.

[00:37] This means that we have some history object that has no corresponding bid object. This shouldn't be the case in our database. We've detected this anomalous condition where we see a bid history still lying around that wasn't cleaned up.

[00:57] One of the first things I would try to do here is to diagnose what happened. I might look at other related information. I might say let's left join onto users, where the users ID is equal to bid histories.history userID. Left join on users.ID is null.

[01:16] We see that we get another related record here, where the user for this bid history is gone. Probably, what we could determine is in our system, we deleted a user. That managed to delete all of their bids, but it did not manage to delete all of the related bid histories.

[01:36] Oftentimes, we would use a left or a right join in our production data in order to figure out which data is stranded or orphaned, so we know which data is safe to clean up.

[01:49] We can audit that process, and figure out whether we should be cleaning this up, or if we accidentally deleted some data. It gives us something to follow up on.