Use Joins to Detect Orphaned Data in Postgres

Brett Cassette
InstructorBrett Cassette
Share this video with your friends

Social Share Links

Send Tweet
Published 7 years ago
Updated 5 years ago

Unfortunately, sometimes we don’t add the right constraints to our data. A user gets deleted and old data doesn’t get cleaned up. Primary information is deleted, but its audit trail isn’t. With left and right joins, we can flag anomalous data for deletion.

[00:01] You can select * from bid histories and do a left join on bids.ID with bid histories.bidID.

[00:15] We can see that there are nulls in this table. Those are the ones that we're actually going to be interested in, in this case. Let's say where the bid.ID is null -- what that means is that there was a record in the bid histories table, we left-joined into bids, there was not a record in the bids table.

[00:37] This means that we have some history object that has no corresponding bid object. This shouldn't be the case in our database. We've detected this anomalous condition where we see a bid history still lying around that wasn't cleaned up.

[00:57] One of the first things I would try to do here is to diagnose what happened. I might look at other related information. I might say let's left join onto users, where the users ID is equal to bid histories.history userID. Left join on users.ID is null.

[01:16] We see that we get another related record here, where the user for this bid history is gone. Probably, what we could determine is in our system, we deleted a user. That managed to delete all of their bids, but it did not manage to delete all of the related bid histories.

[01:36] Oftentimes, we would use a left or a right join in our production data in order to figure out which data is stranded or orphaned, so we know which data is safe to clean up.

[01:49] We can audit that process, and figure out whether we should be cleaning this up, or if we accidentally deleted some data. It gives us something to follow up on.

egghead
egghead
~ 25 minutes ago

Member comments are a way for members to communicate, interact, and ask questions about a lesson.

The instructor or someone from the community might respond to your question Here are a few basic guidelines to commenting on egghead.io

Be on-Topic

Comments are for discussing a lesson. If you're having a general issue with the website functionality, please contact us at support@egghead.io.

Avoid meta-discussion

  • This was great!
  • This was horrible!
  • I didn't like this because it didn't match my skill level.
  • +1 It will likely be deleted as spam.

Code Problems?

Should be accompanied by code! Codesandbox or Stackblitz provide a way to share code and discuss it in context

Details and Context

Vague question? Vague answer. Any details and context you can provide will lure more interesting answers!

Markdown supported.
Become a member to join the discussionEnroll Today