This Lesson is for Members

Subscribe today and get access to all lessons! Plus direct HD download for offline use, enhances transcripts, member comment forums, and iTunes "podcast" RSS feed. Level up your skills now!

Unlock This Lesson

Already subscribed? Sign In

Autoplay

    Use Joins to Detect Orphaned Data in Postgres

    Brett CassetteBrett Cassette

    Unfortunately, sometimes we don’t add the right constraints to our data. A user gets deleted and old data doesn’t get cleaned up. Primary information is deleted, but its audit trail isn’t. With left and right joins, we can flag anomalous data for deletion.

    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    00:01 You can select * from bid histories and do a left join on bids.ID with bid histories.bidID.

    00:15 We can see that there are nulls in this table. Those are the ones that we're actually going to be interested in, in this case. Let's say where the bid.ID is null -- what that means is that there was a record in the bid histories table, we left-joined into bids, there was not a record in the bids table.

    00:37 This means that we have some history object that has no corresponding bid object. This shouldn't be the case in our database. We've detected this anomalous condition where we see a bid history still lying around that wasn't cleaned up.

    00:57 One of the first things I would try to do here is to diagnose what happened. I might look at other related information. I might say let's left join onto users, where the users ID is equal to bid histories.history userID. Left join on users.ID is null.

    01:16 We see that we get another related record here, where the user for this bid history is gone. Probably, what we could determine is in our system, we deleted a user. That managed to delete all of their bids, but it did not manage to delete all of the related bid histories.

    01:36 Oftentimes, we would use a left or a right join in our production data in order to figure out which data is stranded or orphaned, so we know which data is safe to clean up.

    01:49 We can audit that process, and figure out whether we should be cleaning this up, or if we accidentally deleted some data. It gives us something to follow up on.

    Discuss

    Discuss