Join egghead, unlock knowledge.

Want more egghead?

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock This Lesson
Become a member
to unlock all features

Level Up!

Access all courses & lessons on egghead today and lock-in your price for life.


    Use Joins to Detect Orphaned Data in Postgres


    Unfortunately, sometimes we don’t add the right constraints to our data. A user gets deleted and old data doesn’t get cleaned up. Primary information is deleted, but its audit trail isn’t. With left and right joins, we can flag anomalous data for deletion.



    Become a Member to view code

    You must be a Pro Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    orLog In




    You can select * from bid histories and do a left join on bids.ID with bid histories.bidID.

    We can see that there are nulls in this table. Those are the ones that we're actually going to be interested in, in this case. Let's say where the bid.ID is null -- what that means is that there was a record in the bid histories table, we left-joined into bids, there was not a record in the bids table.

    This means that we have some history object that has no corresponding bid object. This shouldn't be the case in our database. We've detected this anomalous condition where we see a bid history still lying around that wasn't cleaned up.

    One of the first things I would try to do here is to diagnose what happened. I might look at other related information. I might say let's left join onto users, where the users ID is equal to bid histories.history userID. Left join on users.ID is null.

    We see that we get another related record here, where the user for this bid history is gone. Probably, what we could determine is in our system, we deleted a user. That managed to delete all of their bids, but it did not manage to delete all of the related bid histories.

    Oftentimes, we would use a left or a right join in our production data in order to figure out which data is stranded or orphaned, so we know which data is safe to clean up.

    We can audit that process, and figure out whether we should be cleaning this up, or if we accidentally deleted some data. It gives us something to follow up on.