Join egghead, unlock knowledge.

Want more egghead?

This lesson is for members. Join us? Get access to all 3,000+ tutorials + a community with expert developers around the world.

Unlock This Lesson
Become a member
to unlock all features

    Detect Duplicates in Postgres with Cartesian Product

    Brett CassetteBrett Cassette

    The cartesian product is a humdinger -- it relates all data in one table to all data in another! Now that’s a lot of rows, and fast. Surprisingly, cartesian products are as fast as inner joins at duplicate detection, thanks to the magic of the query planner!



    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson


    00:00 Let's look at an inner join. We'll inner join backers when projects.ID equal backers.projectID. This gives us all the projects where there's a backer for that project. A cross join doesn't need a clause. It takes every single project and matches it to every single backer regardless of whether or not they backed that project.

    00:30 Here we see the second project matched again to every single backer, and again, and again. This gives us the entire Cartesian product of the projects table matched up with the backers table. We could even cross join onto ourselves. P1 cross joins onto P2.

    00:53 Maybe we'd use this to detect some duplicates. Perhaps, where the P1.ID is not equal to the P2.ID and the is equal to the This shows us two different projects that have the same name.