This Lesson is for Members

Subscribe today and get access to all lessons! Plus direct HD download for offline use, enhances transcripts, member comment forums, and iTunes "podcast" RSS feed. Level up your skills now!

Unlock This Lesson

Already subscribed? Sign In

Autoplay

    Detect Duplicates in Postgres with Cartesian Product

    Brett CassetteBrett Cassette

    The cartesian product is a humdinger -- it relates all data in one table to all data in another! Now that’s a lot of rows, and fast. Surprisingly, cartesian products are as fast as inner joins at duplicate detection, thanks to the magic of the query planner!

    postgresPostgreSQL
    Code

    Code

    Become a Member to view code

    You must be a Member to view code

    Access all courses and lessons, track your progress, gain confidence and expertise.

    Become a Member
    and unlock code for this lesson
    Transcript

    Transcript

    00:00 Let's look at an inner join. We'll inner join backers when projects.ID equal backers.projectID. This gives us all the projects where there's a backer for that project. A cross join doesn't need a clause. It takes every single project and matches it to every single backer regardless of whether or not they backed that project.

    00:30 Here we see the second project matched again to every single backer, and again, and again. This gives us the entire Cartesian product of the projects table matched up with the backers table. We could even cross join onto ourselves. P1 cross joins onto P2.

    00:53 Maybe we'd use this to detect some duplicates. Perhaps, where the P1.ID is not equal to the P2.ID and the P1.name is equal to the P2.name. This shows us two different projects that have the same name.

    Discuss

    Discuss