When working with database tables, we want to make sure we keep our data as simple as possible. This can sometimes mean having multiple tables that share some data columns. Using the join statement, we can combine data between two tables on similar column data to gather more information in one query. Let’s work through some common join statements like left, right, inner, and cross join.
Instructor: [00:00] If this is the first time getting your hands dirty with SQL and databases, you might find yourself thinking that your tables need to hold as many columns that are relevant as possible. For example, when dealing with users, as in our case, we could add more columns for email, address, phone number for each person, and then gone even deeper with their user history, like if they've changed their email from what to what, and then a column for when that happened.
[00:25] Before long, you'll find that your table has too many columns to keep track of. Eventually, this one table will have to deal with a lot of transactions at the same time, especially, as you scale your application. Instead, we want to keep our tables simple and modular. We have our users table with three users. I've also gone ahead and created a purchases table that has two rows of data.
[00:46] Each one of these rows represents a purchase that someone from our users table made. Let's say that we were making a dashboard within our application that displayed the user's information as well as their purchase history, which is really common. We could make two separate calls to our database to get this info, one, to get all the info about the user and the other, to get all of their purchases.
[01:07] Instead what we can do is get it all out in one query by joining these two tables together. In our first example, we'll use the left outer join. Now let's walk through what's happening here. We still use the select * which will pull out all of our columns returned from the rest of our query. We say from our users table, and this U is aliasing the users table, kind of like making it a variable that we can reference throughout the rest of this query.
[01:34] We then say left outer join with the purchases table aliased as P, on the matching user handles between the two tables. Now let's get more into the join. If we zoom out enough, you'll see that we now have our two tables' columns added together to form one large table. All of the users columns are accounted for, as well as purchases. You can see we have two user handle columns because they're both shared.
[01:58] When joining tables together, we tell it what condition to join on. This is why we need the on statement in our query, so our tables know where to fuse. You'll notice that Debbie does not have any purchases, so all the columns are holding the NULL value for each of her purchase columns. Left outer join is saying, "Get the table and all of its rows stated to the left of the join statement," in our case, users, as our starting set of data.
[02:24] With this data, match it up with the following purchases table rows, filling in NULL values for any missing rows, which again is why Debbie's purchase columns has NULL values, because she doesn't have a row in purchases.
[02:39] Now let's change our left outer join to be a right outer join, and we'll see what we get. We have to zoom way out to see everything. Now with our right outer join, we're not starting with a users dataset first. We take the right table's data. In our case, the two purchases table rows and match up the users rows with them. We start with two rows, and that's what we end up with in the end.
[03:02] As you can see, Debbie doesn't show up in this list, because she's not in the purchases table. Now let's insert a row into our purchases table and have it generate a random user handle that does not exist in our users table. We'll rerun our join statement and see that because there's no matching user handle for this newly inserted row that matches our users table, we fill in NULL values for the users table data columns.
[03:28] What if we want both missing rows of data in our combined ultimate table? We want to see Debbie in here, as well as this purchases row that does not have a matching user. We do that by using the full outer join statement. The result is we have four total rows with NULL values inserted for missing columns on both sides.
[03:49] There's also inner join, which is the opposite of full outer join. It only returns an ultimate joined table, where there's a matching user handle on both tables. Finally, there's cross join, which doesn't take a join on statement, because it takes the first table's rows and assigns it each row of the joining purchases table.
[04:12] As you can see, I'm in here three times because the purchases table has three rows of data. My users row is duplicated and matched up with each row in purchases, which is why cross join doesn't accept any on conditions.
[04:29] There are a couple of final points to know when working with joins. If we specifically ask for a column that is shared between two joining tables, we'll get an error asking us which column from which table to pull out. This is why we need to alias our tables, so that we can specify which table column combination we want to pull out. Another tidbit is we only had one join on clause. You can add more conditions by using AND.