Select Grouped and Aggregated Data with SQL

Tyler Clark
InstructorTyler Clark
Share this video with your friends

Social Share Links

Send Tweet

Another powerful SQL skill is understanding how to group rows together by column values. Once the data is grouped together, it can be difficult to understand how to actually work with the groupings. In this lesson, we will use the group by clause, as well as the count, sum, avg, min, and max aggregate functions.

Counts with create date and first name example:

SELECT u.total, u.create_date, first_name 
FROM Users us 
INNER JOIN (SELECT count(create_date) AS total, create_date 
   FROM Users
   GROUP BY create_date) u ON u.create_date = us.create_date;

Min create date with first name example:

SELECT create_date, first_name 
FROM Users 
WHERE create_date = (select min(create_date) FROM Users);

Aggregate functions:

https://www.postgresql.org/docs/9.5/functions-aggregate.html https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html https://docs.microsoft.com/en-us/sql/t-sql/functions/aggregate-functions-transact-sql https://docs.oracle.com/database/121/SQLRF/functions003.htm

Instructor: [00:00] Our users table currently has four rows of data. We've got a Tyler, Debbie, Mary, and Patrick. They each have unique user handles, but two of them share the same create date, 2019 January 10th. One has one in 2019 and the other is February 1st of 2019.

[00:17] When we select out data, I want to be able to group common column values together. For instance, I want to get a count of shared create dates. The group by clause does exactly what you would assume. It combines the rows returned from the select statement into groups.

[00:32] This is telling us that there are three groups of rows. Even though there are four rows in our users table, since we're grouping by create date, and two rows share the same value, we get three groups. Just to show both sides, when none of the rows share, let's say, a user handle, we'll get those four rows.

[00:51] Just seeing the count isn't really helpful, because we don't know which create dates are actually shared. Let's pull that out as well inside of our select statement. Now we know that the January 10th date is the one that has two rows grouped together. Let's also pull out the first name column values as well.

[01:09] As you can see, we're getting an error. This is telling us that the first name column needs to be added to the group by in order to pull this out. Remember, our rows are being grouped together by create date.

[01:20] Because more than one row can be in a group, as in our January 10th example, our database doesn't know which first name value to pull out of the grouping, which is why we're getting this error.

[01:32] If we add first name to the group by, we no longer get this error, but instead get all four rows back. This is because our group by is now looking to create groups that share create date and first name. Because none of our rows share these two columns, we get all four rows out.

[01:50] Keeping this mentality of grouping in mind, most SQL databases have built-in aggregate functions. For example, if we wanted to find what the lowest value create date was within our table, we'd use the min function.

[02:03] Aggregate functions are closely related to grouping operations like group by, which is why we'll get the same error as we did before when trying to pull out our first name column with the min function.

[02:15] As a side note, if you're curious on how to accomplish the task I talked about in this lesson, I'll post those two queries in the notes to this video. We need to do a little bit more than just group by and aggregate the create date values.

[02:26] Some other common aggregate functions are max, which does the opposite of min, there's sum, which sums up values, and average. Be sure to check the notes to this video for links to popular SQL databases that implement these, as well as their own functions.