Combine Window Functions to Answer Complex Questions

Brett Cassette
InstructorBrett Cassette
Share this video with your friends

Social Share Links

Send Tweet
Published 8 years ago
Updated 6 years ago

Individual window functions are useful, but sometimes you’ll need to combine them to answer complex questions. For instance, which 10 star movies were released in the year that had the most 10 star movies released?

[00:00] We're going to ask a fairly complicated question that's going to require us to compose window functions. That question is, essentially, which year has the most one-star ratings, two-star ratings, three-star ratings? Then, what were the titles of the movies that had those ratings in that year?

[00:18] The first thing that we're going to do is get counts for each year by each rating. I'm rounding the rating, because I don't, necessarily, want to know for 1.1, 1.2. I just want to know for one.

[00:30] We'll round the ratings by year. We'll order by the year descending. These are just the row numbers, we're going to see where we're starting here. In 2004, you can see we have 7, 8, 9, 10, 11, 12 one-star ratings.

[00:49] In 2003, we keep coming until we get to this last value. We've just said, "last value." That should tell us which window function we're going to reach into our pocket for next. The first thing I want to do is to split these up. Let's call these "Our rankings," that each of these queries can be isolated in itself.

[01:10] We'll say, "Select star from rankings." We'll also add onto this the last value of the ranking column over the partition by...we'll use rating and year.

[01:24] Now, we're using this rating, we don't need to round it anymore. We'll call this "The count." This should be the number of this many star movies in this particular year.

[01:36] As we've seen before with last value, it's important for us to give ourselves an order bias. We're ordering by the ranking. We have to grab the rows between unbounded preceding and unbounded following to make sure that we're getting the correct count here.

[01:56] Now that we have the counts, we want to make sure that we're grabbing the years that have the highest count for each particular rating. Let's call these "The rated years." We're going to rank them in some way. We will select from the rated years. We'll pick out the outliers in a second.

[02:19] For right now, let's just say, "Select star from the counts." We're going to want to get a ranking of some kind, I think DENSE_RANK makes sense. We'll say, "This is over the partition by the rating, ordered by the count descending." That's going to be the rank.

[02:39] We'll see what this looks like here. The rank for 1997, you can see is one for the rating of one, two for 2004 etc. The ones we want are the ones where the rank is one. These are the first place. These are the outliers.

[03:01] We see 1997 has 16 one-star rated movies. That is the highest number. We can see what the names of the movies are here. We can see 1997, also, has 85 two-star rated movies. 1997 seems to have produced a lot of bad movies, also for three.

[03:22] 1999 produced the most four-star rated movies and six-star etc. We can get down to 10 in 2004. It looks like a great year, with 70 10-star rated movies.