In our last lesson, we learnt the basics of MongoDB aggregation and saw how we can find and manipulate data. In this continuation, we will see how we can sort, skip and limit records during our aggregation stages.
The aggregation stages/operators covered in this video are:
You can see the list of all the available operators in the MongoDB docs.
Kamran Ahmed: [0:00] Here, in our tweets collection, we have a field called retweet_count which tells us the number of times this tweet was retweeted. To sort the tweets by this field, we will do db.tweets.aggregate().
[0:11] We're going to pass in the ID of stages, where our first stage is the $sort operator. We're sorting with a retweet_count in the descending order. Now, if we run this query, you will see that the first tweet that we get has the most retweets.
[0:23] Let's say that we want to get tweets only from the verified users. We can do that by simply adding another stage called $match where we are filtering all the documents which have user.verified set to true. If we run this query, you will see that in all the documents user.verified is true.
[0:37] If you look at the results, you will see that we are getting a lot of retweets also. Let's say that we want to get only the original tweets. We'll add one more condition to our stage, searching for tweets where retweeted_status is null. Now, if we run it, our first tweet is not a retweet anymore.
[0:52] If you look at the results, we have a lot of data that is not relevant to us. Let's get rid of all of this by adding another stage called $project where we're getting tweet from the $text property, getting username from $user.screen_name, and we're getting retweet_count as it is. Now, if we run this, in our results we only have the three fields.
[1:10] Let's say that we want to get only the top five tweets. We can do that by adding another stage called $limit where we're limiting it to five only. Now, if we run this query, you will see that we're getting only the five top retweeted tweets.
[1:22] There's also a $skip operator, which we can use to skip the results in our aggregation. Let's say that we want to get only the second most retweeted tweet, which is this one. To do that, I will change my limit to one. I will add another stage to skip the top tweet and get the second one. Now, if we run this, you'll see that we're getting the second tweet, which is this one.