Installing MongoDB on AWS EC2 and turning on zlib compression

At this time AWS doesn’t provide an RDS type for MongoDB. So in order to have a MongoDB server on the AWS cloud, you have to install it manually on an EC2 instance.

The full documentation for installing a MongoDB instance on an AWS EC2 can be seen at: https://docs.mongodb.com/v3.0/tutorial/install-mongodb-on-amazon/. Here’s a quick summary though.

First you’ll need to create a Linux EC2 server. Once you have the server created, log in to the machine through secure shell. Drop into root shell using the following command:

sudo su

Next you’ll need to create the repository info for yum to use to download the prebuilt MongoDB packages. You’ll create a file at /etc/yum.repos.d/mongodb-org-3.0.repo:

vi /etc/yum.repos.d/mongodb-org-3.0.repo

And copy/paste the repository:

[mongodb-org-3.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1

Save and exit from vi. And type in the following command to install:

yum install -y mongodb-org

And that’s it! Now you have MongoDB installed on your EC2.

Next, to turn on compression, you’ll need to edit /etc/mongod.conf

vi /etc/mongod.conf

Scroll down to the “storage” directive, and add in this configuration:

engine: "wiredTiger"
wiredTiger:
  collectionConfig:
    blockCompressor: "zlib"

Now any collections you create will be compressed with zlib, which provides the best compression currently.

To turn on your MongoDB instance by typing in this command:

service mongod start

And of course you’ll want to custom configure your MongoDB instance (or not). You can find several guides and tutorials to do that online.

Mongoose model hasOwnProperty() workaround

When using a Mongoose model, if you have a reference to a queried document, you’ll notice that you cannot call hasOwnProperty() to check if it has a certain property. For example, if you have a User model, and the schema has a property called “verified”, if you call user.hasOwnProperty(‘verified’), it will not return true!

This is because the object you get back from your mongoose query doesn’t access properties directly.

An easy and quick workaround is like so:

var userObject = user.toObject();
if(userObject.hasOwnProperty('verified'))
  //...

And now hasOwnProperty() will return true.

Mongoose save() to update value in array

This had me scratching my head for a while until I figured it out. Let’s say you have a MongoDB model which has an array in it. If you’ve got a reference to a document which you pulled using Mongoose, and you update a value in the array, then you call document.save(), the array will not get updated!

Turns out you need to use the document.markModified(arrayName) function calling document.save().

Here’s an example. Let’s say your model looks like this:

var UserSchema = new Schema({
  //...
  emailSubscriptions: {
    type: Array,
    'default': ["all"]
  }
});
exports.model = mongoose.model('User', UserSchema);

So after you get a user and modify an array (in this case emailSubscriptions), you’ll need to mark it as modified before saving it:

User.findById(userId, function(err, user) {
  user.emailSubscriptions = ['none'];
  user.markModified('emailSubscriptions');
  user.save(function(err, user) {
    //...
  });
});

Paginating documents/items in MEAN

If you’ve ever scrolled through the Facebook newsfeed, you’ve noticed that the topmost stories are the most recent ones, and as you scroll to the bottom, older ones get loaded over and over as you keep scrolling.

This feature is firstly kind of “cool”, and fits in perfectly in a single page application. It’s also pretty useful from a performance standpoint, since not all of the documents (items your page is displaying, such as Facebook news stories, classified ads, search results, etc.) need to be loaded up front all at once when the user first lands on the page.

Paginating your documents in a MEAN application can be accomplished fairly easily, though it isn’t necessarily obvious. So I thought I’d write about the process I took, and the code I wrote, to get it done.

Let’s start with the AngularJS side of things. I used the ngInfiniteScroll module (https://github.com/sroze/ngInfiniteScroll) to accomplish the continuous scrolling effect. It’s pretty simple to configure, so please read up on the documentation. Essentially it can just be wrapped around an Angular ng-repeat directive, and be configured with a function to call to fetch more documents when the bottom of the page is reached (ngInfiniteScroll does all the calculations internally). Here is an example of what it would look like for getting more “classifieds” from the database to add them to the view:

loading-classifieds

So in the example above, the getMorePosted() function in your controller is called whenever ngInfiniteScroll detects that the user is at the bottom of the page. Note here that ngInfiniteScroll will most likely trigger right when the user lands on the page, unless you pre-load some documents in your controller. I elected getMorePosted() to fetch both the initial set of documents, and every successive set of documents as well. Depending on how you set things up, this may or may not make a difference, but it did for me.

My getMorePosted() function in the controller looks like this (note: it uses a factory called Classified to do the actual getting of classifieds from the API (Express/MongoDB on the server side of MEAN) which I’ll define later):

$scope.initialLoadDone = false;
$scope.loadingClassifieds = false;
$scope.getMorePosted = function() {
  if($scope.loadingClassifieds) return;

  $scope.loadingClassifieds = true;

  if(!$scope.initialLoadDone) {
    Classified.getPosted(function (postedClassifieds) {
      $scope.postedClassifieds = postedClassifieds;
      $scope.loadingClassifieds = false;
      $scope.initialLoadDone = true;
    });
  }
  else
  {
    Classified.getMorePosted(function(err,numberOfClassifiedsGotten) {
      $scope.loadingClassifieds = false;
      if(numberOfClassifiedsGotten==0)
        $scope.noMoreClassifieds=true;
    });
  }
}

A couple things to note here. When the classifieds are being loaded, the $scope.loadingClassifieds flag is set to true. This disables ngInfiniteScroll from attempting to keep loading more classifieds when the bottom is reached, and it can also be used to put up a message to the user that loading is underway (in case it doesn’t happen near instantly due to a slow connection). Furthermore, getMorePosted() also tracks through the $scope.noMoreClassifieds flag when the end has reached (if ever, depending on how many thousands or millions of documents are in your database, and how far down the user scrolls). It does this by measuring the number of documents returned, and if the number equals zero, it means the end of pagination has been reached.

This is how getPosted() and getMorePosted() look like in the Classified factory:

app.factory('Classified', function Classified(ClassifiedResource, ...) {
      var postedClassifieds = [];
      var postedClassifiedsLoaded = false;
      //...
      getPosted: function(callback) {
          var cb = callback || angular.noop;
          if (postedClassifiedsLoaded) {
            //console.log("Sending already-loaded postedClassifieds");
            return cb(postedClassifieds);
          } else {
            return ClassifiedResource.Posted.query(
              function(_postedClassifieds) {
                //console.log("Loading postedClassifieds from webservice");
                postedClassifieds = _postedClassifieds;
                postedClassifiedsLoaded = true;
                return cb(postedClassifieds);
              },
              function(err) {
                return cb(err);
              }).$promise;
          }
        },
        getMorePosted: function(callback) {
          var cb = callback || angular.noop;
          if (!postedClassifiedsLoaded)
            callback();
          else {
            return ClassifiedResource.Posted.query({
                startTime: new Date(postedClassifieds[postedClassifieds.length - 1].posted).getTime()
              },
              function(_postedClassifieds) {
                //console.log("Loading more postedClassifieds from webservice, from before startTime="+postedClassifieds[postedClassifieds.length-1].posted);
                for (var i = 0; i < _postedClassifieds.length; i++)
                  replaceOrInsertInArray(postedClassifieds, _postedClassifieds[i], true);
                return cb(null, _postedClassifieds.length);
              },
              function(err) {
                return cb(err);
              }).$promise;
          }
        },
        //...

And this is how ClassifiedResource looks like:

app.factory('ClassifiedResource', function ($resource) {
  return {
    Posted: $resource(
      '/api/classified/getPosted/:startTime',
      {
      },
      {
      }
    ),
}

So note that in my setup, the service loads and maintains the list of documents (postedClassifieds) within memory. And getPosted() returns that list if it is already loaded, and it also gets the first set of documents. getMorePosted() is where the magic happens. It gets the timestamp of the last classified, and transmits that to the API (server side, Express) which then loads the next “page” after for all documents (classifieds in this case) after that timestamp.

Before we continue to examine the server side, it’s important to note that you’ll need a field to sort by in a descending order (or ascending if you want you want the oldest documents up front). A timestamp value will work great. Otherwise a MongoDB ID could work too, since those are incremental. It will depend on your data. In my case, a timestamp called “posted” was available in my data, and very consistent. Documents could only be removed from before a past timestamp, but not added to in a past timestamp (even then, this wouldn’t be a huge problem). So that works just fine with this pagination approach.

Here is what the server side looks like in Express/NodeJS:

var Classified = require('./classified.model');
exports.getPosted = function(req, res) {
  var startTime = req.params.startTime ? req.params.startTime : null;

  var query = Classified.find(
      {posted: { $ne: null }}
  );
  query.sort('-posted -_id');
  query.limit(20);
  if(startTime)
    query.where({posted: {$lt: new Date().setTime(startTime)}});
  query
    .exec(function (err, classifieds) {
      if(err) { ... }
      return res.status(200).json(classifieds);
    });

}

Note that “Classified” defines my model, which is queried from using Mongoose. I limit the number of documents returned to 20, which works well for my application. And the query is sorted in descending order by the “posted” field, which is a timestamp. You’ll notice a where clause added, which gets only the classifieds posted before the time sent in (“startTime”) from the UI, so that works in conjunction with the sort and returns 20 more classifieds before the “startTime”. Also note that I send the timestamp in milliseconds, which gives a nice clean number that can be sent down to the API from the UI.

And, that’s it!

Something I want to add is that on your client side (in AngularJS) if you end up loading too many documents/items in your ng-repeat, the application performance will greatly degrade. With ngInfiniteScroll, all items on the page are always kept once they’re loaded, even if they’re not in the view currently. There’s another module: https://github.com/angular-ui/ui-scroll which will allow you to destroy and re-create items as they go in and out of the view from the user’s browser as the user scrolls through. This will vastly improve performance when a lot of documents are loaded.