Monday, October 24, 2011

Star Rating System - Part 1: Google App Engine

We have been looking for ways for our users to share their experiences playing our games.  We tried things like Game Center, HeyZay, twitter, facebook, and prompting for people to rate the app on the AppStore with moderate success.

One day, I had this "inspiration" to allow people to easily rate individual puzzles and share those ratings.  I was currently working on Kento: A Distorted Jigsaw, so I decided to just do it.

This is a very simple star rating system that allows puzzles to be rated from 1 to 5 stars.  That information is sent to a server where it is combined with other people's ratings.  The server keeps track of the total of each puzzle's rating along with the number of users who rated it.  With this information, an average can be caclulated.

I broke the idea up into 3 parts:
  1. Server to collect and share ratings (Google App Engine)
  2. iOS API to access the server
  3. iOS User Interface
This first article is going to focus on writing a simple web service with Google App Engine.  I'm not going to go into how to create the full web service.  There are lots of good examples on the App Engine site that will show how to do that.  Instead, I will focus on some of the tools and tips I've learned doing this project mostly around how to deal with the datastore and transactions.


I'm just using a free GoogleAppEngine account.  You can download a developer environment for either PC or Mac.  It also supports either Java or Python.  I really haven't done much with Python, so I decided to use Python as a good learning experience.

In the past, I have tried just using a plain text editor for python which is such a throw back to the 80s.  This time, I remembered my car pool buddy, Chris, talking about an IDE for Python called PyCharm by JetBrains.

While not my favorite IDE, PyCharm has support for GoogleAppEngine development.  This includes code completion and a DEBUGGER.  This was a huge timesaver given my limited Python skills.  Make sure you checkout their quick start guide for google app engine.  Also make sure you run the GoogleAppEngineLauncher downloaded from google before trying to setup PyCharm.  You don't need it running while using PyCharm, but the launcher will finish the GoogleAppEngine install and setup symlinks on your system needed by PyCharm.


You can configure your app to use either the High Replication Datastore (HRD) or the master/slave datastore.  I wanted to keep the application as cheap as possible, and this application doesn't require rock solid availability.  If the service goes down for maintenance for a short period of time, the end users won't notice or care.  This made the master/slave option the ideal choice.
The Master/Slave datastore uses a master-slave replication system, which asynchronously replicates data as you write it to a physical data center. Since only one data center is the master for writing at any given time, this option offers strong consistency for all reads and queries, at the cost of periods of temporary unavailability during data center issues or planned downtime. This option also offers the lowest storage and CPU costs for storing data. 
You should also be aware that the App Engine datastore's are not SQL databases.

Entity Group

One of the tricky things with Master/Slave transactions is that a transaction can't cross entity groups.  In other words, each table involved in the transaction must be part of the same entity group.  Well, that leads to an interesting question of how to put tables into the same entity group.

I made sure that the tables involved in the transaction had the same parent (they where siblings).  That was enough to put them into the same entity group.

Transaction Errors

If a transaction fails, any changes will be rolled back.  However, just because a transaction throws an exception doesn't mean it has failed.
If your app receives an exception when submitting a transaction, it does not always mean that the transaction failed. You can receiveTimeoutTransactionFailedError, or InternalError exceptions in cases where transactions have been committed and eventually will be applied successfully. Whenever possible, make your datastore transactions idempotent so that if you repeat a transaction, the end result will be the same.
When writing this service, it was important that if you run the same transaction twice the "right" thing will happen.  This means that we need to keep the average data accurate even if the same rating is applied multiple times.

The Tables
I used 3 datastore tables for this project.

The AppPuzzleDB table is used to represent each puzzle in a game.  Since I want to be able to use this for multiple games there is a gameId unique to each game.  I also tend to package groups of puzzles together so there is a packageId to represent a grouping of puzzles.  Finally, there is a puzzleId itself which uniquely identifies a single puzzle.
class AppPuzzleDB (db.Model):
    gameId    = db.StringProperty(indexed=True)
    packageId = db.StringProperty(indexed=True)
    puzzleId  = db.StringProperty(indexed=True)
The server itself doesn't calculate the average.  It just holds the data necessary to calculate an average.  The RatingAverage table keeps the current running total and number of ratings (count) for each AppPuzzleDB.
class AppPuzzleRatingAverageDB (db.Model):
    gameId          = db.StringProperty(indexed=True)
    packageId       = db.StringProperty(indexed=True)
    puzzleId        = db.StringProperty(indexed=True)
    puzzleItem      = db.ReferenceProperty(AppPuzzleDB)
    ratingCount     = db.IntegerProperty()
    ratingTotal     = db.FloatProperty()
I know it is a waste of storage and bad design to repeat the gameId, packageId, and puzzleId within this table.  However, doing so reduces the number of quires I have to perform and it just makes it easier to access this information.  I know poor excuses, but I get to be lazy sometimes!

The last table, AppPuzzleRatingDB, is the individual ratings supplied by each user.  We keep this data around for awhile so that if the user re-rates the puzzle we can back out their old rating and apply their new rating.
class AppPuzzleRatingDB (db.Model):
    puzzleRatingAverage = db.ReferenceProperty(AppPuzzleRatingAverageDB)
    userId              = db.StringProperty(indexed=True)
    rating              = db.FloatProperty()
    dateCreated         = db.DateTimeProperty(auto_now_add=True)
This design allows for really fast retrieval of the current averages, we don't need to do a bunch of lookups and summations.  It also allows us to purge the individual user ratings in the event we need to free up space.  In addition, this is what allows us to handle the same request being sent multiple times to the server.  We use this table to back out any old values before applying the new ones.

Making keys

In order to do fast lookups of the tables, I used custom keys. For example, the follow code snibbit creates a key for the AppPuzzleDB table:
def appPuzzleKey(gameId, packageId, puzzleId):
    keyName = "puzzle(" + gameId + "_" + packageId + "_" + puzzleId + ")"
    return db.Key.from_path( "AppPuzzleDB", keyName, parent=None )
It gets a bit more interesting for the child tables. You can see below that the AppPuzzleRatingAverageDB uses the puzzle key to refer back to its parent.
def appPuzzleRatingAverageKey(puzzleKey):
    keyName = "puzzleRatingAverage(" + + ")"
    return db.Key.from_path( "AppPuzzleRatingAverageDB", keyName, parent=puzzleKey )
Finally, the AppPuzzleRatingDB also has to take into account a userId as each user can have their own rating per puzzle.
def appPuzzleRatingKey(puzzleKey, userId):
    keyName = "puzzleRating(" + userId + "_" + + ")"
    return db.Key.from_path( "AppPuzzleRatingDB", keyName, parent=puzzleKey )

Running a Transaction

In order to run a transaction, just call run_in_transaction with a function reference.  For example:
puzzleItem = db.run_in_transaction( transactionGetAppPuzzleDBItem, gameId, packageId, puzzleId )
As a convention we prefix transactionGetAppPuzzleDBItem with the name transaction to indicate that we call this function in the context of a transaction.  We do this in a transaction so that when we create a new AppPuzzleDB entry, we can also create a corresponding AppPuzzleRatingAverageDB that is initialized correctly and ready to go for when we add items.
def transactionGetAppPuzzleDBItem(gameId, packageId, puzzleId):
    puzzleKey  = appPuzzleKey( gameId, packageId, puzzleId )
    puzzleItem = AppPuzzleDB.get( puzzleKey )

    if puzzleItem is None:
        puzzleItem = AppPuzzleDB(,
                                  puzzleId=puzzleId )

        # Create a new AppPuzzleRatingAverageDB to go along with the puzzleItem
        puzzleRatingAverageKey  = appPuzzleRatingAverageKey( puzzleItem.key() )
        puzzleRatingAverageItem = AppPuzzleRatingAverageDB( key_name =,
                                                            parent = puzzleItem.key(),
                                                            gameId = gameId,
                                                            packageId = packageId,
                                                            puzzleId = puzzleId,
                                                            ratingCount = 0,
                                                            ratingTotal = 0.0 )

    return puzzleItem


The rating system had been live for a short awhile, and we got featured again by FreeAppMagic.  This time we ended up being number #1 in our category on the UK store.  We got a big influx of downloads and users.  Checking the status showed that we had reached 70% usage of our cpu allocation.

The server was using a Django template to form XML to send back to the client, and it was doing this every time a client asked for the average data.  Fortunately, the client was written to only ask for this information once a day.  However, even with that it was taking a good amount of CPU time to do this work.

To decrease CPU usage, we modified the server to cache the generated xml in a memcache that expires every 6 hours.  That way we can use the cached value without having to regenerate the xml.
class AppPuzzleRatingAveragePackageXML(webapp.RequestHandler):
    def get(self):
        gameId    = self.request.get('gameId')
        packageId = self.request.get('packageId')

        # See if we already have the XML document cached
        # If we do, this should save a ton of CPU time as we won't have to keep regenerating the XML document
        xmlRatingKey = appPuzzleRatingAverageXMLMemCacheKeyName( gameId, packageId )
        xmlRating = memcache.get( xmlRatingKey )

        # If not, then we need to generate a new one
        if xmlRating is None:
            puzzleRatingAverageQuery = db.GqlQuery( "SELECT * FROM AppPuzzleRatingAverageDB where gameId = :1 and packageId = :2", gameId, packageId )

            template_values = {
                'gameId' : gameId,
                'packageId' : packageId,
                'puzzleRatingAverageQuery' : puzzleRatingAverageQuery,

            oneHour = 3600     # 60 * 60 = 3600
            path = os.path.join(os.path.dirname(__file__), "AppPuzzleRatingAveragePackage.xml" )
            xmlRating = template.render(path, template_values)
            memcache.add(key=xmlRatingKey, value=xmlRating, time=oneHour*6)

        self.response.headers["Content-Type"] = "text/xml"
        self.response.out.write( xmlRating )

This did have a significant effect on the overall CPU usage.

Wrap Up

It took about a day to get this implemented on the server side, and we have had over 4,000 ratings in just a few weeks.  Do you think a service like this is worthwhile?  Also be sure to check out Kento and our other iOS apps.


1 comment:

  1. Very interesting!! Looking forward to the next parts! Thanks for sharing. Intended to build something very similar!

    -- Umberto