Zite is constantly trying to create a better product. Sometimes, for example with our launch of Zite 2.1, we make a lot of visible changes to the product that you, our users, have requested. But, there are also a lot of improvements to the intelligence of Zite to give you a better product. This is the story of one of those changes.
When we launched Zite 2.0 back in December, we included a new feature that let you view the most popular stories being read on Zite. This module is extremely popular and now accounts for 7 percent of total article clicks. We’ve recently made some updates to the algorithm and wanted to share some of the behind-the-scenes geekery that went into those improvements. (For additional background, we wrote a post on how Zite works.)
Popular stories were originally calculated by showing the most clicked-on stories in the last 24 hours. This led to interesting stories, but unfortunately also surfaced stories that user had thumbed down. This often caused stories about celebrity wardrobe malfunctions to show up, even though they weren’t well-liked. Zite didn’t want to manually specify topics and domains to exclude from the Popular on Zite, so instead we used the massive amount of data we have about stories and our users to improve the articles being chosen.
The second issue we faced was that the stories being displayed were often fairly old. Since we were displaying the most clicked on stories in the last 24 hours – older stories had more time to be clicked on in the app. This meant that old stories would stick around well after they still were interesting.
To get rid of the sensationalistic celebrity gossip stories appearing, we integrated the data on the thumbs up and down for each story. We tested out a couple of variations on this idea, but the one that performed best was combining the raw interest in the document with the average thumb rating for the document. By multiplying these factors together, we completely filtered out results that were universally disliked – and halved the score of ‘controversial’ items that had roughly equal measures of thumbs up and thumbs down.
To bring more recent content into this module we also tested out an age based exponential decay of the interest score. The idea here is to provide a half-life for each story. We ended up using a half-life of 6 hours, which means that a stories interest score is halved at 6 hours, and quartered at 12 hours.
Combining both the average rating and the age based decay, leads to the following formula for ranking documents in Popular on Zite:
Since document views here follow a power law distribution, we take the logarithm of the total interest before combining with the average rating. We also calculate the average rating for a story by using the lower bound of the wilson confidence interval so that we don’t overestimate the rating of documents with low thumb ratios.
We ran an A/B test on this algorithm for a period of about 3 weeks, and this ranking scheme increased click throughs by over 10 percent in this module. About 70 percent of this increase can be attributed to combining the average rating, with the rest being from promoting more recent stories.
There’s a lot that happens in Zite under the hood and we hope to post more stories like this in the future to give you an idea about what’s going on beneath the surface.