If you’re already a Zite user, you’ve experienced the
delivery of personalized content that is updated every time you open
the app. To make that transparent and easy for you, takes a lot of
effort. The Zite team brings together decades of software development
in artificial intelligence, machine learning and natural language
technologies, and more than six years of product development,
to blend and tune the experience for you. In short, Zite works by:
- mining content from your social web
- modeling that content
- modeling the community that interacts with it
- modeling your interests
- matching your interests to the content and your community, to help you discover content you’ll want to see.
Here’s a technical description, a look under the covers
for those of you who are interested in the complex technology
behind Zite.
 |
| Graphic courtesy of DDO |
Finding "What's interesting"
There are tens of billions of web pages out there and more
than two million terabytes of text, images and more are created
every hour. So, where in this deluge does Zite start looking for
what’s interesting
to you? Zite observes what’s
happening around the social web, because the community, in aggregate,
creates a strong signal for what’s interesting. User-generated
content, sharing, commenting and bookmarking have overtaken email
and web pages in sheer volume of data created and total time spent
online – eMarketer expects 115 million people in the U.S. to be
creating content by 2013. What’s important is either happening
on, or reported through, social media. What’s more, mining
the social web makes it possible to personalize content
at the
moment you start using Zite for the first time
.
To take advantage of the social web in order to find and choose
great content for you, Zite:
- Monitors URLs that are shared
through a wide range of social streams that you choose to connect to Zite, such
as Twitter and delicious, to begin to tell Zite about your interests and focus.
- Throws out spam using
adaptive pattern matching heuristics and other techniques.
- Associates each URL with the
user who shares them and calculates the credibility of each of those
users—because a URL from someone who has a lot of followers or is often
re-tweeted, for example, is usually more credible.
- Combines the credibility
scores of all the users who share a particular URL to calculate an overall
quality score for that URL.
- Carries forward URLs with
scores above a certain threshold as potential content to show, depending on
later calculations.
The result is millions of new and vetted URLs put into the Zite
pipeline every day.
Modeling content
Each vetted URL points to
text and graphics that Zite could potentially show you, but it takes a lot more
processing to find out what’s worth your time. So, Zite:
- Strips out all the
extraneous, non-readable content at a URL. This includes HTML formatting, file
“includes,” scripting code, whatever. That’s all removed via syntactic
analysis, leaving a document that a machine can analyze for its content and one
that you can read (if Zite figures it’s worthwhile).
- Analyzes each document via
text mining and term extraction techniques, inferring the terms that succinctly
capture and summarize what the content is about.
- Parses out the places, names
and dates via entity extraction techniques.
- Characterizes the writing
style, patterns of speech, and the length of sentences, phrases and words, all
via semantic classifiers.
- Lastly, collects metadata
such as the author’s name, modifiers from user-added tags and comments, Twitter hash-tags, etc.
All these features—terms,
entities, styles, metadata—define a model of what’s in a document, and they are
carried forward with the document itself.
Modeling community
The aggregated habits and interests of a community of users can
provide valuable recommendations for its members. You’ve
likely experienced this via collaborative filtering
from
Amazon or Netflix. The heuristics correlate the habits of many
users who are like you, in order to help derive what
you
will find relevant. Using a similar technique, Zite:
- Correlates relationships
across millions of users and billions of documents, based on vetted data that
Zite has captured from the social web. This creates a huge matrix of
document-user relationships, derived from both Zite users and external data.
- Condenses these
relationships into a few hundred features that characterize each user and each
document. Later on, these features become the basis for matching each incoming
document to your individual interests.
The process of condensing
tends to “blur” the data a bit, and this is a good thing—it enables Zite to
show you documents that are a little outside your direct interests, adding an
element of serendipity and helping you to discover new things.
Modeling you
The more your friends and
colleagues learn about you, the more enjoyable your conversations become. Zite
works the same way—the more you interact with it, the smarter it gets about
you, so the better it works at bringing you “what’s interesting”. To do this,
Zite:
- Tracks the specific topics
you say you’re interested in and lets you create a Section in your Zite app for
each one.
- Quietly watches what you
read and don’t read, and uses machine learning to infer your degree of interest
in each document.
- Asks for feedback in the
form of thumbs-up / thumbs-down ratings as well as labeled click-boxes so you
can ask for more stories from specific sources, specific authors, or on
specific topics. These could be popular sites or lesser-known blogs, news items
or editorials, and so on.
So, let’s say you
“thumbs-up” multiple stories about upcoming political elections. Zite will show
you more stories about that. Or, if you repeatedly “thumbs-down” certain
stories on the same general topic, Zite will develop a rule to stop showing you
similar ones. But how does Zite know what “similar” means? Why do you like or
dislike a particular story? Is it because it’s about foreign policy, or written
by a specific author, or about a fringe candidate? (You might not even realize
why yourself.) Automatically figuring that out, without pestering you to answer
a lot of questions, isn’t easy. Zite uses the hundreds of features in its
models of content, community, and you, to find the fine-grained patterns in
your ratings that represent your preferences. This way, it can correctly
reflect your interest by what it shows you, without too much effort on your
part.
In short, Zite gets better
every time you use it, just by using it. And the more you tell Zite what you
like and dislike, the more accurate its choices become.
(Note: Although Zite builds
a model of your interests, your name and email address are never shared or
sold. Your usage data is used internally by Zite only to get you “what’s
interesting” specifically for you. We do share some usage data with our
partners, but only when aggregated with other users—no one ever sees your
individual data on its own.)
Matching "What's interesting" to your interests
Zite now has everything it
needs to narrow down the daily deluge of content into focused, personalized,
and up-to-date stories. To do this, Zite:
- Looks at the incoming stream
of new documents since you last opened Zite, and keeps the ones that match your
Zite Sections, sorting them by the quality score.
- Makes a fine-grained
comparison of the highest-scored documents to you and your interests, using the
hundreds of features calculated for each document. This yields a
content-matching score for how
closely a story fits your interests.
- Factors the age of a story
into its score. As a story get older, it often becomes less interesting and so
Zite lowers its score proportionally.
- Applies your block source
input to eliminate sources you don’t want to see.
- Sorts the stories according
to their scores with the most relevant first.
- Lastly, Zite flows these
stories onto the screen of your iPad or iPhone, populating each Section
according to topic, and using the best of those to populate your Top Stories.
Delivering your slice of the Zeitgeist
So that’s how Zite blends advanced technologies
to create a unique and powerful experience on your iPad
or iPhone. We’re planning to keep pushing the
technology and user experience, so stay connected by
signing up for
our blog feed. And let us know what you think of Zite and make
suggestions by commenting on this post.