This is the first in a series of blog posts where we’ll analyze how people are using Zite to discover content in new ways. For our first stab at data analysis, we’ll look at how people’s reading interests differ among different regions of the United States.
Note that a topic that is #1 for a certain state doesn’t necessarily mean that it is the most popular topic in that state. It simply means that it is the most distinctive topic for that state, i.e., that people in this state are interested in this topic in far greater numbers than people in the rest of the USA are.
We noticed some interesting information here. As you’d expect, there’s a cluster of traditionally culturally conservative topics in the Southern United States, while the East Coast tends to be interested in finance and investment information. Several topics on information security and the intelligence industry appear in the D.C. metro area, giving you a bit of a glimpse into the news interests of those working in government. The Midwest has a high concentration of beer and brewing related sections, while the Mountain West seems to prefer outdoorsy activities. Perhaps a testament to the relaxed reputation of Hawaiians, medical marijuana is one of the top interests in that state. While technology terms exist in several different hotspots in America, not just in California (as many of us Silicon Valley types would like to imagine), the top spot for the Golden State belongs to surfing. Perhaps the one thing NorCal and SoCal residents can actually agree upon! Find anything else interesting? Leave a comment below and point it out to us!
Here’s an infographic highlighting some of the more interesting aspects we found (click to expand):
Since Zite keeps track of which sections are clicked on from region to region, getting the raw data isn’t terribly hard. The question is how to transform this into an interesting, usable list that accurately reflects regional differences in America today. To start, we used a basic ratio of the number of clicks for a given section in a particular state to the number of clicks across the U.S. as a whole. We added a smoothing factor to eliminate noise.
This turned out to inappropriately bias our selection toward sections that simply have a few very active users in a particular state, so we switched the calculation to look at the data in log space.
Running this equation for every state and every possible section in Zite yielded an ordered list of the most uniquely popular sections for every state. The final results used this list, after eliminating all sections that fell below a certain cutoff score for significance, as well as removing sections that simply weren’t popular enough throughout Zite to say enough about them. A few states just didn’t have enough data for us to say anything conclusive about their usage, and some states only have one or two significant topics.