Pixazza makes images interactive. This means that the scale of what we do is ultimately the scale of images on the web. How big is this scale?
Facebook is the king of image hosting companies. Justin Mitchell, a Facebook engineer specializing in photos and videos, disclosed on Quora in January that Facebook had 90,000,000,000 images, and was adding 200,000,000 new ones per day. Staggering.
But it's not the raw number of images that matters at Pixazza. It is the number of images people actually see, and how often they see them that matters. Most images on the web pass quickly. A fraction of images stick around for longer. This holds not only on Facebook, but also on blogs, news sites, and the image sharing apps.
So, what is the real, practical scale of images on the web? Which images are people really interacting with, and how much?
At Pixazza, we have a window into this question. Pixazza technology is applied to images that are viewed more than 100,000,000 times per day. We need to know which images are transient and which get stable traffic, because it is the basis for many of our statistics and optimizations, such as our ad targeting.
One way we measure the persistence of images is using the concept of entropy, borrowed from thermodynamics. An image that is viewed uniformly over time has high entropy. An image that is viewed only during a short interval has low entropy. Pixazza-enabled images with high entropy can be optimized over long periods, helping us figuring out what works best for our users.
Here is the distribution of images in the Pixazza network in April, 2011 by their entropy, measured in "days":
Predictably, most of our images go whizzing past in a few days. Two or three days seems to be the typical lifetime of an internet image. But a healthy chunk stick around for longer. Many were getting traffic all month. Image entropy lets us pick those out.