Can you recognize these objects?
What about these?
This seems pretty easy -- the first set of images is a bolt and a screw, and the second a graprefruit and an orange. It probably took you less than a second to figure this out, but this simple task could stump a computer. Why?
Turns out, “learning to see” is surprisingly much more complex to do than learning to read or solve mathematical equations. It seems easy because humans are born with several unique tools that help us see.
- The human eye acts more like a video camera rather than a camera, so we have multiple “snapshots” of any object.
- Having two eyes lets us see objects in stereo, giving us 3D perception.
- Hands that let us manipulate objects so we can see how the color, shadows, texture, and contours change in different angles and under different lighting conditions.
Since computers are presented with static photos to help them learn, they need hundreds (or for really complex objects, like lettuce, millions) of images from all angles to start recognizing "simple" didifferences like oranges versus grapefruits and screws versus bolts. In fact, objects that have been photographed a lot, like faces and popular tourist destinations, are precisely the images that computers have gotten pretty good at recognizing.
Computer vision still has a long way to go, but advances are being made every day. We’re proud to use this technology in our apps, but we also know that it’s not perfect yet, which is why we supplement it with human computing power, and give our audience the best possible experience.