I’ve recently been spending a great deal of time with Apple’s iPhoto and Aperture photo organization and editing software, especially playing around with the ‘Faces‘ facial recognition feature. I’ve tagged lots of people across tens of thousands of photos, and I’ve been really impressed with the results, but there’s a lot of room for improvement. From a software consultancy standpoint, I believe that the usability and marketability of these products’ features could be greatly improved if my recommendations were to be implemented. As Apple, you could harp on about how nifty Faces and Places are as-is, but unless the underlying logic that identifies said metadata is fairly non-inferential, it’s going to frustrate users that have thousands of photos (and as everything gets a camera slapped onto it, all users will have thousands of photos). Frustrated or unimpressed users are never good for winning software markets. Let me share some examples of how that underlying logic could be improved (or made more ‘inferential‘):
iPhoto and Aperture in particular use a database called SQLite to index all the photos and their metadata (faces, places, date taken, camera model, etc…). Furthermore, these databases are ‘relational,’ meaning that each entry is ‘related’ to the other in intricate ways, allowing complex groupings of photos according to their various properties. This database needs to be overhauled considerably. Developers must build capabilities into the database that would allow the manipulation of an even larger set of metadata to make better inferences about the faces and places of photos. This could make identifying these properties a hundred times faster. Let me illustrate what I mean:
If the software databases had logic that identified contact information metadata such as “School Attended” or “Married to…,” it could make inferences that would radically speed up the face and place recognition process. For example, if one of my contacts attended UCLA, the database should be able to assign a very low probability of him showing up in any of my photos from the “Berkeley” album (or tagged with Berkeley geodata). It would then eliminate that friend as a possible face when running the facial recognition process.
In another example, if the database knows that “Bob” is dating “Lisa,” then it should ascribe a very high probability to them being together in a photo. When a photo includes a face that the software recognizes as possibly belonging to Bob, it should increase the probability factor for the match if it also knows that Lisa is in that photo (or a close match to Lisa).
Incorporating these critical pieces of metadata is the next big step this type of software has to make. The good news is that it shouldn’t be all that difficult to implement these features. There are two links in this data chain that need to be strengthened: contact information and social networks.
The first step is to greatly tighten integration with the Address Book and its contacts. It should be able to pull in all metadata from a contact entry in the Address Book database. Second, to ensure that those contacts are heavily populated with useful metadata of the aforementioned kinds, the Address Book should have much tighter integration with social networks such as Facebook. In an age where users so willingly volunteer personal information to these services anyway, why would anyone go through the laborious process of manually entering in data like who someone is dating or where they went to school (especially if you have hundreds of contacts)?
In summary, you allow the Address Book to sync with your Facebook (or other service) account, download all the metadata, then allow the photo software to sync with that now-useful Address Book database. In the final step, the softwares’ database logic is beefed up to make inferences using all that new metadata. All of a sudden, users are able to tag faces and places at a speed previously unthought of before. Remember: users will only find ‘useful features’ useful if they’re not too time-consuming. It’s really not practical to sit there and slowly tag every face in all your photos. With the help of some artificial intelligence, things can be sped up quite a bit.
Lastly, I wanted to discuss the corollary here to cloud computing. We’re already seeing some of the best online photo services taking advantage of the processing power of the cloud to run their facial recognition features, such as Facebook’s photo app. I believe, more than the desktop software offerings discussed above, that these cloud-based offerings might be more successful in winning over users, because, ultimately, they have direct access to all the useful metadata I mentioned. They’ve already built out their address book and populated it with useful metadata (or rather, their users have done so for them). If you’re a service like Facebook, you just need to build some more inquisitive logic into your facial recognition algorithm that takes advantage of that metadata and you’ve one-upped most of the desktop photo competition (at least in terms of organization, not editing).