When photographer Chase Jarvis coined the famous saying, “the best camera is the one you have with you,” he was revealing an unspoken truth: Even professionals carried point-and-shoot cameras despite owning DSLRs and dedicated video cameras. His message was that great photographers create compelling images with whatever they have on hand, but the sentiment wound up setting the stage for a massive disruption of traditional imaging — one that saw famed portrait photographer Annie Leibovitz embrace Google’s Pixel still cameras and filmmaker Steven Soderbergh start shooting movies with iPhones.
2020 will be remembered for many negative reasons, but it should also be marked as the year when technology caught up with and redefined Jarvis’s saying. Thanks in large part to improved sensors and the neural cores in mobile processors made by Qualcomm and Apple, this was the year when standalone photo and video cameras were surpassed by smartphones in important ways, such that “the one you have with you” will now actually be either your best or most capable camera. Unlike single-purpose cameras, the latest smartphones now create 3D scans of objects and rooms, AI-optimized images, and cinema-quality Dolby Vision HDR videos that even professional cameras can’t replicate. As improbable as this might once have seemed, iPhones and Android phones are now changing the face of imaging for businesses and consumers.
We’re now at the beginning of a new age for content recording, where conventional photography and videography will evolve as professional 2D production tools are democratized, and new tools will made creating 3D content effortless for augmented and mixed reality purposes. Machine learning is at the core of this change: In the literal blink of an eye, trained AI running on a neural core can now sew the best parts of 10 photographs into one idealized image, or guarantee that a video’s lights, shadows, and colors will look truer to life than ever before. It remains to be seen whether people will change the ways they use cameras and consume what others have captured, or whether everyone will just fall back to old norms, but after a particularly dark year in history, the future of imaging looks brighter than ever.
To provide some hands-on insight into what’s changing, I’ve spent three weeks with Apple’s iPhone 12 Pro and a weekend with the iPhone 12 Pro Max, devices that are capable of most but not all of the aforementioned innovations, thanks to their 16-core neural processors and updated camera sensors. Here are my big picture thoughts on where we stand today and where we’re going.
3D capture, Lidar, and digital twins
Last year, select Samsung Galaxy phones began to include 3D depth-scanning cameras, an exciting hardware feature with limited software support. A Samsung app called 3D Scanner lets a person move around a real object (such as a stuffed animal) to scan it into a 3D model, then translates a moving person’s 3D depth data to animate the model’s limbs. The concept wasn’t super practical, especially since it required the person’s movements to be recorded on the back side of the device, such that the phone’s user would have to find someone else to “perform.” Samsung pulled the hardware from this year’s Note20 phones, reportedly after seeing that the Sony-developed Lidar 3D scanner found in Apple’s latest iPad Pros was on an entirely different level.
Now Apple has added Lidar scanning to the iPhone 12 Pro and Pro Max, and though Apple itself has done too little to showcase the technology, Christopher Heinrich’s app Polycam impressively demonstrates what 3D scanning makes possible. Using Lidar and AI, a pocketable device can now effortlessly create 3D scans of both objects and entire environments — say, a room or multiple rooms in a house — that can later be viewed and moved through in three dimensions. As the images at the top of this article depict, you simply press a record button, then walk around the object or space as the screen turns from blue to white to indicate successful capture. Using Lidar for depth measurement and Apple’s processor for rapid recalculations, Polycam’s triangular mesh is refined in real time to more properly reflect contours in nearby and far away objects, while the camera fills texturing gaps as you move. A minute after you stop recording, you’ll have a complete and amazingly accurate 3D model that can be saved and shared.
The end results of this amazing combination of computational and imaging technologies are what today are known as “Digital Twins” — 3D models of real spaces and objects that can be explored from any angle, and are expected to spread everywhere over the next 5-10 years. These true 3D captures contrast profoundly with the work of Lytro, which spent the last decade trying to convince people that the future of photography was in refocusable still images, but never gained traction with businesses or consumers. Instead, phones are now creating photorealistic scans of entire rooms that can be explored from any angle.
Imagine the possibilities for truly 3D photography: You could create a 3D version of your home or office and use it to remotely control lights, appliances, TVs, or computers from afar, use it as a virtual gathering place for socially distanced meetings, or turn your favorite real world space into the backdrop for a game. Further 3D scanning applications for both augmented reality and human avatar creation are already in the process of being tested all over the world. We’re just beginning to see where this new 3D photographic tech will take us, and the fact that it will be in tens of millions of devices within a very short period of time hopefully means that it won’t just be ignored by developers.
Big sensors and next-generation computational photography
When Qualcomm announced the Snapdragon 865 last year, one of the most eye-catching selling points was support for big camera sensors — 100 and 200 megapixels, compared with the 12 megapixels commonly found on iPhone cameras. Qualcomm and its partners predicted a rise in super-zoom level detail, such that a phone camera would capture the same amount of detail with 1X zoom as a larger standalone camera might gather with a 10x or 20x zoom lens. Xiaomi and Samsung subsequently shipped 108-megapixel cameraphones, but the 200-megapixel sensor hasn’t yet made its way into a device.
These high-resolution sensors are enabling smartphone cameras to compete with the most expensive DSLR standalone cameras on pixel quantity. But improving pixel quality matters, too, and that’s the direction Apple took this year.
The iPhone 12 Pro Max features the physically largest camera system yet in an iPhone, not because there are more pixels being captured, but because larger pixels enable the same-sized image to be more true to life than before. Instead of increasing the camera’s resolution from 12 megapixels, Apple increased each pixel’s size to 1.7 microns, compared with the 0.8-micron size of Samsung’s 108-megapixel sensor. In other words, four Samsung pixels can fit in the same square space occupied by one of Apple’s — a difference that lets Samsung increase detail, while Apple works to improve light sensitivity and color accuracy.
In Apple’s flagship iPhone 12 Pro Max, this enables 2D images to look even cleaner than they did before — virtually indistinguishable from DSLR quality under common lighting conditions, and with noticeably less noise and grain than rival solutions (or older iPhones) in low light. During my testing, I was impressed by the increasing ratio of usable, clean photos to blurry ones, which was the ultimate result of combining bigger lenses, better light gathering, and automatic AI-assisted selection of the best image snapped during the split-second exposure process.
But even the smaller iPhone 12 Pro, which has doesn’t include the big camera system, benefits from a number of improvements, particularly on the software side. Apple has introduced Smart HDR 3, which uses machine learning and image segmentation to determine proper exposures for separate parts of an image, such as skies and landscapes, across multiple instantly-shot but different exposures of the same image. It also uses Deep Fusion to composite the sharpest details from those exposures into the final photograph.
Very little of this is possible on traditional cameras, and some professional photographers might argue that the physics-defying “mad science” of computational photography is cheating, particularly as it becomes leagues smarter and faster than the humans whose photographs trained it. Google, Qualcomm, and Apple seem to be wholly unconcerned, as the captures their products produce continue to improve in perceptible quality every year while conventional cameras remain largely stuck where they were five years ago. Apple is even catering to professionals now with a new image format called ProRAW, enabling granular tweaking of augmented RAW images shot by iPhone 12 Pro and Pro Max cameras, including the ability to turn off or adjust AI-based adjustments.
The end result is clear: Whether you choose an Android phone or iPhone, the need for huge standalone cameras is evaporating. And the concept of conventional 2D photography, where a photographer tries to make the most of whatever the camera’s shutter happens to expose during a single click, is rapidly becoming obsolete.
4K video, now with Dolby Vision HDR
When Qualcomm showed off the Snapdragon 865 last year, it claimed the processor was the first to live-capture Dolby Vision 4K HDR footage — video with professional cinema-quality color, brightness, and shadow detail. But as is Qualcomm’s burden as a chip supplier, truly being “first” required a partner company to manufacture a smartphone with Snapdragon 865, the right camera, and the right software. That hasn’t yet happened in 2020.
As it turned out, Apple actually was first to commercialize Dolby Vision-recording cameraphones using a completely different chip: A14 Bionic. In October, Apple proclaimed that the iPhone 12 Pro and Pro Max were the “first camera[s] ever to record in Dolby Vision,” capable of not only 10-bit HDR recording, but also live editing and simple streaming to Apple TV 4K devices and smart TVs with AirPlay 2 support. Creating Dolby Vision previously was a post-production process that required standalone computers, but the A14 Bionic’s speed enables it to happen as the video is being filmed, at up to 60 frames per second.
Apple’s implementation is far from perfect. Videos created with Dolby Vision are color-accurate on iPhone screens, but don’t seem to look quite right when uploaded to third-party video services or displayed on some television sets, as the required Dolby Vision software version is newer than what’s included with most HDTVs. Trying to share videos on Instagram, for instance, will result in colors that look either faded or blotchy, rather than as saturated or richly detailed as intended. Until that gets sorted out, some — perhaps even most — iPhone 12 users may prefer to record without Dolby Vision HDR.
Granted, the average person mightn’t need or even care about creating home movies that look as good as Hollywood blockbusters. Under some circumstances, the differences won’t be noticeable. But in situations where there are extreme variations between light and dark within a single frame — bright highlights, dark shadows — or tons of colors that might otherwise be rendered as blotchy, with inaccurate, unrealistic details — HDR cameras will offer superior nuance, and raise the floor for smartphone videography. The fact that they’re already doing it at high resolutions and frame rates bodes well for future cameraphones, as well as support across multiple televisions.
AI as a quiet enabler, and resulting camera accessory evolution
It’s hard to overstate AI’s impact on photography. Many of the innovations above are being made possible by neural cores in smartphone chips, powered by machine learning that enables the cameras to aggressively recognize (or “segment”) elements in photos, videos, and 3D scans so that the end user output looks optimal. Even common accessories such as selfie sticks are gaining AI powers: Zhiyun’s Smooth XS gimbal uses a combination of motors and AI software to automatically change the position of a smartphone’s video or photo camera based on the movement of tracked subjects, going beyond the capabilities of DJI’s similar Osmo Mobile 3 at a much lower price.
On the other hand, conventional photography accessories — such as Moment’s series of excellent add-on lenses and their supporting software — are still in the process of being rejiggered to accommodate the latest changes to Apple’s devices. The substantial-feeling metal and glass lenses now have to contend with the iPhone 12 Pro’s and Max’s now separate camera systems, which required the company to produce both new cases, shown below, and “drop in” lens mounts that are still in production. They’re expected to be released in December.
Interestingly, Apple’s decision to use a Lidar-based autofocus system for the new iPhones — a feature that is designed to help a conventional camera focus faster and more accurately by augmenting its pixel-based perceptive abilities with depth understanding — could also present challenges for add-on lens makers. Obstructing the Lidar scanner with a lens could stop the autofocus from working properly, requiring Moment’s software to disable Lidar and use an alternative means of focusing.
It remains to be seen how some of these small issues will be resolved over the next year, as well as what larger improvements we should expect to see from the next generations of Android phones and iPhones. We’ll likely get a preview of some of 2021’s technologies at Qualcomm’s next Tech Summit in early December, though we’ll then have to wait and see which companies actually adopt the latest Android technologies over the next year. However it shakes out, I’m excited to see what future chips and camera sensors will make possible, and hope you’ll follow along for all the news as it breaks heading into the next year.
Best practices for a successful AI Center of Excellence:
A guide for both CoEs and business units Access here