Generative AI is the new camera
We can initiate AI with text prompts or by clicking a camera shutter. Why is the output of only one of them eligible for copyright?
I’m taking a break from my tour of copyright history as I head to Albuquerque to give the keynote speech about how to save the world with digital twins at the New Mexico Technology Summit.
Before I go, I’d like to share one recent development that illustrates how rapidly generative AI is influencing our thinking about copyright law even as it challenges some long-established norms.
Below you’ll find four seemingly unrelated items.
Next week, Apple will introduce a new generation of iPhones.
My interview with IP attorney Amir Ghavi on The Futurists podcast.
The Copyright Office is seeking public comment about Generative AI.
Om Malik wrote a dire prognosis for manufacturers of traditional cameras.
In this article, I will attempt to show how, together, all four point to intriguing possibilities.
Another new iPhone. Again.
Next week, in what will probably rank as the most under-anticipated product launch of the year, Apple will reveal the newest generation of iPhones. The company performs this perfunctory unveiling ritual every September, like clockwork.
Ten years ago, this news would have stirred up some excitement, possibly even a frisson of eager anticipation. In the early years of the smartphone, some fans waited outside of the AT&T store overnight to be the first to get their hands on the newest model.
Today we can’t be bothered to watch Tim Cook go through the motions of channeling Steve Jobs.
iPhone sales still account for 52% of Apple’s revenue. The form factor of the handset has not changed significantly in a decade, yet Apple always manages to persuade plenty of people to upgrade.
How do they manage to do that? For many users, it’s all about the camera.
According to rumors, the camera on the high end iPhones that will be revealed next week will have 5X zoom capability, a sizable increase over the current 3X zoom.
Full disclosure: I am probably going to upgrade to the new iPhone Max Pro for the camera. (Go ahead, call me a sucker.).
Snark aside, I still think that there is room for Apple, or another smartphone maker, to astonish us. And I love to be astounded.
But the surprises are more likely be found in the software, not the hardware.
My discussion with an IP attorney about Generative AI
I was reminded of the iPhone when I interviewed Amir Ghavi two weeks ago for The Futurists podcast that I host with my friend Brett King.
Amir is an IP attorney and a self-proclaimed “copyright nerd.” He currently is defending some of the leading generative AI companies in copyright infringement lawsuits.
On the podcast, Amir was even-handed and did not delve into the specifics of his clients or cases. Like any good attorney, he can see both sides of an issue. He presented a useful assessment of the current landscape of copyright issues that pertain to generative AI. It’s a good interview, and I hope you will listen to it here or on any podcasting platform.
Near the very end of the interview, at 0:54, I asked Amir whether artwork or written text that is generated by AI is eligible for copyright.
I expected him to refer to the guidance that was issued by the US Copyright Office in March which clarified that copyright is for human authors only, not for machines.
Background: On March 16, 2023, the US Copyright Office issued a new policy titled “Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence”.
That document sets forth the reasoning behind the decision to deny copyright to owners of works that were generated by an AI. The copyright office established a guideline that authors who register a work for copyright must disclose whether artificial intelligence was used to create it.
According to the policy, any part of that work that was generated by an AI will not be eligible for copyright. That’s because copyright is only for human authors, not machines or animals (yes, someone tried to obtain a copyright for work created by a monkey).
But Amir went considerably further than I anticipated. He pointed out that there is lively debate about the Copyright Office’s current standard of human authorship, and whether there is even textual support for that rule in the statutes.
After making that observation, he veered in an unexpected direction.
Amir mentioned the camera on the smartphone. He asked me to consider the difference between a photo taken by a smartphone camera and a text prompt for an image generated by AI.
Is there really any difference?
My gut instinct was “Of course, there is a big difference!” but that is not where Amir was heading.
Instead he wanted to make an entirely different point: artificial intelligence already comprises a significant part of the digital camera in your phone. When you click on the button to release the (virtual) shutter, AI instantly takes over the process.
The upshot? Clicking on your iPhone camera is the equivalent of writing a prompt for generative AI. In both cases, you do a modicum of original work, and the machien does the heavy lifting. You can get a copyright for the iPhone photo, but you cannot currently get a copyright for work generated by an AI system in the cloud.
At some point, probably soon, generative AI may be seamlessly integrated into your smartphone camera. When that happens, there will no longer be as much distinction between a camera and generative AI.
Generative AI is the new camera.
Does that statement seem right to you? Or does that seem preposterous?
A few months ago, I would have said that idea is preposterous. Now I am beginning to change my mind. I think that AI might be the new camera.
Lately I have been reading a lot (and on The Futurists podcast, hearing a lot) from folks like Amir who make the argument that generative AI is like a camera, at least in terms of copyright law.
Now the government seems to be listening to these argument.
The Copyright Office has now invited the public to submit comments about generative AI: it seems they are trying to gauge the broader sentiment in an effort to tune the March ruling.
Why do IP attorneys make the comparison to a camera? Because when the camera was first invented in the 19th century, the prevailing opinion at the time was that it was not a truly creative tool. All the photographer did was point the camera and click.
The conventional wisdom at the time was that the camera did all the work to create the image.
There’s also an element of chance or unpredictability in photography. There are a lot of factors outside the photographer’s control: the layout of the scene or landscape, the angle of the sun, the time of day, the season, a passing cloud might cast a shadow, a person in the shot might turn away, any stray element might come between the lens and the image. These random factors were thought to affect the final product by lessening the control of the human author. This was an argument that photography was not eligible for copyright.
Today we see it differently. We understand that there are countless nuances in photography, each of which is available to the photographer as an artistic choice at the time that she decides to press the shutter on her camera: beginning with the selection of camera (and sensor if digital), the lens, exposure, focal length, framing, mise en scene, composition, contrast, colors, filters, not to mention arranging the elements contained in a particular shot or directing the people in a shot to act or pose a certain way.
Those are all artistic choices made by the photographer. They have a big influence over the final result. That is more than enough to satisfy the "modicum of originality" which is required to obtain a copyright.
Lo and behold, today a photograph is eligible for copyright.
The prompt is the creative act in GenAI
Something similar to the debate about photography seems to be afoot with generative AI.
Proponents of GenAI have maintained from the very beginning that act of writing the prompt that drives the AI represents the modicum of originality that is required to qualify for copyright (which, by the way, is a very low bar).
Until recently, I was skeptical of this view, because in my understanding the user had zero influence over how the AI would interpret those commands. And in my personal experience, the output was unpredictable and uncontrollable.
You cannot reliably “steer” the AI towards a desired outcome. And you don’t always get the same result twice from a single prompt.
The Copyright Office seemed to be on the same page. Rejecting an attempt to register copyright for AI-generated artwork, the Office opined, “Rather than a tool that Ms. Kashtanova controlled and guided to reach her desired image, Midjourney generates images in an unpredictable way.”
But "predictability of output" is quite likely the wrong standard. As copyright nerds have pointed out, action painters like Jackson Pollock had little control over their output. Pollack most certainly couldn't predict the ultimate result when he began a painting, since the paint would splatter and blend and run and blur on the canvas beyond the artist's control. And yet Pollack and his peers of the 1950s had no difficulty obtaining copyright for these random works of art.
As explained in this article, the copyright office ought to be focused on the "originality of the input", not the "predictability of the output". Now, the author of that article happens to be the attorney who is representing Kristina Kashtanova, the artist who attempted to register the work generated by Midjourney. He is still pushing for the copyright. Not exactly unbiased testimony.
He makes an excellent point. The input of the human is where creativity that matters for copyright occurs, not the output of the tool (whether that tools is a paintbrush or camera or AI).
In the case of both the camera and action painting, the input is where the creativity happens, not the output (which is sometimes quite random).
And if the output from a point-and-shoot camera or a random paint splatter is copyrightable, then shouldn't the output for generative AI also be eligible?
Who is really taking the photo? You or the iPhone?
This is where we return to Amir Ghavi’s observation. As he explained, the distinction between generative AI and the camera in your smartphone is blurring rapidly.
Yes, of course, you still need point your smartphone at a scene in the real world in order to capture an image, whereas with generative AI you must type in a text command. That is indeed different. Both represent the human input in the equation.
But what happens after the human user snaps the photo or submits the text prompt? In both cases, something quite similar happens inside the machine whether it is a smartphone or a generative AI hosted in the cloud: algorithms go to work after the human input is done. In both cases, those algorithms greatly influence the output.
You might say that the algorithms determine the output.
In the case of the smartphone, the algorithms inside your phone do a lot of work to improve the image. They apply image stabilization, noise reduction, sharpening, sometimes filters and adjustments for low lighting or backlighting. These make a huge difference to the final photo, generally for the better.
As a user, I have zero influence over those things (unless I decide to manually override the settings, which very few users do).
In fact, I have even less influence than I think I have over my iPhone camera.
AI is baked into the silicon of iPhone. For a detailed explanation, read this blog post about panoptic segmentation by the developers at Apple who used the Detection Transformer as the architecture for the deep neural network powered by the Apple Neural Engine coprocessor on your iPhone.
AI is responsible for many of the iPhone’s signature touches. The fake bokeh feature introduced in iPhone’s Portrait Mode is the result of an AI technique that relies on Apple’s patented depth map to generate the illusion of depth of field on an image captured by a shallow lens that couldn’t possibly generate real bokeh effects. The user has no control over the bokeh effect.
In fact, the pictures that end up in your Photo Album might not even be taken by you at all. Apple uses a type of neural image processing called Deep Fusion to capture a total of nine images, including two sets of four which are taken by the AI before you even click on the camera button. Then it takes one more longer exposure when you press the button.
The Neural Engine, a machine learning system, selects the best image from the bunch. Not you. You are not even aware that this happens.
For some purists, this much AI is way too much. They decry Apple’s push to supplant the DSLR, asking “Has your iPhone camera become too smart?”
GenAI is unpredictable but that won’t stop people from attempting to master promptcraft
In the case of generative AI, the process is the same. A set of algorithms go to work after I type in my prompt, just as they do when I press the virtual shutter button on my iPhone.
Similar to the smartphone camera, I have no influence or control over those algorithms. In many cases I cannot predict accurately the range of effects or styles that will be applied by the machine.
Often, when I use genAI tools, I feel like it is similar to playing scratch off lottery tickets. You only find out what you got when the result is revealed. Feel lucky, punk?
Conclusion: "originality of the input" seems like it could emerge as the standard for copyright, not "predictability of the output".
A growing consensus around the idea that generative AI is the new camera.
Call it a "camera-less camera". Unlike a traditional camera, AI is not generating images by capturing light reflected from a scene in the real world. But, much like the smartphone camera, the AI is applying a lot of computational power to manipulate the human input and generate a result.
First of all, let's take a moment to appreciate this modern marvel of computation. A computer that can create an image in seconds from a few words. Impressive.
And what happens inside your smartphone is at least an equally impressive feat computational photography. In both instances, software is doing most of the creative work.
Now consider the inevitable next step which is the merging of the two functions. When smartphone cameras include an "generative AI enhancement" feature, the user will have the ability to improve their photos with entirely fake elements created by AI.
Today, if you want to manipulate the photos you took on your camera with GenAI, it’s possible but it requires a cumbersome multi-step process. It is pretty easy to imagine that, soon, generative AI will just be another feature or option embedded in the camera app in your smartphone. There are already plenty of weak AI filter apps that do that.
Which means that the human input for Generative AI will be the same as it is for a smartphone camera, just snapping a picture.
Makes me wonder about the race to add ever-better sensors and lens to smartphones. When you buy a top-of-the-line smartphone, a big part of what you are paying for is the camera, meaning the sensor and the lens. That adds at least $300 of cost on top of a standard smartphone, basically the equivalent in price of buying a pretty good digital camera.
Many people are willing to pay the extra $300 for the convenience of having a camera-equivalent sensor and lens in the smartphone in their pocket, rather than in the DLSR stuffed in a drawer at home.
But what if you could use a standard-issue sensor and lens, and then send the image to the cloud for a digital upgrade via generative AI? In that case, you might be able to use a lower-resolution photo taken on a basic smartphone as the input, and the AI would do the rest, including upscaling the image to higher resolution.
If that happens, maybe we won’t need to upgrade our phones as often as we do today. If we have the ability upgrade the image processing software, we might be able to squeeze a little more lifespan out of an aging smartphone.
I see adding Gernative AI as a potentially appealing upgrade to the camera of the near future. Limitless creative potential, because you’ll basically be starting with the photo as a prompt and then you crank out endless iterations, perhaps veering off into some surprising directions with GenAI enhancement.
Purists who love photography will decry it, but they have been complaining since the days of Photoshop and Digital Darkroom, and that had no effect. The purists have probably been resisting change since the Brownie camera was introduced.
The dog may bark but the caravan moves on.
What about real cameras?
Om Malik has a grim prognosis for the companies that manufacture traditional cameras. He believes that the old school camera makers will soon be dead.
His main reason? Hardware companies are terrible at building usable software. In addition to being a longtime tech industry analyst, Om is a professional photographer, so his critique hits home.
Om’s article made me think further about the camera's relationship to the smartphone.
As a longtime fan of Sony cameras, I must admit I agree with Om. My two RX100 cameras are in a drawer someplace in my house. I gave away my big DSLR to my niece who appreciated its retro style. It is a hassle to carry cameras. Syncing them with the computer is another hassle.
I probably still have photos from before the pandemic that I haven’t yet transferred to my MacBook sitting on the memory card of one of my cameras.
The last two or three iPhones were so good that I lost the urge to carry my digital camera anymore, even when I am traveling. Especially when I am traveling! Old school cameras are too bulky, too heavy, they require special cables, and the process of exporting images to the computer is cumbersome.
It is easier to use the smartphone, even if the lens is not quite as good.
In addition to a large brilliant display and low bulk, one thing that makes the smartphone superior to the old school camera is software.
Image processing software works unobtrusively to turn a mediocre photographer into a good one. Most people do not even notice it, they just feel good about the quality of the images that they can shoot. Many assume that they have great skill.
The race between the phone and the camera companies is all about software now, and the camera makers are losing because they are terrible at software. Just like previous hardware companies (consumer electronics firms, PC makers, even car manufacturers).
This makes me wonder what the future of photography will be all about. I expect it to be defined by software. Beyond the current notion of “computational photography.”
What is a "software defined camera"?
We are already getting a glimpse of what a “software defined document” might be. Generative AI is reshaping our understanding of what a "document" is. It's no longer fixed and immutable.
We are moving from the 500-year-old notion of a "fixed" document printed on paper to something like a "procedural" document written in software that can evolve depending upon outside circumstances.
Some journalists have already begun to experiment with "procedural" journalism whereby the article on a web site may continue to evolve long after it was written. This represents a pretty significant break with our understanding of authorship, authority, fixed media, and the trustworthiness of print.
Will we also have "procedural" photos?
Will our future photography produce images that evolve and respond to viewers, brought to life by hidden links to generative AI systems in the cloud?
Midjourney and Stability may be evolving into some kind of virtual camera, maybe a software defined camera, or a camera without any physical presence at all. The vaporized camera.
I’d be surprised if Apple does not someday introduce a new kind of “live” photo that is endlessly animated with AI. It’s already manipulated and filtered by AI, so why not take the next step?
This is new territory. It's fun to explore it.
All images in this newsletter are generated by Midjourney. As such, they are not currently copyrightable.
I think that the sense of ownership of intellectual property derives from deep features of consciousness and personal identity that are very real but also esoteric and difficult to discuss.
Rather than being driven by explicit decisions that a photographer makes, or is assumed to make, I think that the photographer's proprietary connection to their photographic image is a reflection of a moment in shared conscious experience - an aesthetic-participatory phenomenon in which something of nature or humanity is revealed through the sensitivity and motivation of the photographer. The photographer is showing the audience what they saw, or could have seen if they were able to see as quickly as a camera can register physical photosensitivity.
The photographer succeeds when they capture a moment that is rich in visible Significance, or said another way, a moment where Significance (aesthetic-participatory 'saturation') is presented visibly, making the ephemeral moment and the significance of it potentially eternal.
What I'm trying to get as is that I think that the proprietary essence of the artist is hinted at holographically in their art. A single tiny gesture can reveal it if the audience is familiar with their body of work. In the past, that might be called a soul, but I think that AI (among other things) is pushing us to really understand what that word means in scientific or at least legal terms.
Before the digital era, a person's signature was used because the nuances of the gesture made with ink on paper were so subtle and seemingly unique that they were not easy to forge perfectly, and also because the relationship between a person and their own name has a particular significance. Graphology was quite popular in the past for that reason, even though it is now popularly held to be pseudoscientific and unreliable.
The questions that AI brings up for me relate to the ability of a generated content to carry that esoteric, gestural, holographic quality of personhood. Looking at something like a Rothko painting, there seems to be some numinous quality that comes through even in the supremely simple-seeming presentations.
In the case of an AI prompt, what seems to be happening is that while the invention of the prompt itself can carry some proprietary qualities, most of the value actually comes from the larger context of shared consciousness. At this point it seems that some people are just especially 'good at it'. In the hands of an artistic person, AI can act as a camera does for a photographer, but instead of capturing an intimate fragment of direct relationship to nature, there is an intermediate Baudrillardian/simulacra layer of data that is ultimately extracted, often unwittingly, through public and private surveillance. The "light" of the AI camera does not belong to nature but to human beings, both in the public community and in the ranks of developers and engineers who have built it, often for commercial or other politically charged purposes.
It is hard to imagine any kind of satisfactory solution to the AI/IP legal tsunami. If I had to guess, I would think that whoever is willing to spend the most money or bring the most attention to their individual case will come out ahead for the foreseeable future. Our legacy legal and political systems are not equipped to deal with the politics of human consciousness as it actually is, but I think that is what is ultimately necessary. We need a 21st century Manhattan Project on the intersection of consciousness and mental health, technology, money, and politics.