Everyone knows the story of the first YouTube video, a grainy 19-second clip of co-founder Jawed Karim on the zoo, remarking on the elephants behind him. That video was a pivotal second within the digital area, and in some methods, it’s a reflection, or at the least an inverted mirror picture, of at the moment as we digest the arrival of Veo 3.
A part of Google Gemini, Veo 3 was unveiled at Google I/O 2025 and is the primary generative video platform that may, with a single immediate, generate a video with synced dialogue, sound results, and background noises. Most of those 8-second clips arrive in underneath 5 minutes after you enter the immediate.
I have been enjoying with Veo 3 for a few days, and for my newest problem, I attempted to return to the start of social video and that YouTube “Me on the Zoo” second. Particularly, I puzzled if Veo 3 might recreate that video.
As I’ve written, the important thing to a superb Veo 3 consequence is the immediate. With out element and construction, Veo 3 tends to make the alternatives for you, and also you normally do not find yourself with what you need. For this experiment, I puzzled how I might probably describe all the small print I needed to derive from that brief video and ship them to Veo 3 within the type of a immediate. So, naturally, I turned to a different AI.
Google Gemini 2.5 Professional isn’t presently able to analyzing a URL, however Google AI Mode, the brand-new type of search that’s shortly spreading throughout the US, is.
This is the immediate I dropped into Google’s AI Mode:
Google AI Mode virtually immediately returned with an in depth description, which I took and dropped into the Gemini Veo 3 immediate area.
I did do some modifying, principally eradicating phrases like “The video seems…” and the ultimate evaluation on the finish, however in any other case, I left most of it and added this on the high of the immediate:
“Let’s make a video based mostly on these particulars. The output ought to be 4:3 ratio and seem like it was shot on 8MM videotape.”
It took some time for Veo 3 to generate the video (I feel the service is getting hammered proper now), and, as a result of it solely creates 8-second chunks at a time, it was incomplete, chopping off the dialogue mid-sentence.
Nonetheless, the result’s spectacular. I would not say that the principle character seems to be something like Karim. To be honest, the immediate would not describe, as an illustration, Karim’s haircut, the form of his face, or his deep-set eyes. Google’s AI Mode’s description of his outfit was additionally most likely inadequate. I am positive it will have performed a greater job if I had fed it a screenshot of the unique video.
Be aware to self: You possibly can by no means provide sufficient element in a generative immediate.
8 seconds at a time
The Veo 3 video zoo is nicer than the one Karim visited, and the elephants are a lot additional away, although they’re in movement again there.
Veo 3 acquired the movie high quality proper, giving it a pleasant 2005 look, however not the 4:3 side ratio. It additionally added archaic and pointless labels on the high that fortunately disappear shortly. I understand now I ought to have eliminated the “Title” bit from my immediate.
The audio is especially good. Dialogue syncs effectively with my major character and, if you happen to hear carefully, you may hear the background noises, as effectively.
The most important subject is that this was solely half of the transient YouTube video. I needed a full recreation, so I made a decision to return in with a a lot shorter immediate:
Proceed with the identical video and add him trying again on the elephants after which trying on the digital camera as he is saying this dialogue:
“fronts and that is that is cool.” “And that is just about all there’s to say.”
Veo 3 complied with the setting and major character however misplaced a few of the plot, dropping the old-school grainy video of the primary generated clip. Which means after I current them collectively (as I do above), we lose appreciable continuity. It is like a movie crew time leap, the place they instantly acquired a significantly better digital camera.
I am additionally a bit pissed off that every one my Veo 3 movies have nonsensical captions. I would like to recollect to ask Veo 3 to take away, conceal, or put them outdoors the video body.
I take into consideration how onerous it most likely was for Karim to movie, edit, and add that first brief video and the way I simply made primarily the identical clip with out the necessity for individuals, lighting, microphones, cameras, or elephants. I did not must switch footage from tape and even from an iPhone. I simply conjured it out of an algorithm. We have now actually stepped via the trying glass, my associates.
I did study one different factor via this mission. As a Google AI Professional member, I’ve two Veo 3 video generations per day. Which means I can do that once more tomorrow. Let me know within the feedback what you want me to create.