Everything I'm Learning About AI Filmmaking
This also appears as a Medium post which includes some video examples
First and foremost, let me clarify that I am not a filmmaker. I have no illusions of making it big in Hollywood or directing commercials for major brands. My background lies in Web Development, UX Design, and Graphic Design education. The last time I dabbled in video was back in 2004 when I purchased an expensive prosumer video camera. However, I quickly gave up when I couldn’t figure out how to use it. Fast forward 20 years, and I have acquired some new skills, making it easier for me to learn new things. With that said, let’s jump into my journey of understanding AI filmmaking.
TLDR; Claude >> Midjourney >> Pika Labs >> Eleven Labs >> Google Test Kitchen
When you have money: HeyGen, Topaz Labs
Starting from scratch, I began by learning the basics of shooting, lighting, and familiarizing myself with the terminology. YouTube became my go-to resource for this initial learning phase. Next, I delved into video editing. I started with Premiere Rush, a simpler consumer version of Adobe Premiere. Although it took me embarrassingly long to grasp, it provided a solid foundation for me to transition to the full version of Adobe Premiere. Additionally, I taught myself how to use Da Vinci Resolve, thanks to YouTube once again. However, as these are professional tools that I don’t require for every project, I eventually settled on CapCut, a more accessible and simplified editing software that suits my needs. While I am aware that CapCut is developed by ByteDance, the owners of TikTok, and some may argue it contributes to the downfall of civilization, I’m really enjoying it.
In June 2023, I started using RunwayML and their text-to-video tool called Gen1. RunwayML actually offers a range of photo and video tools on their website, although they are not often mentioned. Armed with some prompts, I used up a significant amount of credits to mimic what the cool kids were doing on Twitter. I saved my precious 4-second clips to my desktop until I had a collection of eight. Then, I added AI-generated music (thanks to Google Test Kitchen) and stitched them together in CapCut. Exciting as it was to generate generative video, the quality reminded me of the early days of Midjourney and Dalle-2. In other words, it was not great.
However, my interest was piqued when I learned about Gen2, which enabled image-to-video generation. After reading a few Medium articles and watching YouTube videos, I discovered that I could generate cinematic stills in a 16:9 aspect ratio using Midjourney, and then upload them to Runway Gen2. Some of the results looked awesome, but not consistently. So, I kept uploading and re-rolling until I had enough that looked great. Although I provided ratings and feedback on each image to help developers refine the process, I had no control over how the image was animated. I simply accepted what I received.
A month later, feeling somewhat disillusioned, I moved on to Kaiber. It seemed like the place where the cool kids hang out, with its vibrant and psychedelic visuals. The process was similar to Runway, but with a few more controls. I generated an image in Midjourney, uploaded it to Kaiber, wrote a prompt, selected from existing styles, chose camera movements (zoom, pan, etc.), and hit ‘Generate’. While the initial results were exhilarating, the novelty wore off quickly. Everything started to resemble a bad acid trip, and despite an ‘Intensity’ control slider, I couldn’t prevent my characters from sprouting extra heads and hands.
My path took a turn for the better when I discovered Pika Labs. Even though I had to work in Discord, the quality of the output was significantly higher. I also began using ChatGPT to write my scripts, which I then slightly edited and pasted into Eleven Labs for voice-overs. While RunwayML introduced new features, Pika Labs did the same while maintaining superior quality. Currently, Pika supports camera movements such as pan and zoom, a degree of motion (ranging from 1 to 5, with varying results), aspect ratio adjustments, and the ability to superimpose text into videos. For a complete guide to Pika’s syntax, click here.
Needless to say, YouTube has played a crucial role in my progress. Here is a shortlist of YouTube content creators who have been immensely helpful in my journey to become an AI filmmaker:
Theoretically Media: I have been following Tim for about six months, and his extensive knowledge of audio and video production shines through as he effortlessly references both technical and cinematic terms. Moreover, he is always on the lookout for, and test-driving new tools, providing clear and concise instructions on how to use them.
Theoretically MediaCurious Refuge: Shelby and Caleb are the talented masterminds behind the viral ‘Wes Anderson style Star Wars and LOTR trailers’ from last spring. I didn’t realize this at first, as I was just watching the tutorials, but now I realize they are accomplished filmmakers and also run an AI film school. Their channel never fails to inspire me.
Curious RefugeThe Reel Robot: Dale offers a wealth of information that fills the gaps in my understanding of cinematography, while quickly getting to the point of using new AI tools. He demonstrates how ChatGPT can help maintain consistency with camera angles, color grading, and film types when writing scripts and prompts and clearly connects all the dots.
The Reel RobotMatt Wolfe: Although Matt does not exclusively discuss AI filmmaking, he frequently touches on the topic, along with a dozen other AI-related subjects. His channel has truly expanded my horizons regarding what is possible with AI content creation, and it was his content that initially sparked my interest in this field.
Matt WolfeWhile I don’t always use Adobe Premiere, especially for smaller experiments, I am eager to return to it and After Effects once I feel ready to create more polished projects. Adobe has been incorporating a range of new AI features, and I am excited to explore them. In the meantime, I am learning new tricks from Premiere Gal, who is always at the forefront of the latest developments.
Premiere Gal
My current workflow consists of Claude >> Midjourney >> Pika Labs >> Eleven Labs >> CapCut. However, I want to mention a couple of additional tools. Decoherence is a browser-based tool that produces videos slightly better than Kaiber, with a psychedelic-stable diffusion vibe. It also offers a timeline editor where you can drag and drop clips, adjust their length, add audio, and more. Although it is superior to RunwayML, it falls just short of being a truly great tool. I would use it too, but I have already exceeded my budget for testing out various tools.
On the topic of budget, when the time comes to invest in a lip-sync tool, I will likely choose D-iD.com if I’m on a tight budget or heygen.com if I’m fully committed. However, convincing lip-syncing is not yet perfected, so I am hesitant to use it for what are still experimental projects. Based on the current pace of progress, I anticipate that my skillset and these tools will be ready for prime time by early 2024. Most of these tools spit out mediocre video where the dimensions are too small or the frame rate is too low. If you want to increase your frame rate Descript is an excellent free option and Veed is a great online tool when you’re ready to spend a little money.
RunwayML also solves some of these problems if you’ve decided to buy into their ecosystem. However, once the tools get a little better and I get a little more skilled, I’m going to spend $250 and get a copy of Topaz Labs. It upscales, it interpolates, it just makes everything look better. And all the cool people I read and watch keep mentioning it.
Now, you may wonder, why am I embarking on this journey, and what’s the purpose? I strongly believe that AI-generated videos will become huge in 2024, and right now, it is as thrilling as the internet was back in 1994. My goal is to gain a deep understanding of these tools and provide my students with a competitive advantage. Perhaps, I may even create a few short films for my own enjoyment.