I knew they were coming and I’ve pledged to stop using superlatives like “mind blown” and “game changer”, but Pika Labs 1.0 and Midjourney V6 entered my world this morning and I haven’t been the same since. I’ll try and spend some time with my family in the days preceding Christmas, but to be honest, these are a monumental distraction. I’ll give a brief rundown here and then a full report after the holiday.
Pika Labs 1.0 is Web Based, Easy to Use, and Still Free!
If you’re excited about generative video, get on that waiting list at pika.art so you can start using it before they start charging for it. You can generate video from a text prompt, an uploaded image, or an uploaded video. While many of my initial generations were awful, the 3 second clips rendered quickly, and the ability to extend to 7 second was effortless. The controls are clear and intuitive, you can easily upscale to a resolution of 2560 x 1440, and the quality is stunning. What you see below is all text-to-video, no Midjourney images like in previous projects.
Midjourney V6 has Arrived with Text Rendering
It’s by no means perfect yet and requires a lot of trial and error, but it’s marked as an alpha release. Be prepared to prompt and re-prompt a lot. It’s not as good as Ideogram…yet. If you’d like to get started, you’ll need to enter /settings and switch to V6.
You’ll also probably (depending on what you’re seeking) see a big jump in quality. Photographic renderings are frequently stunning and often indistinguishable from real photos. As noted Instagram user worlds_wurst_dad has pointed out:
v6 is not perfect with text but it's a great start, but the images also got a major upgrade with details... some of them are very difficult to tell from non-AI images.
There is now a whole new syntax of prompting to learn, but it is more simple and straightforward. As CEO of Midjourney, David Holz, puts it:
“Prompting with V6 is significantly different than V5. You will need to ‘relearn’ how to prompt.
V6 is MUCH more sensitive to your prompt. Avoid ‘junk’ like “award winning, photorealistic, 4k, 8k”
Be explicit about what you want. It may be less vibey but if you are explicit it’s now MUCH better at understanding you.
If you want something more photographic / less opinionated / more literal you should probably default to using
--style raw
If you’d like the full report from Venture Beat, check it out
RunwayML Adds Text to Speech Generation
Okay, I have mixed feelings about RunwayML. This is a huge leap forward and could mean eliminating other audio tools from my roster because audio and video generation could happen under the same URL. However, I generated a bunch of audio files, and I was not impressed with the quality, so I’m sticking with Eleven Labs. It’s possible that their video quality has improved, because I was really impressed with the results and their Pan/Tilt/Zoom controls are very impressive. I generated a few more video samples and BOOM I was out of credits again. However, I am committed to getting better at this, so I will probably stock up again.
That’s it for today, December 22. Have a great Holiday. See you on the 26th!