- ThePrompt
- Posts
- Text-to-video (API available)
Text-to-video (API available)
Basically Stable Diffusion for videos
Hi folks! 👋🏻 This is The Prompt, the Indiana Jones of AI:
We uncover valuable insights that others have missed. ⚱️
Today's exciting finds:
Fine-tune your videos
Audio is all you need
ChatGPT enters Congress
GPT as a backend
+ more
Create & Fine-tune videos ⏯
The Tune-A-Video model was released last year, and it can generate videos using only one text-video pair.
The tech is based on text-to-image models, & is adapted for video generation.
So, basically Stable Diffusion/DreamBooth for videos.🎥
This weekend they released the official implementation, and you can find an open API on this Replicate link.
Text-to-music/audio papers hit the right note 🎵
In just 3 days, we got 5 papers that outperform previous models like Riffusion (which we covered here).
These new papers do so many new things:
📝 Text → 🎵 (different kinds of music/sounds/humming -- all of it)📸 Image → 🎵 🎥 Video → 🎵🖌️ Inpainting → 🎵
All links are included in the "Latest AI papers" section below👇🏻
📚 Learning lounge
[Short article] Overview of GPT-as-a-Backend
🪀 Top Headlines
🛼 Toolbox
Steamship: Build and deploy Prompt APIs in seconds 🤯
ScribePod: 1.5 hours of dialogue about ML papers
Text2SQL: Generate SQL with AI
MakeLog: Automate your change-log with GPT-3
🤓 Latest Audio/Music papers
1. Make-An-Audio: You can create audio from text, images, or videos. Plus you can do audio inpainting.
2. AudioLDM: You can generate sound, speech, and music with text descriptions. This model can also generate other everyday sounds from the text description (basically any sound you want). Great news - they will open-source this one!
3. Noise2Music: generate high-quality 30-second music clips from text prompts.
4. MusicLM: a model by Google that can generate high-fidelity music from text descriptions, that span over several minutes. This model can also transform whistled and hummed melodies according to the style described in a text caption. Sadly, Google doesn't plan to release this model due to ethical reasons
5. Moûsai: generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Code on this GitHub link.
📸 AI Photo of the day
Pope getting some fresh ink
❤️ If you like The Prompt, and want to support my work:
Share The Prompt with a friend, and invite them to subscribe here.
Book an ad in The Prompt (reply to this email if you’re interested)
What'd you think of today's edition? |
Thank you for reading! ✌🏼