- ThePrompt
- Posts
- Text-to-video (API available)
Text-to-video (API available)
Basically Stable Diffusion for videos
Hi folks! ππ» This is The Prompt, the Indiana Jones of AI:
We uncover valuable insights that others have missed. β±οΈ
Today's exciting finds:
Fine-tune your videos
Audio is all you need
ChatGPT enters Congress
GPT as a backend
+ more
Create & Fine-tune videos β―

The Tune-A-Video model was released last year, and it can generate videos using only one text-video pair.
The tech is based on text-to-image models, & is adapted for video generation.
So, basically Stable Diffusion/DreamBooth for videos.π₯
This weekend they released the official implementation, and you can find an open API on this Replicate link.
Text-to-music/audio papers hit the right note π΅
In just 3 days, we got 5 papers that outperform previous models like Riffusion (which we covered here).
These new papers do so many new things:
π Text β π΅ (different kinds of music/sounds/humming -- all of it)πΈ Image β π΅ π₯ Video β π΅ποΈ Inpainting β π΅
All links are included in the "Latest AI papers" section belowππ»
π Learning lounge
[Short article] Overview of GPT-as-a-Backend
πͺ Top Headlines
πΌ Toolbox
Steamship: Build and deploy Prompt APIs in seconds π€―
ScribePod: 1.5 hours of dialogue about ML papers
Text2SQL: Generate SQL with AI
MakeLog: Automate your change-log with GPT-3
π€ Latest Audio/Music papers
1. Make-An-Audio: You can create audio from text, images, or videos. Plus you can do audio inpainting.
2. AudioLDM: You can generate sound, speech, and music with text descriptions. This model can also generate other everyday sounds from the text description (basically any sound you want). Great news - they will open-source this one!
3. Noise2Music: generate high-quality 30-second music clips from text prompts.
4. MusicLM: a model by Google that can generate high-fidelity music from text descriptions, that span over several minutes. This model can also transform whistled and hummed melodies according to the style described in a text caption. Sadly, Google doesn't plan to release this model due to ethical reasons
5. MoΓ»sai: generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Code on this GitHub link.
πΈ AI Photo of the day
Pope getting some fresh ink
β€οΈ If you like The Prompt, and want to support my work:
Share The Prompt with a friend, and invite them to subscribe here.
Book an ad in The Prompt (reply to this email if youβre interested)
Thank you for reading! βπΌ