- ThePrompt
- Posts
- Text-to-video (API available)
Text-to-video (API available)
Basically Stable Diffusion for videos
Hi folks! ๐๐ป This is The Prompt, the Indiana Jones of AI:
We uncover valuable insights that others have missed. โฑ๏ธ
Today's exciting finds:
Fine-tune your videos
Audio is all you need
ChatGPT enters Congress
GPT as a backend
+ more
Create & Fine-tune videos โฏ

The Tune-A-Video model was released last year, and it can generate videos using only one text-video pair.
The tech is based on text-to-image models, & is adapted for video generation.
So, basically Stable Diffusion/DreamBooth for videos.๐ฅ
This weekend they released the official implementation, and you can find an open API on this Replicate link.
Text-to-music/audio papers hit the right note ๐ต
In just 3 days, we got 5 papers that outperform previous models like Riffusion (which we covered here).
These new papers do so many new things:
๐ Text โ ๐ต (different kinds of music/sounds/humming -- all of it)๐ธ Image โ ๐ต ๐ฅ Video โ ๐ต๐๏ธ Inpainting โ ๐ต
All links are included in the "Latest AI papers" section below๐๐ป
๐ Learning lounge
[Short article] Overview of GPT-as-a-Backend
๐ช Top Headlines
๐ผ Toolbox
Steamship: Build and deploy Prompt APIs in seconds ๐คฏ
ScribePod: 1.5 hours of dialogue about ML papers
Text2SQL: Generate SQL with AI
MakeLog: Automate your change-log with GPT-3
๐ค Latest Audio/Music papers
1. Make-An-Audio: You can create audio from text, images, or videos. Plus you can do audio inpainting.
2. AudioLDM: You can generate sound, speech, and music with text descriptions. This model can also generate other everyday sounds from the text description (basically any sound you want). Great news - they will open-source this one!
3. Noise2Music: generate high-quality 30-second music clips from text prompts.
4. MusicLM: a model by Google that can generate high-fidelity music from text descriptions, that span over several minutes. This model can also transform whistled and hummed melodies according to the style described in a text caption. Sadly, Google doesn't plan to release this model due to ethical reasons
5. Moรปsai: generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Code on this GitHub link.
๐ธ AI Photo of the day
Pope getting some fresh ink
โค๏ธ If you like The Prompt, and want to support my work:
Share The Prompt with a friend, and invite them to subscribe here.
Book an ad in The Prompt (reply to this email if youโre interested)
Thank you for reading! โ๐ผ