This tool utilizes TTT MLP (Test Time Training with multi layer perceptron) to make text to video easy. It learns while generating the video, making the story flow better.
Like creating one-minute video clips with consistent characters and smooth actions. Relief, when you have multi scene, story like prompts. For these, it does a good job keeping scenes tight and the action smooth.
It’s been constructed by researchers from Stanford, NVIDIA, and Berkeley. It’s shown to outperform older types of models on coherence. Often you get natural animation in one take! (It’s still a research proof-of-concept; there still are some quirks.)
How It Works?
You start with the base model (CogVideo X 5B) only used to be able to make short, 3 sec clips. We add small MLP layers to an already capable model. The MLP layers learn during inference.
They learn at test time using self supervised signals during the inference stage. You only provide small segments (like 3 sec each) and the local attention deals with that.
MLP layers handle the global story context. Because of that, character is same, motion is smooth, even over the whole AD minute-long video.
There’s a gate to mix the TTT learnable part with the base model output, so it stays stable. And it will run on circuitry like yours in real time on local GPU.
What It Offers?
You input a storyboard or prompt. It creates a one-minute animated video. Characters look the same, motion is smooth. It makes the video logical and natural.
There isn’t any editing, or chopping, or stitching that you have to do. It does everything in one shot. It is particularly very good at video with multiple scenes, cartoon like video, Tom & Jerry level of test animation.
The video works very well. You don’t see the jumpy mismatched elements or bizarre flicker. It automatically adapts on the fly, essentially the process adapts in real time. This is the thing that makes it awesome.
Pricing Chat
I guess in terms of research, there is no pricing, as it is an open, free access code on GitHub, so basically there is no cost for someone to try it out.
But if someone commercializes the software you will either receive free access, or pay per usage. Currently there is no charge, it is research code on a GitHub site, so it is very low cost.
Monthly Visitors
Their demo site gets maybe around a few hundred to low thousand visitors a month. I will it is about 1000 visitors monthly. Just a rough guess for a niche research demo.
Social Media Followers
There’s no public social profile for this tool. But Stanford, and NVIDIA social accounts have nice boosts. By and large, about 10,000 followers across all platforms likely see the posts.
So let’s say ~10,000 social followers total.
Pros:
Keeps characters the same over an entire minute.
Motion is fluid; it doesn’t have weird jumps.
It learns while generating just as is a really raw concept.
It manages multi scene, story-like videos.
There is no stitching or manual work on top of generated content.
Cons:
Still a proof of concept; you may see small artifacts.
Lighting or motion can wobble or glitch sometimes.
It needs a decent amount of GPU power (H100 class).
Not as polished as a product or easy to use.
No official pricing or support.
Final Thoughts
The TTT MLP Video Generator is a fun, clever tool. It learns while it’s creating, helping the video maintain flow. One minute of animation, only using a text prompt.
You can maintain the same character. The movements remain fluid, and there is no extra video editing. The rough edges are still present, but the concept is there.
If your technical, you can even experiment on GitHub for free it feels fresh.
It feels like a smart tool that is becoming more and more savvy as it creates in real time, which is much closer to how humans make tiny tweaks and changes as they create. Clean, quick, friendly – that’s the vibe.