Recent progress in diffusion models has greatly enhanced video generation quality, yet these models still require fine-tuning to improve specific dimensions like instance preservation, motion rationality, composition, and physical plausibility. Existing fine-tuning approaches often rely on human annotations and large-scale computational resources, limiting their practicality. In this work, we propose GigaVideo-1, an efficient fine-tuning framework that advances video generation without additional human supervision. Rather than injecting large volumes of high-quality data from external sources, GigaVideo-1 unlocks the latent potential of pre-trained video diffusion models through automatic feedback. Specifically, we focus on two key aspects of the fine-tuning process: data and optimization. To improve fine-tuning data, we design a prompt-driven data engine that constructs diverse, weakness-oriented training samples. On the optimization side, we introduce a reward-guided training strategy, which adaptively weights samples using feedback from pre-trained vision-language models with a realism constraint. We evaluate GigaVideo-1 on the VBench-2.0 benchmark using Wan2.1 as the baseline across 17 evaluation dimensions. Experiments show that GigaVideo-1 consistently improves performance on almost all the dimensions with an average gain of ~4% using only 4 GPU-hours. Requiring no manual annotations and minimal real data, GigaVideo-1 demonstrates both effectiveness and efficiency. Code, model, and data will be publicly available.
GigaVideo-1 Training Pipeline. Our pipeline consists of two components: prompt-driven data engine and reward-guided optimization. On the left, we generate synthetic prompts targeting weak dimensions using LLMs, and synthesize training videos via a pre-trained T2V model. These are combined with real-caption–based samples to balance diversity and realism. On the right, a frozen MLLM scores each video on dimension-specific criteria. These scores guide training via weighted denoising loss. For synthetic videos from real-world caption prompts, extra realism constraint is applied for distribution alignment. GigaVideo-1 enables efficient, automatic fine-tuning without manual labels or extra data collection.
GigaVideo-1 improves the performance of our baseline Wan2.1 across different real-world dimensions.
"Alhambra, zoom out"
Wan2.1
GigaVideo-1
"Garden, First-person perspective, oblique shot, airborne dolly movement"
Wan2.1
GigaVideo-1
"Machu Picchu, zoom in"
Wan2.1
GigaVideo-1
"Pyramid, pan right"
Wan2.1
GigaVideo-1
"The camera enters a golden autumn forest, where the leaves have turned brilliant shades from gold to orange-red. A few leaves drift down with the wind. Sunlight filters through the..."
Wan2.1
GigaVideo-1
"The camera begins in a vast grassland, where the lush green grass sways gently in the breeze, the air fresh, and the soft rustling of the leaves fills the space. As the camera move..."
Wan2.1
GigaVideo-1
"The camera gently descends, passing through the layers of waves, entering the deep underwater world. The surrounding coral reefs are vibrant and colorful, with a variety of tropica..."
Wan2.1
GigaVideo-1
"The camera moves through the vast expanse of the universe, where stars twinkle against the dark backdrop, the Milky Way arcing like a silver river across the sky. Nebulae slowly ro..."
Wan2.1
GigaVideo-1
"In an ancient tomb deep in the mountains, the tomb raiders discovered a massive dragon bone structure. Legend had it that the skeleton belonged to an ancient dragon god. Captain Ol..."
Wan2.1
GigaVideo-1
"It is said that the ancient Dragon Tribe left behind a vast treasure, accessible only through a series of trials. Young adventurer Lucas set off on his own journey to find this leg..."
Wan2.1
GigaVideo-1
"Little Red Riding Hood brings food to visit her sick grandmother and encounters a cunning wolf along the way. The wolf pretends not to know her and guides her down a longer path. A..."
Wan2.1
GigaVideo-1
"The race began, and the runners quickly started. The Team A runner took the lead initially due to a powerful start. However, the Team B runner did not rush to chase but instead ste..."
Wan2.1
GigaVideo-1
"A crocodile with the arms of a gorilla, the legs of a cheetah, the scales of a snake, and the eyes of a chameleon, an apex predator in both land and water."
Wan2.1
GigaVideo-1
"A giraffe with the body of a whale, the legs of a kangaroo, and the tail of a flamingo, allowing it to leap from one ocean wave to another with remarkable grace."
Wan2.1
GigaVideo-1
"A giraffe with the wings of a bat, soaring above the trees in a mysterious flight."
Wan2.1
GigaVideo-1
"A lion with the wings of an eagle, soaring through the sky with majestic ease."
Wan2.1
GigaVideo-1
"The wooden toy turned into a glass toy."
Wan2.1
GigaVideo-1
"A snowman changes from large to small."
Wan2.1
GigaVideo-1
"An ant gradually grow big."
Wan2.1
GigaVideo-1
"A star changes from faint to bright."
Wan2.1
GigaVideo-1
"A cat is on the left of a chair, then the cat runs to the front of the chair."
Wan2.1
GigaVideo-1
"A cat is on the right of a rock, then the cat runs to the left of the rock."
Wan2.1
GigaVideo-1
"A kangaroo is in front of a basket, then the kangaroo jumps to the right of the basket."
Wan2.1
GigaVideo-1
"A squirrel is behind a rock, then the squirrel jumps to the left of the rock."
Wan2.1
GigaVideo-1
"A man is playing basketball."
Wan2.1
GigaVideo-1
"A man is playing football."
Wan2.1
GigaVideo-1
"A person is sitting in a chair, then they suddenly get up and start stretching."
Wan2.1
GigaVideo-1
"Two people are exchanging keys."
Wan2.1
GigaVideo-1
"One person places a blanket over another person."
Wan2.1
GigaVideo-1
"One person adjusts the collar of another person’s shirt."
Wan2.1
GigaVideo-1
"One person adjusts the glasses of another."
Wan2.1
GigaVideo-1
"A man is playing badminton."
Wan2.1
GigaVideo-1
"A man is doing yoga."
Wan2.1
GigaVideo-1
"A person is working on a project, then they suddenly start cooking dinner."
Wan2.1
GigaVideo-1
"A person is sitting at the table, then they suddenly start drawing on a notepad."
Wan2.1
GigaVideo-1
"A person is drinking a glass of water, then they suddenly start cleaning the windows."
Wan2.1
GigaVideo-1
"A person is drinking tea, then they suddenly start folding the laundry."
Wan2.1
GigaVideo-1
"A person is reading the news, then they suddenly start watering the plants."
Wan2.1
GigaVideo-1
"A man and a woman is doing yoga."
Wan2.1
GigaVideo-1
"A clear glass of baking soda is gently poured into a glass of vinegar."
Wan2.1
GigaVideo-1
"A clear glass of coffee is gently poured into a glass of milk."
Wan2.1
GigaVideo-1
"A clear glass of flour is gently poured into a glass of water."
Wan2.1
GigaVideo-1
"A small burning candle was thrown into a pile of dry twigs."
Wan2.1
GigaVideo-1
"A bottle of water is opened in the space station, and the water starts to float out in irregular shapes."
Wan2.1
GigaVideo-1
"A cork is placed on the surface of a bucket filled with water."
Wan2.1
GigaVideo-1
"A metal coin is gently placed on the surface of a shallow pool of water."
Wan2.1
GigaVideo-1
"A plastic toy is placed on the surface of a pond filled with water."
Wan2.1
GigaVideo-1
"A dog is lying in the sun, then it suddenly jumps up and starts playing with its owner."
Wan2.1
GigaVideo-1
"A horse is standing in the stable, then it suddenly starts chewing hay."
Wan2.1
GigaVideo-1
"A dog is running in the yard, then it suddenly starts sitting under a tree."
Wan2.1
GigaVideo-1
"A person is drinking a smoothie from a glass."
Wan2.1
GigaVideo-1
"A person is painting."
Wan2.1
GigaVideo-1
"A person is pouring olive oil into a frying pan."
Wan2.1
GigaVideo-1
"The camera orbits around. Bathtub, the camera circles around."
Wan2.1
GigaVideo-1
"The camera orbits around. Birdhouse, the camera circles around."
Wan2.1
GigaVideo-1
"The camera orbits around. Castle, the camera circles around."
Wan2.1
GigaVideo-1
"The camera orbits around. Clock Tower, the camera circles around."
Wan2.1
GigaVideo-1
"A timelapse captures the gradual transformation of a block of cheese as the temperature rises to 60°C"
Wan2.1
GigaVideo-1
"A timelapse captures the gradual transformation of a piece of ice as the temperature rises to 10°C"
Wan2.1
GigaVideo-1
"A timelapse captures the transformation as steam from a boiling pot comes into contact with a cold tile wall"
Wan2.1
GigaVideo-1
"A timelapse captures the transformation of water in a pot as the temperature reaches 130°C"
Wan2.1
GigaVideo-1
If you use our work in your research, please cite:
@article{gigavideo1,
title={GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning},
author={Bao, Xiaoyi and Lv, Jindi and Wang, Xiaofeng and Zhu, Zheng and Chen, Xinze and Zhou, Yukun and Lv, Jiancheng and Wang, Xingang and Huang Guan},
journal={arXiv preprint arXiv:2506.10639},
year={2026}
}