SeaArt Unlock Endless Possibilities

Bring your imagination to life and create captivating art with AI.

Get Started Now

Kling AI Video Generator: the New King of AI Video Generation Tool

Alan Updated on Sep 28, 2024

6 min read

Kuaishou's latest Kling AI video generator has attracted 700,000+ users to enjoy a high-quality, customizable, and lifelike AI video creation experience.

Could the era of AI-generated short films be upon us?

Recently, the demos released by various video-generation AI tools have been dazzling. From meme creation to physics-based logic, the flood of AI creativity has been impressive, each vying to outdo Sora. Amidst this competition, someone has quietly taken a significant step forward, achieving "cinematic-level" performance:

From realistic lighting effects:

lighting effect

To rich imagination, it covers all elements:

Batman and Joker

Some users have already started using this capability for complex tasks. With video-generating AI, music-generating AI, plus some Photoshop and After Effects, we can produce a complete music video.

AI creates MV

The smooth and detailed effects of this AI video generation have attracted a lot of likes. A quick browse reveals many short videos produced by it on social media.

netizen comment

Users summarized that the new AI's advantage lies in its ability to handle large movements without awkward artifacts. For example, generating a video from an image, like a running centaur:

a running centaur

Behind these videos is Kuaishou's large video generation model "Kling AI," which started making waves on the global internet a few weeks ago, claiming to be "in high demand."

Indeed, this isn't just a demo for a PPT presentation but a product-ready application. Now, Kling AI video generator has launched a web version, emphasizing simplicity and usability.

According to the latest data, Kling AI's user applications have approached 700,000, making it the hottest video generation model on the internet.

The Rapid Evolution of Kling AI

This year marks the dawn of generative AI, and as early as February, OpenAI's Sora raised the bar for video generation. Yet, domestic tech companies were the first to land in this space.

Since its debut on June 6, Kuaishou's Kling AI, the first Chinese large AI video generation model to gain significant overseas attention, has undergone three major updates in just one month.

Starting with text-to-video, followed by support for image-to-video, video continuation, and multiple size options two weeks later, Kling AI has been increasingly outstanding and comprehensive. Various video generation needs seem to have been quietly resolved.

Last weekend, at the World Artificial Intelligence Conference (WAIC) 2024, Kling AI received its third major upgrade, introducing a series of new features that significantly enhance video quality, aesthetics, and usability, leading to another leap in creative experience.

Kuaishou's Senior Vice President, Gai Kun, responsible for the main site business and community science line, introduced three key upgrade features: High-Resolution Mode, First and Last Frame Control, and Camera Movement Control.

Firstly, the basic Kling AI video model has been upgraded again, launching a clearer High-Resolution Mode. Post-upgrade, the video quality has dramatically improved compared to the previous model.

Thanks to higher training spatio-temporal resolution, Kling AI has significantly improved in detail generation, composition, motion aesthetics, and lighting.

quality enhancement

Secondly, Kling AI video generator added the highly demanded "First and Last Frame Control" feature for image-to-video, making it possible to create coherent videos with matching start and end frames.

By customizing the starting and ending frame images, users can precisely control the smooth transitions between different video segments, achieving effects like seamless single-take shots. The actual generated results show not only natural and smooth actions but also maintained image quality. This feature provides users with a more intuitive and convenient editing experience, meeting personalized image-to-video needs.

Finally, Kling AI introduced camera movement control and automated master-level camera movements. In the video world, more camera combinations can capture more frames and enhance overall expressiveness.

Kling AI presets six classic camera movement controls, including Roll, Tilt, Pan, Vertical, Horizontal, and Zoom (in/out), offering a rich selection for different scenes. Users can also adjust the positive and negative parameters of these movements to control the intensity or smoothness of the motion and reverse movements. Additionally, master-level movements help produce eye-catching cinematic masterpieces.

With these new features, Kling AI has made visible improvements in video clarity, aesthetic performance, and content customization control.

Furthermore, the officially launched Kling AI web version integrates text-to-image, text-to-video, and soon-to-be-supported video editing capabilities, becoming an immediately usable one-stop visual content creation platform.

Kling home page

The newly added "First and Last Frame Control" and "Camera Movement Control" features are currently available on the web version. Interested users can apply now!

Describing Kling AI's upgrade as "full of sincerity" is not an exaggeration, which is undoubtedly due to Kuaishou's continuous innovation and breakthroughs in video generation capabilities and technology.

"Cinematic" AI Video Generation, Powered by Technology

Compared to the already mature image generation, video generation tasks are more complex, facing challenges in realism, action continuity, frame smoothness, detail accuracy, scene, character, and lighting consistency, physical accuracy, and duration limits.

Handling these challenges well determines the model's practicality and usability. Clearly, the newly upgraded Kling AI has undergone a transformative change in these aspects. Summarized, Kling AI has seven key capabilities.

Wan Pengfei, Head of Visual Generation and Interaction at Kuaishou, detailed these capabilities, which form Kling AI's core competitiveness in video quality, image-to-video, motion generation, generation duration, physical accuracy, command response, and video controllability. He also provided insights into future developments, stating that the improvement speed of video generation effects is rapidly approaching graphical rendering and camera shooting, offering new opportunities for the broad video industry.

We have already seen Kling AI's capabilities in action, with the previously showcased High-Resolution Mode, First and Last Frame Control, and Camera Movement Control features representing further evolution in cinematic high-definition generation, leading image-to-video effects, and excellent video generation controllability.

The cinematic high-definition generation capability can vividly and accurately present grand or detailed scenes, such as majestic natural landscapes, human or animal movements, and expressions, full of cinematic feel.

the cinematic high-definition illustration

Leading image-to-video capability brings static images to life, converting them into dynamic 5-second short videos. Coupled with different text inputs, image-to-video becomes more creative and versatile.

Excellent video generation controllability puts finer video creation in users' hands. In addition to the new Camera Movement Control, future features will include voice-face matching, character ID consistency, and screen and layout evolution through simple strokes. The model training is already complete, and these features will be available soon.

Meanwhile, Kling AI video generator has also been upgraded in motion generation, generation duration, physical accuracy, and command response capabilities.

Firstly, Kling AI has a large and reasonable motion generation capability. By modeling complex spatiotemporal movements, Kling AI can generate extensive movements while conforming to motion rules.

Thanks to more comprehensive AI video model training, Kling AI's overall motion effect is more agile, supporting larger action ranges without losing rationality. For example, the cat's turning and walking posture are depicted naturally and reasonably, adhering to physical facts.

cat turning illustration

Secondly, it can generate long videos in minutes. Now, the ability to generate minute-long videos has become an essential metric for evaluating a video generation model, requiring more effective multi-shot handling, longer storytelling, and more coherent motion extension capabilities.

Currently, Kling AI can generate several minutes of 1080p, 30fps video. It also offers video continuation following user commands, extending video motion by 4-5 seconds per continuation and supporting multiple continuous continuations, generating up to 3 minutes of video. The continuation can specify the story's direction, maximizing ease of use.

Following this upgrade, Kling AI's algorithm and engineering were deeply optimized, extending the single generated video length from 5 seconds to 10 seconds, achieving the longest duration in user-available products, and providing a broader creative space for users.

Thirdly, Kling AI can simulate complex physical world characteristics. Since Sora, video generation models have focused on generating physics-compliant videos, which determine the model's capability ceiling.

From its launch, Kling AI could accurately model and simulate real-world attributes, making generated videos appear realistic.

cat bath illustration

Now, with more comprehensive model training, Kling AI's modeling and simulation capabilities of interactive physical laws have advanced further.

Fourthly, Kling AI's conceptual combination and command response capabilities are very strong. Technically, through a deep understanding of cross-modal semantics from text to video, Kling AI can easily convert users' rich imaginations into specific video scenes, unleashing creativity.

The upgraded Kling AI received better text data and encoding schemes, naturally enhancing its command response capability and improving visual rendering effects.

All these capabilities stem from Kling AI's technical accumulation and unique innovations in video generation technology (using DiT architecture), model design (such as latent space encoding/decoding, temporal information modeling, text extension and encoding), data assurance (such as multi-dimensional tagging system, video description model), computational efficiency (such as distributed training clusters, phased training strategies), and capability extension (such as video temporal extension, multi-modal input control).

In summary, today's Kling AI video generator is technologically advanced and reliable, no wonder it has been highly pursued since its release.

The Era of Generative AI: Kuaishou's Preparedness

Over the past year, the large model field has been highly competitive. Last year, discussions focused on foundational model development, while this year, the conversation has shifted to applications. Recently, the WA

IC conference has witnessed debates between "modelers" and "applicators."

How has Kuaishou positioned itself in this wave?

Firstly, Kuaishou focuses on a complete system. From the underlying IDC computing power center to network architecture and AI platform to mid-layer foundational core models, and finally to various application explorations at the top layer, Kuaishou has achieved comprehensive self-research. Discussing this system, Kuaishou's VP and Head of the Large Model Team, Zhang Di, believes that steadfast investment in self-research will bring a "technological snowball" effect and significant cost advantages in the long term. One of Kuaishou's significant advantages is the abundance of AI application scenarios at the top layer, providing numerous opportunities for model application.

Zhang Di

Secondly, Kuaishou adheres to a dual strategy of fundamental model technology research and commercial application. Fundamental models set the upper limit of AI capabilities, and incremental research investment can trigger qualitative changes. On the other hand, commercial applications can roll the technology snowball, allowing phased technology application and continuous feedback, gradually forming a virtuous cycle.

Last year, Kuaishou introduced the "KwaiYi" large model, which quickly grew from an early 13B parameter model to 175B, and launched multi-modal versions. After multiple iterations, the KwaiYi model has begun to play roles in Kuaishou's internal content creation, AI interaction, and material production. In June this year, the daily consumption of AIGC marketing materials based on KwaiYi exceeded 20 million.

With a foundational model in place, Kuaishou has gradually developed its differentiated capabilities in various scenarios.

Specifically, in text-to-image, Kuaishou's "Ketu" has become one of the industry's top models, with strong semantic understanding and command-following capabilities. Thanks to innovations in text representation and substantial image data alignment work, Ketu can produce camera-quality images, and its aesthetic alignment with human standards has been achieved through reinforcement learning.

In video generation, Kling AI video generator has ignited a new round of global competition in the field. It can generate text-to-video, image-to-video, and has rich image editing capabilities, maintaining industry excellence in video generation controllability, quality, aesthetics, and motion rationality. Kuaishou's engineers continue to optimize engineering algorithms, striving to lower the threshold for video generation AI.

Speaking of lowering thresholds, optimizing new technologies is one of the critical challenges facing generative AI. As a national-level short video application, Kuaishou's advantage lies in the numerous AI application scenarios, providing opportunities for practical application.

In terms of technology implementation, Kuaishou has achieved a series of milestones:

- The "AI Xiaokuai" dialogue model application tested in the APP comment section can understand video content and interact with users, accumulating over 10 million fans.

- In e-commerce live streams, using the text-to-image AI "Ketu," users can quickly try on clothes with their photos, even seeing dynamic displays.

- Since its release, Kling AI video model has been widely recognized by users, generating over 7 million videos and launching a one-stop content creation platform.

From content production and understanding to recommendation, from individuals to e-commerce, Kuaishou's generative AI capabilities have achieved full coverage of its main business, continually advancing the Kuaishou ecosystem.

Kuaishou's comprehensive practice reassures us that AI productivity is subtly changing our lives.