Sora, Veo 3, and Kling: How to Detect AI-Generated Video in 2026

Netanel Ossi
Founder, FauxLens
The Video Threshold Has Been Crossed
For most of AI's history, video was the frontier that would not fall. Images were convincing by 2022. Audio was convincing by 2023. But video—with its requirement for temporal consistency across thousands of frames, realistic physics simulation, and coherent lighting across moving scenes—was considered the last reliable signal of authenticity.
That changed in 2024. OpenAI's Sora demonstrated that a text prompt could produce 60-second video clips with cinematic quality. By 2025, Google's Veo 2 and the Chinese model Kling had joined the field, each producing video that major news organizations initially published without verification. In 2026, Veo 3 can generate video with synchronized audio—footsteps, ambient sound, dialogue—all synthesized from a single text description.
Sponsored
The era of trusting your eyes for video has ended. But the forensic tools are keeping pace.
Why Video Deepfakes Are Harder to Detect
A static image analysis looks for spatial inconsistencies: lighting from the wrong angle, noise patterns that don't match hardware signatures, compression artifacts that reveal manipulation. Video introduces a time dimension that multiplies both the detection challenge and the number of available signals.
AI video models are trained to produce frames that are individually plausible. The failure modes appear in the transitions between frames—in the physics of how objects move, in the consistency of lighting as a camera pans, and in the subtle flickering of fine details like hair, fabric texture, and background elements.
The Key Forensic Tells of AI Video
1. Temporal Flickering
Look at a single small region of a suspected AI video—a patch of wall, a section of sky, a square centimeter of clothing texture. In a real video, this region will be stable between frames (allowing for camera shake and motion blur). In an AI-generated video, static regions often exhibit subtle frame-to-frame variation—a barely perceptible flickering that is the model regenerating each frame independently rather than tracking continuity from the previous frame.
This is most visible on solid surfaces. Pause the video on consecutive frames and compare. Real footage is consistent. AI footage often 'breathes.'
2. Physics and Fluid Dynamics
Current AI video models perform poorly at simulating the behavior of fluids, fabric, and particulates. Water does not flow with the correct viscosity. Fabric does not follow realistic drape physics under gravity. Smoke and dust expand according to aesthetic rules rather than thermodynamic ones.
This is perhaps the most reliable tell because it is extremely difficult to fake. Physics simulation requires solving differential equations that neural networks currently approximate poorly. A video of someone walking through rain, for example, will often show rain that bounces at incorrect angles from surfaces, droplets that are uniform in size (real rain has a wide size distribution), and puddle ripples with the wrong radius-to-frequency ratio.
3. The 'Boiling' Background Effect
AI video models allocate most of their computational attention to the primary subject—the person or object specified in the prompt. Background elements are rendered with lower fidelity and often exhibit what researchers call 'boiling': a subtle, chaotic shifting of texture and detail that makes background foliage, crowds, or architectural detail appear to vibrate slightly.
4. Compression Artifact Patterns
When an AI-generated video is encoded for distribution (typically with H.264 or H.265 compression), the codec interacts with AI-generated content in distinctive ways. The blocky artifacts that appear in highly-compressed real video follow the structure of the underlying image. In AI video, these blocks often cluster at AI-generated boundary regions—the edges between synthesized objects—creating a characteristic artifact pattern different from natural compression.
5. Facial Microexpressions and Blink Patterns
Early face-swap deepfakes rarely blinked—a tell that trained observers quickly learned to look for. Modern systems blink convincingly. But they still fail on microexpressions: the sub-100-millisecond muscle contractions around the eyes, mouth, and forehead that accompany speech and emotion in real humans. Forensic video analysis can measure the timing and symmetry of these microexpressions. AI faces are often too symmetrical—real faces are subtly asymmetric, and that asymmetry is dynamic.
The Arms Race: Where Video Detection Is Heading
Video detection is an active research field. Academic groups at MIT, Stanford, and the University at Buffalo publish updated detection benchmarks quarterly. The current state of the art for AI video detection achieves roughly 80-85% accuracy on high-quality AI video. This is better than human performance (approximately 50%), but far from a solved problem.
The key emerging approach is temporal consistency network analysis: rather than analyzing individual frames for spatial artifacts, these systems analyze entire clips for physics and continuity violations across time. This approach is model-agnostic—it does not need to know which AI system generated the video, only that the video violates the laws of physical reality in measurable ways.
What You Should Do Right Now
If you encounter a dramatic or emotionally charged video on social media—particularly one that appears to show a crisis, a political figure, or a celebrity in an unusual situation—apply skepticism proportional to the stakes. Reverse image search individual frames using Google Lens. Check whether the audio is synchronized with lip movements at a frame-by-frame level. Look for the tells described above. And run suspicious clips through algorithmic analysis before sharing or reporting them.
The burden of verification has been democratized. Professional forensic tools are no longer confined to research labs. They are available to anyone willing to spend 60 seconds before hitting the share button.