Model Guide4 min readApr 16, 2026

Veo 3.1: 4K, Native Audio, and the $0.40/sec Cost Math

Veo 3.1 is the only video model on fal.ai that ships 4K and synthesized audio together, but at $0.40/sec an 8-second 4K clip costs $3.20.

What Veo 3.1 is

Veo 3.1 is Google DeepMind's flagship video model. On fal.ai it runs as fal-ai/veo3.1 for text-to-video and fal-ai/veo3.1/image-to-video for image-to-video. Both endpoints cost the same and share most of their parameter surface. It's the only model on the platform that delivers true 4K output alongside natively synthesized audio (dialogue, ambient, SFX) in a single pass.

What it does well

Physical realism and lip-sync dialogue are where Veo 3.1 separates from the pack. You can script a line of dialogue in the prompt and get back a clip with matched mouth movement. Ambient sound tracks the scene automatically - rain prompts produce rain audio, not silence.

The resolution ladder goes 720p, 1080p, 4k. That 4K tier is native generation, not an upscale. Duration options are 4s, 6s, 8s (strings, not integers - don't pass 8). Image-to-video also accepts an auto aspect ratio that infers from the input image.

What it can't do

The ceiling is 8 seconds per call. There's no frame-chaining or end-image parameter, which means multi-shot sequences have to be stitched post-generation. Aspect ratios are restricted to 16:9 and 9:16 (plus auto for I2V) - no 1:1 or 21:9. Duration is a fixed enum, not free-form; you cannot ask for a 5-second clip.

Parameters that matter

Parameter	Type	Default	Options
`prompt`	string	required	-
`duration`	string	`8s`	`4s`, `6s`, `8s`
`resolution`	string	`720p`	`720p`, `1080p`, `4k`
`aspect_ratio`	string	`16:9`	`16:9`, `9:16`
`generate_audio`	boolean	`true`	-
`safety_tolerance`	string	`4`	`1` to `6`
`negative_prompt`	string	-	-
`auto_fix`	boolean	`true` T2V / `false` I2V	rewrites blocked prompts
`seed`	integer	-	-

safety_tolerance is a string not an integer - pass "4", not 4. Level 1 is strictest, 6 is most permissive. Default of 4 is usually fine for mixed content.

Pricing

$0.40 per second, flat across 720p, 1080p, and 4K. An 8-second 4K clip is $3.20. A 4-second 720p draft is $1.60. Resolution is free on the cost axis, duration isn't.

If you're iterating prompts, use fal-ai/veo3.1/lite at $0.05/sec for drafts - same schema minus 4K - then promote the winning prompt to full Veo 3.1 for the final render. That's an 8x cost delta per iteration.

Veo 3.1 cost ladder with receipt breakdown

Working example

TYPESCRIPT

1import { fal } from "@fal-ai/client";
2
3const result = await fal.subscribe("fal-ai/veo3.1", {
4 input: {
5 prompt: "A chef flames a copper pan in a professional kitchen, close-up on sizzling ingredients, warm tungsten key light",
6 resolution: "1080p",
7 duration: "8s",
8 aspect_ratio: "16:9",
9 generate_audio: true,
10 negative_prompt: "shaky camera, blown highlights, text overlay",
11 safety_tolerance: "4",
12 },
13});
14
15console.log(result.data.video.url);

When to pick Veo 3.1

Pick it when 4K is a hard requirement, or when you need native dialogue with lip sync. Skip it for rapid iteration (use Lite), multi-shot sequences longer than 8 seconds (use Kling v3 Pro's multi_prompt), or budget-constrained batch work (LTX 2.3 at $0.08/sec or Seedance are cheaper). The quality is there; you're paying for it.

Back to all posts

Blog