Blog

Model Guide5 min read

Kling v3 Pro: multi_prompt, elements, Native Audio

Kling v3 Pro is the only model on fal.ai with first-class multi-shot scripting (multi_prompt) and character consistency bindings (elements).


What Kling v3 Pro is

Kling v3 Pro is Kuaishou's flagship video model. On fal.ai: fal-ai/kling-video/v3/pro/text-to-video and fal-ai/kling-video/v3/pro/image-to-video. It's the only model on the platform with native multi-shot scripting via multi_prompt and character/object consistency via elements bindings. That combination makes it the strongest single-call option for actual scene production rather than isolated clip generation.

What it does well

multi_prompt takes an array of {prompt, duration} objects and generates a multi-shot video in one API call. Each shot gets its own prompt text, and you can reference pinned characters across shots using @Element1, @Element2 syntax. Total duration caps at 15 seconds across all shots, with a 3-second minimum per shot.

Kling multi_prompt three-shot sequence diagram
Kling multi_prompt three-shot sequence diagram

elements (I2V only) locks a character or object into the generation. Each element is either a frontal_image_url plus up to 3 reference_image_urls for different angles, or a video_url from which the model extracts subject appearance. This is the cleanest solution to the identity-drift problem across shots.

Native audio is on by default and supports Chinese and English voice with lip sync. Other languages get auto-translated. cfg_scale (0-1, default 0.5) lets you tune prompt adherence numerically - higher values follow the prompt more literally. negative_prompt defaults to a reasonable "blur, distort, and low quality" baseline.

Image-to-video also supports start_image_url (required) and end_image_url (optional) for frame anchoring.

Kling elements character reference system
Kling elements character reference system

What it can't do

Resolution isn't user-configurable in the schema. Aspect ratios are 16:9, 9:16, 1:1 only on text-to-video. Duration is a string enum 3 through 15 (integer range), and the total also caps at 15 when using multi_prompt. You use prompt OR multi_prompt, not both.

Parameters that matter

ParameterTypeDefaultOptions
`prompt`string-mutually exclusive with `multi_prompt`
`multi_prompt`array-`[{prompt, duration}]`, up to 15s total
`duration`string`5``3`-`15`
`aspect_ratio`string`16:9``16:9`, `9:16`, `1:1` (T2V)
`cfg_scale`number0.50-1
`generate_audio`boolean`true`Chinese/English voice
`negative_prompt`string`blur, distort, and low quality`-
`shot_type`string`customize``customize`, `intelligent`
`elements`array-I2V only
`start_image_url`string-I2V required
`end_image_url`string-I2V optional

Pricing

$0.14 per second of output. 5-second clip: $0.70. 15-second maximum: $2.10. For multi_prompt sequences, you pay for the total duration across all shots, not per shot. A 15-second video with three 5-second shots costs the same as a single 15-second clip.

Kling v3 Pro pricing ladder
Kling v3 Pro pricing ladder

Working example

TYPESCRIPT
1import { fal } from "@fal-ai/client";
2
3const result = await fal.subscribe("fal-ai/kling-video/v3/pro/text-to-video", {
4 input: {
5 multi_prompt: [
6 { prompt: "Wide shot of a rain-soaked Tokyo street at night, neon reflections in puddles", duration: 4 },
7 { prompt: "Medium shot of a courier stepping off a bus, collar raised, rain dripping from hat", duration: 5 },
8 { prompt: "Close-up of the courier's face, determined expression, neon on his cheek", duration: 4 },
9 ],
10 aspect_ratio: "16:9",
11 cfg_scale: 0.6,
12 generate_audio: true,
13 negative_prompt: "blur, distort, low quality",
14 },
15});
16
17console.log(result.data.video.url);

When to pick Kling v3 Pro

Pick it when you need multi-shot narrative structure in one call, or when character consistency across shots is non-negotiable (use elements on the I2V endpoint). Pick it for projects where native bilingual audio matters. Skip it when 4K is required (use LTX 2.3 or Veo 3.1). Skip it for high-volume draft iteration where $0.14/sec adds up - Pixverse at $0.03-$0.12/sec (tiered) or Veo Lite at $0.05/sec are cheaper for throwaway renders. For one-shot scene production, Kling v3 Pro is the most capable single-call option here.