Blog

Prompting3 min read

Seedance 2.0: Director-Style Shot Lists in One Call

Seedance 2.0 accepts inline timing markers that split a single generation across multiple shots. Written right, one API call becomes a 12-second sequence with its own cuts.


What the model actually does with timing markers

Most video models treat a prompt as one scene. Seedance 2.0 treats it as a shot list. Drop bracketed timestamps into the prompt string and the model reads them as scene transitions inside a single generation:

CODE
1[0-5s] A chef seasons a cast iron pan over high flame, close on the hand, kitchen smoke curling up.
2[5-10s] Cut to the finished dish plated on dark slate, overhead shot, garnish added by hand.

One API call. Two distinct shots. The transition happens inside the clip because the model's multi-shot training activates when it sees timestamp blocks. This is the most useful thing about Seedance 2.0 and the easiest thing to get wrong.

The one rule: one action per segment

Each bracketed segment gets one primary action and one camera position. If you stack actions inside a single segment, the model either picks one or blurs both:

Bad:

CODE
1[0-5s] A runner ties her shoes, then stands up, stretches, and begins running down the street

Four actions in five seconds. The model will rush all of them or abandon some.

Good:

CODE
1[0-3s] A runner ties her shoes on a city curb at dawn, close on hands and laces.
2[3-8s] She stands and begins running toward camera, wide shot, empty street behind her.

Two actions, two camera positions, clear cut point.

Bracketed shot list of two timed segments
Bracketed shot list of two timed segments

Auto duration is a planning tool

duration: "auto" lets the model infer total length from how many segments you describe. Three five-second segments yields roughly 12 to 15 seconds. One dense segment without markers yields four to six. Use auto on your first pass, then lock to an explicit duration string ("4" through "15", passed as a string not an integer) once you know what the content wants to be.

Physics gets its own paragraph

Seedance 2.0 has a physics prior that activates when you describe force, resistance, and material contact. Compare:

Underpowered: The ball hits the wall

Activates the physics layer: A rubber ball strikes a brick wall with force, bounces back with reduced velocity, dust kicked up on impact

Name the material, the contact, and what happens after. The output motion becomes visibly heavier and more plausible. You cannot get this by asking for "realistic physics" as a tag, the model needs the verbs.

Weak versus strong physics verb chart
Weak versus strong physics verb chart

What Seedance does not have

No negative_prompt field. None. If you want to exclude something, write it into the positive prompt with exclusionary language: clean frame, no crowd, no text overlay, single subject only. The model reads the constraint because it is positive-voice. A negative_prompt key in your call is ignored.

No prompt expansion. What you write is what runs. This makes Seedance more predictable than Wan 2.7 seed-to-seed, but it means your prompt has to carry all the intent.

A real two-shot call

TYPESCRIPT
1const result = await fal.subscribe("bytedance/seedance-2.0/text-to-video", {
2 input: {
3 prompt: "[0-5s] Heavy raindrops strike a still forest pond, concentric ripples expanding and intersecting, low angle just above the water surface, muted green canopy light. Audio: rain intensifying.\n[5-10s] Cut to wide overhead of the pond surface, rain continuing, a single leaf drops and floats, camera slowly rising.",
4 aspect_ratio: "16:9",
5 resolution: "720p",
6 duration: "auto",
7 generate_audio: true,
8 seed: 42,
9 },
10});

Audio is inline

Audio cues live in the same prompt string. There is no separate audio field. the crowd cheers generates crowd audio. footsteps echo on marble generates the echo. she whispers generates whispered dialogue even without a full transcript. When generate_audio: true, the prompt is doing double duty.

What to strip

  • Stacked style words: cinematic, dramatic, stunning, epic, beautiful, each one after the first does nothing
  • Passive voice for action: the door is opened, use a hand pushes the door open
  • Ambiguous pronouns across segments, if the subject changes, say the noun again
  • Missing light source, say where light comes from

480p first

Resolution options are 480p and 720p. 480p renders 40 to 60 percent faster. Iterate timing markers and physics language at 480p, switch to 720p when the shot list is approved.