Blog

Prompting3 min read

Pixverse C1: Prompts Built Around the Action Engine

Pixverse C1 has no negative prompt, no style presets, no thinking mode. What it has is an action-trained backbone that reads physical motion better than almost anything at $0.03-$0.12/sec (tiered).


C1 is a motion model wearing video clothes

The parameter surface tells you everything. No negative_prompt. No style. No thinking_type. No cfg_scale. Just prompt, duration, resolution, aspect ratio, seed, and an audio switch. Pixverse C1 was not built to need dials. It was built to render physical, action-heavy motion cleanly.

That changes how you prompt. Mood-first lyrical prompting that works on Wan 2.7 or Kling underperforms on C1. What C1 responds to is motion-first prompts with specific verbs, physical weight, and camera instruction.

The five-part motion prompt

Order matters. This order activates the action engine reliably:

  1. Subject, who or what, described physically
  2. Action, an active verb, not a passive description
  3. Motion quality, velocity, weight, impact character
  4. Environment, spatial context the physics can live in
  5. Camera, shot type, lens, angle, movement

Bad: A swordsman in a fortress fighting

Too abstract. No motion engine triggers fire. The model defaults to slow drift.

Good:

CODE
1A female swordsman parries a downward strike then ripostes with a two-handed thrust, blade clashing with sparks and dust, each motion carrying visible weight and momentum, inside a crumbling stone fortress lit by torchlight, low-angle tracking shot with a 35mm lens

Subject. Active verbs (parries, ripostes). Motion quality (visible weight and momentum). Environment, camera, lens.

Five layer motion prompt stack diagram
Five layer motion prompt stack diagram

Verbs that wake the engine

Human motion: strikes, parries, kicks, throws, ducks, lunges, pivots, spins, rolls, leaps

Environmental: shatters, explodes, ignites, collapses, disintegrates

Kinematics: accelerates, decelerates, impacts, ricochets, recoils

Motion quality phrases: visible weight and momentum, high-velocity, slow-motion at impact, impact causes visible recoil

What dies in the output: feels dangerous, looks cool, very epic. Emotions about motion are not motion. The model cannot render feeling without an action to hang it on.

Grid of motion verbs stamped on swatches
Grid of motion verbs stamped on swatches

No negative prompt, exclusions go positive

C1 ignores negative_prompt entirely. Field does not exist in the schema. The workaround is exclusion language inside the positive prompt:

CODE
1sharp focus, no motion blur
2clean single frame, no crowd, no text overlay
3clear blue sky, no overexposure

Positive clause first, exclusion second. no motion blur alone does nothing. sharp focus, no motion blur works because the positive clause gives the model something to render and the exclusion narrows it.

Camera language is the biggest free win

C1 knows cinematography terminology. wide establishing shot, 14mm lens, close-up tracking shot, 85mm, low-angle hero shot, looking up, over-the-shoulder view, subject facing camera-left, handheld tracking shot, following at waist height, slow dolly push-in on subject's face, all read as direct instructions. A well-written action prompt without camera direction still looks flat. Spend the words.

No presets, describe style explicitly

No anime/clay/cyberpunk toggle. Write it into the prompt: cel-shaded animation style with bold ink outlines, neon-lit cyberpunk environment with rain reflections, painted comic-book aesthetic with halftone shading. C1 renders these when told directly.

Audio

generate_audio_switch: true generates BGM, SFX, and dialogue. Off during iteration, on for final renders. Sound-producing events go in the prompt: blade clash rings out, footsteps echo on stone. Environmental acoustics help: reverberant stone chamber, open wind-swept plain.

A full call

TYPESCRIPT
1const result = await fal.subscribe("fal-ai/pixverse/c1/text-to-video", {
2 input: {
3 prompt: "A martial artist in a white gi executes a spinning heel kick, dust particles rising from the mat on impact, visible momentum through the hip rotation, dramatic side key light with deep shadows, low-angle tracking shot with a 24mm lens, slow-motion at the point of contact",
4 duration: 6,
5 resolution: "720p",
6 aspect_ratio: "16:9",
7 generate_audio_switch: false,
8 seed: 1001,
9 },
10});

Duration

C1 supports 1 to 15 seconds. Under 3 seconds and action sequences cut off before resolution. The sweet spot for action is 5 to 8 seconds. At $0.03-$0.12/sec (tiered), a 15-second 1080p clip with no audio is $1.425 (15 x $0.095), extending costs real money, only useful if the motion actually needs the time.