Pixverse C1: Prompts Built Around the Action Engine
Pixverse C1 has no negative prompt, no style presets, no thinking mode. What it has is an action-trained backbone that reads physical motion better than almost anything at $0.03-$0.12/sec (tiered).
C1 is a motion model wearing video clothes
The parameter surface tells you everything. No negative_prompt. No style. No thinking_type. No cfg_scale. Just prompt, duration, resolution, aspect ratio, seed, and an audio switch. Pixverse C1 was not built to need dials. It was built to render physical, action-heavy motion cleanly.
That changes how you prompt. Mood-first lyrical prompting that works on Wan 2.7 or Kling underperforms on C1. What C1 responds to is motion-first prompts with specific verbs, physical weight, and camera instruction.
The five-part motion prompt
Order matters. This order activates the action engine reliably:
- Subject, who or what, described physically
- Action, an active verb, not a passive description
- Motion quality, velocity, weight, impact character
- Environment, spatial context the physics can live in
- Camera, shot type, lens, angle, movement
Bad: A swordsman in a fortress fighting
Too abstract. No motion engine triggers fire. The model defaults to slow drift.
Good:
1A female swordsman parries a downward strike then ripostes with a two-handed thrust, blade clashing with sparks and dust, each motion carrying visible weight and momentum, inside a crumbling stone fortress lit by torchlight, low-angle tracking shot with a 35mm lens
Subject. Active verbs (parries, ripostes). Motion quality (visible weight and momentum). Environment, camera, lens.

Verbs that wake the engine
Human motion: strikes, parries, kicks, throws, ducks, lunges, pivots, spins, rolls, leaps
Environmental: shatters, explodes, ignites, collapses, disintegrates
Kinematics: accelerates, decelerates, impacts, ricochets, recoils
Motion quality phrases: visible weight and momentum, high-velocity, slow-motion at impact, impact causes visible recoil
What dies in the output: feels dangerous, looks cool, very epic. Emotions about motion are not motion. The model cannot render feeling without an action to hang it on.

No negative prompt, exclusions go positive
C1 ignores negative_prompt entirely. Field does not exist in the schema. The workaround is exclusion language inside the positive prompt:
1sharp focus, no motion blur2clean single frame, no crowd, no text overlay3clear blue sky, no overexposure
Positive clause first, exclusion second. no motion blur alone does nothing. sharp focus, no motion blur works because the positive clause gives the model something to render and the exclusion narrows it.
Camera language is the biggest free win
C1 knows cinematography terminology. wide establishing shot, 14mm lens, close-up tracking shot, 85mm, low-angle hero shot, looking up, over-the-shoulder view, subject facing camera-left, handheld tracking shot, following at waist height, slow dolly push-in on subject's face, all read as direct instructions. A well-written action prompt without camera direction still looks flat. Spend the words.
No presets, describe style explicitly
No anime/clay/cyberpunk toggle. Write it into the prompt: cel-shaded animation style with bold ink outlines, neon-lit cyberpunk environment with rain reflections, painted comic-book aesthetic with halftone shading. C1 renders these when told directly.
Audio
generate_audio_switch: true generates BGM, SFX, and dialogue. Off during iteration, on for final renders. Sound-producing events go in the prompt: blade clash rings out, footsteps echo on stone. Environmental acoustics help: reverberant stone chamber, open wind-swept plain.
A full call
1const result = await fal.subscribe("fal-ai/pixverse/c1/text-to-video", {2 input: {3 prompt: "A martial artist in a white gi executes a spinning heel kick, dust particles rising from the mat on impact, visible momentum through the hip rotation, dramatic side key light with deep shadows, low-angle tracking shot with a 24mm lens, slow-motion at the point of contact",4 duration: 6,5 resolution: "720p",6 aspect_ratio: "16:9",7 generate_audio_switch: false,8 seed: 1001,9 },10});
Duration
C1 supports 1 to 15 seconds. Under 3 seconds and action sequences cut off before resolution. The sweet spot for action is 5 to 8 seconds. At $0.03-$0.12/sec (tiered), a 15-second 1080p clip with no audio is $1.425 (15 x $0.095), extending costs real money, only useful if the motion actually needs the time.