Blog

Integration3 min read

Concurrency Control with p-queue and fal

How to submit 200 generations at a controlled rate so the queue stays responsive and your quota does not spike.


Two hundred jobs and a quota that does not spike

200 prompts, one five second Pixverse v6 clip each. At $0.03-$0.12 per second (tiered) that is $30 total at the cheapest 360p tier, still cheap. What is not cheap is what happens when you Promise.all all 200 submissions at once.

You hit account concurrency, fal returns 429, most calls fail, the few that succeed finish out of order, your database writes are a mess.

The fix is not harder retries. It is a bounded submitter. p-queue is the smallest library that does it well.

Install and a tiny runner

BASH
1npm install p-queue @fal-ai/client
TYPESCRIPT
1import PQueue from "p-queue";
2import { fal } from "@fal-ai/client";
3
4fal.config({ credentials: process.env.FAL_KEY });
5
6const queue = new PQueue({ concurrency: 8 });
7
8async function drain(prompts: string[]) {
9 return Promise.all(
10 prompts.map((prompt) =>
11 queue.add(() =>
12 fal.subscribe("fal-ai/pixverse/v6/text-to-video", {
13 input: { prompt, duration: 5, resolution: "1080p" },
14 }),
15 ),
16 ),
17 );
18}

At most eight in flight. The other 192 wait in memory. No 429s, no lost work.

Pick a concurrency number from your account

Concurrency dial with throughput graph
Concurrency dial with throughput graph

Start at 4. Watch for a minute. Zero 429s, linear throughput, move to 8. Keep doubling until the first 429, then halve. The sweet spot on most accounts is 8 to 16.

Above your ceiling, you are paying for retries, not generations. The curve flattens and gets worse.

Add backoff underneath

p-queue limits concurrency. It does not handle transient 429s. Wrap the fal call in your backoff helper:

TYPESCRIPT
1queue.add(() =>
2 withBackoff(() =>
3 fal.subscribe("fal-ai/pixverse/v6/text-to-video", { input: { prompt, duration: 5 } }),
4 ),
5);

Complementary. Queue keeps you under limit, backoff catches flakes.

Intervals

Some endpoints care about requests per minute.

TYPESCRIPT
1const queue = new PQueue({
2 concurrency: 8,
3 intervalCap: 60,
4 interval: 60_000, // 60 requests per minute
5});

Pair intervalCap with interval in milliseconds. The queue throttles across the minute regardless of completion times.

Progress, cancellation, pause

Pause and resume control with queue depth gauge
Pause and resume control with queue depth gauge
TYPESCRIPT
1queue.on("active", () => console.log(`in-flight: ${queue.pending}, waiting: ${queue.size}`));
2queue.on("idle", () => console.log("all done"));
3
4queue.pause();
5queue.start();
6queue.clear();

Log pending and size every few seconds, persist progress, support resume. If your process dies halfway through 200 generations, you should not re-generate the first 120.

Client gate versus queue.submit

p-queue is a client side gate. It decides when your process opens HTTP connections. It does not change fal's server side queue.

If your jobs are all long (Veo 3.1 4K, Kling v3 Pro, Wan 2.7 at 15s), do not use subscribe, use fal.queue.submit with a webhook. Then p-queue only gates the submit POST, which is fast. Hundreds of dangling HTTP connections saved.

TYPESCRIPT
1queue.add(() =>
2 fal.queue.submit("fal-ai/veo3.1", {
3 input: { prompt, duration: "8s", resolution: "4k" },
4 webhookUrl: `${process.env.APP_URL}/api/fal/webhook`,
5 }),
6);

Cost sanity before you press go

TYPESCRIPT
1function estimate(jobs: { seconds: number; pricePerSec: number }[]) {
2 const cents = Math.round(jobs.reduce((s, j) => s + j.seconds * j.pricePerSec * 100, 0));
3 return `$${(cents / 100).toFixed(2)}`;
4}
5
6// 200 Pixverse v6 clips, 5s each
7estimate(Array.from({ length: 200 }, () => ({ seconds: 5, pricePerSec: 0.005 })));
8// -> "$5.00"

Log the estimate before the queue starts. If it says $500 and you thought $5, you catch it before you burn it.

The whole pattern

One bounded queue, one backoff wrapper, submit POSTs when jobs are long, log the in flight count. That is what turns a Promise.all disaster into a predictable twelve minute run.