Integration3 min readApr 6, 2026

Concurrency Control with p-queue and fal

How to submit 200 generations at a controlled rate so the queue stays responsive and your quota does not spike.

Two hundred jobs and a quota that does not spike

200 prompts, one five second Pixverse v6 clip each. At $0.03-$0.12 per second (tiered) that is $30 total at the cheapest 360p tier, still cheap. What is not cheap is what happens when you Promise.all all 200 submissions at once.

You hit account concurrency, fal returns 429, most calls fail, the few that succeed finish out of order, your database writes are a mess.

The fix is not harder retries. It is a bounded submitter. p-queue is the smallest library that does it well.

Install and a tiny runner

BASH

1npm install p-queue @fal-ai/client

TYPESCRIPT

1import PQueue from "p-queue";
2import { fal } from "@fal-ai/client";
3
4fal.config({ credentials: process.env.FAL_KEY });
5
6const queue = new PQueue({ concurrency: 8 });
7
8async function drain(prompts: string[]) {
9  return Promise.all(
10    prompts.map((prompt) =>
11      queue.add(() =>
12        fal.subscribe("fal-ai/pixverse/v6/text-to-video", {
13          input: { prompt, duration: 5, resolution: "1080p" },
14        }),
15      ),
16    ),
17  );
18}

At most eight in flight. The other 192 wait in memory. No 429s, no lost work.

Pick a concurrency number from your account

Start at 4. Watch for a minute. Zero 429s, linear throughput, move to 8. Keep doubling until the first 429, then halve. The sweet spot on most accounts is 8 to 16.

Above your ceiling, you are paying for retries, not generations. The curve flattens and gets worse.

Add backoff underneath

p-queue limits concurrency. It does not handle transient 429s. Wrap the fal call in your backoff helper:

TYPESCRIPT

1queue.add(() =>
2  withBackoff(() =>
3    fal.subscribe("fal-ai/pixverse/v6/text-to-video", { input: { prompt, duration: 5 } }),
4  ),
5);

Complementary. Queue keeps you under limit, backoff catches flakes.

Intervals

Some endpoints care about requests per minute.

TYPESCRIPT

1const queue = new PQueue({
2  concurrency: 8,
3  intervalCap: 60,
4  interval: 60_000, // 60 requests per minute
5});

Pair intervalCap with interval in milliseconds. The queue throttles across the minute regardless of completion times.

Progress, cancellation, pause

Pause and resume control with queue depth gauge

TYPESCRIPT

1queue.on("active", () => console.log(`in-flight: ${queue.pending}, waiting: ${queue.size}`));
2queue.on("idle", () => console.log("all done"));
3
4queue.pause();
5queue.start();
6queue.clear();

Log pending and size every few seconds, persist progress, support resume. If your process dies halfway through 200 generations, you should not re-generate the first 120.

Client gate versus queue.submit

p-queue is a client side gate. It decides when your process opens HTTP connections. It does not change fal's server side queue.

If your jobs are all long (Veo 3.1 4K, Kling v3 Pro, Wan 2.7 at 15s), do not use subscribe, use fal.queue.submit with a webhook. Then p-queue only gates the submit POST, which is fast. Hundreds of dangling HTTP connections saved.

TYPESCRIPT

1queue.add(() =>
2  fal.queue.submit("fal-ai/veo3.1", {
3    input: { prompt, duration: "8s", resolution: "4k" },
4    webhookUrl: `${process.env.APP_URL}/api/fal/webhook`,
5  }),
6);

Cost sanity before you press go

TYPESCRIPT

1function estimate(jobs: { seconds: number; pricePerSec: number }[]) {
2  const cents = Math.round(jobs.reduce((s, j) => s + j.seconds * j.pricePerSec * 100, 0));
3  return `$${(cents / 100).toFixed(2)}`;
4}
5
6// 200 Pixverse v6 clips, 5s each
7estimate(Array.from({ length: 200 }, () => ({ seconds: 5, pricePerSec: 0.005 })));
8// -> "$5.00"

Log the estimate before the queue starts. If it says $500 and you thought $5, you catch it before you burn it.

The whole pattern

One bounded queue, one backoff wrapper, submit POSTs when jobs are long, log the in flight count. That is what turns a Promise.all disaster into a predictable twelve minute run.

Back to all posts

Blog

Concurrency Control with p-queue and fal

Two hundred jobs and a quota that does not spike

Install and a tiny runner

Pick a concurrency number from your account

Add backoff underneath

Intervals

Progress, cancellation, pause

Client gate versus queue.submit

Cost sanity before you press go

The whole pattern

Persisting Generation History in Postgres

Presigned URLs and Your Own CDN for AI Video

Uploading Assets to fal Storage vs External CDN