Troubleshooting3 min readApr 12, 2026

Debugging IN_PROGRESS Forever Jobs

When a fal job never completes, five checks you run before you cancel and resubmit. Cancel is always last.

IN_PROGRESS does not mean broken

A job sitting at IN_PROGRESS longer than you expect is not automatically stuck. Before you cancel and resubmit, run five checks. Cancel is last.

Check one: logs at the endpoint

fal.queue.status with logs: true returns whatever the worker has printed. Ninety percent of "stuck" jobs tell you exactly what they are waiting on.

TYPESCRIPT

1const status = await fal.queue.status("fal-ai/veo3.1", { requestId, logs: true });
2
3console.log(status.status);
4status.logs?.forEach((l) => console.log(l.timestamp, l.message));

Recent timestamps mean it is alive. Last log 30 seconds old at 90 percent? Wait. Last log five minutes old at 10 percent? Continue.

Check two: the endpoint status

A single slow endpoint during a burst is common for Veo 3.1 4K, Kling v3 Pro, Wan 2.7 at 15 seconds. 45 second, one minute queue times on those during peak are not a bug.

If the community is quiet and the endpoint looks healthy, the issue is your payload. Continue.

Check three: the input payload

Two classes of silent-ish failure.

Oversized inputs: image-to-video with a 12 MB reference, or audio with a 20 MB WAV. The worker is downloading your asset. Slow storage = long IN_PROGRESS.
Ambiguous prompts: extremely long prompts or content that triggers Veo 3.1's auto_fix=true rewrites. Result finishes, but slower.

Look for log messages like "downloading image_url" or "prompt expanded".

Check four: region and account concurrency

If you just fired 30 jobs and this is the 31st, you are queued behind a hidden worker limit. Status says IN_QUEUE or IN_PROGRESS but nothing is actively computing.

Count your own in-flight jobs:

SQL

1SELECT count(*) FROM generations WHERE status IN ('IN_QUEUE','IN_PROGRESS');

Near your ceiling? Obvious answer. Or submit a tiny Pixverse v6 5 second draft (around $0.15 at 360p no audio). If that flies through, your infra is fine and the original is just waiting.

Check five: the webhook

If you submitted with webhookUrl and your DB still says IN_QUEUE, the job may have completed and you missed the notification. Two ways that breaks.

Handler returned non 2xx. Fal retries, then stops. Upstream COMPLETED, your DB never got the update.
Webhook URL changed during deploy. Delivery fires to a 404.

Poll directly.

TYPESCRIPT

1const final = await fal.queue.result("fal-ai/veo3.1", { requestId });

If this succeeds, the job is done. Fix the webhook, update your DB with the result, move on. No re-generation.

Now, and only now, cancel

Cancel as last resort, new request id stamped

If logs are stale, endpoint is healthy, payload is sane, concurrency is not it, no webhook pending, then cancel.

TYPESCRIPT

1await fal.queue.cancel("fal-ai/veo3.1", { requestId });
2await db.query("UPDATE generations SET status='CANCELLED' WHERE request_id=$1", [requestId]);

Resubmit with a different strategy:

Veo 3.1 at 4K: draft first on Veo 3.1 Lite at $0.05/sec.
Wan 2.7 at 15s: split into two 8 second clips.
Image-to-video with a large asset: upload to fal.storage first.

A debug helper you will reuse

TYPESCRIPT

1async function debugJob(endpoint: string, requestId: string) {
2  const s = await fal.queue.status(endpoint, { requestId, logs: true });
3  return {
4    status: s.status,
5    lastLogTime: s.logs?.at(-1)?.timestamp,
6    lastLogMsg: s.logs?.at(-1)?.message,
7    inFlight: await db.oneOrNone(
8      "SELECT count(*)::int AS c FROM generations WHERE status IN ('IN_QUEUE','IN_PROGRESS')",
9    ),
10  };
11}

Four data points. Most of the time, one obvious cause.

Back to all posts

Blog

Debugging IN_PROGRESS Forever Jobs

IN_PROGRESS does not mean broken

Check one: logs at the endpoint

Check two: the endpoint status

Check three: the input payload

Check four: region and account concurrency

Check five: the webhook

Now, and only now, cancel

A debug helper you will reuse

Error Handling Patterns for Long-Running Video Jobs

Picking a fal.ai Video Model: A Decision Tree

Measuring Success Rate: What to Log and What to Ignore