Blog

Integration3 min read

Exponential Backoff for Transient fal API Errors

A tiny retry wrapper that handles the most common transient errors without burning a single extra generation.


Retry, but not on everything

The cheapest bug is retrying a 400 a dozen times. A 400 means bad input. Retrying it costs time and, if you hit a partial generation path, money.

Retry only on the transient class.

Retryable status codes grid with pink retry stamps
Retryable status codes grid with pink retry stamps
  • 408 request timeout
  • 429 too many requests (concurrency or rate limit)
  • 500, 502, 503, 504 server errors
  • Network layer: DNS, TCP reset, TLS handshake, connection closed mid response

Everything else is a code smell, not a flake. Do not retry 401, 403, 404, 422.

The algorithm in one paragraph

Start with a short base delay (500 ms). Double after each failure. Add random jitter so callers do not wake up together. Cap the delay. Cap the total attempts.

Formula: sleep = min(base * 2^attempt, cap) * random(0.5, 1.5).

Jitter curve with scattered retry bars
Jitter curve with scattered retry bars

Without jitter, every caller that failed at the same moment retries at the same moment. With it, they spread.

TypeScript wrapper

TYPESCRIPT
1type Fn<T> = () => Promise<T>;
2
3interface BackoffOptions {
4 maxAttempts?: number;
5 baseMs?: number;
6 capMs?: number;
7 isRetryable?: (err: unknown) => boolean;
8}
9
10const defaultRetryable = (err: unknown) => {
11 const e = err as { status?: number; code?: string };
12 if (e?.code === "ECONNRESET" || e?.code === "ETIMEDOUT") return true;
13 if (e?.status === 408 || e?.status === 429) return true;
14 if (e?.status && e.status >= 500 && e.status < 600) return true;
15 return false;
16};
17
18export async function withBackoff<T>(
19 fn: Fn<T>,
20 opts: BackoffOptions = {},
21): Promise<T> {
22 const { maxAttempts = 5, baseMs = 500, capMs = 15_000, isRetryable = defaultRetryable } = opts;
23 let attempt = 0;
24 while (true) {
25 try {
26 return await fn();
27 } catch (err) {
28 attempt += 1;
29 if (attempt >= maxAttempts || !isRetryable(err)) throw err;
30 const raw = Math.min(baseMs * 2 ** (attempt - 1), capMs);
31 await new Promise((r) => setTimeout(r, raw * (0.5 + Math.random())));
32 }
33 }
34}

Wrap your fal calls with it.

TYPESCRIPT
1const result = await withBackoff(() =>
2 fal.subscribe("fal-ai/pixverse/v6/text-to-video", {
3 input: { prompt: "A fox trotting through snow, wide shot", duration: 5 },
4 }),
5);

Pixverse v6 starting at $0.03/sec (360p no audio, scaling to $0.12/sec for 1080p with audio) is the right tester. A 5 second draft at 360p no audio is $0.15.

429 specifically

A 429 means concurrency. The retry works eventually, but the better fix is not firing every job at once. Put a p-queue in front. Move long jobs to fal.queue.submit with a webhook so workers are not held on subscribe.

Do not retry a running job

The most expensive mistake is retrying a job that is already running because the HTTP connection dropped.

TYPESCRIPT
1const { request_id } = await fal.queue.submit("fal-ai/wan/v2.7/text-to-video", {
2 input: { prompt: "...", duration: 5 },
3});
4
5// save request_id BEFORE polling
6await db.insert("generations", { request_id, status: "IN_QUEUE" });

If subscribe fails with a network error mid job, grab the request id and resume with fal.queue.status and fal.queue.result. Do not resubmit.

Python version

PYTHON
1import random, time, httpx, fal_client
2
3def with_backoff(fn, *, max_attempts=5, base=0.5, cap=15.0):
4 attempt = 0
5 while True:
6 try:
7 return fn()
8 except httpx.HTTPStatusError as e:
9 s = e.response.status_code
10 retry = s in (408, 429) or 500 <= s < 600
11 except (httpx.TimeoutException, httpx.NetworkError):
12 retry = True
13 else:
14 retry = False
15
16 attempt += 1
17 if attempt >= max_attempts or not retry:
18 raise
19 time.sleep(min(base * (2 ** (attempt - 1)), cap) * (0.5 + random.random()))

Caps that make sense

Five attempts, cap 15 seconds, base 500 ms. Roughly a minute of retry budget, long enough to ride through transient dips without burning a human watching a spinner. Cut to three attempts if your UX cannot tolerate a minute. Do not push past eight.