Model Guide4 min readFeb 19, 2026

LTX 2.3: Native 4K Output and Audio From One Forward Pass

LTX 2.3 generates up to 2160p (4K) with synchronized audio in a single pass, at $0.08/sec - a fifth the cost of Veo 3.1's 4K tier.

What LTX 2.3 is

LTX 2.3 is Lightricks' open-source Diffusion Transformer for video, running on fal.ai as fal-ai/ltx-2.3/text-to-video and fal-ai/ltx-2.3/image-to-video. It's one of two models on the platform that produces native 4K (the other is Veo 3.1), and it generates synchronized audio and video in the same forward pass rather than stitching them post hoc.

What it does well

Resolution ceiling is 2160p (4K), with 1440p and 1080p as intermediate tiers. Audio is on by default and ships alongside video in one generation - ambient sound, environmental audio, atmospheric elements. Frame rate is configurable: 24, 25, 48, or 50 fps, which matters when you're matching cinematic convention (24) versus broadcast European (25) versus high-motion 48/50.

Image-to-video supports end_image_url for start-to-end interpolation. That's the cleanest path to a controlled transition between two known compositions.

What it can't do

Duration is a fixed enum: 6, 8, or 10 seconds. Nothing shorter, nothing longer. Aspect ratios are 16:9 and 9:16 only on text-to-video (plus auto on I2V) - no 1:1 or ultrawide. No negative prompt field, no seed control in the OpenAPI schema. That last one hurts for reproducibility testing.

Parameters that matter

Parameter	Type	Default	Options
`prompt`	string	required	-
`duration`	integer	6	6, 8, 10
`resolution`	string	`1080p`	`1080p`, `1440p`, `2160p`
`aspect_ratio`	string	`16:9` T2V / `auto` I2V	`16:9`, `9:16`
`fps`	integer	25	24, 25, 48, 50
`generate_audio`	boolean	`true`	-

On image-to-video, image_url is required and end_image_url is optional for transition videos.

Pricing

$0.08 per second. A 10-second clip at any resolution is $0.80. At 4K, that's roughly a fifth of what Veo 3.1 charges for the same duration ($0.40/sec vs $0.08/sec). If your workflow needs 4K output and you don't need Veo's native dialogue lip-sync, LTX 2.3 is the cost-efficient play.

Resolution doesn't affect price - 1080p, 1440p, and 2160p all cost the same per second.

Working example

TYPESCRIPT

1import { fal } from "@fal-ai/client";
2
3const result = await fal.subscribe("fal-ai/ltx-2.3/text-to-video", {
4 input: {
5 prompt: "Aerial shot over a glacier at dawn, ice fissures catching first light, slow forward push, atmospheric haze",
6 resolution: "2160p",
7 duration: 8,
8 aspect_ratio: "16:9",
9 fps: 24,
10 generate_audio: true,
11 },
12});
13
14console.log(result.data.video.url);

When to pick LTX 2.3

Pick it when you need 4K output and Veo's native dialogue isn't required - this is the price-efficient way to get broadcast-quality resolution from the platform. Pick it when you want configurable frame rate (24 for cinema, 48/50 for high-motion sports or action). Skip it if you need duration outside of 6/8/10 seconds, or if you rely on negative prompts and seed reproducibility. For a 4K ambient landscape with matching audio, this is the tool.

Back to all posts

Blog