12.7.1 - Flow Matching: Straight Paths to Generation#

Duration:: 40-50 minutes (core) + 4-8 hours training (Exercise 3)
Level:: Advanced

Overview#

What if there is a more direct path from noise to images than diffusion’s wandering trajectory?

Flow Matching answers this question with an elegant insight: instead of learning to reverse a noisy corruption process like DDPM, we can learn to follow straight paths from noise to data [Lipman2023]. This simple change has profound implications: Flow Matching typically requires only 10-50 integration steps for high-quality generation, compared to 250-1000 steps for diffusion models.

In this exercise, we apply Flow Matching to generate African fabric patterns using the same dataset from Modules 12.1.2 (DCGAN), 12.1.3 (StyleGAN), and 12.3.1 (DDPM). This enables direct comparison of four different generative approaches on identical data.

Training dataset samples — **Training Data** (African Fabric Patterns)#

Flow Matching generated fabric patterns — **Generated Output** (50 ODE integration steps)#

Learning Objectives#

By the end of this exercise, you will:

Understand the Flow Matching paradigm: How learning velocity fields creates more efficient generative models
Implement the Conditional Flow Matching objective: The simple training loss that makes this approach tractable
Compare with DDPM: Understand why straight paths require fewer integration steps
Train a Flow Matching model from scratch: Complete pipeline on the African fabric dataset

Connection to DDPM Basics (12.3.1)#

If you completed Module 12.3.1 (DDPM Basics), you have already learned:

How diffusion models corrupt images with noise (forward process)
How neural networks learn to reverse this corruption (reverse process)
The U-Net architecture for noise prediction
DDIM for accelerated sampling

Flow Matching builds on these foundations but takes a fundamentally different approach:

Aspect	DDPM (Module 12.3.1)	Flow Matching (This Module)
Training Target	Predict noise \(\epsilon\)	Predict velocity \(v\)
Path Type	Curved, stochastic (SDE)	Straight, deterministic (ODE)
Sampling Steps	50-1000 steps typical	10-50 steps typical
Forward Process	Noise schedule \(\beta_t\)	Linear interpolation
Math Framework	Score matching	Optimal transport

Quick Start#

Before diving into theory, let us see what Flow Matching can generate.

Note

Model Available

The trained model is available at models/flow_matching_fabrics.pt.

Training time: ~1.7 hours on RTX 5070Ti GPU
Final loss: 0.1979
Model size: 19 MB

If you want to train your own model, see Exercise 3.

Once you have a trained model, generate fabric patterns:

import torch
from flow_model import SimpleFlowUNet

# Load model
model = SimpleFlowUNet(in_channels=3, base_channels=64)
checkpoint = torch.load('models/flow_matching_fabrics.pt', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Generate from noise via ODE integration
x = torch.randn(4, 3, 64, 64)  # Start from noise
dt = 1.0 / 50                   # 50 integration steps

with torch.no_grad():
    for i in range(50):
        t = torch.full((4,), i / 50)
        v = model(x, t)          # Get velocity
        x = x + dt * v           # Euler step

# x now contains generated fabric patterns!

Flow Matching generated African fabric patterns — 16 African fabric patterns generated by Flow Matching in just 50 ODE integration steps. Compare with DDPM output from Module 12.3.1 which requires 250+ steps.#

Core Concepts#

Concept 1: From Diffusion to Flow#

In DDPM, we learned that generative modeling can be framed as reversing a corruption process. We add noise gradually over 1000 steps, then train a network to predict that noise and subtract it.

But consider this: the noise-adding process follows a curved, stochastic trajectory through high-dimensional space. Reversing this requires careful step-by-step denoising.

Flow Matching asks a different question: What if we could take the direct route?

Comparison of diffusion and flow matching trajectories — **Left**: DDPM follows a curved, stochastic trajectory requiring ~1000 denoising steps. **Right**: Flow Matching follows a straight, deterministic path requiring only ~50 ODE integration steps. The linear interpolation formula \(x_t = (1-t)x_0 + tx_1\) defines the optimal transport path.#

The key insight is elegant:

Define a straight line from noise \(x_0 \sim \mathcal{N}(0, I)\) to data \(x_1\)
Train a network to predict the velocity along this line
At inference, follow the velocity field from noise to data

This approach was introduced independently by several groups in 2022-2023 [Lipman2023] [Liu2023] and has become the foundation for state-of-the-art systems like FLUX.1 [Esser2024].

Did You Know?

Flow Matching is closely related to optimal transport theory, a mathematical framework dating back to Gaspard Monge in 1781 [Villani2009]. Straight paths correspond to the optimal way to transport mass between distributions!

Concept 2: The Flow Matching Objective#

The training objective for Flow Matching is remarkably simple. Given a data sample \(x_1\):

Step 1: Sample Noise

\[x_0 \sim \mathcal{N}(0, I)\]

Step 2: Sample Time

\[t \sim \text{Uniform}(0, 1)\]

Step 3: Compute Interpolation (Straight Path)

\[x_t = (1 - t) \cdot x_0 + t \cdot x_1\]

This defines a straight line from noise (\(t=0\)) to data (\(t=1\)).

Step 4: Compute Target Velocity

The velocity along a straight line from \(x_0\) to \(x_1\) is simply:

\[v_{\text{target}} = x_1 - x_0\]

This is the direction from noise to data!

Step 5: Train Network to Predict Velocity

\[\mathcal{L} = \mathbb{E}_{x_0, x_1, t} \left[ \| v_\theta(x_t, t) - v_{\text{target}} \|^2 \right]\]

The complete training loop:

def flow_matching_loss(model, x_1):
    batch_size = x_1.shape[0]

    # Sample noise
    x_0 = torch.randn_like(x_1)

    # Sample time
    t = torch.rand(batch_size, 1, 1, 1)

    # Linear interpolation
    x_t = (1 - t) * x_0 + t * x_1

    # Target velocity
    v_target = x_1 - x_0

    # Predict velocity
    v_pred = model(x_t, t.squeeze())

    # MSE loss
    loss = F.mse_loss(v_pred, v_target)
    return loss

Compare this to DDPM’s training: no noise schedules \(\beta_t\), no cumulative products \(\bar{\alpha}_t\), no posterior variance calculations. Just straight lines and MSE loss!

Concept 3: ODE Integration for Sampling#

Once trained, generating samples involves integrating the learned velocity field from \(t=0\) to \(t=1\). This is an ordinary differential equation (ODE):

\[\frac{dx}{dt} = v_\theta(x, t)\]

We can solve this using the Euler method:

\[x_{t+\Delta t} = x_t + \Delta t \cdot v_\theta(x_t, t)\]

Starting from \(x_0 \sim \mathcal{N}(0, I)\) and taking 50 steps:

@torch.no_grad()
def sample_flow(model, num_samples, num_steps=50):
    # Start from noise
    x = torch.randn(num_samples, 3, 64, 64)
    dt = 1.0 / num_steps

    # Integrate ODE
    for i in range(num_steps):
        t = i / num_steps
        v = model(x, t)
        x = x + dt * v

    return x

Why do straight paths need fewer steps?

DDPM’s curved trajectory requires careful following of a winding path
Flow Matching’s straight trajectory is easier to integrate accurately
With 50 Euler steps, we can accurately traverse a straight line
The same 50 steps would poorly approximate DDPM’s complex curve

This is why Flow Matching achieves similar quality with 5-10x fewer steps!

Concept 4: Comparison with DDPM#

Let us directly compare the two approaches on our African fabric dataset:

Training Comparison

Aspect	DDPM	Flow Matching
Loss function	\(\\|ε - ε_θ(x_t, t)\\|^2\)	\(\\|v - v_θ(x_t, t)\\|^2\)
Target computation	Sample noise \(ε\), add to image	Compute direction \(x_1 - x_0\)
Time sampling	Discrete steps 1…1000	Continuous [0, 1]
Noise schedule	Required (linear, cosine)	Not needed

Sampling Comparison

Steps	DDPM Quality	Flow Matching Quality
10	Very poor	Usable
20	Poor	Good
50	Acceptable	Excellent
250	Good (DDIM)	Excellent (unnecessary)

When to Use Each

Use Flow Matching when: Speed is important, you need fast iteration, deploying to resource-constrained environments
Use DDPM when: You have existing diffusion infrastructure, need classifier-free guidance (though FM supports this too)

Both approaches can use the same U-Net architecture, differing only in training objective and sampling procedure.

Hands-On Exercises#

Exercise 1: Generate Fabric Patterns (Execute)#

Download exercise1_generate.py

Goal: Run the pre-trained Flow Matching model to generate African fabric patterns.

Prerequisites: Trained model at models/flow_matching_fabrics.pt

python exercise1_generate.py

What to observe:

Generation uses only 50 steps (compare to DDPM’s 250)
Output shows 16 unique fabric patterns
Compare quality and diversity with your DDPM outputs from Module 12.3.1

16 Flow Matching generated African fabric patterns — 16 unique African fabric patterns generated by Flow Matching with 50 ODE integration steps.#

Exercise 2: Explore Flow Parameters (Modify)#

Download exercise2_explore.py

Goal: Understand how different parameters affect Flow Matching generation.

python exercise2_explore.py

This script runs three explorations:

Part A: Sampling Steps Comparison

Generates the same pattern with 5, 10, 20, 50, and 100 steps.

Comparison of generation quality at different step counts — Quality comparison at 5, 10, 20, 50, and 100 integration steps. Flow Matching achieves usable results with as few as 10 steps, and excellent results at 20-50 steps.#

Part B: Flow Trajectory Visualization

Shows the step-by-step transformation from noise to image.

Flow trajectory from t=0 to t=1 — Straight-line transformation from pure noise (t=0) to fabric pattern (t=1). The transformation is gradual and smooth, following the learned velocity field.#

Animated flow trajectory — Animated version showing the continuous transformation from noise to fabric pattern.#

Part C: Velocity Field Visualization

Visualizes the learned velocity field at different times.

Exercise 3: Train Your Own Flow Matching Model (Create)#

Download exercise3_train.py

Goal: Train a Flow Matching model from scratch on the African fabric dataset.

Time required: 4-8 hours on RTX 5070Ti GPU

Step 1: Verify Dataset

The training uses the preprocessed African fabric dataset from Module 12.1.2:

python exercise3_train.py --verify

Expected output: “Found 1059 images”

Step 2: Start Training

python exercise3_train.py --train

Training Configuration:

Image Size	64x64 (consistent with DCGAN/StyleGAN/DDPM)
Batch Size	32
Learning Rate	1e-4 (AdamW optimizer)
Training Steps	100,000
Model Parameters	4,662,851 (SimpleFlowUNet)

What to Monitor:

Loss should decrease steadily (no oscillation like GANs)
Sample images saved every 1000 steps in training_results/
Training is stable; simpler than DDPM or GAN training

Step 3: Generate from Your Model

After training completes:

python exercise1_generate.py

Your model weights are saved at models/flow_matching_fabrics.pt.

Training Results: Observing Flow Matching Learning#

Showcase Animation#

A 15-second morphing animation demonstrates the model’s capability to generate diverse fabric patterns through smooth latent space interpolation.

Flow Matching fabric pattern morphing animation — Smooth morphing between 5 keyframe patterns using SLERP interpolation in noise space. Each frame requires only 50 ODE integration steps, demonstrating Flow Matching’s efficiency advantage over DDPM (which requires 250+ steps).#

To generate the animation:

python generate_flow_morph.py

Animation Parameters:

Duration	15 seconds (seamless loop)
Resolution	256x256 (upscaled from 64x64)
Keyframes	5 distinct patterns
Total Frames	450 (30 FPS)
Sampling Steps	50 per frame (vs DDPM’s 250)
Interpolation	Spherical Linear (SLERP)

Comparison with DDPM Animation (Module 12.3.1):

The Flow Matching morph animation uses the same SLERP technique as the DDPM version, but with a key difference: each frame requires only 50 ODE integration steps instead of 250 DDIM steps. This represents a 5x speedup in generation, while producing comparable visual quality.

Summary#

Key Takeaways

Flow Matching learns velocity fields: Instead of predicting noise like DDPM, we predict the direction from noise to data
Straight paths are efficient: Linear interpolation enables generation with 10-50 steps instead of 250-1000
Training is simple: Just MSE loss between predicted and target velocity
Same architecture works: The U-Net from DDPM can be used for Flow Matching

Common Pitfalls

Warning

Using too few steps: While Flow Matching needs fewer steps than DDPM, using fewer than 10 may produce artifacts
Forgetting to clamp outputs: Generated values should be clamped to [-1, 1]
Incorrect time range: Time should go from 0 (noise) to 1 (data), not 0 to 1000 like DDPM

Next Steps

Flow Matching is the foundation for modern generative systems:

FLUX.1: Black Forest Labs’ state-of-the-art text-to-image model [Esser2024]
Stable Diffusion 3: Incorporates flow matching ideas
Rectified Flow [Liu2023]: Further straightening for near-single-step generation

References#

[Lipman2023] (1,2)

Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., & Le, M. (2023). Flow Matching for Generative Modeling. International Conference on Learning Representations (ICLR 2023). https://arxiv.org/abs/2210.02747

[Liu2023] (1,2)

Liu, X., Gong, C., & Liu, Q. (2023). Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. International Conference on Learning Representations (ICLR 2023). https://arxiv.org/abs/2209.03003

[Tong2023]

Tong, A., Malkin, N., Huguet, G., Zhang, Y., Rector-Brooks, J., Fatras, K., & Bengio, Y. (2023). Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport. arXiv preprint. https://arxiv.org/abs/2302.00482

[Ho2020]

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33, 6840-6851. https://arxiv.org/abs/2006.11239

[Esser2024] (1,2)

Esser, P., et al. (2024). Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arXiv preprint. https://arxiv.org/abs/2403.03206

[Villani2009]

Villani, C. (2009). Optimal Transport: Old and New. Springer. ISBN: 978-3-540-71049-3

[CambridgeMLG2024]

Cambridge MLG. (2024). An Introduction to Flow Matching. Cambridge Machine Learning Group Blog. https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html

[MetaFM2024]

Meta AI. (2024). Flow Matching Library. GitHub repository. facebookresearch/flow_matching (Apache-2.0 License)

[torchcfm2023]

Tong, A. (2023). torchcfm: Conditional Flow Matching Library. GitHub repository. atong01/conditional-flow-matching

[Ronneberger2015]

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234-241.

Total Steps	100,000
Training Time	1.68 hours (RTX 5070Ti)
Final Loss	0.1979 (started at ~0.29)
Model Size	19 MB
Parameters	4,662,851
Checkpoints Saved	20 (every 5,000 steps)
Sample Images	100 (every 1,000 steps)