Bridging Local Observation and Global Simulation in Closed-Loop Traffic Modeling

Ziyan Wang^1,2* · Tan Xiang^2* · Peng Chen² · Xintao Yan^1†

¹The University of Hong Kong · ²Beihang University · ^*Equal contribution · ^†Corresponding author

CRAFT, a Contextual pReference Alignment Framework for Traffic Simulation, bridges this local-to-global mismatch via self-supervised failure discovery and preference-guided test-time alignment.

Read the paper arXiv Code View rollouts ↓

31.2% lower scene-level collision rate
for CAT-K

33.2% lower traffic-violation rate
for SMART

48.7% of abnormal behaviors warned
at least 1.0 s early

23.79% lower abnormal-agent rate
at the 9 s horizon

Overview

Can policies learned from locally observed driving logs remain behaviorally rational when executed in globally contextualized closed-loop simulation?

Autoregressive traffic simulators learn from ego-centric driving logs, where surrounding context can be incomplete due to perception limits and occlusions. When these models are rolled out in globally observable simulation, incomplete context-action associations can lead to abnormal stops, unsafe interactions, and rule violations.

Local-to-global context mismatch and CRAFT alignment — Incomplete ego-centric logs can induce flawed context-action mappings. CRAFT mitigates this mismatch by learning complete-context preferences and guiding simulation toward more rational behaviors.

Method

CRAFT turns simulator-induced failures into preference guidance.

CRAFT treats the simulator itself as a globally observable sandbox. Starting from logged initial states, the base simulator generates diverse what-if rollouts that self-expose failure modes induced by incomplete observational context. These failures are grounded with human-aligned driving priors and converted into preference supervision for a Contextual Preference Evaluator (CPE), which assesses the rationality of generated behaviors under complete scene context.

Training

Construct grouped complete-context what-if rollouts.

The base simulator performs stochastic rollouts with diverse random seeds from the same logged initial scene. The resulting preference dataset is organized as grouped alternative futures, supporting token-level supervision and inter-rollout preference learning.

Inference

Reweight autoregressive decoding before action execution.

At each simulation step, CPE evaluates next-step candidates through one-step lookahead scenes and calibrates the simulator's next-token distribution before action execution.

Qualitative Rollouts

CRAFT guides closed-loop rollouts toward globally coherent traffic behavior.

Each scenario compares the baseline simulator with CRAFT under the same initial scene. CRAFT uses complete-context preference guidance to suppress locally plausible but globally inconsistent actions.

Baseline

CRAFT

Traffic-signal compliance

The baseline selects an action that violates the traffic signal, while CRAFT suppresses the violating candidate and maintains rule-compliant behavior.

Baseline

CRAFT

Abnormal-stop mitigation

The baseline produces an unexplained stop under the complete simulated context, whereas CRAFT guides the rollout toward smoother forward motion and preserves traffic flow.

Baseline

CRAFT

Safer multi-agent interaction

The baseline rollout leads to an unsafe interaction pattern, while CRAFT recalibrates candidate actions using contextual preference scores and produces a more coherent multi-agent behavior.

Baseline

CRAFT

Self-recovery under initial perturbations

The baseline struggles to recover from the initial perturbation, while CRAFT guides the vehicle back toward normal driving behavior under the complete simulated context.

Open-loop Eval.

Open-loop contextual preference evaluation

CPE estimates agent-time preference scores for generated behaviors under the complete simulated scene. In this visualization, green indicates a high preference score close to 1, red indicates a low preference score close to 0, and yellow indicates an intermediate score around 0.5, with colors gradually transitioning from green to red.

Main results

CRAFT improves closed-loop behavioral rationality.

CRAFT improves closed-loop behavioral rationality while maintaining competitive distributional realism. CPE functions as a plug-in alignment module for different simulators, amortizing human-aligned criteria into dense contextual preference scores and enabling more stable probability calibration during decoding.

Base Model	Strategy	JSD (× 10⁻²) ↓			Collision (%) ↓		Offroad (%) ↓		Traffic (%) ↓ P-Sc.
Base Model	Strategy	Spd.	Ang.	Dist.	P-Ag.	P-Sc.	P-Ag.	P-Sc.	Traffic (%) ↓ P-Sc.
Log	Log replay	0.00	0.00	0.00	0.56	5.20	1.59	14.40	20.70
GUMP	Top-K	4.78	5.41	11.36	4.06	36.30	3.80	26.90	32.70
SMART	Top-K	1.07	3.23	0.56	3.13	15.10	2.52	19.10	38.80
CAT-K	Top-K	1.02	2.96	0.53	3.41	15.70	2.54	19.50	36.50
R1Sim	Top-K	1.05	3.16	0.56	3.36	15.50	2.23	16.30	36.60
CAT-K	Auto-labeler	1.19	3.02	0.67	3.21	15.50	2.57	20.30	36.70
CAT-K	CRAFT	0.92	2.20	0.69	2.46	10.80	2.26	16.60	26.60
SMART	CRAFT	0.91	2.20	0.72	2.60	11.90	2.06	17.10	25.90

CAT-K collision P-Sc. 15.70 → 10.80 (31.2% reduction)

CAT-K traffic P-Sc. 36.50 → 26.60 (27.1% reduction)

SMART traffic P-Sc. 38.80 → 25.90 (33.2% reduction)

Test-time alignment analysis

CPE identifies risky action distributions before failures materialize and mitigates long-horizon error accumulation.

This proactive signal enables test-time correction before execution, whereas rule-based evaluators usually react only after explicit violations have occurred.

Abnormal-agent warning rate over different warning lead times — **Proactive risk assessment.** CPE warns 48.7% of abnormal behaviors at least 1.0 s in advance and 17.7% at least 2.0 s in advance.

Abnormal-agent rate curves of CRAFT and CAT-K over closed-loop rollout time — **Error accumulation mitigation.** At the 9 s horizon, CRAFT achieves a 23.79% lower abnormal-agent rate than CAT-K.

Robustness

Training scale, guidance strength, and candidate size jointly affect test-time alignment.

Increasing the preference-training data consistently improves behavioral metrics, indicating that CPE benefits from broader coverage of simulator-induced failure modes. At inference time, the guidance scale β controls the strength of calibration: a small β provides limited correction, while an overly large β over-calibrates the base distribution and degrades performance. Increasing the candidate pool from 16 to 32 improves guidance quality, but a larger pool introduces noisier candidates and reduces stability.

Relative performance under different preference-data sizes, guidance scales, and candidate-pool sizes

Contributions

From incomplete log imitation to complete-context preference alignment.

Local-to-global mismatch

Imitation learning on ego-centric logs leads to incomplete context-action associations that degrade closed-loop behavior.

Contextual Preference Evaluator

CPE learns complete-context behavioral preferences from simulator-induced failures grounded by human-aligned driving priors.

Plug-in test-time guidance

CPE is deployed as a plug-in test-time guidance module for autoregressive decoding, improving behavioral rationality while preserving competitive realism.

Citation

BibTeX

@inproceedings{anonymous2026craft,
  title     = {Bridging Local Observation and Global Simulation in Closed-Loop Traffic Modeling},
  author    = {Ziyan Wang and Tan Xiang and Peng Chen and Xintao Yan},
  booktitle = {},
  year      = {2026}
}