ICLR 2026

CoDA
Agentic Systems for
Collaborative Data Visualization

Specialized LLM agents collaborate through metadata analysis, task planning, code generation, and self-reflection to automate data visualization.

1 UC Santa Barbara 2 Google Cloud AI Research 3 Google
UC Santa Barbara Google

* Work done during a research internship at Google Cloud AI Research.

0%
Improvement
0
Agents
0x
vs SOTA
OVERVIEW

Abstract

Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems struggle with complex datasets containing multiple files and iterative refinement. Existing approaches, including simple single- or multi-agent systems, often oversimplify the task, focusing on initial query parsing while failing to robustly manage data complexity, code errors, or final visualization quality. In this paper, we reframe this challenge as a collaborative multi-agent problem. We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection. We formalize this pipeline, demonstrating how metadata-focused analysis bypasses token limits and quality-driven refinement ensures robustness. Extensive evaluations show CoDA achieves substantial gains in the overall score, outperforming competitive baselines by up to 41.5%. This work demonstrates that the future of visualization automation lies not in isolated code generation but in integrated, collaborative agentic workflows.

ARCHITECTURE

How CoDA Works

Four collaborative phases powered by specialized agents. Click each stage to explore.

01

Understanding

Query intent parsing and data metadata extraction without raw data upload.
Query Analyzer Data Processor
02

Planning

Example code search, visual mappings, and design optimization.
VizMapping Design Explorer Search
03

Generation

Code generation with best practices and automated debugging.
Code Generator Debug Agent
04

Self-Reflection

Quality evaluation with iterative feedback loops for refinement.
Visual Evaluator Feedback Loop
CoDA framework pipeline diagram showing 4 collaborative phases: Understanding (metadata analysis), Planning (task decomposition), Generation (code writing with search agent), and Self-Reflection (visual evaluation with feedback loops), orchestrated by 8 specialized LLM agents
Figure 1. Overview of the CoDA framework. Natural language queries are decomposed into Understanding, Planning, Generation, and Self-Reflection phases with quality-driven feedback loops.
TRACE

Agent Collaboration Trace

A browser-usage sunburst query walks through the full pipeline, including diagnostic feedback loops. Click each step to expand.

Iteration 1 Full-pipeline pass
Understanding

Query Agent + Data Agent

Decompose the user query into tasks and extract data schema & hierarchy without loading raw data.
Query Agent — Input

"Create a sunburst chart showing browser market share by version from the provided dataset."

Query Agent — Output

Task list: load data, parse hierarchy (browser → version), compute share %. Viz type: sunburst. Key columns: browser, version, share.

Data Agent — Output

Schema: 5 browsers × 22 versions, 2-level hierarchy. Total share sums to 100%. No missing values.

Planning

Design Agent + Search Agent

Design the sunburst layout and color scheme; retrieve relevant matplotlib code examples.
Design Agent — Output

Chart: nested sunburst. Palette: distinct hue per browser (Chrome=blue, Firefox=orange, Safari=grey, Edge=green, Opera=red). Labels: radial text on outer ring, percentage on inner ring.

Search Agent — Output

Retrieved 3 examples: nested pie with ax.pie(), stacked-donut sunburst, label rotation patterns. Similarity: 0.89, 0.85, 0.81.

Generation

Code Agent + Debug Agent

Generate a 142-line Python script and execute it. Output image rendered successfully.
Code Agent — Output

142-line script: data loading, hierarchy parsing, nested ax.pie() rings with per-browser color maps, radial labels, percentage annotations.

Debug Agent — Output

Execution successful. No runtime errors. Image rendered.

Browser sunburst chart — iteration 1 output with label overlap issues
Self-Reflection

Eval Agent

Score below threshold — detected layout and labeling issues. Routing feedback to upstream agents.
Diagnosis

Issues: (1) outer-ring labels overlap at small slices, (2) inner-ring text collides with wedge borders, (3) color contrast too low for Safari segments.

Routing

Design Agent ← low aesthetics & layout  |  Code Agent ← label collision fix

Score: 45 / 100 — below θq
Feedback routed to Design Agent & Code Agent
Iteration 2 Targeted refinement — only re-triggered agents run
Planning

Design Agent (re-triggered)

Revises label placement and color contrast based on Eval feedback.
Revised output

Hide text for slices < 3%, leader lines for 3–5%. Safari: light-grey → steel-blue. Inner ring: percentages repositioned outside wedge borders.

Generation

Code Agent + Debug Agent (re-triggered)

Regenerate script with revised label logic (+18 lines). Execution successful, labels no longer overlap.
Code Agent — Changes

+18 lines: conditional label hiding, annotate() with leader lines, Safari color map updated, inner-ring text offset.

Debug Agent — Output

Execution successful. No runtime errors.

Browser sunburst chart — iteration 2, labels fixed after feedback loop
Self-Reflection

Eval Agent (converged)

All quality criteria met. Decision: HALT — no further refinement needed.
Evaluation

Correct chart type, accurate hierarchy, clean labels with leader lines, proper color contrast. All dimensions above θq = 0.85.

Score: 92 / 100 — HALT
EVALUATION

Experimental Results

Comprehensive evaluation across multiple benchmarks and human expert studies.

MatplotBench & Qwen Code Interpreter

gemini-2.5-pro
Method MatplotBench Qwen Code Interpreter
EPRVSROS EPRVSROS
MatplotAgent 97.0
56.7
55.0
81.6
79.7
65.0
VisPath 75.0
37.3
38.0
86.5
94.3
81.6
CoML4VIS 76.0
69.7
53.0
87.1
90.9
79.1
CoDA (Ours) 99.0
79.8
79.5
93.3
95.4
89.0

DA-Code Benchmark (Overall Score %)

SWE-level
CoDA (Ours) Gemini-2.5-Pro
39.0
DS-STAR Gemini-2.5-Pro
20.5
DA-Agent Gemini-2.5-Pro
19.2
DA-Agent GPT-4o
17.0
DA-Agent GPT-4
16.0
DA-Agent Deepseek-Coder
11.0

Human Expert Evaluation on MatplotBench

3 experts · 200 charts
MethodEloHarmonyBalanceColorSimplicityQuery Al.
MatplotAgent15063.653.653.534.313.63
VisPath14842.712.712.652.922.78
CoML4VIS13093.163.223.224.003.59
CoDA (Ours)17014.824.734.964.944.86
Line chart showing overall visualization quality score improving from ~60 to ~85 across 5 refinement iterations in CoDA's self-reflection loop
Overall Score vs. refinement iterations.
Bar chart comparing visualization quality metrics with and without the Search Agent, showing consistent improvement when Search Agent is enabled
Impact of the Search Agent.
Bar chart comparing visualization quality metrics with and without the Global TODO List, demonstrating improved coordination across agents
Impact of the Global TODO List.
Key Performance Highlights
CoDA sets new state-of-the-art across all benchmarks and human evaluation.
0%
+24.5%
MatplotBench OS
vs. 55.0% best baseline
0%
+7.4%
Qwen OS
vs. 81.6% best baseline
0
#1 Rank
Elo Rating
vs. 1506 next best
0%
~2x SOTA
DA-Code OS
vs. 20.5% DS-STAR
REFERENCE

Citation

BibTeX
@inproceedings{chen2026coda, title = {{CoDA}: Agentic Systems for Collaborative Data Visualization}, author = {Chen, Zichen and Chen, Jiefeng and Ar{\i}k, Sercan {\"O}. and Sra, Misha and Pfister, Tomas and Yoon, Jinsung}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026} }

Key Findings