VectorArk: Learning Practical Image Vectorization
with Rounded Polygon Representation

Tarun Gehlaut, Difan Liu, Charu Bansal, Krutik Malani,
Souymodip Chakraborty, Ankit Phogat, Matthew Fisher, Vineet Batra
Adobe
CVPR 2026
VectorArk teaser

VectorArk In Action

More Examples

Abstract

Recent vision-language model (VLM)-based approaches have achieved impressive results on image vectorization tasks. However, they are typically evaluated on synthetic benchmarks, where clean SVGs are rasterized at high resolution and then re-vectorized. As a result, these methods generalize poorly to real-world scenarios, such as images with unknown rasterization methods or those generated by text-to-image models.

We introduce VectorArk, a new VLM-based model designed for robust and practical image vectorization. VectorArk employs a novel rounded polygon representation that simplifies the learning process while naturally producing smooth, visually appealing primitives. We also propose a degradation model that enhances robustness across diverse and imperfect inputs.

Our experiments show that, in contrast to previous methods, VectorArk achieves superior geometric completeness and artifact suppression across multiple datasets, with comprehensive ablations validating the contribution of each component.

Qualitative Comparisons

VectorArk produces faithful vectorizations on challenging inputs including text-to-image model outputs, outperforming prior methods in geometric completeness and visual fidelity.

Input Raster
StarVector
OmniSVG
GPT-4o
Ours
Input 1
StarVector 1
OmniSVG 1
GPT-4o 1
Ours 1
Input 2
StarVector 2
OmniSVG 2
GPT-4o 2
Ours 2
Input 3
StarVector 3
OmniSVG 3
GPT-4o 3
Ours 3
Input 4
StarVector 4
OmniSVG 4
GPT-4o 4
Ours 4
Input 5
StarVector 5
OmniSVG 5
GPT-4o 5
Ours 5
Input 6
StarVector 6
OmniSVG 6
GPT-4o 6
Ours 6
Input 7
StarVector 7
OmniSVG 7
GPT-4o 7
Ours 7
Input 8
StarVector 8
OmniSVG 8
GPT-4o 8
Ours 8

Quantitative Results

VectorArk consistently outperforms all baselines across both benchmarks and difficulty tiers.
Higher SSIM / DINO is better. Lower LPIPS / MSE is better.

SArena Benchmark

Method
SSIM↑LPIPS↓MSE↓DINO↑ SSIM↑LPIPS↓MSE↓DINO↑ SSIM↑LPIPS↓MSE↓DINO↑
GPT-4o 0.6810.2050.0950.972 0.5300.2840.1330.958 0.4700.3340.1510.931
Gemini 0.6220.2530.1210.944 0.4930.3230.1630.932 0.4410.3820.1690.897
OmniSVG 0.8230.0990.0360.980 0.6000.2510.1250.903 0.5180.3240.1230.898
StarVector 0.8760.0690.0320.969 0.7500.1420.0620.949 0.6260.2520.1010.902
Ours 0.9370.0310.0110.992 0.8950.0580.0130.981 0.8570.0930.0220.975

SVGenius Benchmark

Method
SSIM↑LPIPS↓MSE↓DINO↑ SSIM↑LPIPS↓MSE↓DINO↑ SSIM↑LPIPS↓MSE↓DINO↑
GPT-4o 0.6730.1900.0940.976 0.5720.2800.0870.942 0.5660.2950.0790.928
Gemini 0.6110.2440.1210.951 0.5390.3280.0970.914 0.5360.3260.0920.918
OmniSVG 0.8400.0700.0270.985 0.6740.2040.0580.940 0.6380.2480.0650.918
StarVector 0.8900.0460.0190.993 0.7100.2030.0580.921 0.6720.2580.0590.893
Ours 0.9440.0280.0080.995 0.8680.0800.0150.977 0.8300.1200.0230.958

BibTeX

@inproceedings{gehlaut2026vectorark,
  title     = {VectorArk: Learning Practical Image Vectorization
               with Rounded Polygon Representation},
  author    = {Gehlaut, Tarun and Liu, Difan and Bansal, Charu and
               Malani, Krutik and Chakraborty, Souymodip and
               Phogat, Ankit and Fisher, Matthew and Batra, Vineet},
  booktitle = {Proceedings of the IEEE/CVF Conference on
               Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}