Yurong Liu (刘雨绒)

Ph.D. Student, VIDA Center
Department of Computer Science, New York University

Email: yurong.liu [at] nyu [dot] edu
Office: 370 Jay Street, 11th floor

GitHub | Google Scholar | LinkedIn | Twitter


I am a second-year PhD student in computer science at New York University, where I am fortunate to be advised by Prof. Juliana Freire. I also had the pleasure of collaborating with Prof. Christopher Musco and Prof. Chinmay Hegde.

I am broadly interested in data management, particularly applying large language models to data discovery and integration tasks, with emphasis on post-training methods and agentic frameworks. My research also involves table representation learning and explainable AI.

Previously, I jointly pursued an Honors B.S. in Computer Science and an Honors B.A. in Mathematics at the University of Rochester in 2023, where I am pleased to have worked with Prof. Fatemeh Nargesian, Prof. Jiebo Luo, and Prof. Daniel Štefankovič.

Publications

(* = equal contribution; (α-β) = alphabetical order)

AutoDDG: Automated Dataset Description Generation using Large Language Models
Haoxiang Zhang*, Yurong Liu*, Wei-Lun (Allen) Hung, Aécio Santos, Juliana Freire
SIGMOD 2026[paper] [code]

Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values
R. Teal Witter, Yurong Liu, Christopher Musco
NeurIPS 2025[paper] [blog]

Magneto: Combining Small and Large Language Models for Schema Matching
Yurong Liu*, Eduardo Pena*, Aécio Santos, Eden Wu, Juliana Freire
VLDB 2025[paper] [code]

Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
Yurong Liu*, R. Teal Witter*, Flip Korn, Tarfah Alrashed, Dimitris Paparas, Christopher Musco, Juliana Freire
Preprint[paper] [code]

Large Language Models for Data Discovery and Integration: Challenges and Opportunities
(α-β) Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras, Yurong Liu, Eduardo Pena, Aécio Santos, Cláudio Silva, Eden Wu
IEEE Data Engineering Bulletin 2025[paper]

Enhancing Biomedical Schema Matching with LLM-based Training Data Generation
Yurong Liu, Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Eden Wu, Juliana Freire
NeurIPS 2024 Third Table Representation Learning Workshop[paper]

ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
Benjamin Feuer, Yurong Liu, Chinmay Hegde, Juliana Freire
VLDB 2024[paper] [code]

Sampling over Union of Joins
Yurong Liu*, Yunlong Xu*, Fatemah Nargesian
SIGMOD 2023 (Companion)[paper]

Experience

Microsoft Research
Researcher Intern, May 2025 - Aug 2025.
Advisor: Dr. Yeye He
Worked on large language models for data preparation at the Data Systems Group.

Google Research
Student Researcher, May 2024 - Aug 2024.
Advisor: Dr. Flip Korn
Worked on dataset search and game-theoretic explainable AI.

Misc

Outside of research, I am an amateur basketball player 🏀 and ACG lover. My favorite anime are One Piece and Cowboy Bebop. Recently, I’ve also really enjoyed Vinland Saga: Season 1.