Biography
I am a third-year PhD student in computer science at New York University,
where I am fortunate to be advised by Prof. Juliana Freire.
I also had the pleasure of collaborating with Prof. Christopher Musco and Prof. Chinmay Hegde.
Previously, I jointly pursued an Honors B.S. in Computer Science and an Honors B.A. in Mathematics at the University of Rochester in 2023,
where I am pleased to have worked with
Prof. Fatemeh Nargesian, Prof. Jiebo Luo,
and Prof. Daniel Štefankovič.
I am broadly interested in data management,
particularly applying large language models to data discovery and integration tasks,
with emphasis on post-training methods and agentic frameworks.
Publications (* = equal contribution)
BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization
Roque Lopez,
Yurong Liu, Christos Koutras, Juliana Freire
SIGMOD 2026 (Demo)
[paper]
[code]
AutoDDG: Automated Dataset Description Generation using Large Language Models
Haoxiang Zhang*,
Yurong Liu*, Wei-Lun (Allen) Hung, Aécio Santos, Juliana Freire
SIGMOD 2026
[paper] [code]
Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values
R. Teal Witter,
Yurong Liu, Christopher Musco
NeurIPS 2025
[paper]
[code]
[blog]
Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
Yurong Liu*, R. Teal Witter*, Flip Korn, Tarfah Alrashed, Dimitris Paparas, Christopher Musco, Juliana Freire
Under review at TMLR
[paper] [code]
Magneto: Combining Small and Large Language Models for Schema Matching
Yurong Liu*, Eduardo Pena*, Aécio Santos, Eden Wu, Juliana Freire
VLDB 2025
[paper] [code]
Large Language Models for Data Discovery and Integration: Challenges and Opportunities
Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras,
Yurong Liu, Eduardo Pena, Aécio Santos, Cláudio Silva, Eden Wu
IEEE Data Engineering Bulletin 2025
[paper]
Enhancing Biomedical Schema Matching with LLM-based Training Data Generation
Yurong Liu, Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Eden Wu, Juliana Freire
NeurIPS 2024 Third Table Representation Learning Workshop
[paper]
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
Benjamin Feuer,
Yurong Liu, Chinmay Hegde, Juliana Freire
VLDB 2024
[paper] [code]
Sampling over Union of Joins
Yurong Liu*, Yunlong Xu*, Fatemah Nargesian
SIGMOD 2023 (SRC)
[paper]
Misc
Outside of research, I am an amateur basketball player 🏀 and ACG lover.
My favorite anime are One Piece, Gintama, and Cowboy Bebop. Recently, I've also really enjoyed Vinland Saga: Season 1.