I am a second-year PhD student in computer science at New York University,
where I am fortunate to be advised by Prof. Juliana Freire.
I also had the pleasure of collaborating with Prof. Christopher Musco and Prof. Chinmay Hegde.
I am broadly interested in data-centric aspects of AI, especially large language models for data discovery and data explainability.
My research also involves representation learning for structured data.
Previously, I jointly pursued an Honors B.S. in Computer Science and an Honors B.A. in Mathematics at the University of Rochester in 2023,
where I am pleased to have worked with
Prof. Fatemeh Nargesian, Prof. Jiebo Luo,
and Prof. Daniel Štefankovič.
|
"Perhaps you will not only have some appreciation of this culture;
it is even possible that you may want to join in the greatest adventure that the human mind has ever begun."
-- Richard Feynman. The Feynman Lectures on Physics (1964)
|
Publications
(* = equal contribution; (α-β) = alphabetical order)
|
AutoDDG: Automated Dataset Description Generation using Large Language Models
Haoxiang Zhang, Yurong Liu, Wei-Lun (Allen) Hung, Aécio Santos, Juliana Freire
In Submission
[paper] [code]
"A framework for automated dataset description generation tailored for tabular data."
|
Magneto: Combining Small and Large Language Models for Schema Matching
Yurong Liu*, Eduardo Pena*, Aécio Santos, Eden Wu, Juliana Freire
In Submission
[paper] [code]
"A cost-effective solution for schema matching that combines the advantages of SLMs and LLMs to address their limitations. "
|
Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
Yurong Liu*, R. Teal Witter*, Flip Korn, Tarfah Alrashed, Dimitris Paparas, Christopher Musco, Juliana Freire
In Submission
[paper] [code]
"A novel linear regression-based estimator for Banzhaf values in interpretable machine learning."
|
Large Language Models for Data Discovery and Integration: Challenges and Opportunities
(α-β) Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras, Yurong Liu, Eduardo Pena, Aécio Santos, Cláudio Silva, Eden Wu
IEEE Data Engineering Bulletin, 2025
[paper]
"How LLMs are being applied across data integration and discovery tasks?"
|
Enhancing Biomedical Schema Matching with LLM-based Training Data Generation
Yurong Liu, Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Eden Wu, Juliana Freire
TRL@NeurIPS, 2024
[paper]
"Preliminary version of Magneto."
|
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
Benjamin Feuer, Yurong Liu, Chinmay Hegde, Juliana Freire
VLDB, 2024 [paper] [code]
"A framework utilizing large language models for column type annotation, which supports fine-tuning and zero-shot learning settings."
|
Sampling over Union of Joins
Yurong Liu*, Yunlong Xu*, Fatemah Nargesian
SIGMOD, 2023 (Companion) [paper]
"Random sampling over the set and disjoint union of joins, with sample uniformity and independence guarantees"
|
Google Research
Student Researcher, May 2024 - Aug 2024.
Advisor: Flip Korn.
Worked on dataset search and game-theoretic explainable AI.
|
|