Publications

Mapping the Impact of Foundation Models on the UN Sustainable Development Goals

J. Sivaloganathan, A. Trišović, N. Thompson. (2025). 21st IEEE International Conference on e-Science (eScience 2025)

Description
We analyze scientific publications using Foundation Models to assess their alignment with the UN Sustainable Development Goals, revealing a concentration of work on a few goals and large gaps in others.
LLMs in Citation Intent Classification: Progress, Precision, and Reproducibility Challenges

A. Fogelson, A. Trišović, N. Thompson. (2025). ACM Conference on Reproducibility and Replicability (ACM REP)

Description
We investigate the use of LLMs for multi-class citation intent classification, highlighting striking inter-model disagreement among state-of-the-art systems and revealing key challenges in the robustness, transparency, and reproducibility of LLM-based research.
Co-exposure to Extreme Heat, Wildfire Burn Zones, and Wildfire Smoke in the Western US from 2006 to 2020

J. K. Hu*, A. Trišović*, A. Bakshi, D. Braun, F. Dominici, J. A. Casey. (2025). Science Advances

Description
We analyze 15 years of daily census tract–level data to quantify coexposure to extreme heat, wildfire burn zones, and smoke across 11 Western U.S. states. Coexposures—especially between heat and smoke—increased over time and disproportionately affected vulnerable and Indigenous populations.
Predicting Concurrence of Heatwaves, Droughts, and Wildfires with Spatiotemporal Deep Learning

A. Trišović, G. Miller, D. Bertsimas, and J. K. Hu. (2025). Tackling Climate Change with Machine Learning at ICLR

Description
We introduce a framework to identify a state-of-the-art multi-task model for jointly predicting heatwaves, droughts, and wildfires, capturing shared risk factors across these climate extremes.
A Practical Guide to Climate Econometrics: Navigating Key Decision Points in Weather and Climate Data Analysis

(α-β) J. A. Rising, A. Hussain, K. Schwarzwald, A. Trisovic. (2024). Journal of Open Source Education (JOSE)

Description
We present a free and open-source tutorial on the practical aspects of climate econometrics, which includes data collection, analysis design, and result presentation (available at climateestimate.net).
SpaCE: The Spatial Confounding Environment

M. Tec, A. Trisovic, M. Audirac, S. Woodward, J. K. Hu, N. Khoshnevis, F. Dominici. (2024). The 12th International Conference on Learning Representations (ICLR)

Description
We introduce SpaCE - The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding.
SpaCE The Spatial Confounding (Benchmarking) Environment

M. Tec, A. Trisovic, M. Audirac, F. Dominici. (2023). Causal Learning and Reasoning (CLeaR)

Description
The article introduces SpaCE datasets as a benchmarking tool to assist in developing novel methods to address outstanding challenges in spatial and network causal inference.
Air Pollution and Acute Kidney Injury in the US Medicare Population: A Longitudinal Cohort Study

W. Lee, X. Wu, S. Heo, J. M. Kim, K. C. Fong, J. Son, M. B. Sabath, A. Trisovic, D. Braun, J. Y. Park, Y. C. Kim, J. P. Lee, J. Schwartz, H. Kim, F. Dominici, Z. Al-Aly, M. L. Bell. (2023). Environmental Health Perspectives

Description
The article investigates the association between short-term exposure to air pollution and acute kidney injury (AKI) in the US Medicare population.
Advancing Software Citation Implementation

D. Bouquin, A. Trisovic, O. Bertuch, E. Colón-Marrero. (2023). Software Citation Workshop 2022 (ArXiv)

Description
Software's pivotal role in progress is not mirrored in traditional acknowledgments. This report from captures insights from 51 global experts on unresolved software citation issues. It aims to pinpoint and tackle these challenges, benefiting the GLAM community, repository managers, software developers, and publishers.
Nine Best Practices for Research Software Registries and Repositories

D. Garijo, H. Ménager, L. Hwang, A. Trisovic, M. Hucka, T. Morrell, A. Allen, Task Force on Best Practices for Software Registries, SciCodes Consortium. (2022). PeerJ Computer Science

Description
As the FORCE11 Software Citation Implementation Working Group, we describe the best practices for software repositories and registries which include defining the scope, policies, and governing rules, along with the background, examples, and collaborative work that went into their development.
Cluster Analysis of Open Research Data: A Case for Replication Metadata

A. Trisovic. (2023). International Journal of Digital Curation

Description
The article presents a cluster analysis of 1,000+ open research datasets from the Harvard Dataverse repository to identify the most common replication metadata elements.
A Large-Scale Study on Research Code Quality and Execution

A. Trisovic, T. Pasquier, M. K. Lau, M. Crosas. (2022). Nature Scientific Data

Description
This paper presents a large-scale study of the quality, programming literacy, and reproducibility of over 2100 datasets that contain research code in R from the Harvard Dataverse data repository.
Packaging Research Artefacts with RO-Crate

(α-β) S. Soiland-Reyes, P. Sefton, M. Crosas, L. J. Castro, F. Coppens, J. M. Fernández, D. Garijo, B. Grüning, M. La Rosa, S. Leo, E. Ó Carragáin, M. Portier, A. Trisovic, RO-Crate Community, P. Groth, C. Goble. (2021). Data Science

Description
We introduce RO-Crate, an open, community-driven, lightweight approach to packaging research artifacts with metadata, including their identifiers, provenance, relations, and annotations.
Towards FAIR Principles for Open Hardware

N. Miljković, A. Trisovic, L. Peer. (2021). Conference on Application of Free Software and Open Hardware (PSSOH)

Description
We elaborate on open hardware dissemination and reuse complexity, present examples of unique demands, and propose leveraging FAIR principles to make it findable, accessible, interoperable, and reusable.
Why Do We Plot Data?

(α-β) K. Blumenthal, A. Goeva, S. Stoudt, A. Trisovic, P. Trisovic. (2021). Harvard Data Science Review

Description
Explainer Zine for the article "Designing for interactive exploratory data analysis requires theory of graphical inference."
Recipes for Connector Courses From the Early-Career Board Kitchen

(α-β) A. Goeva, P. Jones, S. Stoudt, A. Trisovic. (2021). Harvard Data Science Review

Description
We propose a handful of connector courses for data science, inspired by the article "Interleaving Computational and Inferential Thinking- Data Science for Undergraduates at Berkeley."
Repository Approaches to Improving the Quality of Shared Data and Code

A. Trisovic, K. Mika, C. Boyd, S. Feger, M. Crosas. (2021). Data

Description
We propose three approaches based on computational reproducibility, data curation, and gamified design elements that can be used to indicate and improve the quality of shared data and code in data repositories.
Kaleidoscopic Perspectives on Practicum-Based Data Science Education

(α-β) S. Frost, A. Goeva, J. Pombra, W. Seaton, S. Stoudt, A. Trisovic, C. Wang, C. Zucker. (2021). Harvard Data Science Review

Description
Early-Career Board members of the Harvard Data Science Review discuss the acquisition of practical data science skills and share their experiences from a number of disciplines.
Toward Reproducible and Extensible Research: From Values to Action

(α-β) A. Goeva, S. Stoudt, A. Trisovic. (2020). Harvard Data Science Review

Description
This paper discusses the National Academies' report "Reproducibility and Replicability in Science," advocating for reusability and the need for actionable and hierarchical steps for researchers.
Early-Career View on Data Science Challenges: Responsibility, Rigor, and Accessibility

(α-β) S. Frost, A. Goeva, W. Seaton, S. Stoudt, A. Trisovic. (2020). Harvard Data Science Review

Description
Early-Career Board members of the Harvard Data Science Review present their view of top research challenge areas in data science.
Advancing Computational Reproducibility in the Dataverse Data Repository Platform.

A. Trisovic, P. Durbin, T. Schlatter, G. Durand, S. Barbosa, D. Brooke, M. Crosas. (2020). 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS)

Description
The Dataverse repository software has undertaken integrations with the platforms Code Ocean, Whole Tale, Renku, and Jupyter Binder, which will help capture research code dependencies and advance reproducibility.
Real-Time HEP Analysis With FuncX – a High-Performance Platform for Function as a Service.

A. E. Woodard, A. Trisovic, Z. Li, Y. Babuji, R. Chard, T. Skluzacek, B. Blaiszik, D. S. Katz, I. Foster, K. Chard. (2020). 24th International Conference on Computing in High Energy & Nuclear Physics (CHEP)

Description
We present how the function-as-a-service paradigm can address CERN's computing challenges with efficient and scalable experimental data processing on heterogeneous resources.
Provenance Tracking in the LHCb Software.

A. Trisovic, C. R. Jones, B. Couturier, M. Clemencic. (2020). Computing in Science & Engineering (CISE)

Description
We argue that reproducibility needs to be incorporated into the existing infrastructure and present a new functionality in the CERN software that captures all information within a resulting dataset necessary to reproduce it.
Open is Not Enough.

(α-β) X. Chen, S. Dallmeier-Tiessen, R. Dasler, S. Feger, P. Fokianos, J. B. Gonzalez, H. Hirvonsalo, D. Kousidis, A. Lavasa, S. Mele, D. R. Rodriguez, T. Šimko, T. Smith, A. Trisovic, A. Trzcinska, I. Tsanaktsidis, M. Zimmermann, K. Cranmer, L. Heinrich, G. Watts, M. Hildreth, L. Lloret Iglesias, K. Lassila-Perini, S. Neubert. (2019). Nature Physics

Description
The platforms CERN Analysis Preservation and Reusable Analyses (REANA) are created to facilitate reproducible research for the LHC experiments at CERN. The project, CERN Open Data, disseminates particle-physics data that can be used for research.
Graph Mining at the High-Energy Physics Experiment LHCb.

A. Trisovic. (2018). 7th International Symposium on Industrial Engineering

Description
The paper presents a number of challenges, questions, and use-cases that can be addressed by exploring and analyzing the LHCb graph database that captures its data and software.
Recording the LHCb Data and Software Dependencies.

A. Trisovic, B. Couturier, V. Gibson, C. Jones. (2017). 22th International Conference on Computing in High Energy and Nuclear Physics (CHEP)

Description
We present the design and development of the LHCb graph database that captures the scientific software stack, its software and hardware dependencies, and its products, which are simulation and experimental data.
If These Data Could Talk.

T. Pasquier, M. K. Lau, A. Trisovic, E. R. Boose, B. Couturier, M. Crosas, A. M. Ellison, V. Gibson, C. R. Jones, M. Seltzer. (2017). Nature Scientific Data

Description
The lack of formalism hinders reporting in computational research, which hinders reproducibility. Data provenance can aid in this problem, as showcased in two use-cases- physics (CERN) and ecology (Harvard Forest).
Measuring the D0 Lifetime at the LHCb Masterclass.

A. Trisovic. (2016). 37th International Conference on High Energy Physics (ICHEP)

Description
The paper presents the design of a stand-alone educational application that displays proton-proton collisions in the LHCb experiment created for the International Masterclass in Physics.

Data, Software & Tools



Projects & Activities


Research Areas

AI, Science & Society
AI Methodology
Climate Change, Health & AI
Research Software & Reproducibility
Data Quality, FAIR Principles & Repositories
Data Visualization & Statistical Analysis
Open Science & Education
High Energy Physics