Featured Publications

SpaCE: The Spatial Confounding Environment

Tec M., Trisovic A., Audirac M., Woodward S., Khoshnevis N., and Dominici F.
The 12th International Conference on Learning Representations (ICLR), 2024

Air Pollution and Acute Kidney Injury in the US Medicare Population: A Longitudinal Cohort Study

Lee W., Wu X., Heo S., Kim J. M., Fong K. C., Son J., Sabath M. B., Trisovic A., Braun D., Park J. Y., Kim Y. C., Lee J. P., Schwartz J., Kim H., Dominici F., Al-Aly Z., and Bell M. L.
Environmental Health Perspectives, 2023

A Large-Scale Study on Research Code Quality and Execution

Trisovic A., Pasquier T., Lau M. K., and Crosas M.
Nature Scientific Data, 2022

Nine best practices for research software registries and repositories

Garijo D., Ménager H., Hwang L., Trisovic A., Hucka M., Morrell T., Allen A., Task Force on Best Practices for Software Registries, SciCodes Consortium.
PeerJ Computer Science, 2022

Toward Reproducible and Extensible Research: From Values to Action

Goeva, A., Stoudt, S., & Trisovic, A.
Harvard Data Science Review, 2020

Open is not enough.

Chen, X., Dallmeier-Tiessen, S., Dasler, R., Feger, S., Fokianos, P., Gonzalez, J. B., Hirvonsalo, H., Kousidis, D., Lavasa, A., Mele, S., Rodriguez, D. R., Šimko, T., Smith, T., Trisovic, A., Trzcinska, A., Tsanaktsidis, I., Zimmermann, M., Cranmer, K., Heinrich, L., Watts, G., Hildreth, M., Lloret Iglesias, L., Lassila-Perini, K., & Neubert, S.
Nature Physics, 2019

Publications

SpaCE: The Spatial Confounding Environment

Tec M., Trisovic A., Audirac M., Woodward S., Khoshnevis N., and Dominici F.
The 12th International Conference on Learning Representations (ICLR), 2024

Description
We introduce SpaCE - The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding.

SpaCE The Spatial Confounding (Benchmarking) Environment

Tec M., Trisovic A., Audirac M., and Dominici F
Causal Learning and Reasoning (CLeaR), 2023

Description
The article introduces SpaCE datasets as a benchmarking tool to assist in developing novel methods to address outstanding challenges in spatial and network causal inference.

Air Pollution and Acute Kidney Injury in the US Medicare Population: A Longitudinal Cohort Study

Lee W., Wu X., Heo S., Kim J. M., Fong K. C., Son J., Sabath M. B., Trisovic A., Braun D., Park J. Y., Kim Y. C., Lee J. P., Schwartz J., Kim H., Dominici F., Al-Aly Z., and Bell M. L.
Environmental Health Perspectives, 2023

Description
The article investigates the association between short-term exposure to air pollution and acute kidney injury (AKI) in the US Medicare population.

Advancing Software Citation Implementation

Bouquin D., Trisovic A., Bertuch O., Colón-Marrero E.
Software Citation Workshop 2022 (ArXiv), 2023

Description
Software's pivotal role in progress is not mirrored in traditional acknowledgments. This report from captures insights from 51 global experts on unresolved software citation issues. It aims to pinpoint and tackle these challenges, benefiting the GLAM community, repository managers, software developers, and publishers.

Nine best practices for research software registries and repositories

Garijo D., Ménager H., Hwang L., Trisovic A., Hucka M., Morrell T., Allen A., Task Force on Best Practices for Software Registries, SciCodes Consortium.
PeerJ Computer Science, 2022

Description
As the FORCE11 Software Citation Implementation Working Group, we describe the best practices for software repositories and registries which include defining the scope, policies, and governing rules, along with the background, examples, and collaborative work that went into their development.

Cluster Analysis of Open Research Data: A Case for Replication Metadata

Trisovic A.
International Journal of Digital Curation, 2023

Description
The article presents a cluster analysis of 1,000+ open research datasets from the Harvard Dataverse repository to identify the most common replication metadata elements.

A Large-Scale Study on Research Code Quality and Execution

Trisovic A., Pasquier T., Lau M. K., and Crosas M.
Nature Scientific Data, 2022

Description
This paper presents a large-scale study of the quality, programming literacy, and reproducibility of over 2100 datasets that contain research code in R from the Harvard Dataverse data repository.

Packaging Research Artefacts with RO-Crate

Soiland-Reyes S., Sefton P., Crosas M., Castro L. J., Coppens F., Fernández J. M., Garijo D., Grüning B., La Rosa M., Leo S., Ó Carragáin E., Portier M., Trisovic A., RO-Crate Community, Groth P., and Goble C.
Data Science, 2021

Description
We introduce RO-Crate, an open, community-driven, lightweight approach to packaging research artifacts with metadata, including their identifiers, provenance, relations, and annotations.

Towards FAIR Principles for Open Hardware

Miljković, N., Trisovic, A., & Peer, L.
Conference on Application of Free Software and Open Hardware (PSSOH), 2021

Description
We elaborate on open hardware dissemination and reuse complexity, present examples of unique demands, and propose leveraging FAIR principles to make it findable, accessible, interoperable, and reusable.

Why Do We Plot Data?

Blumenthal, K., Goeva, A., Stoudt, S., Trisovic, A., & Trisovic, P. (alphabetical)
Harvard Data Science Review, 2021

Description
Explainer Zine for the article "Designing for interactive exploratory data analysis requires theory of graphical inference."

Recipes for Connector Courses From the Early-Career Board Kitchen

Goeva, A., Jones, P., Stoudt, S., & Trisovic, A. (alphabetical)
Harvard Data Science Review, 2021

Description
We propose a handful of connector courses for data science, inspired by the article "Interleaving Computational and Inferential Thinking- Data Science for Undergraduates at Berkeley."

Repository Approaches to Improving the Quality of Shared Data and Code

Trisovic, A., Mika, K., Boyd, C., Feger, S., & Crosas, M.
Data, 2021

Description
We propose three approaches based on computational reproducibility, data curation, and gamified design elements that can be used to indicate and improve the quality of shared data and code in data repositories.

A Practical Guide to Climate Econometrics: Navigating Key Decision Point in Weather and Climate Data Analysis

Rising, J. A., Hussain, A., Schwarzwald, K., & Trisovic, A.
Journal of Open Source Education (JOSE), 2021

Description
We present a free and open-source tutorial on the practical aspects of climate econometrics, which includes data collection, analysis design, and result presentation (available at climateestimate.net).

Kaleidoscopic Perspectives on Practicum-Based Data Science Education

Frost, S., Goeva, A., Pombra, J., Seaton, W., Stoudt, S., Trisovic, A., Wang, C., & Zucker, C. (alphabetical)
Harvard Data Science Review, 2021

Description
Early-Career Board members of the Harvard Data Science Review discuss the acquisition of practical data science skills and share their experiences from a number of disciplines.

Toward Reproducible and Extensible Research: From Values to Action

Goeva, A., Stoudt, S., & Trisovic, A. (alphabetical)
Harvard Data Science Review, 2020

Description
This paper discusses the National Academies' report "Reproducibility and Replicability in Science," advocating for reusability and the need for actionable and hierarchical steps for researchers.

Early-Career View on Data Science Challenges: Responsibility, Rigor, and Accessibility

Frost, S., Goeva, A., Seaton, W., Stoudt, S., & Trisovic, A. (alphabetical)
Harvard Data Science Review, 2020

Description
Early-Career Board members of the Harvard Data Science Review present their view of top research challenge areas in data science.

Advancing Computational Reproducibility in the Dataverse Data Repository Platform.

Trisovic, A., Durbin, P., Schlatter, T., Durand, G., Barbosa, S., Brooke, D., & Crosas, M.
3rd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS), 2020

Description
The Dataverse repository software has undertaken integrations with the platforms Code Ocean, Whole Tale, Renku, and Jupyter Binder, which will help capture research code dependencies and advance reproducibility.

Real-Time HEP Analysis With FuncX – a High-Performance Platform for Function as a Service.

Woodard, A. E., Trisovic, A., Li, Z., Babuji, Y., Chard, R., Skluzacek, T., Blaiszik, B., Katz, D. S., Foster, I., & Chard, K.
24th International Conference on Computing in High Energy & Nuclear Physics (CHEP), 2020

Description
We present how the function-as-a-service paradigm can address CERN's computing challenges with efficient and scalable experimental data processing on heterogeneous resources.

Provenance Tracking in the LHCb Software.

Trisovic, A., Jones, C. R., Couturier, B., & Clemencic, M.
Computing in Science & Engineering (CISE), 2020

Description
We argue that reproducibility needs to be incorporated into the existing infrastructure and present a new functionality in the CERN software that captures all information within a resulting dataset necessary to reproduce it.

Open is not enough.

Chen, X., Dallmeier-Tiessen, S., Dasler, R., Feger, S., Fokianos, P., Gonzalez, J. B., Hirvonsalo, H., Kousidis, D., Lavasa, A., Mele, S., Rodriguez, D. R., Šimko, T., Smith, T., Trisovic, A., Trzcinska, A., Tsanaktsidis, I., Zimmermann, M., Cranmer, K., Heinrich, L., Watts, G., Hildreth, M., Lloret Iglesias, L., Lassila-Perini, K., & Neubert, S.
Nature Physics, 2019

Description
The platforms CERN Analysis Preservation and Reusable Analyses (REANA) are created to facilitate reproducible research for the LHC experiments at CERN. The project, CERN Open Data, disseminates particle-physics data that can be used for research.

Graph Mining at the High-Energy Physics Experiment LHCb.

Trisovic, A.
7th International Symposium on Industrial Engineering, 2018

Description
The paper presents a number of challenges, questions, and use-cases that can be addressed by exploring and analyzing the LHCb graph database that captures its data and software.

Recording the LHCb Data and Software Dependencies.

Trisovic, A., Couturier, B., Gibson, V., & Jones, C.
22th International Conference on Computing in High Energy and Nuclear Physics (CHEP), 2017

Description
We present the design and development of the LHCb graph database that captures the scientific software stack, its software and hardware dependencies, and its products, which are simulation and experimental data.

If These Data Could Talk.

Pasquier, T., Lau, M. K., Trisovic, A., Boose, E. R., Couturier, B., Crosas, M., Ellison, A. M., Gibson, V., Jones, C. R., & Seltzer, M.
Nature Scientific Data, 2017

Description
The lack of formalism hinders reporting in computational research, which hinders reproducibility. Data provenance can aid in this problem, as showcased in two use-cases- physics (CERN) and ecology (Harvard Forest).

Measuring the D0 Lifetime at the LHCb Masterclass.

Trisovic, A.
37th International Conference on High Energy Physics (ICHEP), 2016

Description
The paper presents the design of a stand-alone educational application that displays proton-proton collisions in the LHCb experiment created for the International Masterclass in Physics.