About me

I am a computer scientist with primary research interests in computational reproducibility, data engineering, provenance and open science. I am currently a Research Associate in the National Studies on Air Pollution and Health (NSAPH) group, at the Department of Biostatistics of Harvard T.H. Chan School of Public Health. I am also affiliated with the Institute for Quantitative Social Science of Harvard's Faculty of Arts and Sciences, where I work on the Harvard Data Commons and the Dataverse project.

I completed my Ph.D. in 2018 at the University of Cambridge and CERN, where I worked on the LHCb experiment, CERN Open Data and CERN Analysis Preservation.

Twitter LinkedIn GitHub Scholar Email
Software, Workflows and Containers (SWC) Working Group
Practical Guide to Climate Econometrics
CLIR Webinar: Toward Open, Reproducible and Reusable Research
GitHub Action for uploading files on Dataverse
Dataverse DOI Badge Maker
Selected Talks
"Cluster Analysis of Open Research Data" at the 17th International Digital Curation Conference
"Dataverse Integration with GitHub via a GitHub Action" at the Dataverse Community Meeting 2022
"Computational Reproducibility: Expectations, Challenges and Recommendations" at the Journal Editors Discussion Interface (JEDI) workshop
"How to Conduct a Big Data Analysis on Air Pollution and Health" at Mathematical Institute of the Serbian Academy of Sciences and Arts
"The Landscape of Data Sharing and Computational Reproducibility for Social Research" at Pew Research Center
"Toward FAIR principles for Open Hardware" at Gathering for Open Science Hardware (GOSH)
"Research software review as part of the publication process" at Consortium of Scientific Software Registries and Repositories
"Evidence-based steps toward a culture for replicability and reproducibility" at Metascience 2021
"The Dataverse Project: Data sharing, Reproducibility, Research, Development and the Community" at CNSTAT Expert meeting on Guidance on Data Sharing for NIA Longitudinal Studies
"Improving FAIRness with containers" at SORSE - Series of Online Research Software Events
"The Last Presentation" at LHCb Computing
Publications TLDR
Trisovic, A., Pasquier, T., Lau, M. K., & Crosas, M. (2022). A Large-Scale Study on Research Code Quality and Execution. Nature Scientific Data. This paper presents a large-scale study of the quality, programming literacy, and reproducibility of over 2100 datasets that contain research code in R from the Harvard Dataverse data repository.
Soiland-Reyes, S., Sefton, P., Crosas, M., Castro, L. J., Coppens, F., Fernández, J. M., Garijo, D., Grüning, B., La Rosa, M., Leo, S., Ó Carragáin, E., Portier, M., Trisovic, A., RO-Crate Community, Groth, P., & Goble, C. (2021). Packaging Research Artefacts with RO-Crate. Data Science. We introduce RO-Crate, an open, community-driven, lightweight approach to packaging research artifacts with metadata, including their identifiers, provenance, relations, and annotations.
Miljković, N., Trisovic, A., & Peer, L. (2021). Towards FAIR Principles for Open Hardware. Conference on Application of Free Software and Open Hardware (PSSOH). We elaborate on open hardware dissemination and reuse complexity, present examples of unique demands, and propose leveraging FAIR principles to make it findable, accessible, interoperable, and reusable.
Blumenthal, K., Goeva, A., Stoudt, S., Trisovic, A., & Trisovic, P. (2021). Why Do We Plot Data? Harvard Data Science Review. Explainer Zine for the article "Designing for interactive exploratory data analysis requires theory of graphical inference."
Goeva, A., Jones, P., Stoudt, S., & Trisovic, A. (2021). Recipes for Connector Courses From the Early-Career Board Kitchen. Harvard Data Science Review. We propose a handful of connector courses for data science, inspired by the article "Interleaving Computational and Inferential Thinking: Data Science for Undergraduates at Berkeley."
Trisovic, A., Mika, K., Boyd, C., Feger, S., & Crosas, M. (2021). Repository Approaches to Improving the Quality of Shared Data and Code. Data. We propose three approaches based on computational reproducibility, data curation, and gamified design elements that can be used to indicate and improve the quality of shared data and code in data repositories.
Rising, J. A., Hussain, A., Schwarzwald, K., & Trisovic, A. (2021). A Practical Guide to Climate Econometrics: Navigating Key Decision Point in Weather and Climate Data Analysis. Accepted in Journal of Open Source Education (JOSE). We present a free and open-source tutorial on the practical aspects of climate econometrics, which includes data collection, analysis design, and result presentation (available at climateestimate.net).
Frost, S., Goeva, A., Pombra, J., Seaton, W., Stoudt, S., Trisovic, A., Wang, C., & Zucker, C. (2021). Kaleidoscopic Perspectives on Practicum-Based Data Science Education. Harvard Data Science Review. Early-Career Board members of the Harvard Data Science Review discuss the acquisition of practical data science skills and share their experiences from a number of disciplines.
Goeva, A., Stoudt, S., & Trisovic, A. (2020). Toward Reproducible and Extensible Research: From Values to Action. Harvard Data Science Review. This paper discusses the National Academies' report "Reproducibility and Replicability in Science," advocating for reusability and the need for actionable and hierarchical steps for researchers.
Frost, S., Goeva, A., Seaton, W., Stoudt, S., & Trisovic, A. (2020). Early-Career View on Data Science Challenges: Responsibility, Rigor, and Accessibility. Harvard Data Science Review. Early-Career Board members of the Harvard Data Science Review present their view of top research challenge areas in data science.
Trisovic, A., Durbin, P., Schlatter, T., Durand, G., Barbosa, S., Brooke, D., & Crosas, M. (2020). Advancing Computational Reproducibility in the Dataverse Data Repository Platform. 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS). The Dataverse repository software has undertaken integrations with the platforms Code Ocean, Whole Tale, Renku, and Jupyter Binder, which will help capture research code dependencies and advance reproducibility.
Woodard, A. E., Trisovic, A., Li, Z., Babuji, Y., Chard, R., Skluzacek, T., Blaiszik, B., Katz, D. S., Foster, I., & Chard, K. (2020). Real-Time HEP Analysis With FuncX – a High-Performance Platform for Function as a Service. 24th International Conference on Computing in High Energy & Nuclear Physics (CHEP). We present how the function-as-a-service paradigm can address CERN's computing challenges with efficient and scalable experimental data processing on heterogeneous resources.
Trisovic, A., Jones, C. R., Couturier, B., & Clemencic, M. (2020). Provenance Tracking in the LHCb Software. Computing in Science & Engineering (CISE). We argue that reproducibility needs to be incorporated into the existing infrastructure and present a new functionality in the CERN software that captures all information within a resulting dataset necessary to reproduce it.
Chen, X., Dallmeier-Tiessen, S., Dasler, R., Feger, S., Fokianos, P., Gonzalez, J. B., Hirvonsalo, H., Kousidis, D., Lavasa, A., Mele, S., Rodriguez, D. R., Šimko, T., Smith, T., Trisovic, A., Trzcinska, A., Tsanaktsidis, I., Zimmermann, M., Cranmer, K., Heinrich, L., Watts, G., Hildreth, M., Lloret Iglesias, L., Lassila-Perini, K., & Neubert, S. (2019). Open is not enough. Nature Physics. The platforms CERN Analysis Preservation and Reusable Analyses (REANA) are created to facilitate reproducible research for the LHC experiments at CERN. The project, CERN Open Data, disseminates particle-physics data that can be used for research.
Trisovic, A. (2018). Graph Mining at the High-Energy Physics Experiment LHCb. 7th International Symposium on Industrial Engineering. The paper presents a number of challenges, questions, and use-cases that can be addressed by exploring and analyzing the LHCb graph database that captures its data and software.
Trisovic, A., Couturier, B., Gibson, V., & Jones, C. (2017). Recording the LHCb Data and Software Dependencies. 22th International Conference on Computing in High Energy and Nuclear Physics (CHEP). We present the design and development of the LHCb graph database that captures the scientific software stack, its software and hardware dependencies, and its products, which are simulation and experimental data.
Pasquier, T., Lau, M. K., Trisovic, A., Boose, E. R., Couturier, B., Crosas, M., Ellison, A. M., Gibson, V., Jones, C. R., & Seltzer, M. (2017). If These Data Could Talk. Nature Scientific Data. The lack of formalism hinders reporting in computational research, which hinders reproducibility. Data provenance can aid in this problem, as showcased in two use-cases: physics (CERN) and ecology (Harvard Forest).
Trisovic, A. (2016). Measuring the D0 Lifetime at the LHCb Masterclass. 37th International Conference on High Energy Physics (ICHEP). The paper presents the design of a stand-alone educational application that displays proton-proton collisions in the LHCb experiment created for the International Masterclass in Physics.