Numerous arguments strongly support the practice of open science, which offers several societal and individual benefits. For individual researchers, sharing research artifacts such as data can increase trust and transparency, improve the reproducibility of one's own work, and catalyze new collaborations. Despite a general appreciation of the benefits of data sharing, research data are often only available to the original investigators. For data that are shared, lack of useful metadata and documentation make them challenging to reuse. In this paper, we argue that a lack of incentives and infrastructure for making data useful is the biggest barrier to creating a culture of widespread data sharing. We compare data with code, examine computational environments in the context of their ability to facilitate the reproducibility of research, provide some practical guidance on how one can improve the chances of their data being reusable, and partially bridge the incentive gap. While previous papers have focused on describing ideal best practices for data and code, we focus on common-sense ideas for sharing tabular data for a target audience of academics working in data science adjacent fields who are about to submit for publication.
Keywords: DSML 4: Production: Data science output is validated, understood, and regularly used for multiple domains/platforms.
© 2021 The Authors.