Data commons collocate data with cloud computing infrastructure and software services, applications, and tools to create powerful resources for the large-scale management, analysis, harmonization, and sharing of data. Unlike data warehouses, data lakes, and other systems that support an organization’s business analytics, a data commons is focused on providing a resource for a community or collaboration, or in support of multiple communities.
The term data commons derives from the more general concept of a “commons,” a term that was popularized by Elinor Ostrom, who defined it as a natural, cultural, or digital resource accessible to all members of a community, or, more broadly, of a society. An example is a pasture for animals to graze in a village. These resources are held in common, through a partnership, a not-for-profit, or another entity, but not owned privately for commercial gain .
An example of a data commons is the UK Biobank; a large-scale biomedical database with 500,000 participants who consent for their data to be used for the specific purpose of bonafide research on the diagnosis, prevention, and/or treatment of serious and life-threatening illnesses. Evidently, data commons can be powerful catalysts to innovation and transformation of the social impact sector especially when collaborative and synergistic action is required in the efforts to strengthen epidemic preparedness and response, financial inclusion, and enabling access to opportunity.
Data commons can be very challenging to set up and more so to maintain successfully. In this guide, we will share more examples of working data commons and three essential steps to maximize chances of success.
In this Guide
- Get inspired: explore examples of existing data commons
- Define your why and how: Understand the importance of governance in data commons
- Bootstrap: Investigate technology platforms for data commons
- Plan for the long term: Consider issues of data ingest and harmonization
Data Commons in Practice
Data Commons have been developed to address a diverse set of data sharing and research needs. The following are some examples of data commons:
- NCI Genomic Data Commons – a data commons for cancer genomics data
- European Open Science Cloud – a European partnership supporting open science
- Australian BioCommons – a data commons supporting Australian bioscience
- datacommons.org – a data commons operated by Google
Step 1: Governance and Agreements
Establishing effective governance for a data commons is a critical factor for success. Specific considerations will depend on the nature of the data hosted by the commons and the community it supports and may include: agreements for contributing data, permissible use of data, intellectual property rights, publishing and citation guidelines, and operational principles. It’s advisable to look for specialized guidelines related to the sector where you will be getting your data (for example Health Information Exchange is a relevant standard if your platform focuses on health). It is important to do your research and identify the standards that apply to your data scope. Below are some general resources that can help you in this step:
Step 2: Choosing a Platform
Underlying a data commons is a software infrastructure that manages access to data storage, imposes structure on data, and may offer analysis and/or visualization tools. A data commons platform may support some level of interoperability with other data commons, allowing for possible participation in a broader data ecosystem or data mesh. Examples of software platforms for building data commons include:
Gen3 – a general-purpose open-source platform supporting ad hoc analysis
Terra – a platform for biomedical data
Figshare – a general data platform for storing, sharing, and discovering research data
Dataverse – open-source research data repository software
Step 3: Getting the data
Successful commons curate and harmonize the data and produce data products of broad interest to the community. It’s time-consuming, expensive, and labor-intensive to curate and harmonize data; much of the value of data commons is centralizing this effort so that it can be done once instead of many times by each group that needs the data. Here are some useful resources that described principles, standards, and best practices for working with research data in data commons:
The FAIR data principles
The CARE Principles for Indigenous Data Governance
Research Data Alliance
Data commons support a community’s management, analysis, and sharing of data. For this reason, data commons require a governance framework that supports the community’s values and goals. Furthermore, these frameworks are key to ensuring resources are used for the specific purposes for which they were made available. This, in turn, enables more resources to be safely mobilized for the benefit of the community. Successful data commons tend to carefully curate and harmonize the data they contain, which reduces the time and effort required for users to analyze the data, especially when the data comes from multiple sources. This is especially helpful for emergency research and innovation.
Please feel free to suggest any other guides you found helpful by contacting us and we may incorporate them.
 Elinor Ostrom. Governing the commons: The evolution of institutions for collective action. Cambridge university press, 1990.
Grateful for the contribution of Robert Grossman, Ph.D., Frederick H. Rawson Distinguished Service Professor of Medicine and Computer Science, and the Jim and Karen Frank Director of the Center for Translational Data Science (CTDS) at the University of Chicago.
Join Our Community
Make connections with other social impact organizations and receive a curated listing of community and data.org events and opportunities.