Charting the ‘Data for Good’ Landscape

There is huge potential for data science and AI to play a productive role in advancing social impact. However, the field of “data for good” is not only overshadowed by the public conversations about the risks rampant data misuse can pose to civil society, it is also a fractured and disconnected space. There are a myriad of different interpretations of what it means to “use data for good” or “use AI for good”, which creates duplicate efforts, nonstrategic initiatives, and confusion about what a successfully data-driven social sector could look like. To add to that, funding is scarce for a field that requires expensive tools and skills to do well. These enduring challenges result in work being done at an activity and project level, but do not create a coherent set of building blocks to constitute a strong and healthy field that is capable of solving a new class of systems-level problems.

We are taking one tiny step forward in trying to make a more coherent Data for Good space with a landscape that makes clear what various Data for Good initiatives (and AI for Good initiatives) are trying to achieve, how they do it, and what makes them similar or different from one another. One of the major confusion points in talking about “Data for Good” is that it treats all efforts as similar by the mere fact that they use “data” and seek to do something “good”. This term is so broad as to be practically meaningless; as unhelpful as saying “Wood for Good”. We would laugh at a term as vague as “Wood for Good”, which would lump together activities as different as building houses to burning wood in cook stoves to making paper, combining architecture with carpentry, forestry with fuel. However, we are content to say “Data for Good”, and its related phrases “we need to use our data better” or “we need to be data-driven”, when data is arguably even more general than something like wood.

We are trying to bring clarity to the conversation by going beyond mapping organizations into arbitrary groups, to define the dimensions of what it means to do data for good. By creating an ontology for what Data for Good initiatives seek to achieve, in which sector, and by what means, we can gain a better understanding of the underlying fundamentals of using data for good, as well as creating a landscape of what initiatives are doing.

We hope that this landscape of initiatives will help to bring some more nuance and clarity to the field, as well as identify which initiatives are out there and what purpose they serve. Specifically, we hope this landscape will help:

Data for Good field practitioners align on a shared language for the outcomes, activities, and aims of the field.

Purpose-driven organizations who are interested in applying data and computing to their missions better understand what they might need and who they might go to to get it.

Funders make more strategic decisions about funding in the data/AI space based on activities that align with their interests and the amount of funding already devoted to that area.

Organizations with Data for Good initiatives can find one another and collaborate based on similarity of mission and activities.

Below you will find a very preliminary landscape map, along with a description of the different kinds of groups in the Data for Good ecosystem and why you might need to engage with them. A few notes on the construction of the landscape:

  • We are focused on initiatives that are enabling and advancing the field of Data for Good, not trying to catalog every data science project every nonprofit or company is doing (though that would be an interesting landscape to see as well). Sticking with our Wood for Good analogy, we are mapping the initiatives like Home Depot, carpentry schools, and forestry management initiatives that allow others to make things from wood. We are not including Habitat for Humanity, which applies wood as part of their mission. There is one group in the landscape, Solution Builders, that comes closest to standing in for that much larger set of projects.
  • We chose to categorize initiatives, not organizations. It would be impossible to classify “Microsoft”, when their activities span funding through Microsoft Philanthropies, creating open satellite imagery through Microsoft AI for Earth, providing infrastructure through nonprofit licenses of Microsoft Azure, and so on. Therefore you may see organizations appear multiple times across the landscape.
  • The current database of organizations is minuscule – it is nowhere near exhaustive. We have only mapped enough initiatives to show the results for feedback, but we encourage you to recommend other initiatives you know of but don’t see here with this form.
  • More importantly, these groupings are just a start. As more organizations are added and more feedback is given, these will evolve. Big groups will split into smaller subgroups, some groups will cluster together under shared headings. Treat this as a product in beta testing, which will get us to the real landscape map.
  • While we’ve provided some analysis on the various groupings below, you can play with the landscape yourself and find views of it that suit your purpose. Explore, investigate! There is no one way to view this landscape, and we look forward to seeing what you find.

If you just want to get to the landscape, read on. However, we feel that the methodology and the design of the landscape is equally, if not more, important, so we highly recommend reading some of the background materials:

This landscape is very much a work in progress, and you can send any and all feedback to jake@data.org. You can also add initiatives that you think should be on the landscape or offer reclassifications (perhaps we categorized your initiative incorrectly!) through this Google form.

Enjoy!

View the Landscape Map on Kumu

The Groups That Form the “Data for Good” and “AI for Good” Landscape


The Dataset Providers

We (create more)(data) so that (social impact organizations can create more data solutions) so that (social benefit increases).

You can’t start a data project without data, and these initiatives will start you off on the right foot. They’re committed to providing datasets that civil society can use to learn from or build algorithms atop. If you’re a nonprofit looking to do work with data that you may not have, you’ll want to look here. You are most likely going to want to work with one specializing in your issue area, but there are general data providers, like the World Bank, that aim to provide national or global datasets that could apply to many issue areas. If you’re a funder, you may want to fund them to create or maintain a dataset that you or your grantees need.

Activities:
  • Releasing data they’ve collected or that others have collected
How would you interact with them?
  • If you’re a social impact organization looking to build a data solution, they could provide the data you need to do so
  • If you’re a funder who believes we need more data available for a certain topic, you might fund them to get it or maintain it
Example initiatives:

The Storage Providers

We (create more) (data storage) so that (social impact organizations can create more data solutions) so that (social benefit increases).

Once people collect data, it’s got to go somewhere. These initiatives provide data storage solutions so that organizations can store, sort, and share their data. This category has the most overlap with the for-profit space, in that most of the tools the nonprofit space is using are simply free versions of for-profit data storage (e.g., Salesforce.org for nonprofits, Microsoft Excel, Google Sheets, etc.) Good or bad, most initiatives in this category are general enough to apply to any sector. You’ll no doubt use one of these providers if you’re collecting any data yourself. As a funder, you may want to work with these initiatives to provide your grantees with affordable storage. If you’re a field-builder, you might want to see if these tools meet the field’s needs and, if not, propose social-sector-specific alternatives.

Activities:
  • Providing software that allows organizations to store and arrange their data
How would you interact with them?
  • If you’re a social organization looking to build a data solution, they could provide the storage you need to do so
  • If you’re a funder who believes we need more storage available in the social sector, you might fund them to provide it
Example initiatives:

*Icon credit: data storage by ProSymbols from the Noun Project


The Data Governors

We (advocate) for better (data collection and access standards) so that (social impact organizations use data more effectively) so that (harm is reduced and benefit increases).

The data governors are a unique group. They care about new and better uses and applications of data and data storage, but they don’t provide datasets like the “dataset providers”, nor do they provide storage like the “storage providers”. They guide the world with research about and best practices for providing data and storage better. For example, you’ll find GovLab’s Open Data Impact report here alongside Engine Room’s Responsible Data Guide, both of which provide analysis and best practices for using data in the social sector. You’ll also find think tanks like Development Initiatives providing analysis on how data and evidence should be driving impact for certain problems, and groups like Brighthive actively trying to demonstrate new ways of sharing and using data collectively. Lastly, you’ll find advocates arguing for whole new ways of collecting or using data, like Data for Black Lives and Data2X.

Activities:
  • Advocating for changes to data standards
  • Releasing guides of best practices for data sharing and standardization
  • Analysis of open data and standards in the social sector
  • Ensuring observational data is made use of
How would you interact with them?
  • If you are a social organization collecting or sharing data, their guides could teach you best practices
  • If you are a social organization collecting or sharing data, some of these groups could consult to help you do it better.
  • If you believe in their vision, you would fund them to create new standards and norms around data collection and sharing, e.g. Data2x increasing access to gender data.
Example initiatives:

*Icon credit: Data Steward by H Alberto Gongora from the Noun Project


The Data Talent Providers

We (create more) (data outputs) by (applying existing) (data talent) so that (social impact organizations have more data solutions at their disposal) so that (social benefit increases).

No matter how much data literacy the nonprofit and government sector gets, there are invariably going to be times when we’ll need specialists to come help. This group of initiatives provides data scientists to help the social sector with a range of needs, from problem scoping to solution creation. They’ll likely use data from the client (and from the data providers), but the solutions they create will be unique to the needs of the client. They’re great to consult with if you’re planning a project or don’t have the capacity to do it in-house, and funders could support their grantees by funding these initiatives to provide them capacity.

Activities:
  • Consulting with social sector organizations on their data needs
  • Providing data scientists to solve a wide range of data/AI problems with nonprofits
How would you interact with them?
  • If you’re a social organization looking to build a data solution, they could potentially design and build it with you. More importantly, they could validate whether the scope is realistic.
  • If you’re a funder who wants to see more solutions exist for you or your grantees, you may fund these groups to provide them
  • If you are a data scientist who wants to work with the social sector, you might consider working with them.
Example initiatives:

*Icon credit: Data Scientist by Thibault Geffroy from the Noun Project


The Solution Designers

We (create more) (data outputs) so that (social impact organizations have more data solutions at their disposal) so that (social benefit increases).

The Solution Designers are the most prevalent actors in the Data/AI for Good space. If the other groups are the trainers that allow the sector to create more data solutions, these are the athletes out there doing it. These initiatives seek to create more data solutions in the world, regardless of who builds it. In the case of JPAL, they do the analyses and then provide the results to the social sector to use. The Rockefeller Innovation Portfolio funds more solutions to exist. UN Global Pulse turns internal prototypes they’ve built into products. These organizations want to increase the effectiveness of the social sector at solving problems with data science and AI, but here they’re directly creating the solutions that would allow one to do that. As an actor in the field, you might want to use some of the products they create, or you might want to collaborate with them on building new solutions. As a funder, you might want to commission the creation of data solutions to solve specific problems.

Activities:
  • Creating and scaling data solutions to social problems
How would you interact with them?
  • If you’re a social organization with a problem that data could solve, you might want to use one of their solutions.
  • If you’re an organization that builds data and ML solutions for social impact, you might want to talk to the funders in this group.
  • If you’re a funder you might want to commission solutions to a problem from them.
Example initiatives:

*Icon credit: Idea by IconMark from the Noun Project


The DIY Software Providers

We enable organizations to (create more) (data outputs) by (applying existing) (outputs) so that (social impact organizations have more data solutions at their disposal) so that (social benefit increases).

The DIY Software Providers help mission-driven organizations create data solutions themselves. They provide a way for organizations to enter data and then visualize it, analyze it, or sometimes even build algorithms on top of it. This software is often identical to or adapted from software used by for-profit ventures so, for better or for worse. If you’re a mission-driven organization that wants to do data analysis on its own, you might consider looking at these options (TechSoup often provides free versions of software like this). If you’re a funder, you might want to consider providing these tools to your grantees or subsidizing them. If you’re a field-builder, you might want to see if these tools meet the field’s needs and, if not, propose social-sector-specific alternatives.

Activities:
  • Providing software that allows people to visualize, model, or otherwise apply data to create a new data solution.
How would you interact with them?
  • If you’re a social organization looking to build a data solution, their software could allow you to build it yourself in-house.
  • If you’re a funder who wants your grantees to be able to build their own solutions you might look into these tools.
  • If you are a field builder, you might see if these tools meet the sector’s needs.
Example initiatives:

*Icon credit: Data by Gregor Cresnar from the Noun Project


The Data Strategy Providers

We (create more) (of the entire pipeline) so that (social impact organizations are able to apply data more effectively overall) so that (social benefit increases).

Up until now, the groups have focused largely on helping civil society utilize data and talent to create new solutions. But what if you just want to be more strategic in how you use data in your operations generally? That’s what the data strategy providers are here for. These initiatives build capacity in civil society, helping nonprofits and government actors design data strategies for their organizations. If you’re an organization looking to build a strategy or undergo digital transformation, these groups might be for you. As a funder, you may want to hire these groups to help your grantees create data-driven theories of change and to ensure you’re using data ethically and responsibly.

Activities:
  • Consulting with social sector organizations on their data strategies
How would you interact with them?
  • If you’re a social organization looking to plan out your use of data, you may consider consulting with one of these groups.
  • If you’re a funder who wants to support your grantees with data strategy work, you might subsidize it through these groups.
  • If you are interested in seeing more nonprofits with better data strategies, you might partner with these organizations or help scale their services.
Example initiatives:

*Icon credit: Data Strategy by Nithinan Tatah from the Noun Project


The Data Talent Trainers

We (create more) (data talent) so that (social impact organizations have more data talent available to work in-house) so that (social benefit increases).

Getting data scientists from the Data Scientist Providers can get you some short-term capacity, but what if you want to learn to do data science yourself? Or as a field builder, perhaps you recognize a painful lack of data talent available for parts of civil society? In that case you’ll want to turn to the Data Talent Trainers. These initiatives create new data scientists and AI experts, in this case specifically for the social sector. If you think more talent is needed or want to try to gain some data skills yourself, look to this group.

Activities:
  • Training new data scientists / increasing peoples’ data science skills
  • Training data scientists to work in the social sector
How would you interact with them?
  • If you’re looking to learn data science, some of these initiatives provide courses
  • If your strategy relies on there being more data talent in a certain part of the social sector, you may want to work with or fund these groups.
Example initiatives:

*Icon credit: classroom by Rflor from the Noun Project


The Community Builders

We (apply existing) (data talent) by bringing data scientists together so that (data talent in the social sector is supported and happy) so that (they stay in the sector and do good work).

The Community Builders care about supporting data scientists and AI engineers in the field. The “for good” activities of these groups vary by outcome. Some, like Data Analysts for Good, are specifically focused on supporting data talent that works in the social sector, giving them a place to learn, share best practices, and get emotional support with like-minded analysts facing the difficulty of working in the social sector. Others, like Black in AI, care about seeing representation for black people in the field of AI and provide opportunities for black practitioners to advance in the field of AI, as well as form a community for black AI engineers around the world to learn, network, and support one another. As supporters, these organizations often teach, fund, and provide networking opportunities for data scientists of all stripes.

Activities:
  • Provide a community space for affinity groups within data science and AI
  • Advocate for more representation of their groups within the field of AI writ large
  • Run conferences and social events to bring the community together
How would you interact with them?
  • If you’re a data or AI practitioner looking to be in community with folks like yourself, you could join these groups
  • If you’re a field builder, it will be important to know about these communities and what they care about
  • If you’re a funder you may want to support the communities aligned with the outcomes you care about.
Example initiatives:

*Icon credit: Community by Oksana Latysheva from the Noun Project


The Social Data Thought Leaders

We (research and advocate for changes to) (the entire data pipeline) so that (social impact organizations understand the current state of data and so that policies enforce best practice) so that (social benefit increases and harm is reduced).

This group is a fairly heterogeneous group of people providing thought leadership to civil society on best practices around data science and AI. They don’t usually specialize in just one part of the data science pipeline, but instead do research and write about whatever trends across the space are relevant. For example, think tanks like Data and Society regularly publish articles and guides about the latest uses of modeling and prediction in civil society. UN Global Pulse creates guides about data privacy, modeling, and algorithms for multinational organizations like the UN. USAID publishes reports on guidelines for investing in ML and AI projects. If you are a civil society leader looking for guidance on the latest in data science and AI, these groups could help get you up to speed.

Activities:
  • Publishing research on the state of data science and AI in civil society
  • Consulting with leaders in civil society on the latest trends in the space
How would you interact with them?
  • As a member of the general public, you may read their publications to understand the latest trends in data and society.
  • As a leader in civil society you may read their publications or consult with them to inform your own programming.
  • As a funder you may fund them to conduct research in the areas you care about.
Example initiatives:

*Icon credit: Thinking by Tippawan Sookruay from the Noun Project


The Responsible AI Advocates

We (research and advocate for changes to) (data outputs and their use) so that (civil society can critically engage with forprofit tech) so that (harms are reduced).

Our final group is primarily focused on reforming the for-profit sector. While their aims are broad enough to extend to civil society, their underlying tenet is that technology built by tech companies is causing social harms as a result of its use, and we must reduce those harms. Here you’ll find folks advocating for regulation of tech companies, shedding a light on the way sentencing algorithms or facial recognition are biased, and making changes in the way business is done at tech companies so that the public can live safe and healthy lives. Their work is probably what the public is most familiar with, from the work Timnit Geburu was doing in Google’s Ethics team, to the Algorithmic Justice League’s movie Coded Bias, to the reports from MIT Press on the harms of AI. This group uses the term AI most prevalently, though many of the issues they highlight stem from traditional statistics (e.g. issues in collecting biased data).

Activities:
  • Reporting on the risks of the misuse of AI
  • Advocating for changes to law and policy to make AI work better for humans
How would you interact with them?
  • If you’re a public citizen, you could read their reports to understand what the state of AI is and what harms it may cause.
  • If you’re an organization using AI, you may use their reports to inform your strategies to reduce harms.
  • If you’re a funder looking to commission AI research, you could fund one of these groups.
Example initiatives:

*Icon credit: ai ethics by Symbolon from the Noun Project


Preliminary Analysis

Hopefully the groupings above help distinguish the types of activities occurring across the “Data for Good” and “AI for Good” space. What now are we to make of these results? Here are a few preliminary observations:

  • The Storage Providers is one group that is almost entirely dominated by corporations. I’m reminded of comments from Kate Crawford and Lucy Bernholz about how the social sector is dealing with some of the most important and sensitive data in the world on refugees, the underprivileged, and those persecuted by governments, yet almost all of its data lives on Western companies’ servers. The alternative is to keep it all on an individual’s computer, which brings its own security risks. This observation is not bad, per se, but it is the one group that is nearly fully dominated by the private sector, so worth noting.
  • A lot of groups here – The Solution Providers, the Data Talent Providers, and the DIY Solution Providers – have a similar activity of “creating something new out of data”. They differ only in their approach. Some give nonprofits tools to build their own solutions, some provide consultants to co-create solutions with nonprofits, others build data solutions themselves for the whole sector to use. There seems to be a need to distinguish more clearly between these types of “solution creators”.
  • The term “AI” has supplanted “data” and “data science” in many spaces, yet it’s not clear what is distinct in doing so. In the landscape, “AI” shows up most commonly when talking about thought leadership and research (e.g. AINow) and in creating guidelines (e.g. government initiatives on AI). The challenges people have with AI today – biased data, lack of control over algorithms, and lack of talent – show up under initiatives that use the word “data” like The Responsible Data Forum. There is an unclear distinction between the disciplines, which makes itself known in this chart.

Next Steps

Future research will entail adding more organizations to this landscape and reorganizing clusters accordingly when they need distinction. We also hope to add lenses to the landscape, such as amount of funding in each group, connections between groups, and relative influence of groups, to help better understand the current state of the landscape. Of course we also want to talk about what the future state of the landscape should look like, but that will come after this first version feels a little more baked. You can help with that process by sending general feedback to jake@data.org or adding / reclassifying initiatives in this map through this google form. Thanks in advance for your feedback!

Acknowledgments

I want to thank everyone who signed up to be an advisor on this project, as well as those who were willing to give me some of their time to craft this first version of the landscape. Huge appreciative thank yous to Olubayo Adekanmbi, Aman Ahuja, Carol Andrade, Afua Bruce, Peter Bull, Kriss Deiglmeier, Maria Dyshel, Chapin Flynn, Matt Gee, Josh Greenberg, Elizabeth Grossman, Mark Hansen, Perry Hewitt, Brigitte Hoyer Gosselink, Claudia Juech, Zia Khan, Tariq Khokhar, Juan Mateos Garcia, Andrew Means, Danil Mikhailov, Josh Nesbit, Craig Nowell, Ben Pierson, Uttam Pudasaini, Giulio Quaggiotto, George Richardson, Michelle Shevin, Sarah Stone, Evan Tachovsky, Jenny Toomey, Stefaan Verhulst, Sherry Wong, Chris Wiggins, Chris Worman, and Ginger Zielinskie.


Jake Porway is a research fellow at data.org. He co-founded and served as Executive Director at DataKind, a non-profit dedicated to using data science in the service of humanity.