Why, How, and What of Data Science for Social Impact

img-crowd-1024×576

WHY

Data science, an unknown term a decade ago, is now inexorably linked to our daily lives. Data about the spaces we live in, the streets we travel, the food we eat, the air we breathe, the purchasing choices we make, are collected and stored and analyzed to predict what future needs we may have. This is true for the life of a nomadic herdswoman in the Sahel uninvolved with the vast network of government or philanthropic funded satellites tracking climate change and market shifts to better understand her needs and choices, and for the silicon valley tycoon, seeking out these technologies, and vetting her next investment.

With data science driving decisions made by the systems that govern us and shaping the choices we make ourselves, we must determine: how will data science prioritize public good? How do we shape a generation of leaders that will need to use and work with data to shape our future? What can we do to channel this next wave of innovation to put social impact first? And who gets to determine what social impact means?

Currently, a market-first mentality governs the decisions and priorities about what data science can do. While there is benefit in anticipating greater consumerism to grow wealth, these forces are not underrepresented in our design for the future. Instead, investment is needed in data science for social impact, which seeks to answer fundamental questions of human well-being. What problems could we solve, what diseases could we eradicate, if the brightest minds and greatest resources were invested in solving a global health crisis? What data and information about solutions and people involved in the solution could recover coral reefs, and protect wetlands? How can we track and stop the hiding of wealth by the corrupt refusing to reinvest in the public good? What tools can we build to relieve back-breaking labor and leave room for rest, for creativity, and for education?

Gaps in the capacity of non-profit and other mission-driven organizations to embrace technology and data science are often taken up by outside contractors, by corporate social responsibility programs of technology organizations, or by short-term volunteers, but we need solutions that are far more sustainable and independent. Social impact organizations are often charged with ensuring accountability or advocating for changed behavior of those very institutions or companies that are the leading providers of data science support. We need the non-profit and civil society sectors to have the capacity to pursue their own data science needs with as innovative and cutting-edge talent, technology, and tools as any other organization.

We seek a world where the data and information used to anticipate future actions are a mirror, not of the world as it exists now, but of a world that we want to live and thrive in. We know that each of us is more innovative, more prosperous, and our solutions more sustainable when we are diverse when we take the time to weave in the perspectives, experiences, and expertise of many people, and when we prioritize those most vulnerable. By placing these values, social impact, at the forefront of any investment in data science, we will shape a future that will reflect who we wish to be.

HOW

Knowing why we need to drive data science to prioritize social impact, to create a mirror of the world we wish to live in, we’re then faced with how to achieve it.

Reaching this new prioritization requires reshaping an entire field. Seemingly the greatest need is to hire more technologists and data scientists into government and mission-driven organizations. This is admirable work, and absolutely needed, but to reshape the field a multisector approach is required across: education, employment, policy, and culture. What follows is an outline of how to approach this work.

Education
Primary & Secondary

Before considering a redesign of tertiary education and ethics education for data scientists, it is worth revisiting early childhood education and primary education, where for too many, ethics, math, and art are separated. Children are too often driven to choose between maths and hard science, or humanities and soft sciences. By valuing and investing in both, we will support the kind of interdisciplinary mindset and mentality needed to be great data scientists and great leaders. There’s a need to reshape our vocabulary, and a shorthand for this- “I’m a linear thinker” or I have “an engineer mindset” or “i’m a creative thinker, I see all parts.” Reshaping the field will happen with leaders who are able to use all sides of their brain, and approach problems from many different perspectives and expertise.

Post-Secondary

Education in interdisciplinary studies provides for a world and workforce with a better understanding of the complexities of difficult problems, but even students who are now better positioned to take advantage of higher level education need training. Working with professors and leaders in universities, we can ensure that those entering into data science, computational science, and statistics paths are also required to study public good, philosophy, and ethics. Conversely, we must also ensure our future leaders in public policy and mission-driven organizations are oriented to data science and concepts, to be able to communicate across expertise. Programs exist and are growing at University of PretoriaBoston UniversityWashington University, and Lee Kuan Yew School of Public Policy, and courses and initiatives at NYUVirginia Tech, and many others and more sharing of curricula and outcomes are needed to support additional programs.

Employment
Fellowships

Students ready to enter the workforce will seek a career that is personally fulfilling, that can help others, but that they can expect will grow their talent and expertise to ensure a living wage. Currently, it is rare that an organization is wholly mission driven and has the infrastructure and capacity to nurture and grow technical talent. Most NGOs are in a technical transition period, plugging gaps where they can and trying to reimagine how to become data driven while keeping their core work as their priority. To begin, fellowships have demonstrated a proven benefit to introducing talent to new fields and nurturing talent in a way that an individual organization may not have capacity to do. Rayid Ghani’s Data Science for Social Good fellowship has had success with early career fellows. Programs from Mozilla FoundationAspen Tech Fellows or Atlas Corps Tech Fellows are ripe for connecting mid-level leaders to NGOs in transition and seeking expertise. More programs are needed, and to ensure that education is successful more fellowships are needed that are tied in with universities, to support early career exploration of technologists into social impact sectors.

Data Science Roles

Currently, individuals and organizations poised to benefit from, and influence the field, are driven into the private sector. Partly due to the larger salaries and job stability that are available, but also due to the fact that companies have recognized the need to invest in their technology infrastructure, the education of their employees, and the quality of their data to be able to be influential. This shift to integrate technologists in non-profits and other mission -driven organizations requires a massive expenditure and investment up front. While more recent mission-driven organizations like Community SolutionsPolicing Equity, or Code for Africa can build data and data science into their architecture at the outset, long-standing NGOs and civil society organizations too often lack the financial capacity for organization-wide change. Instead such institutions defer needed upgrades and upskilling staff for stewarding the work with partnerships and relationships that they have cultivated and built trust with over years and decades. Engaging in discussion and recognition of how to transform an organization can be daunting, and can be tempting to just hire data scientists and hope everything will be solved. But the work of transformation requires a whole-of-organization shift. Most early work is simply in cleaning data, inventorying, and categorizing to get a sense of what data is collected and useful. How an organization makes decisions and what data informs these decisions. Organizations need to do the hard work themselves of diagnosing what talent they need to be on the path of digitization and transformation, and truly hiring a data scientist is likely not the first step. The field itself needs support for organizations seeking transformation, and more clearly defined roles within the organizations for those with technical talent to have their hard-developed skills and talent well-used.

Transformation is hard. It can be easier to start an organization from the beginning than to try to reshape an existing system. But so much of the work and relationships are entrenched and trust has been built and shaped over decades, it is worth the effort to update these large institutions. Many organizations have tried and stumbled in their transformation, but it is encouraging to see INGOs like World Resources International and DonorsChoose set up infrastructure and hire leaders to make these investments.

Leadership

None of these transformations would be possible without leaders who recognize and integrate evidence and data driven decision making into their organizations and across their teams. Not all leaders need to be data scientists, but every good leader must value data and data scientists to place them into positions of authority and power. Organizations like NetHope have sought to do this and have worked hard on digital transformation of organizations, but too often those with data expertise are placed into a vertical of digitization, and not included in leadership decisions. Leaders that include data science in their decision making and empower technical leads across the organization are needed to reshape the field.

Policy

We need local and global policies in place that prioritize social impact and protect the most vulnerable from harm and bias in data science. Our broadest sweeping policies including the EU legislation General Data Protection Regulation (GDPR), prioritize protections from corporate overreach, which is absolutely right, but neglect the development of policies that prioritize social impact. Protections currently in place, for example, place a far greater burden on researchers seeking information for a public health response than for a private company conducting market research. We need deliberate efforts to place social impact first at all levels of data collection, storage, analysis, and application.

This applies not only to global policies but organizational and operational policies of individual organizations. In, Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing, a team of current leading AI ethics researchers identify the shortcomings of an organization’s statement of principles, and advocate for operational policies that reinforce accountability in aspiring for social impact in a field with shifting boundaries and untested ideals.

The framing of “social impact” is also worthy of scrutiny. Who defines social impact, social good, and how do these definitions evolve? Fortunately, as the founder of the Engine Room Alix Dunn writes, the field of human rights has laid a growing foundation of how to identify and steward social good, who should protect the definitions, and how to refine and improve them. Those organizations and individuals working in this field for decades have developed language around what values to prioritize and protect, dignity, inclusion, fairness, equity. We need tools to be driven by those with the expertise and experience, those who have dedicated their lives to combating domestic violence, homelessness, pollution, bigotry in all forms. We need data science leadership from those steeped in fighting for equity in access to food, education, work, health, and liberty- what the United Nations Human Rights Council calls the rights “that make life worth living.”

Culture

In this moment, the promise and potential is in how the world, we the people, will use technology. As our lives move swiftly online, whether or not we choose to connect, the facts about ourselves are getting digitized and quantified, stored and analyzed to provide insights about imagined prosperities and futures. The world we work toward must be informed by and reflect the world we wish to live in. This means that the people involved in decision-making, and the data that represents them, must represent the diverse background of experiences, interests, expertise, and goals that are integral to our future, as we expand beyond a world where a few represent the interests of the many.

To realize this complexity, we need language that cuts across many disciplines. A data scientist is by definition interdisciplinary, someone that has expertise in mathematics, computer science, and subject matter expertise. We need a better common language and vocabulary that is simple and resonates with multiple fields of expertise.

With this greater familiarity with the concepts of data science, of evidence in decision making, of predictive analytics, of flaws in training data and bias in algorithms, of equity, fairness, and transparency, we will build toward data science for social impact.

WHAT

Civil society is overdue for its own data revolution. And we all need it to undergo one. What we may find in this transformation is a rise of ethicists, those at the front lines who have built their careers and expertise on working with and listening to the most vulnerable, the most affected, the voices who are too often lost or intentionally cut out of decision-making. Faced with a very real person in front of you, it can be hard or impossible to imagine reducing their story, their life, to a datapoint. But without civil society serving to translate, governments will continue to build their conclusions based on their own understanding, often too many steps removed from frontline experience, and guided by a private sector that has been able to build incredible tools and machinery with the data that is possible to gather and understand. In a World That Counts, the UN publication that helped launch the Global Partnership for Sustainable Development Data, charged with helping countries improve their data capacity to achieve the Sustainable Development Goals, the researchers and authors identify that the world we are building is being built for those whose needs can be reflected in the data. As we continue to integrate algorithms and AI into decision-making about where roads are built, how vaccines are distributed, and who should receive a longer prison sentence, we need an equipped civil society advocating, influencing, and writing these algorithms. Dunn writes that the conversations about ethics in data do not require new frameworks but a return to the principles of human rights. These fundamental principles are continually refined in how they apply to new issues and new concepts including any conversation and debate on data and data use. The human rights advocates and those steeped in this work must be at the forefront of the conversation, should they only have the literacy to do so.

What can be done? This work is ongoing, and the gaps and needs will ebb and shift as more people and more organizations recognize the need for and invest in data science for social impact. To continue to grow and build this field, there are some concrete actions that can be taken:

Telling Stories: We must share the stories of what is possible with data science, and flood the market with stories of how data science is used for social good and not for ill. This is not to discount shining a light and transparency on the great harms currently underway with data science, from bias in policing to test taking to healthcare, but we must balance these stories with the potential benefits of data science to attract a generation to contribute to good. Preventing student drop-outs, anticipating a crop infestation, and investing in municipal responses to flooding to curb cholera outbreaks are all active and needed uses of data science for social impact.

Sharing Examples of Commendable Leadership: Data scientists come from anywhere, and many have not followed a linear path to serve as a data scientist focusing on social impact. Students entering in the field must learn origin stories of those making an impact and demystify what might otherwise seem a daunting journey.

Invest in Capacity Building: Students, researchers, mission-driven organizations, civil servants, and private sector all need to grow capacity in data science for social impact- whether from a technical background, social background, or interdisciplinary, all sectors will benefit from greater training and upskilling for fair and equitable data science. Many of these trainings and toolkits already exist, and will only be taken up when those who have experience participating in training and building capacity.

Public Goods: Like any sector, data science for social impact needs basic infrastructure to grow. Online courses anyone with partial internet access and a laptop can access, developed by and for multiple languages. Better training data to curb current biases in algorithms. Improved assessments of whether data science is the best tool to use. Models and best practices of how to transform an historic mission-driven organization to incorporate data-driven decisions and data science leadership. Better openly shared policies to address the use of personal data by private companies and public institutions alike. A public commons of tools, resources, trainings, and mentorship is vital to moving the field forward.

How this field is built is also vital. We cannot together build a field that prioritizes social impact if it is not part of the work that we do everyday. Some key principles, though there are many to include and expand:

Multisector Partnerships: Genuine partnerships, not just an invitation to the proverbial table but designing the table, setting the agenda, and updating the invite list- is needed in any meaningful work to support data science for social impact. Partnerships are complex and can be long and messy work, but meaningful engagement with multi-sector organizations that can confidently state their perspectives were valued, considered, and where possible integrated, leads to greater trust and more sustained impact for any work.

Transparency in Design and Execution: Building a field is not a straight and knowable path, the choices made are specific to the needs and context in a time and will change as the field itself changes and grows. Taking on the principles of open source, any work should be knowable, the decisions made- whether agreed or not- are understandable, and the work can be picked up and continued by others.

Invest in the Community: Whenever working at the edge of a field it’s always tempting to go with the experts, those who have worked for decades and have established success. But just as we need good data to create good models, we need a next generation of data scientists for social impact that understand the current work to know best how to improve upon it. Including students, advocates, volunteers, and early career professionals in the work of building the field will help in ensuring the next generation are exposed to the potential of data science for social impact, and invest in advancing their career to be leaders in the work ahead.

Feedback Loops: Ensure that any work is available for scrutiny and improvement by those directly affected, particularly the most vulnerable. This improves the work and product overall, builds trust in the work, and attracts more talent to help future improvements.

The work of data science for social impact is current, evolving, and expanding. Data science is the field that is shaping and will shape the future of our lives, how we work, how we collaborate, how we govern ourselves, and how we grow. As we build our shared future, we have the opportunity to design and prioritize systems that represent the best of ourselves, that serve as a mirror not of the society we are now but of who we hope to become. To build a more perfect future, we must put social impact at the forefront of all our data science work.

This long-form piece represents the personal views and reflections of Kat Townsend, Executive Director, Open Data Collaboratives, and is informed by her many years of work within policy and data, including being COO of data.org between January 2020-2021. It’s the first in data.org’s series of thought pieces by our partners, collaborators, and contacts in the DSSI sector.