“Where do I apply this?”
The explosion of online learning platforms, from YouTube channels by star educators to massive open online courses (MOOCs) to platforms like DataCamp and Zindi, has democratized how tech-savvy youth learn data science skills. A simple internet search reveals countless blog posts by data enthusiasts showcasing their technical skills in statistics and machine learning using readily available datasets that may not be domain-specific or representative of global issues. However, talking to many of the same enthusiasts reveals a common frustration: how do I apply these skills to solve “real,” meaningful challenges?
Take the case of Stacy Kozlovska, a data enthusiast from Chrenivtsi, Ukraine who began her data science journey as an active member of Google Summer of Code. After earning an undergraduate degree in marketing and philology, Stacy taught herself data science and joined online forums to build her data skills. She wanted to apply the same skills in education, a sector she deeply cares about, and with Ukrainian youth in particular. However, the lack of access to datasets, mentorship, and scant research opportunities was a hindrance. These problems are exacerbated in teaching or learning Data Science for Social Impact (DSSI). “It’s one thing to learn a set of tools and apply them to well-curated datasets available online. It’s another game to deal with real-world datasets that are messy and incomplete. A data science education is incomplete without engaging with these real-world datasets,” she says.
Integrating Data Science Education with Domain Expertise
As the need for data scientists across domains—from healthcare to climate change to finance to the public sector—continues to grow around the world, there is an urgent need to re-imagine how data science is taught. While most data science courses focus on core statistical tools, they fail to provide adequate exposure to domain knowledge for context-specific challenges in gathering, cleaning, and understanding data. Thus a data professional who is quite skilled at applying the latest tools to carefully curated datasets used for instruction might find it daunting when faced with messy datasets in the real world. Besides, understanding the nuances of data collection and cleaning is central to gleaning the correct insights from data. Additionally, domain expertise would allow one to ask the most relevant questions about the data being worked on.
“Understanding ‘first problems’ in data—how it is sampled, cleaned, and labeled—is the barrier to becoming an effective data scientist,” says, Dr. Bhasi Nair, Director of Data Science at EquiTech Futures, where he develops new project-based data science curricula. “Training the next generation of effective data scientists requires us to equip them not only with the latest statistical tools but also with the ability to “think like a scientist” about how data is collected and to have the intellectual agility to learn about a new domain with which they may not be familiar.”
It takes time to help new talent build an interdisciplinary lens in engaging with data. A recent report on Workforce Wanted: Data Talent for Social Impact released by data.org, Patrick J. McGovern Foundation, and Dalberg, through a review of nearly 200 data talent initiatives and expert interviews with over 30 leaders, reveals that short-term training programs, boot camps or fellowships under three months have limited impact on learning outcomes. These program types may not run for sufficient time to cultivate a “data mindset,” which requires experiential learning by engagement with sector-specific challenges.
Emphasizing Experiential Learning
Many online and tech training programs have been criticized for lack of alignment with sector demands from impact-focused organizations. Programs that offer internship placements or practical capstone projects to complement in-class teaching demonstrate a greater propensity to support professionals in securing work. For example, Laboratoria in Latin America engages its extensive network of partner companies throughout the training period in delivering talks, providing feedback on student-run projects, judging the final hack-a-thons, and participating in recruiting events. The trust this builds with employers over time and the opportunity it grants them to interact with students on multiple occasions are key drivers of Laboratoria’s high placement rates in high-quality jobs.
An experiential curriculum can be a crucial differentiator between learners who are job-ready and those who are not. Connections with social impact organizations (SIOs) or government agencies to engage with real-world data challenges early on has dual benefits: to help students learn soft skills and gauge the professional world while making participating in supporting SIOs more data-driven.
The IDEA Challenge: Democratise DS education while facilitating effective learning
A study by Dalberg and Intel, Decoding Diversity: The Financial and Economic Returns to Diversity in Tech shows that improving ethnic and gender diversity in the workforce could create USD 470 to USD 570B in new value for the tech industry and could add 1.2 to 1.6% to national GDP. Cathy O’Neil’s book, Weapons of Math Destruction highlights how mathematical models or algorithms that claim to quantify important traits often have harmful outcomes and reinforce inequality (e.g., by encoding racism or other biases into algorithms).
As the legal scholar, Kimberlé Crenshaw, through her theoretical framework of intersectionality posits that multiple social categories (e.g., race, ethnicity, gender, sexual orientation, socioeconomic status, etc.) intersect at the micro level of individual experience to reflect multiple interlocking systems of privilege and oppression at the macro, social-structural level (e.g., racism, sexism, heterosexism). While the underlying drivers of exclusion are many, very few pathways reveal initiatives that intentionally focus on building and shaping an IDEA (Inclusivity, Diversity, Equity, Access) ecosystem. IDEA can be a foundational asset in advancing access to, the value of, and the impact of DSSI talent initiatives.
An example of this is the Financial Inclusion Accelerator established by data.org and the University of Chicago’s Data Science Institute with a consortium of seven other diverse higher education partners including Hispanic Serving Institutes (HSIs), Minority Serving Institutes (MSIs) and Historically Black Colleges & Universities (HBCUs), and that adopts an IDEA lens at its core. These partners are working collaboratively to create a modular, experiential curriculum that will be accessible to a diverse community of students, that could fundamentally change who ‘sits behind the computer’ and has the power to process the data, analyze it and tell stories through insights from that data.
Looking Ahead
As the Workforce Wanted: Data Talent for Social Impact report highlights, if the latent demand in the social impact space is stimulated, there is potential to cultivate 3.5 million data professionals in the next 10 years. It is up to us to create a diverse and inclusive workforce by ensuring interdisciplinary teaching with domain expertise along with technical skills, experiential learning through engagement with SIOs, and being intentional about IDEA. The opportunity is up for grabs as we build the field of data science for social impact.
About the Authors
Priyank Hirani is the Director of Capacity Building at data.org, where he strategizes and implements initiatives to democratize data skills and enable social impact organizations to be data-driven
Read moreDr. Abhilash Mishra is the founder of EquiTech Futures. He is also the founding director of the Xu Initiative on Science, Technology, and Public Policy, a research center at the Harris School of Public Policy, University of Chicago.
Read more