In A Nutshell
Rresponsible for maintaining data management systems and deploying machine learning models within those systems.
Responsibilities
Design, build, test, and maintain machine learning pipeline architectures (70%)
- Produce high-quality, reusable code for data ingestion, validation, and processing pipelines
- Architect and implement end-to-end ML pipelines including training, retraining, and inference systems for schools using the SST
- Design and build APIs to easily access, integrate, and manage data from different sources
- Ensure data infrastructure is in compliance with data governance and security policies
- Create comprehensive documentation for data infrastructure and ML pipelines, tailored for both technical and non-technical stakeholders
- Advance internal analytics reporting and automation capabilities as needed
Provide direct data support to partners (15%)
- Manage initial data lifecycle processes for new school onboarding including ingestion, transfer, audit, and validation
- Collaborate with data platform partners on integration and data transfer pipelines
- Provide technical guidance to partners on how to share data formatted in alignment with our data model and with appropriate data governance measures
- Address partner concerns regarding data security and ensure their specific requirements are satisfied
- Support data science initiatives through processing, cleaning, and analyzing data as needed
Collaborate and contribute across DataKind (15%)
- Support other data team members through code reviews and knowledge sharing across products
- Collaborate with the Product, Engineering, and Research teams to ensure seamless integration and alignment of work
- Effectively communicate project status and manage expectations with internal teams and partner organizations
- Maintain accurate and current project information in project management tools like Asana
Skillset
Required
- Alignment with DataKind’s mission and values, including our commitment to anti-racism
- Experience working across lines of difference (culture, identity, and time zone)
- At least 3 years of professional work experience in developing and deploying a machine learning product at scale
- Foundational understanding of machine learning and statistical methods for predictive modeling
- Expert in Python
- Experience with cloud computing (GCP preferred)
- Experience with databases (SQL, Postgres, PySpark, and/or other data query languages)
- Experience with DataBricks or a similar data intelligence platform
- Experience with data warehousing, orchestration, integration, and ETL tools
- Experience with modern source code management and software repository systems (i.e. Git)
- Experience documenting and implementing RESTful APIs
- Proven track record of successfully managing full life-cycle machine learning implementation projects with multiple stakeholders
- Solid understanding of Software Engineering principles and best practices and the data science project life-cycle
- Comfort and skill in communicating highly technical information to semi- and non- technical audiences
- Self-motivated, results-driven, and persistent in the face of challenges
Preferred
- Experience integrating data from SaaS providers
- Experience in the nonprofit sector and/or in a small startup organization
- Experience in scaling machine learning products, handling data quality and volume
- Certifications in cloud computing
- Advanced experience in machine learning—confident in applying, tuning, and evaluating a wide variety of algorithms
- Experience with software development and/or web-dev work (frontends, dashboards, etc.)
- Track record of strong technical writing for a variety of audiences
- Proven track record of (internal or external) client service orientation