In A Nutshell
Design and operationalise pipelines that bring together structured health data, conversational AI, and predictive analytics, translating complex systems into usable, ethical, and field-ready digital tools.
Responsibilities
- Design and implement scalable data architectures integrating Sheets, Excel, and government systems into cloud databases (PostgreSQL, BigQuery).
- Develop APIs and ETL workflows for data ingestion, transformation, and retrieval across GCP-based systems.
- Design and orchestrate Gemini/LLM pipelines for conversational reasoning, data interpretation, and predictive insights.
- Build ASR–LLM–TTS pipelines optimised for multilingual, low-resource contexts (Hindi + regional languages).
- Manage embeddings and vector databases for contextual retrieval and knowledge grounding.
- Translate backend intelligence into usable insights for health workers, dashboards, chatbots, and community feedback loops.
- Collaborate with program teams to ensure AI models reflect real public health needs, ethics, local contexts, and are validated against field realities, including low-connectivity environments.
- Support rapid data visualization for program dashboards and government review systems.
- Establish data security, versioning, and model monitoring best practices.
Skillset
- B.Tech/M.Tech in Computer Science, Data Science, or related discipline.
- 3–6 years of experience in backend, data engineering, or AI-driven product development; exposure to health, GovTech, or social impact data preferred.
- Languages: Python (Pandas, FastAPI, LangChain, SQLAlchemy).
- Databases: PostgreSQL, BigQuery, SQLite; vector DBs such as Pinecone, FAISS, or Chroma.
- AI/LLM: Gemini API, LangChain, prompt design and orchestration.
- Speech Tech: Experience with ASR (Whisper, Google Speech) and TTS (Coqui, ElevenLabs).
- Experience applying NLP to unstructured text/audio for community feedback and AI-enabled sensemaking.
- Cloud: Google Cloud Platform (Cloud Run, Cloud Functions, BigQuery, Secret Manager); experience with containerized workflows using Docker.
- Data Pipelines: End-to-end ETL development, schema design, data validation, logging, and performance monitoring.
- Visualization: Experience with tools such as Streamlit, Gradio, Looker Studio, Power BI, and user journey mapping for rapid analytics and insight generation.