New

Data Engineer

Full-time

On Site

Deadline

January 24, 2025

About the organization

ARTPARK logo

ARTPARK @ IISc

Organization type

Social Impact Organization

In A Nutshell

Location

On Site Bengaluru, Karnataka, India

Job Type

Full-time

Experience Level

Entry-level

Deadline to apply

January 24, 2025

Opportunity to engage with leading experts in disease modelling, climate-health systems, engineering, and public health, both nationally and internationally, in a dynamic and highly motivated environment.

Responsibilities

  • Integrate and structure data from diverse sources into a coherent, harmonised format ready for use by advanced computational models.
  • Develop and automate a robust, scalable data and ETL pipeline using cutting-edge technologies to ensure smooth data flow, reliability, and real-time processing.
  • Work with data analysts and computational epidemiologists to design and deploy simple, accessible, and scalable data access mechanisms and policies while ensuring strict data governance that complies with relevant laws and policies.
  • Engage in exhaustive data cataloguing and documentation for all data acquired from various sources and maintain a repository of the standards and processes used on the data.
  • You will be responsible for streamlining the data flow so that computational and simulation modellers can easily access and utilise the data in their models without manual intervention.
  • Manage and handle different types of data, including spatiotemporal complex datasets – such as semi-structured and unstructured data, climate data, image datasets.
  • Apply state-of-the-art data standardisation techniques, leveraging AI and machine learning, including large language models (LLMs), to convert unstructured and semi-structured data into clean, usable formats for production-grade models.

Skillset

  • Bachelor’s in computer science, engineering, mathematics or related quantitative scientific discipline. A master’s degree is preferred.
  • 3-5 years experience in similar roles.
  • Demonstrable experience in developing and implementing ETL pipelines.
  • Expertise in Data Engineering and Automation: Proven experience designing and implementing robust data pipelines using tools like AWS cloud services and Python. Working on and prior experience maintaining open source stacks is highly desirable
  • Expertise in Database Management and Data Modelling: Deep knowledge of database management, schema design, and data modelling. Working closely with the computational epidemiology team, you will design databases and structures that align with their requirements, ensuring the data is well-organised and ready for analysis.
  • Prior experience with AI and Machine Learning Integration is desirable but not required.

Spot any inaccurate information? Have a job to share? Let us know.