In A Nutshell
Take the lead in building our data pipelines and core tables for OpenAI.
Responsibilities
- Design, build and manage our data pipelines, ensuring all user event data is seamlessly integrated into our data warehouse.
- Develop canonical datasets to track key product metrics including user growth, engagement, and revenue.
- Work collaboratively with various teams, including, Infrastructure, Data Science, Product, Marketing, Finance, and Research to understand their data needs and provide solutions.
- Implement robust and fault-tolerant systems for data ingestion and processing.
- Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear.
- Ensure the security, integrity, and compliance of data according to industry and company standards.
Skillset
- Have 3+ years of experience as a data engineer and 8+ years of any software engineering experience(including data engineering).
- Proficiency in at least one programming language commonly used within Data Engineering, such as Python, Scala, or Java.
- Experience with distributed processing technologies and frameworks, such as Hadoop, Flink and distributed storage systems (e.g., HDFS, S3).
- Expertise with any of ETL schedulers such as Airflow, Dagster, Prefect or similar frameworks.
- Solid understanding of Spark and ability to write, debug and optimize Spark code.