New

Data Engineer

Full-time

Remote

Deadline

August 28, 2025

About the organization

Wikimedia Logo

Wikimedia Foundation

Organization type

Philanthropy

In A Nutshell

Location

Remote Anywhere in Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Kingdom, United States of America and Uruguay.

Salary

$101,102-$156,045

Job Type

Full-time

Experience Level

Entry-level

Deadline to apply

August 28, 2025

Contribute to the Data Platform Engineering team’s effort to unify data systems across the Wikimedia Foundation to deliver scalable solutions.

Responsibilities

  • Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka.
  • Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly.
  • Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines.
  • Collaborate with peers to improve and evolve the shared data platform, enabling use cases like product analytics, bot detection, and image classification.
  • Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance.

Skillset

  • 3+ years of data engineering experience, with exposure to on-premise systems (e.g., Spark, Hadoop, HDFS).
  • Understanding of engineering best practices with a strong emphasis on writing maintainable and reliable code.
  • Hands-on experience in troubleshooting systems and pipelines for performance and scaling.
  • Desirable: Exposure to architectural/system design or technical ownership.
  • Desirable: Experience in data governance, data lineage, and data quality initiatives.Working experience with data pipeline tools like Airflow, Kafka, Spark, and Hive.
  • Proficient in Python or Java/Scala, with working knowledge of development tools and its ecosystem.
  • Knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto).
  • Working knowledge of CI/CD processes and software containerization.

Spot any inaccurate information? Have a job to share? Let us know.