Job Description

About the team:

The ML Data Engineering team is at the heart of metadata extraction and enrichment for all of our brands, managing and processing hundreds of millions of documents, billions of images, and serving millions of users. We operate at an unparalleled scale, handling diverse datasets, including UGC documents, ebooks, audiobooks, and more. Our goal is to build robust systems that drive content discovery, trust, and structured metadata across our platforms.

Role Overview:

We are seeking a Software Engineer II with a strong background in data engineering, software development, and scalable systems. As part of the ML Data Engineering team, you will work on designing, building, and optimizing systems that extract, enrich, and process metadata at scale. You’ll collaborate closely with machine learning teams, product managers, and other engineers to ensure the smooth integration and processing of vast amounts of structured metadata.

Tech Stack:

Our team uses various technologies. The following are the ones that we use on a regular basis: Python, Scala, Ruby on Rails, Airflow, Databricks, Spark, HTTP APIs, AWS (Lambda, ECS, SQS, ElastiCache, Sagemaker, Cloudwatch, Datadog) and Terraform.

Responsibilities

Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types.

Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions.

Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images.

Optimize and refactor existing systems for performance, scalability, and reliability.

Ensure data accuracy, integrity, and quality through automated validation and monitoring.

Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase.

Manage and maintain data pipelines, security and infrastructure.

Requirements

4+ years of experience as a professional software engineer.

Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar.

Hands-on experience with data processing frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing.

Experience working with systems at scaleExperience working with a public cloud provider (AWS, Azure, or Google Cloud).

Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas.

Proven ability to test and optimize systems for performance and scalability.

Bachelor’s in CS or equivalent professional experience.

Bonus points if you have experience working with Machine Learning systems.

Aledade

Staff Software Engineer- Data Infrastructure

As a Staff Software Engineer, you will take us beyond traditional monolithic SQL engines and batch pipelines. You will build the next generation of distributed data stora;

engineer
dev
machine learning
python
sql

Underdog Sports

Data Engineer

About the role and why it’s unique: As a Data Engineer on the Data Platform team, you’ll be architecting and developing Underdog’s sports focused data systems and data p;

engineer

Provectus

Middle/Senior Data Engineer (AWS) (copy)

We are seeking a talented and experienced Data Engineer to join our team at Provectus. As part of our diverse practices, including Data, Machine Learning, DevOps, Applica;

engineer
python

Included Health

Senior Software Engineer, Data Tools

On the Data Tools team we build the tools to solve real-world healthcare challenges with data.  How?  By empowering our Data Science, Analytics and Product team;

engineer
senior
python
dev
machine learning

Data Engineer II

Job Description

USA Only

Data Engineer

11 days ago

Aledade

Staff Software Engineer- Data Infrastructure

Underdog Sports

Data Engineer

Provectus

Middle/Senior Data Engineer (AWS) (copy)

Included Health

Senior Software Engineer, Data Tools

Find Remote Jobs

About us