Job Description

Strategy Creation: Collaborate with cross-functional teams to define the data engineering strategy aligned to business objectives, including data modeling that unifies data assets across a range of source systems used to manage the operations of our partnering hospitals.
Pipeline Development: Define and execute processes needed to develop, test, deploy, and maintain high quality data pipelines. Oversee the end-to-end development of data pipelines from source data extraction through to production-grade analytical dataset delivery, ensuring data quality and security throughout the pipeline.
Performance Optimization: Continuously monitor and optimize data processing performance and efficiency. Identify and address bottlenecks, optimize query performance, and improve overall system stability.
Data Governance: Establish and enforce data quality management policies, data access controls, and data privacy standards.
Technical Leadership: Stay abreast of the latest developments in engineering tools and best practices. Provide guidance to the team about technical challenges.
Documentation: Maintain clear and comprehensive documentation of data pipelines, architecture, and processes to ensure knowledge sharing and team continuity.
Third-party Management: Evaluate and manage relationships with third-party vendors and tools, making informed decisions about when to leverage external solutions.

Requirements

• 3+ years in data engineering roles in a production environment

• Advanced proficiency in Python and SQL for data engineering

• Up-to-date knowledge of and 1+ years of experience using Databricks for Lakehouse management

• Deep understanding of data modeling, data architecture, and data integration best practices

• Strong hands-on experience with Apache Spark

• Familiarity with data governance, security, and privacy principles

• Comfort using git or equivalent to manage the software development life cycle

• Exceptional ability to learn and use new software development techniques and tools

• Ability to manage multiple projects simultaneously

• High energy, humble team player with “get it done” attitude, seeking collaboration with colleagues

Preferred Qualifications

• Experience with the Azure cloud ecosystem

• Experience developing production-ready, real-time machine learning model serving pipelines

• Comfort developing in the Apache Spark Structured Streaming paradigm

• Experience working in a private equity-backed services company

• Experience deploying machine learning models with MLFlow or equivalent

• Experience developing CI/CD pipelines

Mediavine

Data Engineer II - (Remote - US)

Mediavine is seeking an experienced Data Engineer to join our engineering team. We are looking for someone who enjoys solving interesting problems and wants to work with ;

engineer
python

Tiger Analytics

Data Engineer - AWS

Tiger Analytics is a fast-growing advanced analytics consulting firm. Our consultants bring deep expertise in Data Science, Machine Learning and AI. We are the trusted an;

engineer

Sunscrapers Sp Z O O

Data Engineer (Apache Spark)

Are you ready to take the challenge?We’re looking for a Data Engineer to join our team in Warsaw or remotely.Advance your career with Sunscrapers, a leading force in;

engineer
python

Wynd Labs

Data Engineer

Data Engineer $100k - $140k Who We Are. Wynd Labs is an early-stage startup that is on a mission to make public web data accessible for AI through contributions to Grass.;

engineer

Senior Data Engineer

Job Description

USA Only

Data Engineer

2 months ago

Mediavine

Data Engineer II - (Remote - US)

Tiger Analytics

Data Engineer - AWS

Sunscrapers Sp Z O O

Data Engineer (Apache Spark)

Wynd Labs

Data Engineer

Find Remote Jobs

About us