Staff Software Engineer, Batch Compute

Job Description

We are building Rift (talk, blog post) - a new fully managed compute environment that allows data scientists to construct powerful batch and streaming pipelines in Python. Our new environment leverages popular open-source technologies such as Ray, Arrow, and DuckDB. We also have deep integrations with Spark platforms (Databricks, EMR, Dataproc) and data warehouses (e.g. Snowflake, BigQuery, RedShift), along with performant training data pipelines and a workload orchestration platform.

As a staff-level engineer on the Batch Compute team, you’ll play a critical role in architecting, designing, and scaling the core compute engines and storage architecture used by every Tecton customer. You'll contribute to the performance of our query optimizer, from parsing & optimization to plan selection.  Think of this team as the “beating heart” of Tecton.

This role is a unique opportunity that combines customer-obsessed product focus with platform and data engineering innovation and helps companies accelerate their path to real-time AI. You will be working in one or more of the following areas to build the next generation of Tecton infrastructure:
- Distributed compute and resource management
- Query optimization and distributed execution
- Cross-platform integrations with state-of-the-art data platforms


Responsibilities
  • Own and lead large technical domains starting from the problem definition and technical requirements to implementation and maintenance
  • Lead multi-engineer projects of strategic importance to Tecton spanning cross-functional teams including product management and other engineering teams
  • Drive efforts to improve engineering practices, tooling, and processes along with mentorship for senior engineers
  • Develop a deep understanding of the fundamental problems our customers face in building ML systems
  • Be a generalist as needed. We’re a small, but growing engineering team and each engineer needs to be versatile

  • Qualifications
  • Experience working in large Python, Java, Kotlin, or Go codebases and running cloud-native Spark systems (e.g. AWS EMR, Databricks, GCP Dataproc)
  • Experience in performance tuning of Spark, Ray, Maestro, or Airflow jobs
  • Knowledge of data formats such as Parquet, Avro, Arrow, Iceberg, or Delta Lake and object storage (e.g. S3, GCS)
  • Expertise with cloud-scale query performance, query optimization, query planning, heuristic query execution techniques, and cost-driven optimizations
  • Experience with internals of distributed systems, SQL/NoSQL databases, data lakes, or data warehouses
  • Strong communication skills and ability to write detailed technical specifications
  • Excitement about coaching and mentorship of junior engineers
  • BSc, MS, or PhD in Computer Science or related fields
  • 8+ years of experience in building product software systems
  • 5+ years of technical leadership experience for a group of engineers