About the project
The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load.
Infrastructure:
~100 servers (bare metal + VPS)
active use of IaC
Kubernetes clusters in production
focus on stability, observability, and automation
The project is long-term — not a hype startup, but a mature product with real users.
What the work looks like
This is a hands-on role with a clear time allocation:
60% — operations and incidents (including helping teams)
20% — infrastructure automation
20% — prototyping, improvements, technical initiatives
There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week.
Responsibilities
Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting)
Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews)
Monitoring, alerting, backups, and regular recovery checks
Development of service and infrastructure automation
Development of CI/CD and release procedures
Incident diagnosis and resolution, support for product teams
Traffic analytics, bot and attack protection tools
Responsibility for 24/7 platform stability
Requirements
What’s important
4+ years of experience operating Linux/Ubuntu infrastructure and production services
Strong understanding of networking and troubleshooting
Kubernetes (cluster operations), Rancher, Docker / containerd
Hands-on experience with Ansible and Terraform
Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry
CI/CD: Jenkins
Automation: Bash, Python
Experience working with LVM
Nice to have
Experience working with blockchain nodes
Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters
Providers: Hetzner / OVHcloud
Cloudflare (edge, DDoS), experience with AWS
Handling abuse tickets with hosting providers
Technology stack
VPN: WireGuard, OpenVPN
Databases: ClickHouse, MongoDB, Redis, PostgreSQL
Applications: Node.js (pm2), php-fpm, Lua, Tarantool
Supporting services: Go (operatorSDK), Ruby, Node.js, PHP
Benefits
5,000 – 8,000 € net
Format: office / hybrid / remote
Location: Spain (Barcelona and suburbs) or remote (CET ±2)
Full-time
Opportunity to genuinely influence architecture and processes
Mature engineering team and reasonable expectations