Build deep context on Arcadia’s platform, production risks, and operational practices. Participate in on-call/incident response and quickly improve signal quality for at least one critical domain (dashboards, alerts, traces, runbooks). Identify a high-leverage reliability initiative and align stakeholders on scope, success metrics, and milestones.
In 6 months
Establish SLOs/error budgets for key customer journeys, drive operational readiness standards for launches, and lead remediation for recurring incidents with measurable reductions in customer impact and MTTR. Deliver major toil-reduction improvements via automation and self-service workflows.
In 12 months
Own and execute a reliability program with cross-org impact (e.g., GitOps delivery guardrails, observability platform evolution, resilience/DR improvements, or secure infrastructure controls). Influence architecture decisions, establish org-wide operational standards, and mentor Staff engineers—raising the reliability and security bar across Arcadia.