Kentro

Service Reliability & Operations Manager (VA ESOM)

  • Kentro

Job Description

Thank you for considering IT Concepts dba Kentro, where innovation drives opportunity and collaboration leads to success. Our dynamic community of experts is fully committed to advancing our customers' missions, fostering professional growth, and making a positive impact on our communities.

Our transition to Kentro in 2025 reflects a rich legacy built upon the foundation of IT Concepts. Rather than leaving ITC behind, we confidently embrace a future centered around the Core of More. By joining our supportive community, you will find that Kentro is dedicated to your personal and professional development. Together, we can drive meaningful change, spark innovation, and achieve extraordinary milestones.

Kentro is seeking an experienced Service Reliability & Operations Manager to support our VA-ESOM- End Point Support and Operations Monitoring contract across the United States. The Service Reliability & Operations Manager is responsible for ensuring the stability, performance, and resilience of enterprise IT services. This leader oversees real‑time monitoring, major incident response, application performance management, and sustainment of critical integrations. The role drives operational excellence through proactive detection, rapid response, and continuous improvement, partnering closely with engineering, infrastructure, and service management teams.

Reporting to the Senior Operations Director, this manager plays a pivotal role in maintaining service health for a 1,000+ person organization and ensuring customers experience consistent, reliable, high‑quality services.

Key Responsibilities:

Service Reliability & Monitoring:

·       Lead teams responsible for Application Performance Monitoring (APM), observability, and “eyes on glass” 24/7 monitoring functions.

·       Ensure proactive detection of service degradation and performance anomalies.

·       Drive adoption of modern monitoring tools, dashboards, and alerting frameworks.

Major Incident Management:

  • Oversee the major incident process, ensuring rapid triage, escalation, communication, and resolution.
  • Serve as the escalation point for Critical/High incidents and coordinate cross‑functional response.
  • Conduct post‑incident reviews and ensure corrective actions are implemented.

Integration & Sustainment:

  • Manage sustainment of critical integrations, ensuring reliability, version alignment, and lifecycle management.
  • Partner with engineering teams to ensure smooth handoffs from project delivery to steady‑state operations.
  • Maintain documentation, runbooks, and operational readiness standards.

Operational Excellence:

  • Track and improve KPIs such as MTTR, service availability, alert fidelity, and incident volume trends.
  • Identify systemic issues and drive continuous improvement initiatives across operations.
  • Ensure alignment with ITIL processes, especially incident, problem, and change management.

Leadership & Team Development

  • Lead, mentor, and develop a team of analysts, engineers, and incident managers.
  • Foster a culture of accountability, collaboration, and operational discipline.
  • Build succession plans, training programs, and career pathways for operational staff.

Cross‑Functional Collaboration

  • Partner with other ESOM teams to ensure end‑to‑end service reliability.
  • Work closely with the PMO on readiness for new services, innovation pilots, and portfolio changes.
  • Provide clear, concise communication to leadership during incidents and operational reviews.

Location: Telework approved, Able to travel as needed to regional locations.

Salary Range: $160-175K. Factors influencing pay within this range include geography, market demand, skills, education, experience, and other qualifications of the successful candidate.

Requirements

Education:

  • Bachelor's degree in computer science, electronics engineering, or other engineering or technical discipline

Experience:

  • 10+ years in IT operations, service reliability, or incident management, including 5+ years managing managers and large teams.
  • Experience overseeing large teams while supporting a Federal client.
  • Proven experience leading multi-site IT operations and large-scale teams (400+ employees).
  • Strong background in ITIL practices, incident management, and customer support operations.
  • History of collaboration and flexibility, including innovative solutions to solve challenges facing geographically distributed teams.

Skills:

  • Exceptional leadership, coaching, and interpersonal communication skills.
  • Strong analytical and problem-solving skills with a data-driven mindset.
  • Ability to build and maintain strong client relationships and manage escalations effectively.
  • Experience with APM, observability platforms, enterprise monitoring tools, and KPI reporting.
  • Ability to prioritize work and self-direct with minimal input.
  • Strong messaging capabilities to create team cohesion, team-focus and ongoing drive.

Preferred Skills:

  • ITIL Certification
  • Experience with end-user technologies and concepts

Key Competencies

  • Strategic thinking with a focus on operational excellence.
  • Ability to influence and inspire large teams.
  • Results-oriented with a track record of delivering high customer satisfaction.
  • Adaptability and resilience in a fast-paced, multi-client environment.

Clearance requirement:

  • US Citizen or Green card holder
  • Willing and able to get a Public Trust Suitability clearance
    • Must meet updated ID requirements: 
    • If you do not currently meet the ID requirements outlined, you must be willing and able to update your current forms of ID in a timely manner to complete the suitability process successfully.