Cloud Infrastructure Architecture (GCP):
Design and evolve GCP-based infrastructure architecture for scalability, resilience, and security.
Define standards for:
Project and environment structure
Multi-region deployments
High availability and failover strategies
Lead architectural reviews for high-impact infrastructure changes.
Ensure infrastructure supports high-scale, multi-tenant SaaS workloads.
Kubernetes Platform (GKE):
Architect and optimize Kubernetes (GKE) platforms for production workloads.
Define and enforce:
Cluster architecture and node pool strategies
Workload isolation and scheduling policies
Upgrade and lifecycle management strategies
Improve reliability, scalability, and operational efficiency of Kubernetes environments.
Networking & Edge (Cloudflare):
Design and manage secure and scalable cloud networking:
VPCs, subnets, routing, and firewalls
Load balancing and traffic routing
Own integration with Cloudflare, including:
CDN configuration
WAF rules and DDoS protection
Edge security and traffic management
Ensure low-latency, resilient, and secure traffic flows.
Identity & Access Management (IAM):
Design and enforce least-privilege IAM architecture across GCP and platform systems.
Define standards for:
Service accounts and roles
Access control policies
Just-in-time access and auditing
Partner with Cyber Security to continuously improve access posture and reduce risk.
Cloud Security & Platform Hardening:
Build and enforce secure-by-default infrastructure patterns.
Partner closely with Cyber Security teams to:
Identify and remediate vulnerabilities
Implement security controls and guardrails
Support threat modeling and risk assessments
Secure Kubernetes workloads, networking layers, and cloud services.
Infrastructure as Code & Automation:
Drive adoption and quality of Infrastructure as Code (IaC) using tools like Terraform.
Build reusable infrastructure modules and automation frameworks.
Ensure infrastructure changes are Auditable, Repeatable & Safe
Reduce manual operational work through automation.
Reliability, DR & Operational Readiness:
Design and improve disaster recovery (DR) and failover strategies.
Define and validate RTO / RPO objectives.
Partner with SRE teams to improve Incident response, System resilience & Operational readiness
Participate in postmortems and drive systemic improvements.
Performance & Cost Optimization:
Identify infrastructure inefficiencies and performance bottlenecks.
Partner with FinOps and Cloud teams to:
Optimize resource utilization
Improve cost visibility and predictability
Balance performance, reliability, and cost in architectural decisions.
Technical Leadership & Mentorship:
Act as a technical leader across Cloud Infrastructure and Security domains.
Mentor SDE2, SDE3, and Lead engineers.
Drive design reviews, architecture discussions, and best practices.
Influence teams across the organization without direct authority.
Cross-Functional Collaboration:
Work closely with:
Platform Engineering (CI/CD, DevEx)
SRE & InfraOps (operations and reliability)
Cyber Security teams (security and compliance)
Communicate complex technical concepts clearly to stakeholders and leadership.
Bachelor’s degree or equivalent experience in Engineering or related field
9+ years of experience in cloud infrastructure, platform engineering, or security
Deep hands-on experience with:
GCP (preferred) or other cloud platforms
Kubernetes (GKE) in production environments
Cloud networking and distributed systems
Strong experience with:
Cloudflare (CDN, WAF, edge security)
IAM and access control systems
Proven experience designing secure, highly available systems at scale
Strong problem-solving and system design skills
Excellent communication and leadership abilities