As a Senior Site Reliability Engineer (SRE), you will be instrumental in helping the organization reach its full potential by ensuring our products scale efficiently, securely, and reliably. In this role, you will work collaboratively with engineers across the company - being an enabler and removing friction to support new product growth and innovation in our products. Your expertise will be pivotal in ensuring we continue to innovate at speed.
Responsibilities
- Platform Modernization: Help lead, define, and execute on the technical roadmap for our cloud platform (AWS, EC2, Kubernetes, CI/CD (CircleCI / ArgoCD), Observability (Datadog), IaC (Terraform / OpenTofu), Tooling (GitHub Actions / Helm).
- Service Collaboration: Work closely with internal service teams to accelerate their efforts, automate aspects of their work, drive adoption of tooling and platform releases, proactively contribute changes across codebases and work closely with colleagues to achieve reliability goals across our products.
- Infrastructure Optimization: Implement improvements to enhance efficiency and save costs across various layers of our AWS infrastructure, including networking, storage, compute, and orchestration.
- Security: Collaborate with our security and IT teams to ensure that vulnerabilities are promptly identified and addressed, access is automated and seamless, and we maintain our high security standards.
- AI: Explore effective uses of AI in SRE and Engineering, and build tooling and workflows around these to scale to organizational adoption. This may include Claude Code and AWS DevOps Agent.
- Guidance and Best Practices: Offer expert advice to new projects, ensuring they are architected for maximum reliability and efficiency from the ground up.
- Grow your Team: Your expertise will also help your teammates learn and grow, you will share knowledge widely, and will mentor more junior engineers to help them on their own career progression.
Requirements
- 5+ years of experience in a SRE / DevOps / Software-Engineering or related role, working with distributed systems at scale.
- Experience with AWS and related cloud technologies (eg EKS, EC2, IAC).
- Proficient with at least one programming language (eg Python, Go, etc).
- Experience with modern SaaS development practices (eg Git, CI/CD, etc).
- Strong English communication skills working in a diverse, collaborative, and distributed environment.