Responsibilities
- Lead our global team of engineers to achieve operational excellence. Discover modern best practices and make them available through productivity-increasing tooling
- Empower teams to provide reliable systems. Facilitate the operation of service level objectives (SLO), on-call rotations and blameless post-mortems
- Drive the construction of a multi-regional cloud-native infrastructure that delivers performance to our global user base
- Optimize the processes for trimming down infrastructure costs and tuning our fleet of instances for maximum efficiency
Requirements
- Administration experience in container orchestration platforms, preferably using Kubernetes and demonstrated by the Certified Kubernetes Administrator (CKA) certificate
- Experience in observability (monitoring, logging, and tracing) for cloud-based environments, using CNCF tools such as Grafana, Prometheus, Thanos, Jaeger, or SaaS tools such as Datadog
- Expertise in maintaining self-service CI/CD platforms and supporting techniques like Trunk-Based development, GitOps-based deployments, or automated canary releases
- Production experience in at least one programming language (e.g. Go, Java, Kotlin, Python, Ruby, or JavaScript)
- Business level communication in English
Nice to haves
- Strong knowledge of fundamental AWS services with a certification as an AWS Associate or above
- Proficiency in operating public cloud services using infrastructure-as-code tools like Terraform or CloudFormation
- Passion to stay up-to-date with latest industry topics in Site Reliability Engineering (SRE), the Cloud Native Computing Foundation (CNCF), DevOps and the AWS Well-Architected Framework
- Knowledge of common application architecture patterns: distributed systems, microservices, asynchronous processing, event-driven systems, and others