TableCheck, Japan’s leading restaurant reservation management platform, is seeking a Site Reliability Engineer. As a member of our SRE team you will own the technology stack and help support our scaling demands.
We run a robust and fault-tolerant infrastructure built on Amazon Web Services (AWS) with Terraform, Kubernetes, Helm, and an array of tools for CI/CD, logging, monitoring, and so on. We emphasize DevOps best practices such as agile, scrum, automation, and customer-centric improvements.
TableCheck has embraced remote work. As such, communication and documentation are in our blood. We look for and write about signals in the noise which enables us to constantly learn and adapt, and we expect members of our teams to constantly follow up with questions and updates to keep everyone in the loop.
Responsibilities include:
- Build and maintain a 24/7 production environment running on Kubernetes
- Implementation of DevOps methodologies to improve IT team quality of life
- Proactive system monitoring and configuration
- Incident response
- Mentoring and empowering team members
Mandatory Skills
- Progressive experience including both software engineering and infrastructure / devops with at least 1 year as a technical lead
- Current ability in at least one of the following languages: Ruby, Elixir, Scala, Go
- Amazon Web Services (AWS)
- Kubernetes
- Configuration management (YAML / Bash)
- Experience running production systems at large scale, and an understanding of the kinds of problems that can occur along with likely solutions
Optional Skills
- Previous startup experience highly desired, with at least 1 year in a technical lead role
- Security, PCI-DSS, GDPR, forensics, etc
- Hashicorp stack (Terraform, Consul, Vault)
- ArgoCD
- Prometheus
- Grafana
- PostgreSQL
- MongoDB
- Kafka
Language Skills
English is required. Japanese is nice to have, but optional.