As a Site Reliability Engineer at Sales Marker, your role will be critical in ensuring the reliability, performance, security, and efficiency of our Sales platform.
Responsibilities
- Managing, fixing and improving the Kubernetes cluster.
- Designing and managing AWS infrastructure (SQS, Lambda, RDS, OpenSearch, etc.) with a focus improving the developer experience.
- Collaborating with engineering teams to optimize system performance, automate tasks, and enforce security measures.
- Setting monitoring systems for different services and pipelines, and proactively improve them for better developer awareness.
- Implementing infrastructure as code (using Terraform and AWS CDK) and organizing the infrastructure resources.
- Support developers in developing and maintaining data pipelines.
- Handling incident response, including security incidents, with a focus on maintaining high system availability and rapid issue resolution.
Requirements
- Experience with running and managing a Kubernetes cluster.
- Experience in Site reliability or platform Engineering focusing on cloud infrastructure.
- Setting up metrics to improve the monitoring of the systems and further automate these processes.
- Proficiency in managing databases like RDS (MySQL), ElasticSearch etc.
- Knowledge of data pipeline management to transfer large amounts of data between different databases.
- Experience in setting up infrastructure as code (Terraform, AWS CDK) and organizing the infrastructure resources.
- Capability to handle incident responses, including security breaches.
- A proactive approach to identifying and mitigating security risks in infrastructure.
Nice to haves
While not specifically required, tell us if you have any of the following.
- History of handling major incidents in large tech companies.
- Background in fast-growing companies at stages comparable to Sales Marker.
- Leadership experience as a Tech Lead.
- Fluency in Japanese.
Compensation
4 to 12 million JPY annually.