We are looking for a full time SRE / devops engineer focused on operations to help us run and scale up our growing infrastructure on AWS and GCP.
The majority of our infrastructure is hosted on AWS using EC2 and Docker. We also use hosted services such as RDS, Lambda, SQS etc. For new development, we also use GCP and GKE. We manage our infrastructure using Terraform.
- Become the lead SRE / Devops engineer for Netsmile
- Make sure the system runs smoothly and is performant for our customers with minimal downtime
- Some evening / early morning / weekend work will be required for the position in case of system issues. Time worked outside core hours will be compensated with time off inside normal working hours
- Weekly Release and Maintenance for the system. Risk assessment and analysis for upcoming releases
- Take the lead when system incidents happen, and recover quickly. Communicate clearly when there are problems.
- Improve the current system to provide faster recovery times and improve redundancy
- Improve our build process
- Help us introduce auto-scaling and clustering
- Improve documentation and training for how to run the system to allow spreading on-call duties throughout the team.
- Flexible work ethic and work style. Willingness to jump in and help troubleshoot when there is a system issue.
- Hands-on experience with AWS, GCP and Terraform in production
- Experience with monitoring and alerting systems
- Experience with the build and release process so that we can keep the system up-to-date and roll back quickly when there is a failure
- Good knowledge of Linux, Ubuntu in particular
Nice to haves
These aren’t required, but be sure to mention them in your application if you have them.
- Experience setting up and using Kubernetes or another clustering system
- Experience with optimizing infrastructure costs
- Experience with running systems OnPremise with setup and management of PostgreSQL
- Experience with consul, nginx or similar systems
7 to 10 million JPY annually.