We are seeking an engineer who can build a scalable, high-performance cloud infrastructure that strengthens our products end-to-end—leveraging not only foundational components but also cloud-native services—while upholding the core DevOps principles of observability, deployment excellence, and security.
In this role, you will work hands-on with cutting-edge cloud technologies in the context of supporting specific applications. Rather than operating as part of a centralized platform team, you will be embedded within a product team, enabling you to directly influence feature delivery, performance, and user experience.
You will drive automation and infrastructure best practices, contributing directly to the operational efficiency and scalability of our products and services. Your work will help ensure that our systems remain resilient, maintainable, and ready to grow.
As cloud technologies evolve rapidly, this position offers continuous opportunities to learn, experiment, and apply new methods. We embrace a mindset of flexibility—welcoming better options as they emerge—so you’ll always have a chance to explore new tools, architectures, and approaches in the ever-changing cloud landscape.
Tech Stack
- Infra: AWS
- Infrastructure-as-Code:Terraform
- Hosting/Compute:ECS/Fargate, Lambda
- Observability:DataDog
- SCM:GitHub
- CI/CD:GitHub Actions / Workflows
- Other services: VPC, IAM, CloudFront, ALB, ECS, Lambda, ECR, RDS, Secrets Manager, EventBridge, SQS, SNS, KMS, GuardDuty, Inspector, S3, API Gateway, Cost Explorer, Route 53, ACM, ElastiCache, QuickSuite
Responsibilities
- Create and maintain Infrastructure as Code (IaC) for both product and foundational infrastructure—covering everything from core components like VPCs to application-oriented services such as EventBridge, SQS, and SNS for event-driven architectures
- Handle day-to-day operations of the product’s AWS accounts (including, but not limited to, backups, monitoring, and cost optimization)
- Collaborate with application developers to build CI/CD pipelines and other automation that improve deployment efficiency and quality
- Use observability tools such as Datadog to design alarms and incident-detection plans aligned with actual application usage
- Build observability dashboards that allow anyone to instantly understand the current health of both the application and its infrastructure
- Continuously monitor the performance of cloud infrastructure and various services, making adjustments as needed to optimize efficiency and scalability
Requirements
- Business-level Japanese proficiency
- 3+ years of professional experience as an engineer
- 1+ year of hands-on experience in a DevOps or SRE-equivalent role
- Practical experience designing, building, and operating AWS resources, including VPC/subnets/routing, compute services such as EC2, and IAM
- Experience operating container-based infrastructure (Docker and either EKS or ECS)
- Hands-on experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, or CDK) and conducting code reviews
- Experience designing and operating CI/CD pipelines (e.g., GitHub Actions, GitLab CI, CodePipeline, Argo CD, etc.)
- Practical experience with monitoring and observability fundamentals (logs, metrics, or traces; e.g., CloudWatch, Prometheus/Grafana)
- Foundational security practices (least-privilege design, secret management such as Secrets Manager/Parameter Store, and basic vulnerability handling)
- Experience participating in incident response (on-call rotation and/or involvement in incident management processes)
- Experience working in an agile environment (task definition, estimation, participation in sprint planning/retrospectives)
Nice to haves
While not specifically required, tell us if you have any of the following.
- Experience in application development
- Experience in architectural design, including making decisions related to availability, scalability, security, and cost
- Experience with performance tuning and capacity planning (bottleneck analysis, designing/executing load tests)
- Experience designing SLI/SLOs and operating error budgets (goal setting, integration with observability, influencing decision-making)
- Knowledge of machine learning or LLMs
- Broad technical knowledge (even at a high level) across areas such as DNS, database operations, API management, messaging, caching, etc.
Compensation
¥7,080,000 ~ ¥10,080,000 annually.