This position is closed and is no longer accepting applications.

Senior Site Reliability Engineer

Treasure Data Minato-ku, Tokyo
  • 💴 No salary range given
  • 🏡 Partially remote
  • 🧪 5+ years experience required
  • 💬 No Japanese required
  • 🗾 Japan residents only
  • 🧳 No relocation support

About Treasure Data

Treasure Data Minato-ku, Tokyo

Treasure Data is the only enterprise Customer Data Platform that harmonizes an organization’s data, insights, and engagement technology stacks to drive relevant, real-time customer experiences throughout the entire customer journey.

Key benefits

  • Highly Technical Founders
  • Globally distributed company
  • Open Source is in our DNA

About the position

As a Senior Site Reliability Engineer at Treasure Data, you will be at the forefront of shaping the technical direction of our Kubernetes platform. This critical role involves working collaboratively with software developers across the company and maintaining a close connection with our internal customers. Your expertise will be pivotal in ensuring the reliability, scalability, and innovation of our services.

Responsibilities

  • Help lead and define the technical roadmap for our Kubernetes platform to improve developer experience, reduce delivery time, and optimize cost.
  • Collaborate with cross-functional teams to integrate Kubernetes into our broader service offerings.
  • Partner with service teams to optimize performance, cost, and operability to address internal user needs.
  • Drive stability and performance improvements with a data-driven approach.
  • Troubleshoot and resolve complex technical issues in a multi-cluster Kubernetes environment.
  • Be a technical leader with the SRE group by assisting and mentoring other engineers.
  • Work with other technical and people leaders to define projects, track progress, identify risks, and highlight opportunities.
  • Explore new technologies and enhance existing services to support our users.
  • Diagnose complex networking, orchestration, and integration issues at the infrastructure and software level.
  • Design and implement new services to automate everyday tasks and unlock new capabilities.

Requirements

  • At least five years working experience in Software Engineering or Systems Engineering role with Distributed Systems at scale.
  • Knowledge of
    • Site Reliability Engineering and distributed systems
    • Architecting multi-cluster Kubernetes
    • Cloud computing providers like AWS, GCP, or Azure
    • Cloud networking concepts
    • One modern web development language (JavaScript/Node, Go, Python, Ruby, etc.)
  • Modern SaaS software development practices (CI/CD, GitOps, software testing, release workflow automation)
  • Excellent English communication skills, with an ability to articulate technical concepts to non-SREs.
  • Strong collaboration skills and experience in working with diverse and distributed teams.

Nice to haves

While not specifically required, tell us if you have any of the following.

  • A history of involvement in the open-source community.
  • Knowledge of high-scale data platforms (e.g. Hadoop) or relational databases (e.g. PostgreSQL)
  • Experience speaking and/or writing in Japanese.
  • Understanding of agile development practices (e.g. Scrum, Kanban)

Meet Treasure Data's Developers

Tyler is a software engineer at Treasure Data working on their Data Clean Room product. He talks about how Treasure Data supports their team’s learning and growth, and how they invest in the quality and performance of their services.

Read his story...

Related jobs

More jobs like this

I'll send you a digest of new English-friendly software developer jobs in Japan. Your email stays private, I don’t share or sell it.