As a Senior Software Engineer on the Worker Team, you will be an owner of the highly available, distributed platform that executes millions of jobs daily for our customers.
Responsibilities
- Designing, building, scaling, and maintaining the core services that power the Worker Platform, prioritizing reliability, high availability, resilience, and performance, while optimizing for cloud costs.
- Leading the modernization and simplification of complex legacy systems to improve maintainability and developer velocity.
- Leading and participating in system design discussions to help the team make the right tradeoffs for a large-scale, multi-tenant, distributed system.
- Collaborating with Tech Leads, Product Managers, and engineers from other teams (like Plazma, Integrations, and SRE) to break down complex projects into deliverable milestones.
- Mentoring and coaching other engineers through pairing, constructive code reviews, and technical discussions.
- Proactively identifying and solving platform challenges, contributing to the team’s operational excellence and long-term roadmap.
- Improving our engineering standards, tooling, and CI/CD processes, ensuring we can deliver value safely and quickly.
Requirements
- A minimum of 5 years of professional experience building and operating large-scale, distributed systems in production.
- Strong software engineering fundamentals and proficiency in JVM-based languages (our primary language is Kotlin).
- Practical experience with concurrent programming, including a solid grasp of JVM-specific synchronization, thread-safety, and resource locking.
- Experience with cloud infrastructure (AWS preferred) and container orchestration patterns (e.g., Kubernetes, ECS), specifically regarding resource management and autoscaling.
- Strong background in Observability (e.g., Datadog, CloudWatch) to diagnose bottlenecks and drive data-driven decisions.
- Excellent communication and collaboration skills, with the ability to work effectively across time zones and language barriers.
- A proven ability to work both independently and collaboratively as part of a high-performing team.
Nice to haves
While not specifically required, tell us if you have any of the following.
- Experience with non-blocking I/O and modern JVM concurrency models, such as Kotlin Coroutines or Java Virtual Threads (Project Loom).
- Experience working in highly distributed teams, across large time zone differences.
- A deep understanding of the common failure modes in complex, distributed systems and experience conducting Root Cause Analysis (RCA).
- A “FinOps” mindset: a proven track record of reducing infrastructure costs by optimizing system throughput and resource utilization via efficient concurrency models.
- Are a student of complex systems theory and how to build resilient and adaptive systems.
- An interest in or experience with applying GenAI/LLMs to improve developer productivity.
- Have read and enjoyed books like “Designing Data-Intensive Applications”, “The Staff Engineer’s Path”, “Nonviolent Communication”, “High Output Management”, or “Systems Performance: Enterprise and the Cloud”.