At Starley, we develop and operate “Cotomo,” one of Japan’s largest voice- based AI conversation applications. Under the concept “Whether you want to talk or remain silent,” we are continuously exploring innovative ways to enhance human-AI interaction. This position requires not only backend expertise but also a strong commitment to overall product improvement including UX/UI enhancements and feature planning.
TechStack
Python, Rust, TypeScript, WebSocket, WebRTC, ElasticSearch, PostgreSQL, GCP, Azure, AWS, Unity, Weights & Biases, NVIDIA Triton, vllm, pytorch, transformers, deepspeed, Dataform, BigQuery, Sentry, Slack, Github
Responsibilities
- Design and implement efficient, highly available infrastructure to support high traffic
- Build backend systems that integrate various AI models, including speech recognition, natural language processing, and speech synthesis
- Develop streaming systems to deliver high-quality, real-time voice communication
- Construct and optimize scalable frameworks for large-scale data processing
- Collaborate with product managers and designers to participate in product improvement and feature planning
Requirements
- At least 3 years of experience in designing, implementing, and maintaining backend systems
- Proficiency with relational databases (e.g., PostgreSQL, MySQL) and NoSQL databases
- Basic understanding of real-time communication technologies (e.g.,WebRTC, WebSocket)
- Experience working with cloud platforms (e.g., AWS, GCP, Azure)
- Experience in building or improving CI/CD pipelines (personal project experience is acceptable)
- Practical experience in integrating new tools and technologies (e.g., RAG, Cursor, Devin) or equivalent hands-on experience
- Fluency in Japanese for daily communication
Nice to haves
While not specifically required, tell us if you have any of the following.
- Experience working in an early-stage startup environment
- Technical communication skills in English
- Exposure to machine learning model operations
- Experience with home server setup and management
- Basic familiarity with deep learning models (e.g., LLMs) and fine-tuning techniques
- Familiarity with speech recognition or natural language processing is a plus
Compensation
Starting from 7.5 million JPY annually.
With performance-based stock options.