Member of Technical Staff - Post Training

AI& Yokohama, Kanagawa May 17 2026
  • 💴 No salary range given
  • 🏡
    Partially remote
  • 🌏
    Apply from abroad
    Relocate to Japan
  • 💬
    No Japanese required
    Business English
  • 🧪
    Intermediate level
    Unspecified years of experience
DO YOU NEED MORE INFO?
ASK A QUESTION

About AI&

AI& Yokohama, Kanagawa

A vertically integrated AI platform from Japan for the global market. We recently officially launched with $50M in seed funding and more than $2B in committed infrastructure capital.

About the position

This is both a research and an engineering role. You will own post-training end to end for ai&’s internal custom models and for enterprise customers who need models adapted to their domains. That means you are running experiments, building the pipelines those experiments depend on, and delivering results that ship.

You will work directly with enterprise customers to translate their requirements into post-training workflows, own the delivery, and feed every applied learning back into ai&’s core stack. At the same time you are an active contributor to our RL practice — designing training environments, integrating techniques from the research frontier, and building the continual learning infrastructure that keeps our models improving over time.

We want someone who thinks end to end across data, training, alignment, and evaluation as a single system, and who is pragmatic enough to optimize for model quality and real outcomes above all else.

Responsibilities

  • Reinforcement Learning — Research and Execution Profile, optimize, and scale RL training runs to reduce iteration time. Integrate new optimization techniques as they emerge from the research community. Design and implement training environments that test the boundaries of model capability and turn proof-of-concept ideas into robust, production-ready pipelines.
  • Post-Training Pipeline Engineering Build and maintain the full post-training infrastructure including SFT, preference alignment, reward model pipelines, experiment tracking, and evaluation infrastructure. Own this stack for both internal model development and enterprise engagements.
  • Enterprise Post-Training Ownership Act as the technical owner for enterprise customer post-training engagements. Translate customer requirements into concrete post-training specifications, run the workflows, design task-specific evaluations, and feed learnings back into core pipelines.
  • Data Generation & Quality Design and build synthetic data pipelines that support post-training and RL at scale. Own generation, filtering, and quality assessment workflows. Strong intuition for data quality is non-negotiable.
  • Continual Learning Develop the methodologies and infrastructure that allow ai& models to keep improving over time without catastrophic forgetting. Design training regimes and evaluation protocols that support ongoing model development as new data and feedback accumulates.
  • Evaluation & Benchmarking Design task-specific evaluations that go beyond standard benchmarks. Interpret results honestly, catch regressions before they reach production, and use findings to drive concrete improvements.
  • Research Contribution Stay at the cutting edge of post-training and RL research. Contribute to ai&’s research output and share findings with the broader community.

Requirements

  • Reinforcement Learning in Practice You have actually run RL on language models. You have implemented reward models, dealt with reward hacking, tuned KL penalties, and shipped models that are meaningfully better as a result. You understand the theory and you have applied it.
  • Post-Training Engineering Depth Hands-on experience with data generation and evaluation for LLM post-training. You have run SFT, preference alignment, and RL workflows on real models and you know where these pipelines break.
  • Framework Proficiency Strong Python and PyTorch proficiency with hands-on experience optimizing training pipelines. Experience with DeepSpeed, FSDP, vLLM, or similar frameworks for efficient model training and inference.
  • Data Quality Instinct Strong intuition for what good training data looks like. Experience designing and executing data generation, filtering, curation, and quality assessment processes at scale.
  • End-to-End Thinking You reason across data generation, training, alignment, and evaluation as a single system. You do not optimize one stage in isolation from the others.
  • Customer and Communication Fluency Comfortable working directly with enterprise customers. You can translate between customer needs and internal technical teams, push back when needed, and be trusted as the technical owner of a delivery.
  • Continual Learning Familiarity Familiar with the challenges of continual and lifelong learning in neural networks. You have thought seriously about catastrophic forgetting and how to build models that stay current without degrading.
  • Great Team Spirit A mission-driven approach to engineering, valuing clear communication, hands-on execution, and collective success over individual silos.
DO YOU NEED MORE INFO?
ASK A QUESTION

Related jobs

More jobs like this

We'll send you a digest of new English-friendly software developer jobs in Japan. Your email stays private, we don't share or sell it.