We are looking for an experienced data engineer with good knowledge of Python and data analytic tools (Spark, Hadoop etc.) who can help us with scaling some of our growing products. You will be responsible for efficiently monitoring, sourcing, transforming and loading/preparing various types of data for our training and inference pipelines at scale. Experience with projects using data pipelines (deployment, resilience, scalability, performance monitoring and improvements) in production is required. You will design and develop efficient and scalable data pipelines to support multiple ML modules in our core products which serve multiple clients concurrently.
Requirements
- Experience with data pipelines and big data analytics (Hadoop, Spark, Hive…) servicing production systems
- Skilled with the main DB types and technologies (PostgreSQL, MongoDB, Cassandra…)
- Skilled Python coder, familiarity with testing and debugging Python code & code quality assurance tools (CI, linting…)
- Knowledge of interservice data exchange technologies (REST, queuing, RPC)
- Hands on experience with designing complex system interaction
- Able to work in a team, knowledge of collaborative tools (git, BitBucket…)
Nice to haves
While not required, tell us if you have any of the following.
- Data orchestration tools (Airflow, Dagster)
- Exposure to machine learning, deployment of trained models into production etc.
- Practical experience in MLOps
- Node.js and Javascript knowledge
- Ability of communication in Japanese on a professional level
Compensation
7 to 10 million JPY annually.