We are seeking a talented NLP Data Engineer to join our team. As an NLP Data Engineer, you will be responsible for designing, developing, and maintaining our data infrastructure. You will play a vital role in creating crawlers to collect text data from the internet, filtering, processing, and overseeing the quality of our text data to support NLP initiatives.
- Design, develop, and maintain efficient and scalable data pipelines for collecting text data from various sources, including databases and the internet.
- Implement data cleaning and preprocessing techniques to enhance the quality and consistency of text data.
- Collaborate with NLP engineers and researchers, to understand their data requirements and ensure the availability and accessibility of high-quality text data.
- Monitor and optimize data processing workflows to ensure efficient and reliable data delivery.
- Identify and resolve data quality issues, implementing measures to maintain data accuracy and integrity.
- Stay up-to-date with the latest advancements in data engineering technologies, identifying opportunities to enhance our data infrastructure and workflows.
- Bachelor’s degree in a relevant field or a minimum of 10 years of work experience (for Visa purposes).
- Solid understanding of data processing and data pipeline architectures.
- Proficiency in Python, including expertise with relevant libraries and frameworks such as Moses, SentencePiece, and spaCy.
- Strong problem-solving and analytical skills, with attention to detail and data quality.
- Intermediate Japanese reading ability, as you’ll be working with Japanese data. You won’t need to write Japanese or talk in it to perform this position.
Nice to haves
While not required, tell us if you have any of the following.
- Experience with web scraping techniques and tools.
- Knowledge of distributed computing frameworks like Apache Spark.
- Knowledge of database systems and SQL.
- Familiarity with text data cleaning and preprocessing techniques.
- Experience with data governance and compliance in handling sensitive or personal text data.
4.2 to 5.04 million JPY annually.