You will primarily be responsible for building data collection pipelines and infrastructure. The role involves collecting, cleansing, and modeling unstructured data from the internet to make it usable for products. All of our data collection systems are built as frameworks, so rather than creating something from scratch each time, we’ve developed a system where data cleansing and modeling can be automatically executed simply by matching the inputs and outputs.
Recently, as the volume of data has increased, our data engineering team has also been involved in building data platforms and introducing distributed systems for analysis. If desired, there are opportunities to get involved in data analytics and data science as well. We’re particularly focused on utilizing LLMs and are conducting projects that explore how to process complex unstructured data using LLMs.
Tech Stack
- Front End
- Languages:TypeScript, React
- Libraries:Storybook, jest
- Hosting:Amplify
- Server Side
- Infrastructure:AWS, ElasticBeanstalk
- DB:Aurora, ElasticSearch
- Languages:Node.js, Python
- Framework:Express
- Observability:DataDog
- Other:AWS Lambda, AWS Batch, AWS API GateWay, AWS Glue
- Data Analysis
- OpenAI
- Amazon Bedrock
- OpenSearch
- SageMaker
- Athena
- Glue
Requirements
- Experience with web scraping
- Experience in data cleansing
- Experience in data modeling
- Experience with ETL processes
- Experience using databases (e.g., MySQL, MongoDB)
- Experience extracting data from unstructured sources
- Business-level English and reading/writing proficiency in Japanese
Nice to haves
While not specifically required, tell us if you have any of the following.
- Experience with Natural Language Processing (NLP) or Machine Learning
- Experience with stream processing
- Experience building and operating data platforms
- Development experience outside of work (e.g., OSS contributions)
- Native-level Japanese
- Experience using data visualization tools (e.g., Tableau, Power BI, D3.js)
- Experience using LLMs in products (e.g., OpenAI)
Compensation
¥6,000,000 ~ ¥12,000,000 annually.