As a data engineer, you will work on the design, development, and operation of the next-generation data infrastructures and data pipelines needed to utilize data in ML and BI.
Details
- Development and operation of streaming data pipelines that use Google Cloud Platform’s (GCP) Cloud Pub/Sub, Dataflow, GCS, BigQuery, Stackdriver etc.
- Development and operation of batch data pipelines that use GCP’s Cloud Dataflow, GCS, BigQuery, Cloud Composer, etc
Requirements
- At least one of the following:
- Experience developing high-volume stream data processing systems using distributed processing frameworks (Dataflow/Spark/Storm/Kafka/Flink, etc.)
- Experience developing high-throughput batch systems using workflow engines (Airflow, Luigi, Digdag, Azkaban, etc)
- A degree in computer science or a related field and over 5 years of experience with software development
- Experience developing software using Scala or Java, and Python