Job Description
Developer will play an integral role in the PTEIT Machine Learning Data Engineering team. Design, develop and support data pipelines in a hybrid cloud environment to enable advanced analytics. Design, develop and support CI/CD of data pipelines and services.
- 5+ years of experience with Python or equivalent programming using OOPS, Data Structures and Algorithms
- Develop new services in AWS using server-less and container-based services.
- 3+ years of hands-on experience with AWS Suite of services (EC2, IAM, S3, CDK, Glue, Athena, Lambda, RedShift, Snowflake, RDS)
- 3+ years of expertise in scheduling data flows using Apache Airflow
- 3+ years of strong data modelling (Functional, Logical and Physical) and data architecture experience in Data Lake and/or Data Warehouse
- 3+ years of experience with SQL databases
- 3+ years of experience with CI/CD and DevOps using Jenkins
- 3+ years of experience with Event driven architecture specially on Change Data Capture
- 3+ years of Experience in Apache Spark, SQL, Redshift (or) Big Query (or) Snowflake, Databricks
- Deep understanding building the efficient data pipelines with data observability, data quality, schema drift, alerting and monitoring.
- Good understanding of the Data Catalogs, Data Governance, Compliance, Security, Data sharing
- Experience in building the reusable services across the data processing systems.
- Should have the ability to work and contribute beyond defined responsibilities
Minimum Qualifications
OR
5+ years of IT-related work experience without a Bachelor’s degree.
- 2+ years of any combination of academic or work experience with programming (e.g., Java, Python).
- 1+ year of any combination of academic or work experience with SQL or NoSQL Databases.
5 years of Industry experience and minimum 3 years experience in Data Engineering development with highly reputed organizations
- Proficiency in Python and AWS
- Excellent problem-solving skills
- Deep understanding of data structures and algorithms
- Proven experience in building cloud native software preferably with AWS suit of services
Desirable
- Exposure or experience in other cloud platforms (Azure and GCP)
- Experience working on internals of large-scale distributed systems and databases such as Hadoop, Spark
- Working experience on Data Lakehouse platforms (One House, Databricks Lakehouse)