We are defining the next generation of trusted enterprise computing in the cloud. We're a fast-paced, agile and innovative team. We're highly collaborative and work across all areas of our technology stack. We enable critical services for the business, qualify complex compute changes, enable big data analytics and trail-blaze new engineering solutions for the cloud.
Responsibilities
Design, architect and implement large scale big data infrastructure by working with multiple teams.
Lead the team by example in using engineering best practices, deliver high quality features, help refine operational procedures and workflows, and influence architectural decisions.
Demonstrate effective service ownership by monitoring their own service and taking proactive actions when incidents happen.
Deliver cloud infrastructure automation tools, frameworks, workflows, and validation platforms on our public cloud infrastructure such as AWS, GCP, Azure, or Alibaba.
Designing, developing, debugging, and operating resilient distributed systems that run across thousands of compute nodes in multiple data centers.
Using and contributing to open source technology (Spark, Trino, Airflow, Superset, Kubernetes Spark Operator, etc).
Resolving complex technical issues and drive innovations that improve system availability, observability, resilience, and performance.
Eat, sleep, and breathe services. You have experience balancing live-site management, feature delivery, and retirement of technical debt.
Participate in the team’s on-call rotation to address complex problems in real-time and keep services operational and highly available.
Work with engineering and product leadership to evolve the feature roadmap by participating and contributing to release planning.
Embrace accountability for various technical deliveries through all phases of the development life cycle—from code to test to production.
Partner with the cross org teams in various areas (E.g. Data Lake Engines, Security, Cloud Infra) and build integrations with different systems like data stores, analytics platforms, saas applications, etc.
Lead the team with active participation in all parts of the Agile process, including planning, execution, and retrospectives.
Be the leading technical representative for the team when dealing with plans and problems that involve cross teams.
Automate & manage the deployments of business critical services & infrastructure to maintain compliance with security, vulnerabilities, patching, etc.
Investigate, debug and provide support for big data services like Spark, Trino, Superset, Airflow, etc.
Engage and influence community of Open Source Software (OSS) projects like Spark, Trino and Airflow.
Present your work at tech talks, cloud demos and engineering blogs to show case the work and iterate using feedback.
Qualifications
10+ years of design, development and technical leadership experience working with large scale data platforms, designing solutions with modern data systems to support exponential data growth.
Excellent analytical and problem solving skills.
Excellent communication skills, working with customers, senior leadership and various levels of Engineering.
Strong Technical Influence and experience collaborating with architects, engineers, cross org teams..
Passionate and Self Driven in leading complex projects and bringing clarity to the team.
Expert in Kubernetes and experience with container technologies such as Docker/CRI-O.
Experience with Scala, GoLang, Python or Java in a Linux/UNIX data center environment.
Experience in managing infrastructure as code using Terraform, Helm, Spinnaker, etc.
Experience working with Public Cloud platforms like AWS, GCP or Azure.
Good understanding and experience in Security concepts such as OAuth, mTLS, OPA.
Experience and deep understanding of following AWS technologies : VPC, IAM Roles/polices/RBAC, Security, EC2, ELB, AutoScaling, S3, Cloud watch, DNS, etc.
Deep understanding and experience with big data processing engines such as Apache Spark, Apache Trino.
Experience in investigating issues with distributed systems (Spark, Trino, Airflow, Kubernetes, etc), eg : connecting the dots by scanning through logs from different systems, perform RCA, fix issues, provide workarounds, etc.
Experience working in agile based development teams.
Experience working in Service Ownership and Devops model teams.
Experience working in global team across different timezones (PST and IST).
Hands-on Salesforce.com knowledge of product and functionality a plus.
Experience contributing to Open Source Technologies is a plus.