Job Description
Responsibilities
- Lead E2E @scale system level debug activities in the cluster to meet fleet KPIs
- Collaborate with hardware, firmware and software teams to enable comprehensive root cause analysis
- Accountable to meet debug/ triage SLA
- Provide priority to issues based on technical and business understanding of both complexity & impact
- Identify, drive execution and E2E verification plans to proactively address open issues
- Develop best in class debug methodologies, test strategies and platform validation plan for the at-scale clusters
- Solve problems relating to mission critical services and build automation to drive debug efficiency
- Collaborate with internal and external partners to ensure systems meet significant quality, reliability, and service level requirements for a cloud environment
- Effectively communicate with partners and stakeholders for planning and progress on initiatives using data
Qualifications
Required Qualifications:
- 7+ years of technical leadership experience as a platform architect or validation architect or a lead debug engineer or equivalent industry experience
- Deep understanding of modern server architectures – Memory or CPU or storage or system level firmware
- Capable of technical deep dives into platform design, verification flows, operating systems, networking, and storage
Preferred Qualifications
- Platform Debug and validation experience
- Data analytical skills- Knowing how to use data based and analytical tools
- Excellent communication skills using various forms of media
- Able to plan work, and work to a plan adapting as necessary in a rapidly evolving Environment
- Individual effectiveness skills such as discipline, time management, decision making, planning, and organizing work, summarizing results through technical reports
- Self-driven, self-motivated individual must be able work independently as well as collaboratively in a team environment and across the team of engineers