Job Description
Responsibilities
- Be part of the concept design discussions, gather node/rack system level requirements, clarify interfaces, provide feedback into future design requirements to help develop robust and high-performance cloud hardware solutions.
- Lead in cross-boundary issue triage, debug, and resolution.
- Gather all the system engineering requirements for the infrastructure components by working with engineering groups and other stakeholders.
- Architect cloud test scenarios and guide software engineers to automate test flows.
- Work with ODMs, other system engineers, internal design teams and customers to develop validation execution plans for new technologies and MSFT IP features.
- Developing quality criteria for different phases of programs
- Work with ODMs, Silicon vendors, component suppliers and internal design teams on cross-boundary triaging, debugging, and resolving issues.
- Review system engineering and validation coverage throughout the lifecycle of the program and publish plan vs. progress reports.
- Develop node/rack system level test cases, test requirements against system level requirements and verify them for functionality at node/rack level.
- Collaborate with internal and external partners to ensure systems meet significant quality, reliability, and service level requirements for a cloud environment. Architect cloud test scenarios and guide software engineers to automate test flows.
Qualifications
Required Qualifications:
- BS/MS in Electrical/Computer Engineering or related degree
- 8+ years of relevant experience in server systems/platforms design and development for enterprise or cloud market segments.
- Hands-on experience in server hardware architecture, design, and development with solid understanding of hardware, firmware, and OS interfaces.
- 5+ years of experience in leading technology development in HW/FW area.
- Strong technical communication skills (verbal and written) to interface with cross-functional technical leads within and/or outside of the organization.
- Advanced troubleshooting and debugging skills.
- Familiar with networking, power, rack device management and remote access environments
- Experience in performance benchmarking tools such as SPEC workloads, Linpack
- Experience in GPUs, and various networking standards including InfiniBand
- Experience in windows and Linux operating systems, test automation and Hyperscale testing covering hundreds of Systems under Test.
- Good understanding and experience in device drivers and debugging issues related to interactions with HW subsystem
Preferred Qualifications
- Experience in evaluating off the shelf OEM hardware designs, HW/FW/OS interactions, platform config trade-offs, performance tuning and optimizations is required.
- Understanding how standard server interfaces, such as PCIe, SATA, and memory, work with their respective software stacks.
- Functional knowledge of secure boot, attestation, FW update & recovery on server platform architectures.
- Experience in platform HW and FW security capabilities (RoT) and implementations.
- Familiarity with NIST 800 standards as it pertains to FW support for secure update and secure recovery.
- Experience in Server platform HW designs or Server platform validation with knowledge of system level firmware, will be an added advantage.
- Experience in platform level test architecture and usage of debug tools like (ITP, Arium, ARM JTAG tools or equivalent).
- Volume hardware test and debug expertise
- Experienced in debugging complex system level issues and ability to root-cause/identifying potential fixes down to a board hardware, signal integrity, CPLD, thermal and Firmware components, OS is required.