Collaborate with data scientists and engineers to design, build, and deploy machine learning models at scale.
Develop and maintain MLOps/AIOPs pipelines to automate the end-to-end lifecycle of machine learning models (from development to deployment, monitoring, and retraining).
Work on the integration of models into production systems while ensuring scalability, security, and performance.
Model Operationalization:
Implement CI/CD pipelines for ML models, ensuring smooth deployments with minimal downtime.
Design and deploy robust monitoring and alerting systems for ML models in production to detect issues such as model drift or data skew.
Implement model governance, version control, and logging systems to ensure compliance with internal standards and external regulations.
Optimization & Scalability:
Optimize machine learning models and pipelines for performance and cost efficiency (compute, storage).
Manage infrastructure for ML workloads using cloud-native tools (Azure, Kubernetes, Docker) or other container orchestration platforms.
Collaboration & Communication:
Partner with cross-functional teams, including Data Engineering, Product Management, and other Engineering teams to build cohesive solutions.
Provide technical guidance to junior engineers and drive best practices for MLOps/AIOPS within the team.
Security & Compliance:
Work on securing models, data pipelines, and infrastructure in compliance with Microsoft's security standards.
Ensure that the entire ML lifecycle adheres to privacy and compliance requirements (e.g., GDPR, CCPA).
Qualifications
Required Skills:
4+ years of experience in machine learning, MLOps/AIOPs, or software engineering roles.
Proven track record of deploying large-scale machine learning systems in production.
Strong experience with cloud platforms (Azure preferred) and infrastructure as code (e.g., Terraform, ARM templates).
Advanced knowledge of MLOps/AIOPs practices, including pipeline automation, monitoring, and orchestration.
Experience optimizing ML models for performance and scalability in production environments.
Demonstrated ability to lead initiatives, mentor junior team members, and influence cross-functional teams.
Solid understanding of security and compliance frameworks relevant to ML operations.
Hands-on experience in building and deploying ML models in a cloud environment (preferably Azure).
Proficiency in Python and experience with ML frameworks (e.g., TensorFlow, PyTorch).
Experience with containerization (Docker, Kubernetes) and microservices architecture.
Strong knowledge of CI/CD tools and workflows (Azure DevOps, GitHub Actions).
Basic understanding of model monitoring, retraining, and model governance practices.
Desired Skills:
Experience with Azure Machine Learning, Azure Fabric, Synapse, or similar platforms.
Strong understanding of data versioning, governance, and reproducibility in ML workflows.
Knowledge of responsible AI practices, including fairness, transparency, and bias mitigation.
Strong communication skills and the ability to work in a fast-paced, collaborative environment.