Lead AI Infrastructure Engineer (Python/ML )
Type: Long-term contract
Location: Remote (overlap with PST)
At Sphere, we partner with global logistics company leveraging AI, Machine Learning, and Data Engineering to optimize warehouse operations, predictive maintenance, and route planning.
Role: Build and maintain scalable AI infrastructure, enabling teams to run ML experiments, deploy machine learning models, and implement MLOps pipelines for production-grade AI.
Responsibilities:
- Design distributed training pipelines for large-scale machine learning and deep learning models.
- Optimize compute and storage resources for cloud-based AI/ML workloads on AWS, GCP, or Azure.
- Collaborate with data scientists and ML engineers to deploy models in production efficiently.
- Implement monitoring, logging, and alerting for model performance and AI workflows.
- Ensure scalable, maintainable, and reliable AI infrastructure to support real-time and batch ML applications.
Requirements:
-
5+ years in Python and ML infrastructure.
-
Experience in cloud AI platforms (AWS Sagemaker, GCP AI Platform, Azure ML).
-
Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD for ML.
-
Experience with distributed systems, data pipelines, and high-performance computing for AI.
-
Hands-on with deep learning frameworks like TensorFlow or PyTorch.
