Lead AI Infrastructure Engineer (Python/ML )

USA, Remote

Type: Long-term contract
Location: Remote (overlap with PST)

At Sphere, we partner with global logistics company leveraging AI, Machine Learning, and Data Engineering to optimize warehouse operations, predictive maintenance, and route planning.

Role: Build and maintain scalable AI infrastructure, enabling teams to run ML experiments, deploy machine learning models, and implement MLOps pipelines for production-grade AI.

Responsibilities:

Design distributed training pipelines for large-scale machine learning and deep learning models.
Optimize compute and storage resources for cloud-based AI/ML workloads on AWS, GCP, or Azure.
Collaborate with data scientists and ML engineers to deploy models in production efficiently.
Implement monitoring, logging, and alerting for model performance and AI workflows.
Ensure scalable, maintainable, and reliable AI infrastructure to support real-time and batch ML applications.

Requirements:

5+ years in Python and ML infrastructure.
Experience in cloud AI platforms (AWS Sagemaker, GCP AI Platform, Azure ML).
Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD for ML.
Experience with distributed systems, data pipelines, and high-performance computing for AI.
Hands-on with deep learning frameworks like TensorFlow or PyTorch.