
Senior MLOps / AI Infrastructure Engineer - iGaming - Remote
- Taipei City Luqa, Malta Island
- Permanent
- Full-time
We're a fast-growing Bay Area AI company building cutting-edge products powered by large-scale machine learning models. We're looking for a Senior MLOps / AI Infrastructure Engineer to lead the development of robust, scalable infrastructure that supports everything from model training to real-time inference.You will play a critical role in designing systems that power our AI research and production environments, working closely with ML researchers, software engineers, and product teams to ensure models move quickly from prototype to production - reliably and securely.Job Responsibilities
- Build and scale core ML infrastructure for distributed training, hyperparameter tuning, and experiment tracking.
- Design and maintain containerized model-serving infrastructure for LLMs and multimodal models with low-latency requirements.
- Develop CI/CD pipelines tailored to machine learning workflows using tools like MLflow, Airflow, or Kubeflow.
- Optimize compute usage and resource allocation on cloud platforms (GCP or AWS) and Kubernetes clusters.
- Implement observability and alerting systems for model performance, drift, and uptime in production.
- Collaborate with cross-functional teams to productionize novel research models.
- 4+ years of software engineering experience, with at least 2 years in ML infrastructure or DevOps for AI/ML.
- Proficient in Python and one systems language (Go, Rust, or similar).
- Strong expertise in Kubernetes, Docker, and cloud infrastructure (GCP preferred).
- Familiar with ML tooling such as PyTorch/TensorFlow, MLflow, Ray, or similar.
- Experience deploying ML models in production at scale, preferably in containerized environments.
- Strong understanding of distributed systems, resource orchestration, and observability.
- Proficiency in English; Mandarin is a plus.
- Experience serving large foundation models (LLMs, vision-language, etc.).
- Exposure to RAG architectures, vector search, or fine-tuning of open-source models.
- Familiarity with infrastructure-as-code tools (Terraform, Helm).
- Contributions to MLOps open-source tools or whitepapers.