Machine Learning Engineer - Agentic AI & AIOps
Platform9 is the leader in simplifying enterprise Private Clouds. Founded by a team of VMware cloud pioneers, we are dedicated to transforming IT operations. Our flagship product, Private Cloud Director, turns your existing hardware into a full-featured, future-ready private cloud. We innovate across what we build and how we deliver it, staying focused on a next-generation, open private cloud while holding ourselves to one standard: exceptional customer outcomes.
Enterprises are selecting Platform9 to replace legacy virtualization because it eliminates operational risk and complexity. Private Cloud Director is designed for the experienced infrastructure team, offering a familiar GUI experience for managing VMs and containers, seamless integration with your existing hardware and third-party storage, and critical enterprise features (HA/DR, scale, reliability) built-in. This enables IT teams to gain robust API control and a user experience they trust—rooted in customer obsession and an owner’s mindset. We share context quickly and candidly to keep decisions moving.
With over 30,000 nodes in production at some of the world’s largest enterprises, including Cloudera, EBSCO, Juniper Networks, and Rackspace, Platform9 is the proven path to achieving true vendor independence and operational consistency. We are an inclusive, globally distributed company backed by prominent investors, supported by a partner ecosystem of resellers, systems integrators, MSPs, and technology vendors committed to driving private cloud innovation and efficiency. Our values—innovation, customer obsession, ownership, radical candor, and excellence—guide how we build and support every deployment.
Roles and Responsibilities:
- Develop Intelligent AI Solutions: Leverage state-of-the-art AI technologies to build pioneering NLP and Generative AI solutions—such as Retrieval-Augmented Generation (RAG) pipelines and agentic workflows—that solve real-world infrastructure problems
- Own Key AI Features - Drive the end-to-end development of LLM-powered applications, chatbots, and optimization engines that improve operational efficiency and resilience.
- Build and deploy Agentic AI platforms and Applications enabling autonomous execution and orchestration. Develop AI-powered observability and autoscaling frameworks for large-scale distributed systems.
- Integrate AI/ML solutions into CI/CD pipelines, monitoring, and platform APIs.
- Collaborate with cross-functional engineers to deliver high-impact operational experience
- Serves as a subject matter expert on a wide range of ML techniques and optimizations.
- Mentor & Share Best Practices—Guide junior engineers and peers on ML design patterns, code quality, and experiment methodology.
Qualifications and Skills:
- Minimum bachelor's degree in Data science or a related field.
- 5+ years of experience (at least 2 years working on generative AI technologies)
- Solid understanding of transformers and modern NLP / LLM techniques; experience with fine-tuning or prompting large language models.
- Strong proficiency in Python. Additionally, exposure to Golang is a plus
-Working knowledge of deployment using Kubernetes on Cloud infrastructure is desirable.
- Proven ability to build scalable, reliable production services.
- Ability to work on tasks and projects through to completion with limited supervision.
Distinguish yourself with:
- Agentic AI Mastery—Practical experience with frameworks such as LangChain or LangGraph and a deep understanding of multi-step reasoning and planning.
- LLM Inference Optimization—Expertise in accelerating LLM inference (e.g., KV caching, quantization) to achieve low latency at scale.
- End-to-End ML Systems Ownership—A portfolio showing full lifecycle ownership, from data ingestion to monitoring and continuous improvement.