Machine Learning Engineer - Agentic AI & AIOps
India
Full Time
Engineering
Experienced
About Platform9:
Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.
Enterprises are selecting Platform9's Private Cloud Director to migrate away from legacy virtualization platforms because it meets all of the following enterprise requirements:
- Familiar VM management experience
- Critical enterprise virtualization features: HA, DRR, networking, scale, reliability
- Compatibility with all existing hardware environments, including 3rd-party storage
- Automated migration tooling that lowers cost barrier by 10x
Platform9 was founded by a team of VMware cloud pioneers and has over 30,000 nodes in production at some of the world’s largest enterprises, including Cloudera, EBSCO, Juniper Networks, and Rackspace. Platform9 is an inclusive, globally distributed company backed by prominent investors, committed to driving private cloud innovation and efficiency.
Roles and Responsibilities:
- Develop Intelligent AI Solutions: Leverage state-of-the-art AI technologies to build pioneering NLP and Generative AI solutions—such as Retrieval-Augmented Generation (RAG) pipelines and agentic workflows—that solve real-world infrastructure problems
- Own Key AI Features - Drive the end-to-end development of LLM-powered applications, chatbots, and optimization engines that improve operational efficiency and resilience.
- Build and deploy Agentic AI platforms and Applications enabling autonomous execution and orchestration. Develop AI-powered observability and autoscaling frameworks for large-scale distributed systems.
- Integrate AI/ML solutions into CI/CD pipelines, monitoring, and platform APIs.
- Collaborate with cross-functional engineers to deliver high-impact operational experience
- Serves as a subject matter expert on a wide range of ML techniques and optimizations.
- Mentor & Share Best Practices—Guide junior engineers and peers on ML design patterns, code quality, and experiment methodology.
Qualifications and Skills:
- Minimum bachelor's degree in Data science or a related field.
- 5+ years of experience (at least 2 years working on generative AI technologies)
- Solid understanding of transformers and modern NLP / LLM techniques; experience with fine-tuning or prompting large language models.
- Strong proficiency in Python. Additionally, exposure to Golang is a plus
-Working knowledge of deployment using Kubernetes on Cloud infrastructure is desirable.
- Proven ability to build scalable, reliable production services.
- Ability to work on tasks and projects through to completion with limited supervision.
Distinguish yourself with:
- Agentic AI Mastery—Practical experience with frameworks such as LangChain or LangGraph and a deep understanding of multi-step reasoning and planning.
- LLM Inference Optimization—Expertise in accelerating LLM inference (e.g., KV caching, quantization) to achieve low latency at scale.
- End-to-End ML Systems Ownership—A portfolio showing full lifecycle ownership, from data ingestion to monitoring and continuous improvement.
Platform9 is a leader in simplifying enterprise private clouds. Our flagship product, Private Cloud Director, turns existing infrastructure into a full-featured private cloud. Enterprise IT teams can manage VMs and containers with familiar GUI tools and automated APIs in a private, secure environment.
Enterprises are selecting Platform9's Private Cloud Director to migrate away from legacy virtualization platforms because it meets all of the following enterprise requirements:
- Familiar VM management experience
- Critical enterprise virtualization features: HA, DRR, networking, scale, reliability
- Compatibility with all existing hardware environments, including 3rd-party storage
- Automated migration tooling that lowers cost barrier by 10x
Platform9 was founded by a team of VMware cloud pioneers and has over 30,000 nodes in production at some of the world’s largest enterprises, including Cloudera, EBSCO, Juniper Networks, and Rackspace. Platform9 is an inclusive, globally distributed company backed by prominent investors, committed to driving private cloud innovation and efficiency.
Roles and Responsibilities:
- Develop Intelligent AI Solutions: Leverage state-of-the-art AI technologies to build pioneering NLP and Generative AI solutions—such as Retrieval-Augmented Generation (RAG) pipelines and agentic workflows—that solve real-world infrastructure problems
- Own Key AI Features - Drive the end-to-end development of LLM-powered applications, chatbots, and optimization engines that improve operational efficiency and resilience.
- Build and deploy Agentic AI platforms and Applications enabling autonomous execution and orchestration. Develop AI-powered observability and autoscaling frameworks for large-scale distributed systems.
- Integrate AI/ML solutions into CI/CD pipelines, monitoring, and platform APIs.
- Collaborate with cross-functional engineers to deliver high-impact operational experience
- Serves as a subject matter expert on a wide range of ML techniques and optimizations.
- Mentor & Share Best Practices—Guide junior engineers and peers on ML design patterns, code quality, and experiment methodology.
Qualifications and Skills:
- Minimum bachelor's degree in Data science or a related field.
- 5+ years of experience (at least 2 years working on generative AI technologies)
- Solid understanding of transformers and modern NLP / LLM techniques; experience with fine-tuning or prompting large language models.
- Strong proficiency in Python. Additionally, exposure to Golang is a plus
-Working knowledge of deployment using Kubernetes on Cloud infrastructure is desirable.
- Proven ability to build scalable, reliable production services.
- Ability to work on tasks and projects through to completion with limited supervision.
Distinguish yourself with:
- Agentic AI Mastery—Practical experience with frameworks such as LangChain or LangGraph and a deep understanding of multi-step reasoning and planning.
- LLM Inference Optimization—Expertise in accelerating LLM inference (e.g., KV caching, quantization) to achieve low latency at scale.
- End-to-End ML Systems Ownership—A portfolio showing full lifecycle ownership, from data ingestion to monitoring and continuous improvement.
Apply for this position
Required*