GPU/AI/ML FAQs

What does Rafay do or provide around AI/ML or cloud-native adoption?

Rafay provides infrastructure orchestration to help enterprises, cloud service providers and sovereign AI clouds deploy a Platform-as-a-Service (PaaS) solution that enables self-service compute environments for developers and data scientists. It helps platform engineering teams deliver a user-friendly PaaS experience quickly, typically in weeks instead of years. Rafay’s platform enables faster development and deployment of new capabilities while maintaining necessary controls and guardrails. By simplifying the process of implementing complex platforms, Rafay reduces the need for large teams of experts. In essence, Rafay streamlines cloud-native and AI/ML adoption by offering a ready-to-use platform that balances speed, efficiency, and security for businesses.

Does Rafay offer a GPU PaaS?

Yes, the Rafay Platform provides infrastructure orchestration that enables enterprises and cloud providers to deploy a Platform-as-a-Service (PaaS) in support of CPU and GPU-accelerated compute environments. Platform teams can quickly set up and deliver customized self-service experiences for developers and data scientists, typically within days or weeks. This flexible platform allows end-users to easily access the computational resources they need, whether it’s standard CPU processing or more powerful GPU capabilities. Rafay’s solution streamlines the deployment and management of diverse computing environments, making it easier for organizations to support a wide range of applications, from standard software to complex AI/ML projects.

What does Rafay offer for ML workbenches?

Rafay provides curated ML workbenches that offer developers and data scientists an experience similar to Amazon SageMaker or Google VertexAI, but at a more competitive price point. The platform includes out-of-the-box services such as Notebooks-as-a-Service, with pre-compiled environments featuring TensorFlow, PyTorch, and other popular libraries for immediate productivity. For those preferring a job-based model, Rafay offers Ray-as-a-Service, allowing data scientists to focus on their work without dealing with infrastructure complexities. Advanced teams can opt for a Kubeflow-based ML workbench, which manages pipelines, experiment tracking, and model repositories. These solutions enable data science teams to work efficiently with their preferred tools while Rafay handles the underlying infrastructure management.

What does Rafay offer for GenAI playgrounds?

Rafay provides a controlled, cost-effective Generative AI playground for organizations new to GenAI. This environment allows data scientists to train, tune, and serve GenAI models, enabling efficient experimentation and development without significant investment or infrastructure complexity. It’s ideal for businesses looking to explore GenAI capabilities while managing costs and maintaining control over their AI initiatives.

Who uses Rafay's platform for AI/ML initiatives?

The Rafay Platform is utilized by various enterprises and cloud service providers within organizations worldwide including: business in financial services, healthcare/medical, telecommunications, government, energy, retail, and manufacturing sectors. We’re also collaborating with major GPU vendors for specialized use cases. A notable public example of a company using our AI/GPU stack is MoneyGram, a global leader in cross-border P2P payments and money transfers.

How does Rafay’s platform accelerate time-to-value for AI/ML projects?
  • Without Rafay, platform teams implement complex platforms internally over multiple years and with large teams of experts.
  • With Rafay, platform teams can deliver a finely tuned PaaS experience to internal users in weeks.
How does Rafay ensure compliance and governance for enterprise AI initiatives?

Rafay applies its proven governance and control features, originally developed for cloud-native projects, to AI/GPU initiatives. These capabilities include blueprinting, access management, chargebacks, and auditing/logging. This approach ensures that enterprises can maintain compliance and control over their AI projects, just as they do with other cloud-native initiatives. By leveraging these established features, Rafay helps organizations accelerate AI adoption while maintaining the necessary governance standards, ultimately leading to increased revenues and lower total cost of ownership for both cloud-native and AI/ML projects.

How does Rafay's platform streamline AI/ML infrastructure management for enterprise adoption?

Rafay enables enterprise platform teams to deliver a PaaS experience for GPU resources, both on-premises and in the cloud. The platform offers a cost-effective alternative to services like Amazon SageMaker or Google VertexAI, providing ML workbenches with similar functionality. Rafay’s self-service model and hierarchical experience sharing allow platform teams to selectively offer compute and ML workbench experiences to different teams, optimizing access to expensive GPU resources. Additionally, the platform includes chargeback capabilities to ensure fair cost allocation among internal teams. This comprehensive approach simplifies AI/ML infrastructure management, accelerating enterprise adoption while maintaining cost control and resource efficiency.

Does Rafay provide AI/ML workbenches and other tooling?

Yes, Rafay offers a comprehensive suite of AI/ML tools. The platform provides out-of-the-box workbenches based on Kubeflow and KubeRay, delivered as fully managed services. This allows users to access sophisticated AI/ML platforms without dealing with infrastructure complexities. Additionally, Rafay includes a low-code/no-code framework that enables partners to rapidly develop and deploy specialized AI solutions such as verticalized agents, co-pilots, and document translation services. This combination of ready-to-use workbenches and a flexible development framework streamlines the adoption and customization of AI/ML tools for various enterprise needs, accelerating time-to-market for new AI capabilities.

Is GPU Virtualization supported?

Yes, Rafay supports GPU virtualization. The platform enables GPU and Sovereign Cloud providers to offer fractional GPU resources to end users through a self-service interface. Rafay’s system manages key aspects of virtualization, including:

  1. Security measures
  2. Compute isolation
  3. Chargeback data collection
How does Rafay handle chargebacks and billing?

Rafay offers a comprehensive solution for chargebacks and billing. The platform collects granular chargeback information on resource usage, which can be easily exported to customers’ existing billing systems for further processing and distribution. Rafay allows for customizable chargeback group definitions to align with organizational structures or projects. Both group definition and data collection can be carried out programmatically, enabling efficient and accurate billing processes.

How is Rafay different from Run.AI?

Run:AI focuses on providing fractional/virtualized GPU consumption and a proprietary scheduler optimized for AI/GenAI workloads, replacing the default Kubernetes scheduler. Rafay, however, provides a more comprehensive platform that manages the full lifecycle of underlying Kubernetes clusters and environments. Rafay offers an out-of-the-box experience to deploy and consume Run:AI on the Rafay Platform, while also providing its own GPU virtualization and AI-friendly Kubernetes scheduler for customers preferring a single-vendor solution. Essentially, Rafay can either complement Run:AI’s offerings or provide a standalone solution that covers similar functionalities along with broader infrastructure management capabilities, giving customers flexibility in their AI infrastructure choices.

Does Rafay support NVIDIA NIMs/NIM?

Yes, Rafay supports NVIDIA NIM (NVIDIA Inference Microservices). NIM is NVIDIA’s proprietary solution for delivering packaged inferencing capabilities. It comes pre-configured with NVIDIA’s in-house models and has been optimized for use with a wide range of open-source models, including Meta’s Llama variants. While NIM is often viewed as an alternative to the open-source kServe package, Rafay’s platform supports both NIM and kServe. This flexibility allows customers to choose their preferred inference endpoint and deploy it effortlessly on GPU instances using the Rafay platform. By supporting multiple inferencing solutions, Rafay enables organizations to leverage the most suitable tools for their specific AI/ML needs while maintaining a consistent and manageable infrastructure.

Why consider Rafay's solution over AWS SageMaker or Google Vertex AI?

While AWS SageMaker and Google Vertex AI offer fully managed services, Rafay’s Kubernetes and Kubeflow-based MLOps solution provides distinct advantages. It offers vendor agnosticism, allowing deployment across various cloud providers or on-premises, thus avoiding vendor lock-in. Rafay’s approach enables greater customizability, giving users more control over their infrastructure and workloads. It can also be more cost-efficient, as managing your own Kubernetes clusters allows for optimized resource utilization. This combination of flexibility, control, and potential cost savings makes Rafay’s solution appealing for organizations seeking a tailored and adaptable MLOps environment that can evolve with their specific needs and infrastructure preferences.

How does Rafay's solution fit into existing AWS/Google Cloud workflows?

Rafay’s MLOps platform is designed to seamlessly integrate with existing cloud ecosystems, including AWS and Google Cloud. The solution supports integration with various cloud services, allowing organizations to leverage their current investments and workflows. Rafay’s platform excels in hybrid and multi-cloud environments, providing a unified interface to manage MLOps workflows consistently across different infrastructures. This approach enables businesses to maintain their existing cloud relationships while gaining the added benefits of Rafay’s flexible, vendor-agnostic platform. By bridging the gap between different cloud environments, Rafay allows organizations to optimize their MLOps processes without disrupting established workflows, offering a smooth transition and enhanced capabilities for AI/ML initiatives.

Will managing Kubernetes and Kubeflow add complexity compared to fully managed services?

While Kubernetes and Kubeflow management can be complex, Rafay’s platform is specifically designed to simplify these processes. The solution addresses potential complexity in three key ways:

    1. User-Friendly Interface: Rafay provides an intuitive UI and automation tools that significantly reduce the complexity typically associated with Kubernetes.

 

    1. Managed Kubernetes Service: The platform offers managed Kubernetes services that handle cluster provisioning, scaling, and maintenance, allowing teams to focus on developing models rather than managing infrastructure.

 

    1. Expert Support: Rafay provides comprehensive support and documentation to help teams navigate any challenges, effectively reducing the learning curve.

 

This approach enables organizations to harness the power and flexibility of Kubernetes and Kubeflow without the added complexity.

What about the cost? Are there hidden expenses in managing our own infrastructure?

Rafay aims to provide transparent and potentially cost-saving solutions for managing AI/ML infrastructure. The platform addresses cost concerns in three key areas:

    1. Transparent Pricing: Rafay offers clear pricing models without hidden fees that can be associated with fully managed services.

 

    1. Cost Control: By managing your own infrastructure through Rafay, you can optimize resource usage and avoid over-provisioning, potentially leading to significant cost savings.

 

    1. Avoiding Vendor Premiums: Fully managed services often come with a premium for convenience. Rafay enables you to balance convenience and cost effectively.

 

This approach allows organizations to have greater control over their infrastructure costs while still benefiting from the ease of use provided by Rafay’s platform.

What's Rafay's stance on support and reliability compared to established providers?

Rafay is committed to providing enterprise-grade support and reliability, comparable to established providers like AWS and Google. The platform offers dedicated support teams to assist with any issues, ensuring minimal downtime and quick resolutions. Rafay’s technology stack is built on mature, widely adopted open-source technologies like Kubernetes and Kubeflow, which are trusted across the industry. This foundation provides a robust and reliable infrastructure for AI/ML workloads. Additionally, Rafay’s focus on MLOps allows for specialized support that may not be available with more generalized cloud providers. By combining proven technologies with dedicated, specialized support, Rafay aims to deliver a reliable and well-supported platform that meets the high standards expected in enterprise environments.

How does the Rafay Platform and MLOps offerings benefit an AWS sales team?

Rafay’s offerings complement AWS services in two key ways, benefiting both customers and AWS sales teams. For customers using SageMaker and Bedrock, Rafay enhances AWS’s ecosystem with additional cloud-native and Kubernetes management capabilities.

For customers hesitant to use SageMaker or Bedrock, Rafay provides a similar experience that can be fully deployed within AWS accounts, addressing concerns about cost or data exposure.

Crucially, Rafay’s solutions drive direct compute consumption on AWS, contributing to customers’ Enterprise Discount Program (EDP) commitments. This helps AWS sales teams meet their targets and potentially expand future EDPs, making Rafay a valuable partner in the AWS ecosystem that can increase overall AWS usage and revenue.

Does the Rafay Platform support multi-tenancy?

Rafay Platform enables CSPs to transform their GPU infrastructure into a secure, scalable, and self-service GPU cloud offering. Designed for operational efficiency and customer isolation, Rafay Platform enforces hard multi-tenancy at the infrastructure level – across servers, network, storage and DPU – ensuring each tenant’s environment is fully isolated and protected.
On top of this, Rafay Platform provides soft multi-tenancy through a flexible hierarchy of Organizations, Workspaces, and Users. Each customer receives at least one dedicated Organization, which acts as an isolated boundary for their resources. Within each Organization, multiple Workspaces can be created to represent teams or projects or groups of users or individual users. Users assigned to a Workspace can launch and manage multiple GPU-powered compute instances (bare metal servers, virtual machines, SLURM clusters, Kubernetes Clusters, …); AI/ML Applications and Services (Jupyter Notebook, Finetuning, Inferencing, Third Party and Marketplace AI Applications, …); and GenAI Applications and Services all within their secure environment.
CSPs can govern usage with fine-grained policy-based controls, including limits on the number of instances per user, workspace or organization. This structure allows for secure multi-user collaboration without sacrificing control or resource efficiency.

Can the Rafay Platform be installed on prem for Sovereign AI?

Yes. Sovereign AI Clouds and customers in highly regulated industries prefer Rafay’s air-gapped controller model. Rafay Platform can be deployed in your data center or in your private/public cloud environment. You get exactly the same experience and all the same features available to our SaaS customers.

What is a cloud GPU service, and how does it work?

A cloud GPU service makes powerful graphics processing units (GPUs) available on demand, without requiring organizations to own or manage physical GPU hardware. Through the Rafay Platform, enterprises and cloud service providers can offer GPU resources—alongside CPUs, Kubernetes clusters, and AI apps—in a secure, multi-tenant, self-service environment. Developers and data scientists can provision GPUs instantly for training, fine-tuning, and inference workloads, while platform teams maintain governance, quota enforcement, and cost control.

What is a GPU PaaS?

A GPU Platform-as-a-Service (GPU PaaS) is a managed environment that abstracts the complexity of GPU infrastructure, enabling developers and data scientists to access compute, AI tools, and apps with cloud-like self-service. Rafay delivers infrastructure orchestration that enables organizations to deploy a turnkey GPU PaaS in support of bare metal servers, Kubernetes and virtual clusters, fractional GPUs, ML workbenches, inference endpoints, and pre-packaged apps such as NVIDIA NIM. This empowers both enterprises and sovereign clouds to monetize GPU investments and accelerate innovation.

What are the pricing models for the Rafay Platform?

Pricing typically depends on consumption. With Rafay, cloud providers and enterprises can define SKUs (small, medium, large, etc.) and apply granular chargeback or usage-based billing. Pricing can be per GPU-hour, per instance size, or tied to higher-value services like inference APIs or AI applications. This model allows providers to maximize GPU ROI, while enterprises ensure internal cost transparency.

How does cloud GPU pricing compare to owning physical GPU hardware?

Owning GPU hardware requires significant upfront capital expenditure, along with ongoing costs for maintenance, operations, and staffing. A cloud GPU PaaS built on the Rafay Platform reduces that burden by pooling GPUs and delivering them as a service. Providers can monetize idle GPUs, while enterprises pay only for what they consume. Many organizations find this approach more cost-efficient, with lower total cost of ownership and faster time-to-value.

What are the main use cases for a cloud GPU PaaS?

Key use cases include:

  • AI/ML training and fine-tuning of models on dedicated or fractional GPUs.
  • Inference as a service (e.g., delivering APIs powered by NVIDIA NIM).
  • Generative AI applications such as RAG, copilots, and verticalized agents.
  • Developer self-service for Kubernetes clusters and environments.
  • Multi-tenant GPU clouds operated by sovereign or regional providers.
  • These use cases let organizations turn GPUs into governed, revenue-generating services.

    What are the performance benchmarks for different GPU types?

    Performance depends on the underlying accelerated computing hardware (e.g., NVIDIA A100, H100, L40S) and the workload (training vs. inference). Rafay itself is hardware-agnostic but ensures efficient utilization through GPU slicing (e.g., NVIDIA MIG) and orchestration policies. Cloud providers can expose SKU performance characteristics, and organizations can choose the right GPU for cost/performance balance.

    How do I access and manage cloud GPU instances (console, API, SDK)?

    Rafay offers multiple access methods:

    Self-service console/portal (white-labeled if needed).

    REST APIs, Terraform, and GitOps for automation and integration with CI/CD pipelines.

    SDKs and templates for deploying Kubernetes clusters, inference endpoints, and apps.
    This flexibility ensures teams can consume GPUs the same way they do with AWS or GCP.

    Is my data secure when using cloud GPU services?

    Yes. The Rafay Platform enforces enterprise-grade security: role-based access control (RBAC), zero-trust authentication, network isolation, policy enforcement, audit logging, and encryption. Multi-tenancy is enforced at both the infrastructure and application layers, ensuring each customer or team’s workloads and data remain isolated.

    Are there any limitations or quotas on GPU usage?

    Yes, but these are by design to ensure fair access and governance. Rafay supports fine-grained quota enforcement on GPUs, CPUs, memory, and namespaces. Platform teams or cloud providers can set per-user, per-team, or per-organization limits, controlling how many GPUs or clusters can be consumed. This prevents resource exhaustion, optimizes utilization, and ensures predictable costs.