The AI & Cloud-Native Infrastructure Blog

Stay updated with the latest news and insights on AI and cloud-native infrastructure through Rafay's highly active blog site

  • All

Powering GPU Cloud Billing: Rafay + Monetize360 Integration

In the fast-evolving world of GPU cloud services and AI infrastructure, accurate, flexible, and real-time billing is no longer optional — it’s mission critical. That’s why Rafay has partnered with Monetize360 to deliver an end-to-end pricing, billing, and revenue management… Read More

Image for GPU/Neocloud Billing using Rafay’s Usage Metering APIs

GPU/Neocloud Billing using Rafay’s Usage Metering APIs

Cloud providers offering GPU or Neo Cloud services need accurate and automated mechanisms to track resource consumption. Usage data becomes the foundation for billing, showback, or chargeback models that customers expect. The Rafay Platform provides usage metering APIs that can… Read More

Image for What is Agentic AI?

What is Agentic AI?

Agentic AI is the next evolution of artificial intelligence—autonomous AI systems composed of multiple AI agents that plan, decide, and execute complex tasks with minimal human intervention. Unlike traditional artificial intelligence systems that operate within fixed boundaries and require human… Read More

Image for Deep Dive into nvidia-smi: Monitoring Your NVIDIA GPU with Real Examples

Deep Dive into nvidia-smi: Monitoring Your NVIDIA GPU with Real Examples

Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance. In this… Read More

Image for Introduction to Dynamic Resource Allocation (DRA) in Kubernetes

Introduction to Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we reviewed the limitations of Kubernetes GPU scheduling. These often result in: Resource fragmentation – large portions of GPU memory remain idle and unusable. Topology blindness – multi-GPU workloads may be scheduled suboptimally. Cost explosion – teams overprovision GPUs to… Read More

Image for Rethinking GPU Allocation in Kubernetes

Rethinking GPU Allocation in Kubernetes

Kubernetes has cemented its position as the de-facto standard for orchestrating containerized workloads in the enterprise. In recent years, its role has expanded beyond web services and batch processing into one of the most demanding domains of all: AI/ML workloads. Organizations… Read More

Image for Neocloud Providers: Powering the Next Generation of AI Workloads

Neocloud Providers: Powering the Next Generation of AI Workloads

Artificial intelligence teams face critical challenges today: Limited GPU availability, orchestration complexity, and escalating costs threaten to slow AI innovation. Enterprises deploying large language models (LLMs), computer vision systems, and machine learning inference pipelines at scale urgently need infrastructure built… Read More

Image for Understanding ArgoCD Reconciliation: How It Works, Why It Matters, and Best Practices

Understanding ArgoCD Reconciliation: How It Works, Why It Matters, and Best Practices

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads. One of its core functions is reconciliation, a continuous process by which ArgoCD ensures that the live state of a Kubernetes cluster matches the desired state… Read More

Image for The Hidden Costs of Running Generative AI Workloads—And How to Optimize Them

The Hidden Costs of Running Generative AI Workloads—And How to Optimize Them

Generative AI has revolutionized what’s achievable in modern enterprises—from large language models (LLMs) powering virtual assistants to diffusion models automating complex image generation workflows. However, behind this wave of innovation lies a significant infrastructure challenge: the escalating cost and complexity… Read More

Image for Accelerating AI App Delivery with the Right Infrastructure Orchestration Strategy

Accelerating AI App Delivery with the Right Infrastructure Orchestration Strategy

Key Takeaways AI orchestration is foundational for scaling AI workloads and delivering apps faster, enabling modern businesses to unlock the full potential of their AI systems. Without a well-defined AI orchestration strategy, teams risk delays, inefficiencies, and spiraling infrastructure complexity… Read More

Image for Choosing the Right Fractional GPU Strategy for Cloud Providers

Choosing the Right Fractional GPU Strategy for Cloud Providers

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent… Read More

Image for Demystifying Fractional GPUs in Kubernetes: MIG, Time Slicing, and Custom Schedulers

Demystifying Fractional GPUs in Kubernetes: MIG, Time Slicing, and Custom Schedulers

As GPU acceleration becomes central to modern AI/ML workloads, Kubernetes has emerged as the orchestration platform of choice. However, allocating full GPUs for many real-world workloads is an overkill resulting in under utilization and soaring costs. Enter the need for fractional… Read More

Image for Custom GPU Resource Classes in Kubernetes

Custom GPU Resource Classes in Kubernetes

In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like In this blog post, we… Read More