Live across 5 vendors · 90s refresh

Find the GPU
your model
actually fits.

Pick your model, quantization, and budget. StratusPilot calculates exact VRAM requirements including KV cache, filters incompatible hardware, and returns ranked options across every major cloud vendor.

Generated for you after you match

$ vllm serve meta-llama/Llama-3.1-70B-Instruct

--quantization awq --max-model-len 4096

--gpu-memory-utilization 0.90 # 35GB req on A40 48GB

Find your GPU How it works

// free · sign in to start

stratuspilot.io/matcher

Llama 3.1 70B Instruct

INT4 AWQ

$1.50/hr max

35 GB

VRAM required

Compatible GPUs

$0.89

Lowest match

8s ago

Last updated

A40 48GB

CheapestRunPod

VRAM35 GB needed / 48 GB available

CUDA 11.810 cores48 GB RAM99% uptime

$0.89/hr

~$650/mo

RTX 4090 24GB

Vast.ai

VRAM35 GB needed / 48 GB available · Tight

CUDA 12.08 cores32 GB RAM88% uptime

$1.10/hr

~$803/mo

Vendors compared

90s

Price refresh rate

40+

GPU configurations

To search

OOM surprises

Under the hood

VRAM calculated correctly. Not estimated.

Most tools show you GPU VRAM and let you do the math. StratusPilot computes model weights plus KV cache for your exact context length and batch size. The number you see is what vLLM will actually consume.

vram_calc.pyschema.ts

# Model weights

model_vram = params * bytes_per_param * 1.2

# KV cache (keys + values)

kv_cache = (

2 * layers * heads * head_dim

* seq_len * batch_size

* bytes_per_element

)

total_vram = model_vram + kv_cache

# Llama 3.1 70B + INT4 AWQ

# seq_len=4096, batch=8

total_vram = 35.4 GB

Features

Everything the other tools skip.

Generic GPU comparison sites show you a table. StratusPilot knows your workload and only shows you what will actually run.

◈

Workload-aware VRAM matching

Calculates model weights plus KV cache overhead for your context length and batch size. Not a rough estimate.

model: Llama 3.1 70B
quant: INT4 AWQ
required: 35.4 GB
status: 6 GPUs match

⟳

Live prices across 5 vendors

Prices fetched from Vast.ai, RunPod, Lambda Labs, Verda, and IndieGPU every 90 seconds. Every card shows its timestamp.

vast.ai    live · 8s ago
runpod    live · 8s ago
lambda    live · 8s ago
verda     live · 8s ago

⚡

Fix suggestions on dead ends

Nothing fits your budget or VRAM? StratusPilot shows you the exact quantization switch that opens up results.

BF16: 0 GPUs in budget
INT8: 3 GPUs match (~70GB)
INT4: 9 GPUs match (~35GB)

Supported vendors

Every major GPU cloud in one search.

Vast.ai

Marketplace

RunPod

On-demand + Spot

IndieGPU

Independent hosts

Lambda Labs

Reserved

Verda

EU · Renewable

How it works

From model selection to deployed endpoint.

step_01

Configure your workload

Select your model, quantization format, and max hourly budget. StratusPilot computes exact VRAM requirements including KV cache at your context length and batch size.

step_02

See only what fits

Compatible GPUs ranked by value across all vendors. Anything that will OOM is filtered by default. No surprises after you have already paid for the instance.

step_03

Deploy with the exact command

Hit Deploy and go directly to the vendor with your GPU pre-selected. Or copy the generated vllm serve command and run it yourself.

Stop paying for GPUs
that OOM your model.

Free to view the formula. Sign in free to use the matcher. Takes 30 seconds.

Find your GPU