Live across 5 vendors · 90s refresh
Find the GPU
your model
actually fits.
Pick your model, quantization, and budget. StratusPilot calculates exact VRAM requirements including KV cache, filters incompatible hardware, and returns ranked options across every major cloud vendor.
Generated for you after you match
$ vllm serve meta-llama/Llama-3.1-70B-Instruct
--quantization awq --max-model-len 4096
--gpu-memory-utilization 0.90 # 35GB req on A40 48GB
// free · sign in to start
Llama 3.1 70B Instruct
INT4 AWQ
$1.50/hr max
35 GB
VRAM required
6
Compatible GPUs
$0.89
Lowest match
8s ago
Last updated
A40 48GB
VRAM35 GB needed / 48 GB available
CUDA 11.810 cores48 GB RAM99% uptime
$0.89/hr
~$650/mo
RTX 4090 24GB
VRAM35 GB needed / 48 GB available · Tight
CUDA 12.08 cores32 GB RAM88% uptime
$1.10/hr
~$803/mo
5
Vendors compared
90s
Price refresh rate
40+
GPU configurations
$0
To search
0
OOM surprises
Under the hood
VRAM calculated correctly. Not estimated.
Most tools show you GPU VRAM and let you do the math. StratusPilot computes model weights plus KV cache for your exact context length and batch size. The number you see is what vLLM will actually consume.
vram_calc.pyschema.ts
# Model weights
model_vram = params * bytes_per_param * 1.2
# KV cache (keys + values)
kv_cache = (
2 * layers * heads * head_dim
* seq_len * batch_size
* bytes_per_element
)
total_vram = model_vram + kv_cache
# Llama 3.1 70B + INT4 AWQ
# seq_len=4096, batch=8
total_vram = 35.4 GB
Features
Everything the other tools skip.
Generic GPU comparison sites show you a table. StratusPilot knows your workload and only shows you what will actually run.
Workload-aware VRAM matching
Calculates model weights plus KV cache overhead for your context length and batch size. Not a rough estimate.
model: Llama 3.1 70B
quant: INT4 AWQ
required: 35.4 GB
status: 6 GPUs match
quant: INT4 AWQ
required: 35.4 GB
status: 6 GPUs match
Live prices across 5 vendors
Prices fetched from Vast.ai, RunPod, Lambda Labs, Verda, and IndieGPU every 90 seconds. Every card shows its timestamp.
vast.ai live · 8s ago
runpod live · 8s ago
lambda live · 8s ago
verda live · 8s ago
runpod live · 8s ago
lambda live · 8s ago
verda live · 8s ago
Fix suggestions on dead ends
Nothing fits your budget or VRAM? StratusPilot shows you the exact quantization switch that opens up results.
BF16: 0 GPUs in budget
INT8: 3 GPUs match (~70GB)
INT4: 9 GPUs match (~35GB)
INT8: 3 GPUs match (~70GB)
INT4: 9 GPUs match (~35GB)
Supported vendors
Every major GPU cloud in one search.
Vast.ai
Marketplace
RunPod
On-demand + Spot
IndieGPU
Independent hosts
Lambda Labs
Reserved
Verda
EU · Renewable
How it works
From model selection to deployed endpoint.
step_01
Configure your workload
Select your model, quantization format, and max hourly budget. StratusPilot computes exact VRAM requirements including KV cache at your context length and batch size.
step_02
See only what fits
Compatible GPUs ranked by value across all vendors. Anything that will OOM is filtered by default. No surprises after you have already paid for the instance.
step_03
Deploy with the exact command
Hit Deploy and go directly to the vendor with your GPU pre-selected. Or copy the generated vllm serve command and run it yourself.
Stop paying for GPUs
that OOM your model.
Free to view the formula. Sign in free to use the matcher. Takes 30 seconds.
Find your GPU