PromQL Cheatsheet
The Prometheus query language at a glance — types, selectors, operators, functions, and the gotchas. Beginner essentials in black, advanced in accent.
- v2.x
- one-pager · A4
- print at 100%
Data types
| name | instant vector — one sample per series, "now" |
|---|---|
| name[5m] | range vector — window of samples |
| 3.14 | scalar — single number |
| "foo" | string — only as fn arg |
Selectors & matchers
| {job="api"} | exact match |
|---|---|
| {status!="200"} | not equal |
| {status=~"5.."} | regex (RE2, anchored) |
| {path!~"/health.*"} | negative regex |
| {env=""} | label not present |
| {env!=""} | label present |
| {__name__=~"node_.*"} | regex on metric name |
| x offset 5m | shift evaluation back 5m |
| x @ 1672531200 | pin to Unix timestamp |
Time & durations
Units (chained, biggest first):
ms · s · m · h · d · w · y| 5m | 5 minutes |
|---|---|
| 1h30m | 90 minutes |
| 1y2w | 1 year 2 weeks |
Subquery adv
[range:resolution] — instant expr over rangemax_over_time(
rate(req[5m])[1h:1m]
) Aggregation
Pattern:
op by (labels) (expr) — or without (...) to drop instead of keep.| sum | add across series |
|---|---|
| avg | mean across series |
| min / max | lowest / highest |
| count | number of series in group |
| count_values | histogram of values |
| topk(k, …) adv | k highest, keeps labels |
| bottomk(k, …) adv | k lowest |
| quantile(φ, …) adv | φ-quantile across series |
| stddev / stdvar adv | spread across series |
| group adv | constant 1 per group |
Counter functions
rate(v[5m]) per-sec average rate; reset-aware. Default for graphs & alerts.
irate(v[1m]) adv last-2-samples rate; reactive but spiky.
increase(v[1h]) total increase in window (extrapolated).
resets(v[1h]) adv counter reset count.
Gauge functions
delta(v[2h]) first→last difference of a gauge.
deriv(v[1h]) per-sec derivative (linear regression).
predict_linear(v[1h], t) adv predicted value t seconds from now.
predict_linear( fs_avail[1h], 4*3600 ) < 0 # disk full in 4h
changes(v[1h]) adv value-change count.
*_over_time
Aggregate a range vector along the time axis, per series.
| avg_over_time | mean of samples |
|---|---|
| min/max_over_time | extremes in window |
| sum_over_time | sum of samples |
| count_over_time | sample count |
| last_over_time | most recent sample |
| quantile_over_time | φ-quantile per series |
| stddev_over_time | std-dev per series |
Histograms adv
histogram_quantile(φ, b) φ-quantile from classic
_bucket series; keep le.histogram_quantile(0.95, sum by (le, job)( rate(req_bucket[5m]) ))
Labels & misc
label_replace(v, dst, repl, src, rx) adv rewrite label via regex.
label_join(v, dst, sep, …) adv concat labels into a new one.
absent(v) · absent_over_time → 1 when no series — for "stopped reporting" alerts.
clamp / clamp_min / clamp_max squeeze values into a range.
vector(s) · scalar(v) bridge scalar ↔ vector.
time() unix sec at evaluation.
sort / sort_desc order results in tables.
Binary ops & matching
| + - * / % ^ | arithmetic, label-matched |
|---|---|
| == != > < >= <= | filter (use bool for 1/0) |
| and / or / unless | set ops on label sets |
| on(l...) | match only on these labels |
| ignoring(l...) | match on all but these |
| group_left / group_right | many-to-one join |
requests / ignoring(code) group_left requests_total
Gotchas
rate inside sum, never outside.
Aggregating counters first hides resets → garbage.
Window ≥ 4× scrape interval.
rate/increase need ≥ 2 samples;
[5m] is the safe default. increase() can be fractional.
extrapolation to window edges; expected.
Match function to metric type.
rate/irate/increase → counter. delta/deriv/predict_linear → gauge.
Keep
le for histograms. sum by (le, …) — never drop it. Watch cardinality.
user IDs, paths, trace IDs as labels = pain.
Subqueries are expensive.
prefer recording rules for hot paths.
Empty result from
a / b? label sets don't line up. Use
on()/ignoring().Quick recipes
Error rate %
sum(rate(req{code=~"5.."}[5m]))
/ sum(rate(req[5m])) * 100 CPU usage %
100 - avg by(instance)(
rate(node_cpu_seconds_total
{mode="idle"}[5m])) * 100 P95 latency
histogram_quantile(0.95,
sum by(le)(
rate(http_dur_bucket[5m]))) "Job is down" alert
up{job="api"} == 0
or absent(up{job="api"})