← Back to full cheatsheet A4 portrait · two-color · dense

PromQL Cheatsheet

The Prometheus query language at a glance — types, selectors, operators, functions, and the gotchas. Beginner essentials in black, advanced in accent.

Data types

name instant vector — one sample per series, "now"
name[5m] range vector — window of samples
3.14 scalar — single number
"foo" string — only as fn arg

Selectors & matchers

{job="api"} exact match
{status!="200"} not equal
{status=~"5.."} regex (RE2, anchored)
{path!~"/health.*"} negative regex
{env=""} label not present
{env!=""} label present
{__name__=~"node_.*"} regex on metric name
x offset 5m shift evaluation back 5m
x @ 1672531200 pin to Unix timestamp

Time & durations

Units (chained, biggest first):
ms · s · m · h · d · w · y
5m 5 minutes
1h30m 90 minutes
1y2w 1 year 2 weeks
Subquery adv
[range:resolution] — instant expr over range
max_over_time(
  rate(req[5m])[1h:1m]
)

Aggregation

Pattern: op by (labels) (expr) — or without (...) to drop instead of keep.
sum add across series
avg mean across series
min / max lowest / highest
count number of series in group
count_values histogram of values
topk(k, …) adv k highest, keeps labels
bottomk(k, …) adv k lowest
quantile(φ, …) adv φ-quantile across series
stddev / stdvar adv spread across series
group adv constant 1 per group

Counter functions

rate(v[5m])
per-sec average rate; reset-aware. Default for graphs & alerts.
irate(v[1m]) adv
last-2-samples rate; reactive but spiky.
increase(v[1h])
total increase in window (extrapolated).
resets(v[1h]) adv
counter reset count.

Gauge functions

delta(v[2h])
first→last difference of a gauge.
deriv(v[1h])
per-sec derivative (linear regression).
predict_linear(v[1h], t) adv
predicted value t seconds from now.
predict_linear(
  fs_avail[1h], 4*3600
) < 0   # disk full in 4h
changes(v[1h]) adv
value-change count.

*_over_time

Aggregate a range vector along the time axis, per series.
avg_over_time mean of samples
min/max_over_time extremes in window
sum_over_time sum of samples
count_over_time sample count
last_over_time most recent sample
quantile_over_time φ-quantile per series
stddev_over_time std-dev per series

Histograms adv

histogram_quantile(φ, b)
φ-quantile from classic _bucket series; keep le.
histogram_quantile(0.95,
  sum by (le, job)(
    rate(req_bucket[5m])
  ))

Labels & misc

label_replace(v, dst, repl, src, rx) adv
rewrite label via regex.
label_join(v, dst, sep, …) adv
concat labels into a new one.
absent(v) · absent_over_time
→ 1 when no series — for "stopped reporting" alerts.
clamp / clamp_min / clamp_max
squeeze values into a range.
vector(s) · scalar(v)
bridge scalar ↔ vector.
time()
unix sec at evaluation.
sort / sort_desc
order results in tables.

Binary ops & matching

+ - * / % ^ arithmetic, label-matched
== != > < >= <= filter (use bool for 1/0)
and / or / unless set ops on label sets
on(l...) match only on these labels
ignoring(l...) match on all but these
group_left / group_right many-to-one join
requests / ignoring(code)
  group_left requests_total

Gotchas

rate inside sum, never outside.
Aggregating counters first hides resets → garbage.
Window ≥ 4× scrape interval.
rate/increase need ≥ 2 samples; [5m] is the safe default.
increase() can be fractional.
extrapolation to window edges; expected.
Match function to metric type.
rate/irate/increase → counter. delta/deriv/predict_linear → gauge.
Keep le for histograms.
sum by (le, …) — never drop it.
Watch cardinality.
user IDs, paths, trace IDs as labels = pain.
Subqueries are expensive.
prefer recording rules for hot paths.
Empty result from a / b?
label sets don't line up. Use on()/ignoring().

Quick recipes

Error rate %
sum(rate(req{code=~"5.."}[5m]))
/ sum(rate(req[5m])) * 100
CPU usage %
100 - avg by(instance)(
  rate(node_cpu_seconds_total
    {mode="idle"}[5m])) * 100
P95 latency
histogram_quantile(0.95,
  sum by(le)(
    rate(http_dur_bucket[5m])))
"Job is down" alert
up{job="api"} == 0
  or absent(up{job="api"})