prometheus Exporter Issues #
The prometheus exporter is not designed to handle and
transmit large volumes of metrics.
Consider a scenario with a Postgres Pro database containing 10,000 tables and 10,000 indexes, using the following extended plugin set:
hostmetrics: cpu
(utilization), disk, filesystem,
load, memory,
network, paging,
processes
postgrespro: activity,
archiver, bgwriter,
bloat_indexes, bloat_tables,
cache, databases,
functions, indexes,
io, locks,
replication, replication_slots,
tables, tablespaces,
version, wal
The expected load in this scenario looks as follows:
collector RAM usage — at least 3 GiB
time to fully load the metrics page — 8-10 seconds
CPU load — 30-50% in conducted tests (1 core)
If the server has less than 3 GiB of RAM available, the collector may be terminated by the OOM killer, excluding other processes.
The collector generates over 390,000 metric records in this configuration.
Use the table below to estimate the number of metrics.
| Plugin Name | Number of Metrics Generated per Object |
|---|---|
| tables | 31 per table |
| indexes | 6 per index (+1 if invalid) |
| bloat_tables | 1 per table |
| bloat_indexes | 1 per index |
Thus, for 100,000 tables and 100,000 indexes, the number of metrics would be at least 3,900,000 for the plugins listed above.
When transmitting hundreds of thousands of metrics through the
prometheus exporter using pull model, you may
encounter the following error in
pgpro-otel-collector logs:
{
"level": "error",
"ts": "2025-09-05T17:40:25.575+0300",
"msg": "error encoding and sending metric family: write tcp 127.0.0.1:8889->127.0.0.1:44930: write: broken pipe\n",
"resource": {
"service.instance.id": "62cc1e9c-a53f-423e-9c6f-41b1f6a0872a",
"service.name": "pgpro-otel-collector",
"service.version": "v0.4.0"
},
"otelcol.component.id": "prometheus",
"otelcol.component.kind": "exporter",
"otelcol.signal": "metrics"
}
This error indicates that metrics cannot be downloaded by
prometheus within its allocated timeout period. Use
the following workarounds to fix the problem:
Increase Timeout
In the prometheus
configuration, specify a larger timeout than the default value:
global: scrape_interval: 15s # Default = 1m scrape_timeout: 15s # Increase timeout globally or in specific scrape_config (default = 10s)
Reduce Metrics Volume
To reduce the overall volume of transmitted metrics, configure collection from specific objects:
receivers:
postgrespro:
plugins:
tables:
enabled: true
databases:
- name: database_name
schemas:
- name: schema_name
tables:
- name: table_name
indexes:
enabled: true
databases:
- name: database_name
schemas:
- name: schema_name
tables:
- name: table_name
indexes:
- name: index_name
bloat_tables:
enabled: true
fetcher:
batch_size: 10000
collection_interval: 5m
databases:
- name: database_name
schemas:
- name: schema_name
tables:
- name: table_name
bloat_indexes:
enabled: true
fetcher:
batch_size: 10000
collection_interval: 5m
databases:
- name: database_name
schemas:
- name: schema_name
tables:
- name: table_name
indexes:
- name: index_name
Use Denylists
If the previous method requires specifying too many objects, use a denylist to exclude specific objects instead. For implementation examples, refer to Section 6.6.5.
Increase Resources
If all collected metrics are required, allocate more CPU resources to
pgpro-otel-collector — for example, when
the /metrics page loads too slowly due to
insufficient server resources.