To work with alerts, you must first pre-configure them in the
ppem-manager.yml manager configuration file.
You can specify the following parameters:
alerts:
metrics:
request_chunk_size: number_of_instance_IDs
cleanup_grace_period: alert_cleanup_interval_if_no_data_is_received
scheduler:
interval: interval_for_checking_new_alerts
initial_delay: delay_for_starting_alert_scheduler
timeout: timeout_for_updating_alert_trigger_rules
delayed_data:
is_enabled: true or false
data_delay: default_data_arrival_delay_for_all_sources
datasource_delays:
metrics: delay_for_metrics_arrival
logs: delay_for_log_arrival
max_delay: maximum_allowed_data_arrival_delay
is_adaptive_delay: true or false
notifier:
num_workers: number_of_concurrent_workers
worker_batch_size: number_of_alerts_in_one_batch
worker_interval: interval_for_checking_new_alerts
backoff_base: exponential_backoff_calculation_duration
max_retries: maximum_number_of_alert_attempts
notification_timeout: alert_timeout
janitor_interval: janitor_worker_polling_interval
stale_processing_timeout: stale_alert_processing_timeout
email:
is_enabled: true or false
smtp:
host: SMTP_server_hostname_or_IP
port: SMTP_server_port
username: username_for_SMTP_server_authentication
password: password_for_SMTP_server_authentication
from: alert_sender_email
timeout: SMTP_server_connection_timeout
use_starttls: true or false
use_ssl: true or false
tls:
insecure_skip_verify: true or false
root_ca_path: path_to_root_CA
Where:
metrics: The parameters of sending requests to the
metrics plugin.
request_chunk_size: The maximum number of
instance IDs within one request.
Default value: 100.
cleanup_grace_period: The interval after
which alerts are cleaned up if no data is received.
Default value: 6h.
scheduler: The parameters of the scheduler that
updates alerts in the manager memory.
interval: The interval for
the scheduler to check for new alerts to process.
Default value: 50s.
initial_delay: The delay before
starting the scheduler for the first time after the start of
PPEM.
Default value: 10s.
timeout: The scheduler timeout for updating
alert trigger rules.
Default value: 10m.
delayed_data: The parameters for managing
delayed metrics and logs with unknown delay time.
is_enabled: Specifies whether delayed data
management parameters are enabled.
Possible values:
true
false
If true is specified, PPEM
checks for delayed metrics and logs.
Default value: false.
data_delay: The default data delay for all
data sources when specific delays are not configured.
Default value: 180s.
datasource_delays: The data delay for
specific data sources. This parameter allows specifying different
delays for metrics and logs as they may arrive at different
rates.
Possible values:
metrics: The delay for the metrics
arrival, in seconds. Metrics typically have more consistent
collection intervals but may be delayed due to network or
processing issues.
logs: The delay for the log arrival, in
seconds. Logs may arrive more frequently but with higher
variability in timing due to log rotation and processing.
max_delay: The maximum allowed delay to
prevent processing data that is too old. Data found earlier
than this number of seconds is ignored to prevent false alerts
from stale data.
Default value: 600s.
is_adaptive_delay: Enables or disables the
adaptive delay learning based on observed data arrival patterns.
Possible values:
true
false
When enabled, PPEM learns on actual delays from data timestamps and adjusts the lookback window dynamically.
Default value: true.
notifier: The parameters of the notifier that sends
alerts.
num_workers: The number of concurrent
workers that will send alerts.
Default value: 5.
worker_batch_size: The number of alerts
processed by workers in one batch.
Default value: 20.
worker_interval: The polling interval for
workers to check for new alerts in the
repository database.
Default value: 30s.
backoff_base: The base duration for the
exponential backoff calculation when resending a failed alert.
The delay for resending the alert is calculated as:
backoff_base X (2^number_of_retry_attempts).
Default value: 10s.
max_retries: The maximum number of attempts
to resend a failed alert.
Default value: 3.
notification_timeout: The maximum amount of
time for the notifier to wait for an alert to be sent
before considering it failed.
Default value: 20s.
janitor_interval: The polling interval for the
janitor worker that cleans alerts stuck in the processing state.
Default value: 1m.
stale_processing_timeout: The amount of
time after which alerts stuck in the processing state are considered
stale and must be reset by the janitor worker.
Default value: 10m.
email: The parameters of sending alerts via email.
is_enabled: Specifies whether alerts are sent
via email.
Possible values:
true
false
If false is specified, alerts are logged instead of
being sent via email.
Default value: false.
smtp: The parameters of the SMTP server used
for sending alerts.
host: The hostname or IP address of the
SMTP server.
Default value: localhost.
port: The port number of the SMTP server.
Default value: 25.
username: The username for authenticating
in the SMTP server.
Default value: "".
password: The password for authenticating
in the SMTP server.
Default value: "".
from: The email address of the alert sender.
Default value: admin@localdomain.local.
timeout: The SMTP server connection timeout.
Default value: 10s.
use_starttls: Specifies whether the
STARTTLS extension is used for
securing the SMTP server connection.
Possible values:
true
false
Default value: false.
use_ssl: Specifies whether the SSL/TLS
protocol is used for the SMTP server connection.
Possible values:
true
false
Default value: false.
tls: The TLS protocol parameters.
insecure_skip_verify: Specifies whether
the client skips the verification of the certificate chain
and hostname of the SMTP server.
Possible values:
true
false
Default value: false.
Setting this parameter to true represents
a security risk. Do it only for testing purposes or with
trusted networks.
root_ca_path: The path to the CA certificate
used for verifying the certificate of the SMTP server.
Default value: "".