Monitors And Correlation Groups
Monitors connect your external monitoring tools to Incido so that service health changes can automatically create, escalate, and resolve incidents without operator intervention. Each monitor receives webhook callbacks from a third-party service and evaluates them to determine whether the monitored endpoint is healthy or unhealthy. When enough monitors in a group report problems, Incido creates an incident on your behalf.
This gives you hands-free incident management for well-understood failure modes, while preserving full manual control for situations that require human judgment.
How monitors work
A monitor is a webhook endpoint that Incido generates for you. You configure your external monitoring tool (Pingdom, Grafana, or any service that can send HTTP webhooks) to call this endpoint when alert state changes. Each incoming request is evaluated against matching rules to determine whether the monitor should transition to Healthy or Unhealthy.
Every monitor has a unique webhook URL that includes an auto-generated secret for authentication. You copy this URL into your monitoring tool's webhook configuration, and from that point on, state changes flow into Incido automatically.
Monitors do not poll external services. They are passive receivers — your monitoring tool pushes status changes to Incido, not the other way around. This means Incido works with any monitoring tool that supports outbound webhooks, and you are not adding load to your infrastructure by having Incido check endpoints.
Monitor types
Incido supports three monitor types, each designed for a different integration scenario.
Pingdom
The Pingdom type is a zero-configuration integration for Pingdom alerting. When Pingdom sends a webhook, Incido evaluates the current_state field in the payload: UP means healthy, DOWN means unhealthy. There is nothing to configure beyond pointing Pingdom's webhook at the monitor URL — the matching rules are built in.
Use this type when you already use Pingdom for uptime monitoring and want Pingdom alerts to flow directly into Incido's incident lifecycle.
Grafana
The Grafana type works the same way as Pingdom but matches Grafana's alert payload format. Incido looks at the status field: resolved means healthy, firing means unhealthy. Like Pingdom, there is no additional configuration required.
Use this type when Grafana alert rules are your primary source of health signals and you want firing alerts to automatically create incidents in Incido.
Generic
The Generic type is for everything else. Instead of hardcoded matching rules, you define your own JSONLogic expressions that evaluate the incoming webhook payload. You write one expression for the healthy condition and one for the unhealthy condition. Incido evaluates the healthy expression first; if it matches, the monitor transitions to Healthy. Otherwise, the unhealthy expression is evaluated.
The request body and query parameters are merged into a single data structure before evaluation, so your expressions can match against any part of the incoming request. Query parameters support dot-notation for nested keys.
For example, if your monitoring tool sends a JSON body like {"status": {"key": "healthy"}}, you would configure the healthy matcher as {"==" : [ { "var" : "status.key" }, "healthy" ] } and the unhealthy matcher as {"==" : [ { "var" : "status.key" }, "unhealthy" ] }.
Use the Generic type when integrating with monitoring tools that are not Pingdom or Grafana, or when you need custom matching logic — for instance, matching on a combination of fields rather than a single status value.
Setting up a monitor
When you create a monitor in the Dashboard, you configure several things that determine how it behaves and what happens when it detects a problem.
Type determines which matching rules are used. Choose Pingdom, Grafana, or Generic based on which monitoring tool is sending the webhooks. If you choose Generic, you will also need to provide the JSONLogic expressions for healthy and unhealthy matching.
Affected components define which parts of your service this monitor represents. When the monitor becomes unhealthy and triggers an incident, these components are marked as affected on your public status page.
Component status defines the impact level that affected components receive when the monitor becomes unhealthy. Choose from Under Investigation, Degraded Performance, Partial Outage, or Full Outage based on the expected customer impact of the failure this monitor detects. This status feeds into the aggregate component health calculation on your status page — if multiple monitors or incidents affect the same component, the most severe status wins.
Correlation group determines how this monitor's state changes are aggregated with other monitors to trigger incidents. Every monitor must belong to exactly one correlation group (more on this below).
Force severity level optionally overrides the correlation group's default severity when this specific monitor triggers an incident. Use this when a particular monitor represents a higher-impact failure than the group default — for example, a database primary failure versus a read replica failure.
Force trigger and force activate allow a single monitor to bypass the correlation group's threshold rules. Force trigger creates an incident immediately when this monitor becomes unhealthy, regardless of how many other monitors in the group are also unhealthy. Force activate goes further and also moves the incident directly to Active stage, skipping triage. Use these sparingly — they are designed for monitors where any failure is severe enough to warrant immediate incident creation.
After saving, copy the generated webhook URL (including the secret) and configure it in your external monitoring tool. The monitor starts in Healthy state and will transition when it receives its first webhook callback.
The number of monitors your organization can create is governed by your billing plan.
Correlation groups
A correlation group ties multiple monitors together and defines the rules for when their collective state should trigger, activate, and resolve incidents. Without correlation groups, every single monitor failure would create a separate incident. With them, you can model realistic failure scenarios: "create an incident when at least 3 of our 10 API endpoint monitors report unhealthy" is more useful than creating 10 separate incidents.
Every correlation group now has both a human-readable name and a stable group key. The name is for operators reading the dashboard, while the key is the technical identifier used by the API and by organization configuration import/export. Keep keys stable over time even if you rename the group for clarity, because imported monitor definitions resolve their target group by key, not by display name.
Threshold configuration
Every correlation group has three thresholds that control its behavior:
Trigger threshold defines how many monitors in the group must be unhealthy before an incident is created. Set this to 1 if any single failure warrants an incident, or higher if you want to tolerate partial degradation before alerting customers. This threshold must be at least 1.
Activation threshold is optional and enables a two-phase approach. When set, the incident is first created in Triage stage (giving your team time to assess), and only moves to Active when the unhealthy count reaches this higher threshold. This is useful when you want early internal awareness without immediately publishing a customer-facing incident. The activation threshold must be equal to or higher than the trigger threshold.
Resolution threshold defines the maximum number of monitors that can remain unhealthy while still allowing automatic resolution. When automatic resolution is enabled and the unhealthy count drops to or below this threshold, the incident is automatically resolved. This must be lower than the trigger threshold — otherwise, the incident would resolve before it should have been created in the first place.
Incident template
Each correlation group defines a template for the incidents it creates. This includes a translatable title and public summary, an internal summary, severity level, initial incident status, tags, and metadata. When a monitor triggers incident creation, Incido uses this template to populate the new incident record.
Invest time in writing a good incident template. The title and public summary appear on your status page the moment the incident is created, so they should describe the problem in customer-friendly language. If no title or summary is provided, Incido generates defaults from the correlation group name.
The initial incident status determines which lifecycle stage the incident starts in. If you set it to a Triage-stage status, incidents start in Triage and can later be activated (manually or by reaching the activation threshold). If you set it to an Active-stage status, incidents start in Active immediately and the activation threshold has no effect.
Automatic resolution
When automatic resolution is enabled and enough monitors recover, Incido resolves the incident automatically. Resolution is blocked if a monitor marked as critical-force-trigger is still unhealthy, which prevents premature resolution when a critical signal is still failing even though the aggregate count has dropped.
Automatic resolution transitions the incident toward Post Incident or Closed (depending on available workflow transitions) and enables subscriber notifications so customers know the issue is recovering.
How monitors and correlation groups create incidents
When a monitor becomes unhealthy, Incido checks whether an ongoing incident already exists for that monitor's correlation group. If no incident exists and the unhealthy monitor count meets the trigger threshold (or the monitor has force trigger enabled), a new incident is created using the correlation group's template.
If an ongoing incident already exists, the unhealthy monitor is added to the tracking record. New affected components from the monitor are added to the incident, and severity is escalated if the monitor's force severity level is higher priority than the current incident severity.
When a monitor recovers to healthy, Incido updates the tracking record and evaluates whether automatic resolution conditions are met. Each monitor's contribution to the incident is recorded with timestamps, so you can see after the fact exactly which monitors triggered, escalated, and resolved the event.
If the incident's affected components overlap with an active maintenance that has incident suppression enabled, the incident is created with the workflow's suppression status instead of the normal initial status. This means monitor-triggered incidents respect maintenance suppression rules automatically — see the incident suppression section in the maintenance guide for details.
What changes on the public frontend
Monitors themselves are never visible on public status pages — customers do not see monitor names, webhook URLs, or health states. What customers see are the incidents that monitors create and the component status changes that result from those incidents.
When a monitor-triggered incident is published to a status page, it looks identical to a manually created incident. The title, severity, affected components, and timeline updates all appear the same way. If you write good incident templates on your correlation groups, the automatic communication will be indistinguishable from manual operator communication.
Component health indicators on the public status page reflect the aggregate impact of all active incidents and maintenances. When monitors drive incident creation, component impact comes from the worst configured monitor impact across unhealthy monitors in the correlation group, combined with any other overlapping incidents or maintenances.
Operational effects
Monitor behavior directly affects the speed and quality of automated communication. Threshold tuning controls whether customers hear about single-point failures immediately or only when wider disruption is confirmed. Correlation templates determine the initial public message quality, so weak template content can produce low-value notifications at scale. Keep this page aligned with Incidents, Maintenances, and Status pages so automated behavior matches the communication rules your team expects.
Audit trail
Every monitor status transition (healthy to unhealthy and back) is recorded with timestamps and duration. This gives you a historical trail of how long each monitor spent in each state, which is valuable for reliability reporting and for understanding patterns in failures.
Audit entries are subject to your organization's data retention policy. Records older than the retention period are automatically cleaned up.
Troubleshooting
A monitor is not receiving webhooks. Verify the webhook URL in your external monitoring tool matches the URL shown in the Dashboard, including the secret. Ensure the monitor is enabled — disabled monitors reject incoming webhooks. Check that your monitoring tool is actually sending requests (most tools have webhook delivery logs).
A webhook returns a 422 error. The incoming request did not match either the healthy or unhealthy expression. For Pingdom and Grafana types, verify that the payload format matches what Incido expects (current_state for Pingdom, status for Grafana). For Generic monitors, test your JSONLogic expressions against the actual payload your tool sends.
An incident was not created when a monitor went unhealthy. Check the correlation group's trigger threshold — the unhealthy monitor count may not have reached it yet. Verify the monitor is assigned to a correlation group and that the group is correctly configured. If the monitor has force trigger disabled, a single failure will not create an incident unless the threshold is 1.
A monitor import cannot find its correlation group. This usually means the imported monitor references a correlation group key that does not exist in the target organization. Compare the key in the import payload with the key configured on the destination correlation group. Renaming the group in the dashboard does not update external payloads automatically, so stale payload keys must be updated before retrying the import.
An incident was not automatically resolved. Confirm that automatic resolution is enabled on the correlation group. Check whether any force-trigger monitor is still unhealthy, because this blocks automatic resolution regardless of the count. Verify that the remaining unhealthy count is at or below the resolution threshold.
Component status does not match expectations. Component status is calculated from the worst impact across all unhealthy monitors in the group and any other overlapping incidents or maintenances. If a component shows as more degraded than a single monitor would suggest, another active incident or maintenance is likely contributing.