Incidents

An incident is the operational record for unplanned service disruption. When something is broken, degraded, or under investigation and your customers need to know about it, an incident is how you communicate what is happening, what you are doing about it, and when they can expect resolution.

The core idea is straightforward: acknowledge a problem, keep customers informed as you work through it, and close the record when everything is stable again. Incidents can originate in three ways — an operator creates one manually through the Dashboard, an external system creates one through the API, or the monitoring system creates one automatically when alert conditions are met. Regardless of source, every incident follows the same lifecycle and appears the same way on your public status pages.

When to create an incident

Create an incident when disruption is unplanned or uncertain. Typical triggers include elevated error rates, partial regional outages, degraded API latency, authentication failures, or any customer-visible behavior that requires active investigation.

If work is planned in advance, use a maintenance instead. When planned work causes unexpected additional impact, it is perfectly valid to have both: a maintenance for the scheduled window and an incident for the unexpected failure. Incido tracks the relationship between the two, including whether the incident was automatically suppressed during the maintenance window (see the incident suppression section in the maintenance guide for details).

The incident lifecycle

Every incident moves through four stages. Understanding these stages is important because each transition can trigger visible changes on your public status page, affect component health indicators, and dispatch notifications to your subscribers.

Triage is the intake stage. The issue is acknowledged but impact and root cause are still being validated. An incident in Triage exists for internal coordination — you can begin public communication from Triage, but no lifecycle side effects fire automatically. This is your space to assess the situation before committing to a public narrative. Use Triage to gather information, align your team, and decide on severity before escalating.

Active means disruption is confirmed and response is underway. When an incident enters Active, the activation timestamp is recorded and affected component health on your status page updates to reflect the configured impact level. This is the stage where your public communication matters most, because customers are actively experiencing the problem. Each affected component shifts to its configured impact status (for example, Degraded Performance or Partial Outage), visible to anyone viewing the status page.

Post Incident means customer impact has ended, but your team is still verifying stability or preparing closure communication. Affected components return to Operational status at this point (assuming no other active incidents or maintenances affect them). This is a useful intermediate state: it tells customers that the disruptive phase is over while giving your team breathing room to write a summary, run verification checks, or prepare a post-mortem before declaring everything complete.

Closed is terminal. No further status transitions or updates are possible. The incident remains visible in historical views on the status page, but it no longer affects component health or active incident indicators.

You do not have to move through every stage in order. The Dashboard allows you to skip stages when the situation calls for it — a low-severity issue might go directly from Triage to Closed after quick validation, and a severe event might skip Triage entirely and start in Active if the initial status is configured that way. The Dashboard enforces valid transitions but does not force a linear path.

Creating an incident

Open the Incidents section in the Dashboard and create a new entry. Several fields shape both how your team coordinates the response and what your customers see on the public status page.

A title that clearly describes the affected surface and the symptom in plain language. "API authentication failures in EU region" tells customers and operators what is happening. "Auth issue" does not. The title is the first thing customers read, so invest the extra few seconds to make it specific and informative.

A severity level that reflects customer impact, not engineering effort. Your organization defines its own severity levels with names, colors, and icons, and they appear prominently on the public status page. Customers use severity as a signal for how seriously to take the disruption and whether they need to activate their own contingency plans. You can change severity later as the situation evolves — downgrade when impact narrows, escalate when it widens.

A customer-facing summary that explains what customers are experiencing, not what your internal systems are doing. Be specific about which functionality is affected and what workarounds exist, if any. Write this for the people who depend on your service: "Credit card payments are failing for approximately 30% of transactions; PayPal and bank transfer remain available" gives customers something to act on. "Payment system experiencing issues" does not.

An internal summary that captures context your team needs but customers should not see — investigation notes, hypotheses, runbook links, rollback plans, and escalation contacts. This never appears on the public status page.

Affected components that tell customers which parts of your service are involved. Choose these carefully, because component selection drives what customers see on the status page and how aggregate service health is calculated. Each affected component carries an impact level — Under Investigation, Degraded Performance, Partial Outage, or Full Outage. When multiple incidents and maintenances overlap on the same component, the public status page shows the most severe impact across all of them.

While you are choosing components on the raise-incident form, the Dashboard may show a short ongoing maintenance notice if a selected component is already inside an active maintenance window on that component (the maintenance is in progress and that component’s maintenance line is marked in progress). The notice lists the maintenance title and links straight to the maintenance record so you can confirm scope, timing, and any incident-suppression settings before you publish anything. It is only there to reduce surprises; it does not stop you from saving the incident when you still need a separate record for unexpected impact.

A deduplication key that represents the underlying problem. Every incident requires one — a short lowercase identifier using letters, numbers, and dashes, between 5 and 40 characters. If an active or triaging incident with the same key already exists in your organization, Incido returns the existing incident instead of creating a duplicate. This is especially important during noisy failure periods when monitors or API integrations might fire multiple creation requests for the same root cause. Choose a key that describes the problem, not the symptom — for example, eu-db-primary-down rather than high-error-rate-api.

After saving, the incident is created in the stage that corresponds to its initial status (typically Triage). From there, you can begin publishing updates and transitioning through the lifecycle.

Working with statuses and transitions

Your organization's statuses sit inside the fixed stage structure. A status like "Investigating" or "Identified" belongs to a specific stage, and your organization defines which transitions are allowed between them. Each status also carries its own public label, public color, and public icon for customer-facing communication. That means your internal workflow wording does not have to match what customers see. You might keep an internal status named Active for operators, while showing Identified publicly because that is the phrase your customers already understand from other status pages.

Moving between statuses within the same stage changes only the workflow label and the public presentation tied to that status — for example, progressing from one Active-stage status to another without triggering a stage change. This kind of transition updates the customer-facing badge and timeline wording but does not trigger component recalculation or stage-level side effects.

Moving to a status that belongs to a different stage triggers the full stage transition with all its consequences: timestamps are recorded, affected components and public visibility are updated, and subscriber notifications may be dispatched. Think of stage transitions as communication milestones. Move to Active when you are ready to tell customers that disruption is confirmed and response is underway. Move to Post Incident when you are confident that customer impact has ended. Move to Closed when follow-up is complete and you are ready to archive the event.

When you open a transition that moves an incident into Active, the Dashboard may show a warning if every affected component is still marked Under Investigation. This is a prompt to pause and review whether those components should already be set to a concrete impact level such as Degraded Performance or Full Outage before you confirm the transition. The warning does not block the action. It is there to help you avoid activating an incident while leaving the public component state overly vague for customers who need a clear picture of current impact.

The Dashboard shows one action per available transition from the current status, so you always see exactly which moves are valid. If a transition you expect is missing, check your organization's status transition configuration.

Shortcut actions

The Dashboard offers two shortcut actions for the most common transition patterns, so you can move quickly during an active response without scanning through individual transitions.

Activate finds a valid transition to an Active-stage status and applies it. This is the fastest way to escalate an incident from Triage when you have confirmed customer impact. The shortcut automatically enables subscriber notifications on the update it creates.

Resolve prefers a transition to Post Incident and falls back to Closed if no Post Incident transition is available. Like Activate, it enables subscriber notifications automatically. Use this when customer impact has ended and you are ready to signal recovery.

If no valid transition exists for either shortcut, Incido does not fail silently — it records an internal comment explaining why the action could not be performed, so you always have a trail. This typically means your status transition configuration does not include a path from the current status to the target stage.

Publishing updates

Incident updates build the public timeline that customers read as your official narrative of the event. Every published update is timestamped in that timeline, giving customers a chronological record of how the situation evolved. Message-only updates appear as their own Update entries, while an update that also changes the public status is folded into the same milestone row so customers can see both the status change and the explanation together.

When writing an update, answer three questions: what is happening right now, what changed since the last update, and when should customers expect the next communication. Keep language concrete and time-aware. "We mitigated elevated error rates for most requests; elevated latency remains for file uploads; next update in 20 minutes" gives customers something to act on. "We are continuing to investigate" does not.

An update can optionally include a status transition. When you pair a message with a transition, both the narrative update and the stage change appear together in the timeline, giving customers a single coherent entry that explains both what happened and what it means.

Use the subscriber notification toggle intentionally. Send notifications when the update changes customer decisions — new impact, escalation, partial mitigation, full recovery, or an unexpected delay. Skip notifications for minor internal progress notes that do not change what customers should do. Notifications build trust when they carry meaningful information, but erode it when they become noise.

Internal comments are separate from public updates and never appear on status pages or in the public timeline. Use them for team coordination, colleague mentions, and operational detail that should stay internal.

When you mention a teammate in an internal incident comment, Incido sends that person an email notification after the comment is saved. The email includes who wrote the comment, embeds the full sanitized comment content, and provides a direct button back to the incident detail page in the Dashboard. The subject uses the incident title (Incident: {title}), and email thread headers are set so repeated mention emails for the same incident can group together in most mail clients. This makes mentions useful even when responders are away from the Dashboard, while keeping all communication tied to the same incident record.

Monitor-created incidents

When a monitor detects a failure condition, it can create an incident automatically. Monitor-created incidents behave identically to manually created ones, but their affected components, severity, and initial status are derived from the monitor and its correlation group configuration rather than operator input.

The monitoring system uses the correlation group to determine which components to mark as affected, what impact level to assign based on the type of failure, and which severity level to apply. If the correlation group specifies an initial incident status, the incident starts in the stage that status belongs to — which may be Active rather than Triage, meaning the incident can go live on your status pages immediately.

When additional monitors in the same correlation group detect failures, the existing incident is updated rather than duplicated. New affected components are added and severity may be escalated if the new monitor's configuration warrants it. When enough monitors recover, the system can automatically resolve the incident.

If you see an incident appear that nobody on your team created, check the incident detail page — the source is visible there. Monitor-created incidents follow the same lifecycle and can be transitioned, updated, and closed in the same way as any other incident.

What changes on the public frontend

Every action you take in the Dashboard can affect what appears on your public status page. Here is what changes at each stage. Publication behavior also depends on status page rules in Status pages.

When an incident is first published, it appears on the public incident list with its title, severity indicator, current public status badge, and affected components. An incident is published to a status page when the severity level is applicable, the current status is applicable, and at least one affected component belongs to that status page. Your status page configuration controls these applicability rules.

When an incident is in Triage, customers see whatever public wording and styling the current status defines. Most teams use this for an acknowledgment such as Investigating, with an icon and color that clearly signal active attention without overstating certainty. The incident still does not affect component health calculations at the stage level, so you have space to validate impact before committing to a broader public narrative.

When an incident moves to Active, the status page reflects confirmed disruption. Affected components shift to their configured impact level — for example, showing "Degraded Performance" or "Partial Outage" instead of "Operational". This impact is calculated by combining the incident's component impact with any other active incidents or maintenances affecting the same components, always showing the most severe. The badge and milestone wording now come from the current status itself, so an Active-stage status can present Identified, Mitigating, or another customer-facing phrase that matches your communication style.

When an incident moves to Post Incident, customers see that recovery has happened and the team is verifying stability. Affected components return to Operational status (assuming no other active incidents or maintenances affect them). The public timeline shows the current status's public wording for that recovery phase, and because status presentation is live configuration rather than snapshotted text, changes you make to that public label later also affect how those historical milestones render.

When an incident reaches Closed, it is archived. It remains accessible in the incident history on the status page, but it no longer affects component status or active incident indicators.

Every published update appears on the incident detail page timeline, either as its own timestamped Update entry or attached to the milestone created by the same status transition. Customers treat this timeline as the authoritative narrative, so both the timing and clarity of your updates matter.

Operational effects

Subscriber notifications are sent when you explicitly enable them on an update and the subscriber's own preferences allow delivery. Stage transitions that trigger publication (like moving to Active) can also dispatch notifications. Incido distinguishes between three notification types — incident created, incident updated, and incident closed — so subscribers receive context-appropriate messages at each point in the lifecycle. For subscription scope and channel behavior, see Subscribers.

Notification delivery is intentionally per-update, giving you granular control over communication frequency. When an incident is first published to a status page, subscribers for that page receive a creation notification. Subsequent updates with notifications enabled dispatch update notifications. When the incident moves to Post Incident or Closed, subscribers receive a closure notification.

After publishing an update or transitioning a stage, there may be a brief delay before changes appear on the public status page. This refresh window is typically a matter of seconds. If an update does not appear after a reasonable wait, verify that it was published to the correct status page from the incident detail view.

Troubleshooting

An incident is not appearing on the status page. Check three things: the incident's severity level must be applicable to the status page, the current incident status must be applicable, and at least one affected component must belong to that status page. If any of these conditions are not met, the incident will not be published there. Review your status page configuration to verify applicability rules.

Customers do not see a new update. Confirm the update was published to the intended status page — an incident can be linked to multiple pages, and you may have published to the wrong one. Allow a few seconds for public page refresh. If the update still does not appear, check the publication state on the incident detail page in the Dashboard.

Subscribers did not receive a notification. Verify that the notification toggle was enabled when you published that specific update. Then check subscriber scope: subscribers must have active subscriptions for the relevant status page and components. Channel preferences and delivery settings on the subscriber side can also filter out notifications even when you enabled them on your side.

An incident appears duplicated. Check the deduplication keys on both records. Incidents with the same key should resolve to a single record while either is in Triage or Active. If duplicates exist with different keys, the keys need to be aligned at the source — whether that is monitor configuration, API integration, or manual entry.

Component status looks wrong after closing an incident. Review affected component mappings and impact levels across all currently active incidents and maintenances. Public component health shows the most severe active overlap, not a single incident in isolation. If a component still shows as degraded after your incident closed, another active incident or maintenance is likely still affecting it.

A shortcut action (Activate or Resolve) did not work. The shortcut requires a valid transition from the current status to a status in the target stage. If no such transition exists in your organization's configuration, the action cannot proceed. Incido records an internal comment explaining why. Check your incident status transition settings and add the missing path if needed.

A monitor-created incident has incorrect severity or components. The monitor and its correlation group configuration determine these values at creation time. Adjust the monitor or correlation group settings to change future behavior, and manually correct the current incident if needed from the incident detail page.

When to create an incident​

The incident lifecycle​

Creating an incident​

Working with statuses and transitions​

Shortcut actions​

Publishing updates​

Monitor-created incidents​

What changes on the public frontend​

Operational effects​

Troubleshooting​