Skip to content

Bidirectional Alerts

When YipYap delivers an alert to your sink, your sink can reply with a CloudEvent in the HTTP response body and have it act on the alert directly. No polling loop, no callback into a second API: the same request that delivered the alert carries the response that mutates it.

Practical examples:

  • Your incident-response automation matches the alert against an in-progress remediation and tells YipYap “this is a duplicate of alert X; suppress for 15 minutes.”
  • Your runbook bot accepts the alert on behalf of the on-call engineer and stamps a “claimed by” entry on the timeline.
  • Your auto-remediation pipeline kicks off a fix and tells YipYap “remediation in progress, expected 5 minutes” so the timeline reflects the work without paging the next escalation step.
  • A pre-existing change ticket auto-resolves the alert once the underlying monitor recovers.

YipYap delivers a CloudEvent to your sink:

POST /alert-handler HTTP/1.1
Content-Type: application/cloudevents+json
{
"specversion": "1.0",
"type": "run.yipyap.alert.fired.v1",
"id": "01HXO...",
"source": "https://console.yipyap.run/orgs/my-org",
"time": "2026-04-24T14:22:08Z",
"data": { "alert_id": "01HXA...", "severity": "critical", "reason": "..." }
}

Your sink returns a CloudEvent:

HTTP/1.1 200 OK
Content-Type: application/cloudevents+json
{
"specversion": "1.0",
"type": "run.yipyap.reply.alert.suppressed.v1",
"id": "01HXR...",
"source": "https://my-automation/handler",
"ce_alertref": "01HXO...",
"data": {
"alert_id": "01HXA...",
"duration_seconds": 900,
"reason": "known deploy-window noise; suppressing for 15 min"
}
}

YipYap validates ce_alertref against a real outbound event, checks OIDC, confirms the reply type is opted-in on your channel, clamps the duration to your channel’s cap, and creates a suppression record. Future alerts matching your filter are silenced automatically until the window expires. Fully auditable in the admin dashboard.

Read-only (annotate the timeline):

  • claimed, “claimed by alice”
  • enriched, attach Grafana links, runbook URLs, log snippets
  • linked, attach Jira/PagerDuty/Slack-thread references

State-mutating (standard):

  • suppressed, silence duplicate/noisy alerts for a capped duration
  • deduplicated, mark as duplicate of a primary alert
  • ack_with_context, rich ack carrying a reason code
  • ownership, transfer ownership to a team or user
  • remediation_started, flag remediation in progress; suppresses re-notify until result
  • remediation_result, record outcome; auto-resolve on success if monitor recovered
  • status_page, record that an external status page was updated

State-mutating (elevated, requires Trust Elevated flag):

  • escalated, override severity or escalation step
  • route, create a scoped, time-bounded routing rule

High-blast-radius (requires Trust Elevated + Allow Deregister):

  • monitor.deregister, disable the monitor entirely

See the event catalog for every type’s data schema.

Automatic deduplication. Your correlation engine sees two alerts from the same monitor within 30 seconds, decides they’re the same underlying incident, and replies to the second with deduplicated. YipYap redirects further notifications on the duplicate to the primary alert, and the admin sees a single incident in the UI.

Auto-remediation with feedback loop. Your self-healing controller receives alert.fired for a flaky pod, replies remediation_started with a 120-second expected duration. Yipyap suppresses re-notify. Your controller restarts the pod, waits for the Ready check, and replies remediation_result{outcome: success}. If the monitor has recovered, yipyap auto-resolves the alert, no human in the loop.

Deploy-window suppression. Your CI/CD system replies suppressed{duration: 600, reason: "deploy 2ff18b8 rolling out"} to every incoming alert during a deploy window. No paging spam, and every suppression is audited with the reason and the deploy SHA in the reference_id.

Cross-team handoff. An on-call SRE receives an alert that’s actually owned by the platform team. Their automation replies ownership{owner_kind: team, owner_id: platform, reason: "misrouted, belongs to platform for auth-service"}. YipYap transfers ownership and the platform team’s rotation gets notified on the next escalation tick.

In Settings → Notification Channels → (your CloudEvents channel) → Accepted Replies you control exactly which reply types your channel honors. Defaults are strict:

  • Read-only (claimed/enriched/linked) ship enabled if you enable them at all.
  • Standard state-mutating types default to off. Check the ones you trust.
  • Elevated types require the Trust Elevated flag (off by default).
  • monitor.deregister requires a second Allow Deregister flag, the UI shows a prominent warning banner explaining the blast radius.

Per-type caps (suppress duration, remediation window, route rule TTL) are per-channel configurable with sensible defaults.

Rate limits default to your plan:

PlanReply rate cap per (org, type)
SaaS FreeNo Knative Eventing access
SaaS Pro30 replies / minute
SaaS Enterprise60 replies / minute
FOSS (self-hosted)60 replies / minute (operator-tunable)

Every reply, accepted, rejected, cross-org blocked, rate-limited, handler-errored, is logged with:

  • Reply CloudEvent id, type, source, verified OIDC sub.
  • Alert / monitor / org / channel IDs.
  • Outcome class (accepted / rejected:<reason> / handler_error:<msg>).
  • Structured JSON diff of before / after state.

Admins browse this at Admin → Reply Activity with filters by channel, alert, type, and time range.

Retention by plan:

  • SaaS Pro: 30 days
  • SaaS Enterprise: 90 days
  • FOSS: 14 days (operator-tunable)

Compliance-heavy deployments can export the stream via GET /api/v1/admin/cloudevents/replies before pruning.

Bidirectional reply contracts depend on five guarantees, and YipYap implements all of them:

  1. CloudEvents-native delivery. Response bodies are typed events, not ad-hoc JSON webhooks.
  2. Cryptographic correlation. ce_alertref ties replies to the originating outbound event, preventing forgery and cross-alert hijack.
  3. Per-channel trust gating. Enabling monitor.deregister is a deliberate, two-confirmation action.
  4. Audit-first design. Every reply gets a before/after JSON diff in a durable log.
  5. Rate-cap and loop-safety proofs. A property test asserts yipyap’s own outbound events replayed as replies never trigger handlers.