How We Detected a Reputation Crisis 6 Hours Before the Client Noticed

Tuesday morning, 8:37 AM. A retail client's email program is running normally. Two campaigns sent overnight—a promotional flash sale and the standard weekly newsletter. Both showed healthy initial metrics.

By 8:52 AM, we had detected a reputation crisis in progress.

By 9:30 AM, it was fixed.

Without autonomous detection, this same issue would have surfaced around 2 PM—when the marketing team noticed the flash sale had unusually low clicks. The investigation would have taken until end of day. The damage would have affected three more campaigns.

Here's exactly what happened, and how AI investigation changed the outcome.


The Detection

At 8:47 AM, Engagor's AI flagged an anomaly: Gmail deferral rates for promotional sends were trending upward. Not dramatically—from 0.4% to 1.1%—but outside the expected variance for this client, this time of day, this campaign type.

Most monitoring systems wouldn't alert on this. A 1.1% deferral rate is within normal bounds for many senders. But our AI doesn't use static thresholds. It uses dynamic baselines that account for this specific sender's patterns.

For this client, promotional sends to Gmail typically show 0.3-0.5% deferral rates in the morning. 1.1% was a 2x deviation. Statistically significant. Worth investigating.

The AI didn't just flag the anomaly. It started investigating immediately.


The Investigation (Autonomous)

8:47 AM — Scope identification

AI determined the deferral increase was:

  • Isolated to Gmail (Outlook and Yahoo showing normal patterns)
  • Specific to promotional sends (transactional unaffected)
  • Affecting a single sending domain (client's promotional subdomain)
  • Consistent across both campaigns sent that morning

This scoping took approximately 4 seconds.

8:48 AM — Hypothesis formation

Based on the pattern, the AI generated three hypotheses:

  1. Authentication failure on the promotional subdomain
  2. Reputation degradation for the promotional sending identity
  3. Infrastructure issue at Gmail (affecting all senders)

It began testing each hypothesis.

8:48 AM — Authentication check

AI queried authentication results for the past 24 hours. DKIM signatures were passing. SPF was aligned. DMARC reports showed no failures.

Hypothesis 1 eliminated.

8:49 AM — Reputation correlation

AI compared current deferral patterns against the previous 7 days. Found: deferral rates had been creeping up since Friday—0.4% → 0.6% → 0.8% → 1.1%.

This wasn't sudden. It was gradual deterioration that crossed a threshold this morning.

Hypothesis 2 strengthened.

8:49 AM — Cross-sender check

AI looked at deferral patterns across other senders in our platform. No spike. Gmail wasn't having a bad morning—this was specific to the client.

Hypothesis 3 eliminated.

8:50 AM — Root cause analysis

If reputation is degrading, something caused it. AI looked for changes in the past week:

  • No new sending IPs
  • No authentication changes
  • No list source changes
  • One configuration change on Friday: new ESP webhook endpoint for bounce processing

The AI flagged this as the most likely culprit.

8:50 AM — Correlation verification

AI checked bounce processing since Friday. Found: the new webhook endpoint had an SSL certificate error. Bounce notifications were being rejected. Zero bounces had been processed in 4 days.

Without bounce processing, invalid addresses weren't being suppressed. The client had been sending to dead addresses for 4 days, and Gmail noticed.


The Insight

At 8:52 AM—15 minutes after the first anomaly signal—this insight surfaced:

Gmail Reputation Degradation — Root Cause Identified

Severity: High | Trend: Escalating since Friday

Gmail deferrals for promotional sends have increased from 0.4% to 1.1% over 4 days. Root cause identified: bounce webhook processing failure since Friday 3:47 PM.

Impact: Approximately 12,000 sends to invalid addresses over 4 days. Gmail interpreting as list quality issue.

Evidence:

  • Bounce processing: 0 events since Friday 3:47 PM
  • Webhook endpoint returning SSL errors
  • Hard bounce addresses continuing to receive sends

Recommended Actions:

  1. IMMEDIATE: Fix webhook SSL certificate
  2. IMMEDIATE: Process 4 days of queued bounces
  3. TODAY: Suppress affected addresses from pending campaigns
  4. THIS WEEK: Audit bounce processing across all endpoints

Recovery Estimate: Reputation should normalize within 3-5 days after fix.


The Resolution

9:00 AM — Engineering team reviewed the insight. Confirmed webhook SSL certificate had expired Friday afternoon.

9:15 AM — Certificate renewed. Webhook restored.

9:20 AM — Queued bounces replayed through system. 847 hard bounces suppressed from active lists.

9:30 AM — Fix verified. AI confirmed bounce processing resumed normally.

Total time from detection to resolution: 43 minutes.

Human investigation time: Approximately 15 minutes (reviewing AI findings, implementing fix).


The Counterfactual

What would have happened without autonomous detection?

Tuesday, 8:37 AM — Campaigns send normally. No alerts fire. Deferral rate within "acceptable" range.

Tuesday, 2:00 PM — Marketing team reviews flash sale performance. Click rate is 40% below forecast. "That's weird."

Tuesday, 3:00 PM — Marketing escalates to deliverability team. "Something's wrong with Gmail."

Tuesday, 4:00 PM — Deliverability starts investigating. Checks blacklists (clean), authentication (fine), Google Postmaster Tools (shows domain reputation declining).

Tuesday, 5:30 PM — Team identifies bounce processing issue. By now, two more campaigns have sent to dead addresses.

Wednesday, 10:00 AM — Fix deployed after overnight engineering work.

Total time from start of issue to resolution: 5.5 days.

Additional damage during that window:

  • 3 more campaigns affected
  • Estimated 45,000 additional sends to invalid addresses
  • Gmail reputation further degraded
  • Recovery timeline extended to 7-10 days

The ROI

Let's do the math.

Scenario A (With Engagor):

  • Issue detected in 15 minutes
  • Resolution in 43 minutes
  • Affected campaigns: 2
  • Revenue impact: Minimal (flash sale recovered with re-send to engaged segment)

Scenario B (Without Engagor):

  • Issue detected after 6+ hours
  • Resolution in 5.5 days
  • Affected campaigns: 5
  • Revenue impact: Flash sale performance -40%, newsletter engagement -25% for two weeks during recovery

For this client, the flash sale alone generated €120,000 in revenue. A 40% reduction represents €48,000 in immediate losses. Ongoing engagement reduction across subsequent campaigns adds another estimated €25,000.

Total protected revenue: ~€73,000 from a single incident.


The Pattern

This story isn't unique. We see variations of it regularly:

  • A DNS change that breaks DKIM alignment
  • A list import that introduces spam traps
  • An ESP migration that misconfigures authentication
  • A template change that triggers spam filters

The specifics differ. The pattern is consistent: problems that look fine on dashboards, reveal themselves in subtle signal shifts, and compound over hours or days until they become visible crises.

The question isn't whether you'll have incidents like this. You will. The question is how long they'll run before you catch them.


Engagor's AI monitors every signal, investigates anomalies autonomously, and surfaces insights with root cause—so you can fix problems before they become crises.

See how it works →

BV
About the author

Bram Van Daele

Founder & CEO

Bram has been working in email deliverability since 1998. He founded Teneo in 2007, which has become Europe's leading email deliverability consultancy. Engagor represents 27 years of hands-on expertise encoded into software.

Connect on LinkedIn →