Reputation Crisis Detection Case Study | Engagor

Tuesday morning, 8:37 AM. A retail client's email program is running normally. Two campaigns sent overnight—a promotional flash sale and the standard weekly newsletter. Both showed healthy initial metrics.

By 8:52 AM, we had detected a reputation crisis in progress.

By 9:30 AM, it was fixed.

Without autonomous detection, this same issue would have surfaced around 2 PM—when the marketing team noticed the flash sale had unusually low clicks. The investigation would have taken until end of day. The damage would have affected three more campaigns.

Here's exactly what happened, and how AI investigation changed the outcome.

The Detection

At 8:47 AM, Engagor's AI flagged an anomaly: Gmail deferral rates for promotional sends were trending upward. Not dramatically—from 0.4% to 1.1%—but outside the expected variance for this client, this time of day, this campaign type.

Most monitoring systems wouldn't alert on this. A 1.1% deferral rate is within normal bounds for many senders. But our AI doesn't use static thresholds. It uses dynamic baselines that account for this specific sender's patterns.

For this client, promotional sends to Gmail typically show 0.3-0.5% deferral rates in the morning. 1.1% was a 2x deviation. Statistically significant. Worth investigating.

The AI didn't just flag the anomaly. It started investigating immediately.

The Investigation (Autonomous)

8:47 AM — Scope identification

AI determined the deferral increase was:

Isolated to Gmail (Outlook and Yahoo showing normal patterns)
Specific to promotional sends (transactional unaffected)
Affecting a single sending domain (client's promotional subdomain)
Consistent across both campaigns sent that morning

This scoping took approximately 4 seconds.

8:48 AM — Hypothesis formation

Based on the pattern, the AI generated three hypotheses:

Authentication failure on the promotional subdomain
Reputation degradation for the promotional sending identity
Infrastructure issue at Gmail (affecting all senders)

It began testing each hypothesis.

8:48 AM — Authentication check

AI queried authentication results for the past 24 hours. DKIM signatures were passing. SPF was aligned. DMARC reports showed no failures.

Hypothesis 1 eliminated.

8:49 AM — Reputation correlation

AI compared current deferral patterns against the previous 7 days. Found: deferral rates had been creeping up since Friday—0.4% → 0.6% → 0.8% → 1.1%.

This wasn't sudden. It was gradual deterioration that crossed a threshold this morning.

Hypothesis 2 strengthened.

8:49 AM — Cross-sender check

AI looked at deferral patterns across other senders in our platform. No spike. Gmail wasn't having a bad morning—this was specific to the client.

Hypothesis 3 eliminated.

8:50 AM — Root cause analysis

If reputation is degrading, something caused it. AI looked for changes in the past week:

No new sending IPs
No authentication changes
No list source changes
One configuration change on Friday: new ESP webhook endpoint for bounce processing

The AI flagged this as the most likely culprit.

8:50 AM — Correlation verification

AI checked bounce processing since Friday. Found: the new webhook endpoint had an SSL certificate error. Bounce notifications were being rejected. Zero bounces had been processed in 4 days.

Without bounce processing, invalid addresses weren't being suppressed. The client had been sending to dead addresses for 4 days, and Gmail noticed.

The Insight

At 8:52 AM—15 minutes after the first anomaly signal—this insight surfaced:

Gmail Reputation Degradation — Root Cause Identified

Severity: High | Trend: Escalating since Friday

Gmail deferrals for promotional sends have increased from 0.4% to 1.1% over 4 days. Root cause identified: bounce webhook processing failure since Friday 3:47 PM.

Impact: Approximately 12,000 sends to invalid addresses over 4 days. Gmail interpreting as list quality issue.

Evidence:

Bounce processing: 0 events since Friday 3:47 PM

Webhook endpoint returning SSL errors

Hard bounce addresses continuing to receive sends

Recommended Actions:

IMMEDIATE: Fix webhook SSL certificate

IMMEDIATE: Process 4 days of queued bounces

TODAY: Suppress affected addresses from pending campaigns

THIS WEEK: Audit bounce processing across all endpoints

Recovery Estimate: Reputation should normalize within 3-5 days after fix.

The Resolution

9:00 AM — Engineering team reviewed the insight. Confirmed webhook SSL certificate had expired Friday afternoon.

9:15 AM — Certificate renewed. Webhook restored.

9:20 AM — Queued bounces replayed through system. 847 hard bounces suppressed from active lists.

9:30 AM — Fix verified. AI confirmed bounce processing resumed normally.

Total time from detection to resolution: 43 minutes.

Human investigation time: Approximately 15 minutes (reviewing AI findings, implementing fix).

The Counterfactual

What would have happened without autonomous detection?

Tuesday, 8:37 AM — Campaigns send normally. No alerts fire. Deferral rate within "acceptable" range.

Tuesday, 2:00 PM — Marketing team reviews flash sale performance. Click rate is 40% below forecast. "That's weird."

Tuesday, 3:00 PM — Marketing escalates to deliverability team. "Something's wrong with Gmail."

Tuesday, 4:00 PM — Deliverability starts investigating. Checks blacklists (clean), authentication (fine), Google Postmaster Tools (shows domain reputation declining).

Tuesday, 5:30 PM — Team identifies bounce processing issue. By now, two more campaigns have sent to dead addresses.

Wednesday, 10:00 AM — Fix deployed after overnight engineering work.

Total time from start of issue to resolution: 5.5 days.

Additional damage during that window:

3 more campaigns affected
Estimated 45,000 additional sends to invalid addresses
Gmail reputation further degraded
Recovery timeline extended to 7-10 days

The ROI

Let's do the math.

Scenario A (With Engagor):

Issue detected in 15 minutes
Resolution in 43 minutes
Affected campaigns: 2
Revenue impact: Minimal (flash sale recovered with re-send to engaged segment)

Scenario B (Without Engagor):

Issue detected after 6+ hours
Resolution in 5.5 days
Affected campaigns: 5
Revenue impact: Flash sale performance -40%, newsletter engagement -25% for two weeks during recovery

For this client, the flash sale alone generated €120,000 in revenue. A 40% reduction represents €48,000 in immediate losses. Ongoing engagement reduction across subsequent campaigns adds another estimated €25,000.

Total protected revenue: ~€73,000 from a single incident.

The Pattern

This story isn't unique. We see variations of it regularly:

A DNS change that breaks DKIM alignment
A list import that introduces spam traps
An ESP migration that misconfigures authentication
A template change that triggers spam filters

The specifics differ. The pattern is consistent: problems that look fine on dashboards, reveal themselves in subtle signal shifts, and compound over hours or days until they become visible crises.

The question isn't whether you'll have incidents like this. You will. The question is how long they'll run before you catch them.

Engagor's AI monitors every signal, investigates anomalies autonomously, and surfaces insights with root cause—so you can fix problems before they become crises.

See how it works →

Autonomous Intelligence

Deliverability Analytics

Engagement Intelligence

Unified Infrastructure

Solutions by Role

Solutions by Challenge

Solutions by Industry