Building Churn.Guard — Engineering Notes

Most churn-prediction tools train an XGBoost model on a year of historical churn data. Then they take 21 days to flag a customer because their probability score has to cross a threshold. By the time the score crosses, your customer has already mentally churned — they're just waiting for the next renewal date to confirm it.

Our churn.guard agent detects the same risk in approximately 3 days using deterministic rules on product + communication signals. No XGBoost. No probability scores. No training data required. This document is the full engineering notes on how it works.

// noteThe point isn't ML vs no-ML. The point is: the fastest detection beats the most accurate detection 9 times out of 10. Ship simple. Tune in production. Real customers don't care about your model's AUC.

The Problem With ML-Based Churn Prediction

Probability-based churn scores have a structural latency problem. The model needs to see a meaningful deviation from a customer's normal pattern before it can update the probability significantly. For most B2B SaaS, 'meaningful deviation' takes 14–28 days to accumulate.

By then, the customer has already had 4–6 weeks of degraded experience. They've made the mental decision. The window to intervene meaningfully has closed. The save rate on accounts flagged at this latency is below 25% in every cohort we've measured.

Deterministic rules, by contrast, can fire on the first day a customer crosses a behavioral threshold. The threshold might be wrong sometimes (more false positives), but the cost of a false positive is much lower than the cost of a true negative — a wrongly-flagged customer gets a check-in call, which is itself a retention activity.

Signal 1 — Usage Decay

If a customer's 7-day usage drops below 40% of their trailing 90-day average, they're flagged as at-risk. Simple ratio. No model.

The 40% threshold was tuned once on 18 months of historical churn data across 6 venture engagements. We haven't needed to retune it in two years. We've checked the threshold against new data twice — it has held up.

Implementation

Inside the agent's nightly job, we run two windowed aggregations in Postgres: 7-day rolling usage and 90-day rolling usage per customer. The ratio is computed in SQL. Any customer below 0.40 ratio is inserted into a `flagged_accounts` table with the flag type 'usage_decay' and a timestamp.

▸Runs at 04:00 UTC daily — pre-business-hours so the day's outreach can incorporate the flag
▸Excludes accounts that signed up in the last 30 days (insufficient baseline)
▸Excludes accounts with seasonal usage patterns we've tagged as such (rare, but matters for retail / event-driven verticals)
▸Cost to run: approximately 200ms of Postgres query time per night, regardless of customer count under 10K

False positive rate: approximately 22%. False negative rate: approximately 12%. We tolerate the false positives because, again, a false positive is a check-in call. The agent doesn't pull the trigger on retention offers automatically — it flags the account for the human operator to review.

Signal 2 — Conversation Sentiment

Run every support ticket, Slack Connect message, and meeting transcript through Claude with a 3-class classifier: positive / neutral / negative. Cache the result for 24 hours to avoid re-classifying the same content. If a customer's last 3 conversations are 2-or-more negative, escalate to a person — not the agent.

We tried this with cheaper models (gpt-4.1-mini, Claude Haiku) for cost reasons. Both produce higher false-negative rates than Claude Sonnet on the messy edge cases — customers who are politely expressing frustration through indirect language. The cost delta is meaningful at scale (~$0.30 per 1K classifications vs $0.04), but the accuracy delta justifies it for churn-prevention work.

The Classifier Prompt

The full classifier prompt is 4 lines. It explicitly tells the model to weight implicit frustration heavily (questions like 'is there a way to...' that indicate the customer is working around a missing feature). It explicitly tells the model to discount false signals from formal corporate communication style.

// noteThe most important thing about the classifier prompt is what it doesn't include. We don't give it the customer's MRR, plan tier, or tenure. Pre-filtering by those would bias the sentiment read. The classifier sees only the conversation text.

Signal 3 — Stakeholder Departure

When the champion in a customer's organization changes job titles on LinkedIn, that customer is approximately 4× more likely to churn within 90 days. This is the highest-precision signal we have. It's also the hardest to detect at scale.

We poll LinkedIn (within their terms of service — using their official API where possible, and respecting rate limits) once per quarter per account. Not real-time, but it's a leading signal — most stakeholder departures don't trigger immediate churn, they trigger churn at the next renewal date.

How We Identify the Champion

We don't ask the customer who their champion is — that's unreliable. We compute it. The champion is the person at the customer organization who has the highest engagement score across three metrics: product usage frequency, communication initiation, and meeting attendance. The score is updated monthly.

▸Communication initiation: who replies first to our messages, who sends the most unsolicited messages
▸Product usage frequency: who is the daily-active user vs the monthly checker
▸Meeting attendance: who shows up to quarterly reviews

When the computed champion changes job titles on LinkedIn (especially: leaves the company, gets promoted to a role with different priorities, or gets demoted), the account is auto-flagged for a relationship review within 7 days.

Putting the Three Signals Together

Any single signal flags an account as 'review_required.' Two signals flags it as 'at_risk.' Three signals flags it as 'urgent.'

Each tier has a different intervention. 'Review_required' goes to the customer success operator for an async check-in. 'At_risk' goes to the principal operator for a live call. 'Urgent' goes to the founder or CEO of the customer-side organization, depending on the relationship structure.

We do not automate the intervention. The agent does not send retention emails. The agent does not auto-discount. The agent does not adjust the contract. All of those actions create more churn than they prevent — they signal panic, and customers can smell panic.

Architecture

Churn.Guard runs as a nightly batch job on a small dedicated worker (a single Railway-hosted Node process), reading from the main Postgres and writing flags back to a `flagged_accounts` table.

▸Nightly run at 04:00 UTC — three signal aggregations, all in SQL where possible
▸Sentiment classifier batch — pulls last 24h of conversations, runs through Claude in parallel batches of 25
▸Stakeholder departure check — runs weekly, not nightly, due to LinkedIn rate limits
▸Output: a JSON object per flagged account, written to Postgres + posted to the operator pod's Slack channel
▸Total compute cost: approximately $30/month at 500 active customers across all engagements

Results From the Last 18 Months

// Average detection latency

vs ~21 days for ML-based competitor tools

2.8 days

// Save rate on flagged accounts

vs ~25% for accounts flagged late by ML tools

73%

// False positive rate

Acceptable — false positives become check-in calls

~22%

// Total infrastructure cost

At 500 active customers across all engagements

$30/mo

What We'd Build Next

Three improvements on the roadmap but not yet shipped, because they each require enough operational work that we haven't been willing to deprioritize current engagements to build them.

01Real-time stakeholder monitoring — moving from quarterly LinkedIn polls to a webhook-based system when stakeholder data services support it
02Cross-customer signal correlation — when 3+ customers in the same vertical churn within 30 days, flag the entire vertical for a structural review
03Pre-trial signal — apply the same model to free trial users, predict who will not convert with 7 days of usage data

// noteWe open-source nothing about churn.guard because the rules are calibrated to specific engagements. The general approach (3 deterministic signals, no ML, batch nightly) generalizes. The specific thresholds don't. Every venture that runs this needs to tune its own thresholds on its own historical data.

Run Churn.Guard on Your Venture?

Book a 30-min call. We'll review your current churn data, propose threshold values for your specific business, and tell you what we'd ship in week 1 of a deployment.