Why we don’t tell you “this is a bot.”

Most fingerprinting services tell you what you want to hear: “this is a bot” or “this is human.” Binary. Clean. Wrong. Underneath the verdict is a black box — dozens of measurements folded into a model whose decisions you’ll never see explained. You either trust the answer, or you don’t.

Fingerprinting isn’t a classifier. It’s a measurement. Real users and real attackers leak overlapping signals. The interesting question isn’t “is this person lying about who they are,” it’s “how does this session compare to the population, weighted by which signals are easy to fake and which aren’t.”

The receipt model

We don’t tell you “this is a bot.” We tell you the risk level, how confident the score is, and the reasons behind it. You decide.

Every detection category is clearly documented. Every threshold has a rationale. When a session is flagged on a real customer, your team can defend the decision — to legal, to product, to the customer themselves. No black box. No mystery model. Just calibrated measurements you can audit and tune.

Confidence is a real measure with a real source. So is the risk level. So is the list of reasons. When your server verifies a session, you don’t get a verdict — you get the receipt. The decision is yours, because only your code knows what’s at stake.

Binary verdicts hide the cost of being wrong

A binary verdict has two failure modes:

False positive: a real customer is flagged as a bot. The cost is churn, support tickets, and the slow erosion of trust in your product. You’ll never see it in your dashboard, because the customer just leaves.
False negative: a bot is waved through as a real customer. The cost is a chargeback, a fraud loss, or a fake signup. You’ll see that one loud and clear in your dashboard.

When the only visible cost is false negatives, models drift toward over-blocking. That’s the silent failure mode of every binary classifier in the fraud space.

Calibrated risk levels and a confidence measure flip the asymmetry. Your team chooses where to draw the line — and the line can differ per surface. A signup form can challenge sooner; a payment form can hold out for stronger evidence; a read-only public API can allow nearly everything and still log it for later review. The model gives you the inputs; your policy draws the line.

The five-tier risk model

We expose five risk tiers, each with a defined meaning:

Tier	What it means	Default action
Minimal	The boring middle of the population — most real human traffic.	allow
Low	A slight anomaly, usually privacy browsers or unusual hardware.	allow and observe
Medium	Suggestive of automation or fraud, but not conclusive.	challenge (second factor, step-up)
High	Strong evidence of automation, tampering, or infrastructure abuse.	block or strong challenge
Critical	Multiple categories agree — almost certainly malicious.	block, log for review

The meanings are stable across releases. We don’t quietly tighten the model and suddenly start blocking customers who were “low” yesterday. Calibration drift is treated as a breaking change — announced in advance, with the old thresholds still available through a configuration flag.

Confidence is a separate axis

The risk level says how likely a session is malicious. The confidence measure says how much we know about it.

A session that spent a full minute on the page, with mouse, keyboard, and scroll activity, gives us a lot to work with. A session that asked for a score after a couple of seconds and immediately submitted gives us very little — we don’t know enough to be sure of anything.

When confidence is low, the right move is usually to defer, not block. A high-risk but low-confidence session is a great candidate for a soft challenge — solve a CAPTCHA, and then there’s enough to commit to a decision.

The reasons are the audit trail

Every risk score comes with the reasons behind it — the named signals that fired during collection. That list is the audit trail. When a support ticket lands on your engineer’s desk saying “I was blocked from signing up,” the engineer can pull up the verification and see exactly which signals fired.

If a signal fired that shouldn’t have, your engineer can find the matching category in the docs, read its calibration rationale, and either tune the threshold or allowlist the customer. The whole loop closes in minutes, not days.

Calibration over verdicts

The short version:

Verdicts are a marketing surface; measurements are an engineering surface.
Your fraud team makes the final call — we give them the evidence.
False positives are not acceptable losses; the defaults are conservative for a reason.
The reasons are the audit trail; every score traces back to the signals that produced it.

That’s it. That’s the whole philosophy.

Threat categories — the six categories that contribute to the score.
Engineering principles — operating constraints that flow from this philosophy.
Use cases — what this looks like in production.

Why we don’t tell you “this is a bot.”

The receipt model

Binary verdicts hide the cost of being wrong

The five-tier risk model

Confidence is a separate axis

The reasons are the audit trail

Calibration over verdicts

Related reading