The Graph Behind the Fraud Ring: How Cross-Entity Data Surfaces Connections That Rules Never Find
A mid-sized Indian NBFC came to us in early 2024 with a problem most lenders in the country will recognise. Their portfolio looked healthy. KYC pass rates were strong. Individual application reviews showed nothing unusual. Their fraud-ops team was sharp, well-resourced, and increasingly frustrated.
But first-payment-default rates were rising in a way the volume couldn't explain. Roll rates on certain cohorts were diverging from the rest of the book. Collection teams were reporting odd patterns, borrowers who paid one EMI cleanly and then disappeared, sometimes with the same handful of mobile numbers showing up in entirely unrelated cases.
Their data science team had a hypothesis. Their fraud-ops team had another. Both teams were right. Neither could prove it from inside their existing data stack.
What they were looking at, without yet being able to see, was 503 fraud rings hiding inside 87,000 loan applications. A little over 6% of their inbound applicant pool was linked to organised fraud.
Not a single one of those applications had failed individual KYC.
This article is about how we found the rings. More specifically, it is about the data architecture that made finding them possible — because the interesting thing about fraud rings is not the fraud itself. The interesting thing is that the fraud is invisible inside any single application, and obvious the moment you map the relationships between applications.
The fraudulent transaction is almost never the signal. The signal is the connection.
What "Graph Intelligence" Actually Means (a 60-Second Primer)
Before we go further, let's settle the vocabulary, because half the fraud-tech market uses the word "graph" loosely. You already know what a graph is, even if you don't think of it that way. Facebook's social graph maps people connected by friendship. Google Maps' road graph maps locations connected by routes. LinkedIn's professional graph maps people connected by work history.
A fraud graph maps entities connected by suspicious shared signals.
Two things make it a graph and not a table:
Nodes are the entities — an applicant, a device, a phone number, an email, an IP block, a geographic cell, a SIM card, a KYC document.
Edges are the relationships — this applicant used this device, this device shared a session with this IP, this phone number registered from this geo-grid.
In a traditional fraud database, each applicant is a row with fields. The applicant's device is one field among many. In a graph, the applicant and the device are both entities, connected by a relationship. Which means you can ask a different question, not "what does this applicant look like?" but "what else is connected to this device?"
That second question is the one that finds rings.
Why Rules-Based Fraud Systems Cannot See Rings (By Architecture, Not By Tuning)
Most Indian NBFCs and banks run rules-based fraud stacks. PAN match. Aadhaar verification. Bureau pull. Income document analysis. Address validation. Phone-number penny-drop. Each check produces a binary outcome on the applicant in front of you.
This works for the kind of fraud that consists of one person trying to scam one institution. It fails completely for the kind of fraud that consists of one operator trying to scam one institution forty-seven times.
A ring's job, mechanically, is to look like 47 different applicants. Each individual application has clean documents because the documents are clean, sourced from real PANs and real Aadhaars, often legitimately purchased from people in financial distress for ₹2,000–₹5,000 per identity. Income proofs are doctored to plausibility, not implausibility. The address points to a real building. The phone number rings.
Each application is a verifiable, KYC-clean, individually unremarkable customer.
The ring is the pattern of those 47 applications sharing one device, one IP block, one geo-grid, one onboarding session timing window, one specific manipulation artifact in their Aadhaar JPEG files, or some combination of the above.
A rules-based system that evaluates one applicant at a time cannot, by architecture, see those relationships. It has no place to put them. This is not a tuning problem. It is a data-model problem.
You don't need better rules. You need a different data structure.
A Picture Worth Holding in Your Head
Picture 47 loan applications submitted to the same NBFC over a nine-day window.
Each application has a different PAN. A different Aadhaar. A different phone number. A different email address. A different applicant name. Different employer details, plausible income proofs, separate bureau scores in acceptable ranges.
From the applicant table, they are 47 distinct rows. From the rules engine, they are 47 individually-approvable customers.
Now draw a line connecting all of them to one single point.
That point is a device fingerprint. One physical handset. The same OnePlus or Redmi unit submitted all 47 applications, switching SIMs and identity files between sessions, located in a Tier-3 town in eastern Maharashtra. That is what a ring looks like inside a graph. 47 outer nodes. One central hub. The hub is invisible to any system that only looks at the outer nodes.
This is the picture you should be holding in your head as the rest of this article unfolds. Every ring we describe below is some variant of this shape, a central hidden entity, a fan of clean-looking applicants, and an edge structure that exposes the fan to anyone willing to map it.
What an Edge Actually Means (and Why Most In-House Graph Projects Get This Wrong)
Here is the part where most data science teams underweight the problem when they first attempt graph-based fraud detection.
An edge is not a binary fact. An edge is a signal with a half-life, a confidence weight, and a context-dependent meaning.
Two applications sharing a geo-grid within 48 hours is a fundamentally different risk signal than two applications sharing a device fingerprint over 30 days. Both are edges. The risk interpretation is not even close.
Look at the spread:
-
Shared device over a long window — two applicants applying six months apart from the same handset. In most cases, benign. A shared family phone. An employee on a colleague's device. A recycled handset bought second-hand. Risk weight: small unless other signals corroborate.
-
Shared device within a short window — two applicants applying within the same hour from the same handset. Rare in legitimate behaviour. Very common in ring operations where the operator is rapidly cycling through identity files. Risk weight: high.
-
Shared geo-grid over a long window — two applicants from the same 100m × 100m geographic cell, applying months apart. In any urban Indian context, roughly meaningless. Millions of people share geo-grids in any large city.
-
Shared geo-grid within 48 hours — the same two applicants applying within 48 hours from the same geo-grid, especially in a Tier-3 town where applicant density is lower. Meaningfully suspicious. Combined with any other edge — shared IP, shared SIM carrier, shared document template — it becomes definitively a ring.
-
Shared IP address — almost meaningless in isolation in India because of NAT'd networks, mobile carrier rotation, and shared public WiFi. Useful only when stable over time and combined with other edges.
-
Shared behavioural cadence — two applicants whose typing patterns, scroll velocity, and tap timing fall within a tight similarity threshold. Almost never legitimate. Even identical twins type differently. One of the highest-weighted edges in any production graph, captured by behavioural biometrics.
-
Shared document manipulation template — two Aadhaar photos sharing the same JPEG quantisation pattern at the same splice boundaries. The smoking gun. The probability of two unrelated applicants producing the same manipulation artefact is, for practical purposes, zero. This is what image intelligence catches.
Edge weighting is where graph-based fraud detection goes from "interesting concept" to "actually catches rings." A graph that treats every edge equally is a graph that produces noise. A graph that weights edges by temporal context, information density, and joint distribution with other signals is a graph that produces ring detections. This is also why in-house graph projects at Indian banks have, frankly, been hit-or-miss. The schema is easy. The weighting model is hard. Most teams underestimate the second part until their first production deployment buries them in false positives.
The 80% Finding: One Pattern Catches Most of the Rings
Here is the finding from the NBFC engagement that should sit with every fraud analytics head reading this. Of the 503 rings the graph detected, 80% were structurally organised around a single device fingerprint connected to multiple phone–email pairs.
Not multiple devices. Not distributed networks across geographies. Not sophisticated rotating-identity schemes. One device. Multiple phone numbers. Multiple email addresses. Multiple Aadhaar identities. All applying for loans from the same handset over a compressed time window, with each application individually KYC-verified and bureau-cleared.
The simplicity of the pattern is the point.
A rules-based fraud engine looking at any one of those applications saw a verified PAN, a matched Aadhaar, a valid phone number, a working email address, and a bureau score within acceptable range. Each application sailed through individual review. The reason the NBFC's manual reviewers hadn't caught them in months was not inattention. The data model put in front of the reviewers had no field for "this device has been used by 23 other applicants in the last 72 hours."
The graph had that field. The graph had it as the primary edge.
The remaining 20% of rings were more sophisticated— multi-device operations, geographic distribution, behavioural-cadence overlap rather than direct device sharing. These required heavier signal fusion to detect. But the 80% majority of organised fraud against this NBFC was, fundamentally, a device-fingerprint problem disguised as a hundred different identity problems.
This finding aligns with what the RBI Innovation Hub's MuleHunter.AI deployment data has surfaced across Indian public sector banks — the dominant organised-fraud pattern in Indian lending is a device-and-relationship problem, not an identity-document problem. Yet most Indian NBFCs and banks are running defence layers built primarily around document verification.
What 72 Hours of Cross-Entity Graphing Actually Looks Like
The detection timeline matters as much as the detection volume.
Day 1 of integration. Ingestion begins. The NBFC's existing applicant data flows into the graph along with Sign3's signal enrichment — device fingerprints, behavioural signatures, footprint scoring, image intelligence on KYC documents, location and IP enrichment.
Hour 48. First ring clusters surface. The graph identifies 47 applicants sharing a single device fingerprint, applying within a 9-day window, from a Tier-3 town in Maharashtra. Manual review confirms the ring within an additional 4 hours.
Hour 72. Second-order graph traversal identifies 23 separate clusters with the same structural signature. The fraud-ops team begins targeted holds across the affected applicant pool.
Week 2. 503 rings confirmed across the historical applicant pool. Approximately 6.1% of the 87,000-applicant base. Loan disbursement losses prevented: an estimated ₹50–75 crore that would otherwise have entered the book and almost certainly defaulted at the next two collection cycles.
Three things about this timeline matter operationally.
The rings were not detected through transaction monitoring. They were detected through application-time data. The actual default behaviour — the lossable event — was still months away for many rings, since most operations time their default to the second or third EMI to maximise extraction. The graph caught the rings before the lossable event, not after.
The rings had been in the data for months. They had been invisible because the data model treated each application as an independent record. The graph reorganised the same data — the same fields, the same inputs — into a relational structure that exposed what had always been there.
And 0% of the 503 rings had failed individual KYC. Not one. Every applicant had passed the existing defence layer cleanly. The defence layer wasn't broken. It was wrong-shaped.
How Sign3 Builds the Graph: Six Modalities, One Relational Structure
The graph is the output. The inputs are six modalities of intelligence, each producing the signals that become nodes and edges.
-
Device Intelligence produces the most heavily weighted node — a persistent device fingerprint accurate to 99.9%, resilient against reinstalls, VPN switches, factory resets, and emulator masking. In the NBFC case, this single modality contributed the dominant edge for 80% of detected rings.
-
Behavioral Biometrics generates the behavioural signature node — typing cadence, scroll velocity, tap pressure, the micro-tilt of how the device is held during a session. Behavioural fingerprints are nearly impossible to spoof and almost never overlap between legitimate users, which makes these edges some of the highest-weight in the graph.
-
Network & Graph Intelligence is the engine that runs the graph itself — ingesting nodes and edges from every other modality, applying the weighting model, executing cluster detection, and producing real-time risk scores at the moment of decision.
-
Image Intelligence detects manipulation artefacts in KYC documents at the pixel level. The output is a manipulation-fingerprint node that links documents sharing a template, generation model, or splice pattern. The rarest but highest-weight edge in any production graph.
-
Location Intelligence generates geo-grid nodes and the temporal context that makes geo-grid edges meaningful — correlating IP geolocation, WiFi SSID history, and cell-tower triangulation into a richer location signal than GPS alone.
-
Digital Footprint Signals populates the phone, email, and recovery-linkage nodes with 100+ enrichment signals — age of the phone number, social-platform presence, breach exposure of the email, vintage of the recovery chain. Often the entry points into the graph, particularly for detecting newly-onboarded synthetic identities.
Six modalities. One graph. Real-time scoring. Each modality on its own catches fragments. Fused into a single relational structure, they catch the rings.
What a Data Science Team Should Do Monday Morning
If you run data science or fraud analytics at an Indian NBFC or bank, three concrete steps you can act on without involving procurement, without buying anything, without a vendor conversation:
Run a device-fingerprint clustering query against your last 90 days of applicant data. Group applications by device hash. Plot the distribution of applications-per-device. Look at the long tail. The 80%-single-device finding suggests that even a rudimentary version of this query will surface clusters that your current rules engine has been missing. You will find rings before lunch.
Add a temporal-window dimension to every existing fraud rule. Most rules currently fire on absolute thresholds. Add a "within last 7 days" and a "within last 24 hours" version of each. Compare the false-positive rates. You will find that time-bounded rules are significantly more precise than time-agnostic ones, and you will discover that some of your existing fraud is concentrated in tight temporal clusters that aggregate-window rules miss entirely.
Build a basic node-and-edge representation of your last quarter's applicant data, even in a spreadsheet. Applicants as one column, devices as another, phones as another, emails as another. Sort by device. Sort by IP. Sort by geo-grid. The point is not to build production graph infrastructure — it's to give your team a concrete visual encounter with what relational fraud detection looks like. Most teams who do this exercise are sold on the architecture before any vendor enters the room.
These three steps cost a senior data scientist roughly a week of work. They will produce findings that change how your team thinks about the problem.
After that, the question is whether to build the production graph layer in-house or to integrate one. That's a conversation worth having directly.
The Quiet Argument for Graph-First Architecture
Indian retail lending is scaling faster than its fraud defence infrastructure. The Ministry of Finance reported ₹4,245 crore in digital financial fraud in the first ten months of FY24–25 alone — and that's the disclosed domestic-only number. Organised fraud rings — synthetic identities, device farms, multi-application schemes — are increasingly the dominant pattern at NBFCs serving thin-file and Tier-2/3 segments. The rules-based defence layer that worked through 2018–2022 is structurally insufficient for the fraud being run in 2025 and 2026.
The graph is the data architecture that closes the gap. Not because graph databases are a 2024 buzzword. Because organised fraud rings exist as relationships between entities, and a data model that does not represent relationships cannot detect them.
The 503 rings in the NBFC pool had always been there. They were not new. They were not sophisticated. They were not undetectable. They were unmappable, until the data model changed.
That is what graph intelligence is, at the architectural level. A different data model. One that takes the same applicant data your institution already has, reorganises it relationally, enriches it with multi-modal signal inputs, and surfaces what has always been hiding in plain sight.
The 503 rings were always in the data. The data model just couldn't see them. That's the finding worth forwarding to whoever sits across from you in fraud ops.
Frequently Asked Questions
What is graph-based fraud detection, in one sentence?
A data architecture that represents applicants, devices, phone numbers, emails, IPs, and other entities as connected nodes — so fraud rings, which exist as patterns of shared connections rather than individual bad actors, become detectable.
Why can't rules-based fraud systems catch organised rings?
Because they evaluate one applicant at a time. A ring's job is to look like 47 individually-clean applicants who share one underlying operator. Rules-based systems have no data field for "47 applications submitted from the same device in 9 days." Graph systems do.
What's the difference between a fraud database and a fraud graph?
A database stores entities as rows with fields. A graph stores entities as nodes with relationships. Same data, different model. The model is what determines what kinds of fraud you can detect.
Why is edge weighting harder than schema design?
Because every edge is a signal with a half-life, a confidence weight, and a context-dependent meaning. "Shared device over 30 days" and "shared device within 1 hour" are different risk signals. Most in-house graph projects design the schema correctly and underestimate the weighting model — which is what causes false-positive blowouts in production.
Can Indian NBFCs build graph fraud detection in-house?
Yes, partially. Schema design is straightforward. Edge weighting and signal enrichment (device fingerprinting at 99.9% persistence, behavioural biometrics, image manipulation detection) are the parts where most in-house builds run into trouble. Many institutions build the graph layer themselves and partner for the signal-enrichment modalities that feed it.
Sources and Further Reading
This article draws on Sign3 deployment data and the following publicly available sources. All linked references are recommended reading for any data science or fraud analytics team working in Indian financial services.
Indian Financial Fraud Statistics
Reserve Bank of India — Annual Report FY2024–25 — official banking fraud disclosures, including the ₹520 crore digital payment fraud figure and the 13,516 case count.
Ministry of Finance Parliamentary Disclosures, FY24–25 — the ₹4,245 crore digital financial fraud figure and 24 lakh incident count referenced in this article. National Crime Records Bureau (NCRB) — Cybercrime Statistics — the 67.8% share of cybercrime classified as online financial fraud.
Indian Fraud Detection Infrastructure
Reserve Bank Innovation Hub — MuleHunter.AI Launch and Deployment Updates — official documentation on the 19 behavioural pattern model and the 23-bank deployment as of December 2025.
National Payments Corporation of India (NPCI) — Fraud Management Updates — beneficiary-name display rule (June 2025) and Central Payment Fraud Information Registry references.
International Context
Bank for International Settlements (BIS) — Project Nexus Phase Four Documentation — official documentation on multilateral payment rail architecture and the five-country governance framework.
Monetary Authority of Singapore — Supervisory Action on OCBC Bank, May 2022 — S$330 million capital requirement, the regulatory consequences of reactive defence, and the MAS-ABS Shared Responsibility Framework.
World Bank Migration and Development Brief, January 2025 — global remittance flow data and India's $135 billion inbound benchmark.
Industry Research
FXC Intelligence and Money20/20 — The New Era of Asia's Cross-Border Payments, 2026 — the foundational industry document on Asian cross-border payments and the policy context for the next decade of fraud architecture.
The NBFC case study referenced in this article is composite, drawn from anonymised patterns across multiple Sign3 deployments at Indian lending institutions in 2024–2025. Specific applicant pool sizes, ring counts, and lossable amounts have been rounded to nearest representative values.
About Sign3
Sign3 builds the foundational intelligence layer for modern financial systems — an AI-native customer intelligence platform combining device intelligence, behavioural biometrics, network and graph intelligence, image intelligence, location intelligence, and digital footprint signals into one unified layer.
Twenty-plus Indian banks, fintechs, and marketplaces use Sign3 to detect organised fraud, prevent account takeover, and verify identity in real time across the customer lifecycle.
Talk to us: sign3.ai
About The Author

Amit Chahal is the co-founder and Data Science head at Sign3, brings over a decade of experience in machine learning and financial fraud solutions, transforming how businesses safeguard against risks.
