Back to InsightsCase Study

Large and Complex Mobile Phone Investigation: 88% Review Volume Reduction

September 15, 2025·6 min read

When a government fraud investigation required the analysis of communications across 12 mobile devices, the legal team faced a challenge that most eDiscovery platforms are not equipped to handle: massive, overlapping mobile data from multiple custodians with a critical need to surface only the most relevant evidence.

The matter involved complex financial fraud allegations. Communications were spread across iMessage, SMS, and third-party messaging applications on devices belonging to key individuals across the investigation. The sheer volume — nearly five million messages — made traditional review economically untenable and practically unworkable without intelligent preprocessing.

The Scale

The investigation produced data from 12 mobile devices belonging to custodians identified as central to the matter. The raw dataset included:

  • 4.9 million messages across iMessage, SMS, and third-party messaging apps
  • 500,000 distinct conversations
  • Communications spanning multiple years
  • Overlapping conversations across custodians with significant duplication

Standard mobile eDiscovery practice would have converted the full dataset to RSMF and presented 741,000 review records to outside counsel. At standard per-document review rates, that volume represented a six-figure review cost before any substantive legal analysis began.

The Approach

StreemView processed the mobile extraction data through a multi-stage pipeline designed to identify relevant communications before RSMF conversion.

Comprehensive Data Consolidation. All 12 device extractions were ingested and normalized into a unified dataset. StreemView's pipeline handled the format variation across extraction sources, standardizing message metadata, conversation threading, and contact attribution across all devices.

Intelligent De-Duplication. The same conversation thread appearing across multiple devices — a common occurrence when custodians communicate with each other — was deduplicated to a single review record with full custodian attribution. This alone eliminated a substantial portion of the raw volume without losing any evidentiary record.

Profile Normalization. StreemView identified and normalized 23 key individuals across the investigation. Mobile data presents a particular challenge here: the same person may appear under different phone numbers, contact names, or identifiers across different devices. Profile normalization ensured that communications involving the same individual were correctly attributed regardless of how they appeared in each custodian's contact data.

Keyword Filtering. Investigation-specific search terms and keyword sets were applied across the normalized, deduplicated dataset. Only conversations containing responsive terms — or providing essential context to those conversations — were flagged for RSMF conversion.

Pre-RSMF Reduction. The filtering and reduction process operated on the full conversation graph before any RSMF documents were created. This meant the expensive conversion step was applied only to the subset of communications that had already been identified as potentially responsive.

The Results

After consolidation, deduplication, profile normalization, and keyword filtering, the 741,000 potential RSMF records were reduced to 86,000 — an 88% reduction in review volume.

Direct cost savings: approximately $100,000. The reduction from 741,000 to 86,000 review documents translated directly to avoided review cost. At standard outside counsel review rates, the savings on document-level fees alone approximated $100,000. When processing, hosting, and QC costs are considered, the total impact was substantially higher.

Investigation efficiency. Beyond cost, the reduction in volume meant the legal team could move faster. Reviewers engaging with 86,000 focused, relevant documents rather than 741,000 raw message records spent their time on substantive analysis rather than filtering noise.

Defensible workflow. The pre-RSMF reduction process produced a documented, reproducible methodology — every filtering decision was logged and auditable. For a government investigation where the collection and processing workflow itself may be scrutinized, that defensibility was not a secondary concern.

Why Mobile Data Is Different

Mobile eDiscovery presents challenges that do not exist in enterprise messaging platforms. Enterprise Slack or Teams data is collected from a central server; mobile data is extracted from individual physical devices using forensic tools. Each extraction reflects a different device, a different OS version, a different application state.

The result is format inconsistency at scale: the same conversation may look structurally different in two custodians' device extractions. Contact normalization is manual without a purpose-built pipeline. Deduplication across devices requires conversation-level matching rather than simple hash comparison.

StreemView's mobile pipeline was designed for this environment — ingesting extraction formats from Cellebrite and other standard forensic tools, normalizing across device and OS variation, and applying the same pre-RSMF reduction methodology that has delivered 90%+ volume reduction across enterprise chat matters.

The 88% reduction in this investigation was not an outlier. It was the result of applying a consistent, conversation-first approach to a data type that had previously resisted it.

Share this post