Microsoft Teams Discovery: Why Native Processing Is the Only Approach That Works

September 10, 2025·10 min read

Microsoft Teams is now the dominant source of internal corporate communications in most large organizations. And it has become, by extension, the largest and most complex source of modern communications data in eDiscovery.

Yet most eDiscovery platforms weren't built for Teams. They were built for email — and Teams support was bolted on afterward. The patchwork shows at every stage of the workflow: in the manual effort required to process Purview exports, in the broken threading that produces orphaned messages, in the attachment handling that leaves SharePoint files disconnected from the messages that referenced them.

This guide walks through the core challenges that make Teams data difficult to process — and what a purpose-built pipeline actually looks like.

Why Teams Data Breaks Legacy Workflows

The Purview Export Problem

Microsoft Purview is the standard collection mechanism for Teams data. A Purview export is delivered in MSG or PST format — the same file formats used for Outlook email. This creates an immediate structural mismatch: the data container was designed for email, but it contains Teams chat conversations.

The result: Teams chats arrive bundled with email and SharePoint documents in a single undifferentiated export. Separating them requires manual intervention — or a pipeline built to handle it automatically.

A purpose-built Teams ingestion pipeline takes a Purview export and automatically identifies and separates Teams chat content from email and SharePoint documents, applying format-specific processing to each.

The Threading Problem

Teams conversations are threaded. A message can have replies. Replies belong to the parent message, not to a document boundary. When these conversations are processed through an email-centric pipeline, threading breaks down in predictable ways:

Channel messages and their replies are treated as separate items
Thread replies are "orphaned" — disconnected from the parent message they're responding to
Reply chains lose their sequential context, making the conversation unintelligible in review

Proper threading reconstruction means every reply is processed with its parent message and the full thread is preserved as a unified conversation throughout ingestion, search, and production.

The SharePoint Attachment Problem

When a Teams user shares a file in a channel, the file isn't attached in the traditional sense. It's stored in a SharePoint document library and referenced via a hyperlink in the Teams message. The Purview export captures the message and the SharePoint file as separate items — with no automatic link between them.

For reviewers, this means the document a message is discussing is disconnected from the message that discussed it. Correlating them manually across a large dataset is impractical.

A purpose-built pipeline automatically links each SharePoint attachment back to the Teams message that referenced it, preserving the contextual relationship through ingestion, review, and production.

The Deduplication Problem

In a multi-custodian Teams collection, the same message can be collected from every custodian who was a participant in that conversation. A channel message in a 50-person team can result in 50 identical copies of that message — one per custodian export.

Traditional deduplication operates at the document level. Chat data doesn't work that way. A 24-hour RSMF "document" from Custodian A covering the same day as Custodian B will have different document-level hashes even if the underlying messages are identical.

Message-level deduplication solves this: each unique message is identified and deduplicated across all custodians simultaneously. The result is one copy per message — with complete custodian attribution retained in metadata — regardless of how many custodians participated in the conversation.

A 50-custodian collection of a 100,000-message Teams channel becomes 100,000 unique messages in the review set rather than 5,000,000.

What Pre-RSMF Search Changes

The standard workflow for Teams eDiscovery looks like this:

Collect from Purview
Process into RSMF files
Load into review platform
Apply search terms
Review the hit set

The problem is that step 4 — search — happens after you've already committed to paying for all the RSMF documents created in step 2. If your custodians generated 500,000 Teams messages and your search identifies 25,000 relevant messages, you've already created (and paid for) 500,000 messages worth of RSMFs.

Pre-RSMF search inverts this sequence:

Collect from Purview
Normalize and deduplicate
Apply search and filters to the full conversation dataset
Create RSMFs only from the relevant messages
Load the targeted RSMF set into review

The result: only 25,000 messages become RSMFs. The remaining 475,000 remain preserved — fully defensible, available for reactivation if the scope changes — but never converted into production documents.

The Microsoft 365 Copilot Factor

Copilot interactions are a new and growing category of Teams data that requires specific handling. When a user interacts with Microsoft 365 Copilot within Teams, those interactions are captured in the same Purview export as regular Teams chat content.

Copilot data is different in several important ways:

Prompts represent user intent in a distinct form from normal chat messages
Responses from Copilot are AI-generated content that may reference other company data
Chain-of-thought content from Copilot interactions has evidentiary significance distinct from the prompt-response pair alone

A Teams processing pipeline that doesn't handle Copilot interactions explicitly will either miss this content or produce it in a format that obscures its nature. Purpose-built handling separates Copilot interactions from regular Teams chat and preserves the full interaction structure.

Attachment Economics: Preserve Everything, Activate What Matters

Teams matters often involve large numbers of SharePoint attachments. In a preservation context, you need to capture all of them — you can't know in advance which files will be relevant to the matter.

But paying to process 40,000 attachments when only 3,800 are linked to responsive messages makes no sense. StreemView's Preservation Only model addresses this directly:

Every attachment is captured and preserved at collection
Only attachments linked to responsive messages are "activated" for full processing
Unactivated attachments remain preserved at a significantly reduced rate
Scope changes don't require re-collection — unactivated attachments can be activated on demand

The practical result: complete preservation without paying for content that will never enter review.

What to Look for in a Teams Processing Platform

If you're evaluating platforms for Teams eDiscovery, the questions to ask are:

How is Purview data ingested? Does it require manual pre-processing, or is it direct?
How is Teams data separated from email and SharePoint in mixed exports? Is this automatic?
How is threading handled? Are replies processed with parent messages through the entire workflow?
How are SharePoint attachments linked to messages? Is this automatic, or manual correlation?
How does deduplication work? Is it document-level or message-level?
When is search applied? Before or after RSMF creation?
How are Copilot interactions handled? Are they separated from regular chat content?

A platform that can answer all seven of these questions with "automatically, natively, at the message level" is one that was actually built for Teams. Everything else is a retrofit.

Share this post