• home contact us
  • who we are
  • chanllenges we solve
  • markets we serve
  • technologies we provide
  • who we are
  • Home
  • | Main
  • | Legal
  • | Intelligence
  • | Brand
  • | Forensics
  • | Data LP
  • | Compliance

Legal Community

Managing the Cost of Review

Enabling a Focused Review

Technology Assisted Review

Managing Volume Through Near-Dups

Mining Value from Email Threading and Analytics

How Analytics Works in e-Discovery

Defending Concept-Driven Analytics

Managing Volumes Through Near-Dups

Several factors drive the explosion of electronically stored information (ESI) that forms the nucleus of any litigation.

One is the simple fact that it's easy and fairly inexpensive to store. Most companies' IT organizations make routine backup copies of all users' data, and with offline devices such as portable computers, local storage is essentially automatic.

Another is that email is the language of commerce, and "replying to" an email typically adds many copies of the same email to the trail.

A third is the challenge of developing comprehensive record plans, so companies end up implementing a "save everything" strategy.

Emails that are replies or copies of other mails don't necessarily show up as "duplicate" documents since metadata and other information has changed. But they are "near-dups" of an original document. The problem is compounded when attachments are sent to multiple individuals who make comments or edits. The result is a small population of documents that are very nearly identical.

It's expensive and inefficient to review every one of these documents separately, and different reviewers have been known to code nearly duplicate documents differently. The sheer volume of ESI makes it almost impossible to devote individual time and effort to every document, but the dangers of spoliation and privilege mandate that review is careful, methodical, and inclusive.

CAAT has several capabilities to handle this. It provides a high-volume capability to identify nearly duplicate documents based on textual similarity; e.g., they are edits, forwards, etc., of one another. It groups nearly duplicate documents together and identifies what has changed, so reviewers can look at one major central document and then need only look at the differences in documents that are near-dups.

CAAT takes this further with conceptual near-duplicate document identification. Such documents may be edits of a single, original document, but the order of sentences or paragraphs, or the inclusion of new material, makes them literally different. Or there may be different authors' notes on the same issues or discussions. These documents are highly similar on a conceptual level, and CAAT will identify them as such. This makes near-duplicate document review even more efficient by grouping all "very-like" documents into a common review set.

Near-duplicate document detection dramatically increases review speeds and greatly reduces—and possibly eliminates—the error rates of different reviewers coding nearly duplicate documents.

 

  • cac_logo_img


Copyright 2012 Content Analyst Company, LLC All rights reserved.

  • COMPANY
  • About Us
  • Press Releases
  • Careers
  • Contact Us
  • PARTNERS
  • Partners
  • Approach
  • Get Started
  • ContentCare®
  • MARKETS
  • Legal
  • Intelligence
  • Brand Research
  • Compliance
  • Data LP
  • Forensics
  • SOLUTIONS
  • Value
  • Technology
  • Support
  • References