• home contact us
  • who we are
  • chanllenges we solve
  • markets we serve
  • technologies we provide
  • who we are
  • Home
  • | Main
  • | Value
  • | Technology
  • | Support

Technology Overview

Latent Semantic Indexing (LSI)

Concept Search

Concept-Based Categorization

Dynamic Clustering

Email Thread Analysis

Near-Duplicate Document Identification

Difference Highlighting

Instant Context

Language Analytics

Automatic Summarization

CAAT Software Developers Kit

NEAR-DUPLICATE DOCUMENT IDENTIFICATION

Near-duplicate document detection is another service that CAAT provides, and it takes two forms. First, there is a standard text-based version of near-duplicate detection. This looks word-by-word at the text, and calculates the percentage of shared text between two documents, grouping them accordingly. Second, CAAT has a unique, conceptual version of near-duplicate detection. This looks at documents conceptually, using a concept index, and employs clustering algorithms to gather like documents together.

In both cases, the groupings help ensure consistent document review by having similar documents presented to the same reviewer at the same time. In this way, the groupings are usually supplemental to conceptual clusters and email threads. Additionally, text difference highlighting can be used to reveal the differences between documents in the same group, particularly for textual near-duplicate detection.

 



Copyright 2012 Content Analyst Company, LLC All rights reserved.

  • COMPANY
  • About Us
  • Press Releases
  • Careers
  • Contact Us
  • PARTNERS
  • Partners
  • Approach
  • Get Started
  • Content Care
  • MARKETS
  • Legal
  • Intelligence
  • Brand Research
  • Compliance
  • Data LP
  • Forensics
  • SOLUTIONS
  • Value
  • Technology
  • Support
  • References