Managing Volume Through Near-Dups
Mining Value from Email Threading and Analytics
CAAT is concept analytics technology used by a number of e-discovery software providers to search and manage electronic information based on what that information is about.
CAAT derives conceptual meaning from documents and text by using one concept, which can be defined by a few sentences, a few paragraphs, even an entire document, to find all the other similar concepts in a collection of documents.
Using advanced mathematics and hundreds of dimensions, CAAT looks at individual words, how they are used, and how frequently they appear together to identify patterns across tens or hundreds of thousands of documents. It then matches-up the values it derives against each other or against a given concept, and assigns a relevance ranking to documents based on what it finds. It also similarly ranks the individual words in documents, so CAAT's index will contain values that ultimately show how everything is related to everything else. This enables it to provide conceptual search, and, for example, to create clusters of conceptually related documents.
CAAT has a full range of analytics capabilities, but for e-discovery most of them fall into one of two categories: Organization, and Findability or Search.
Organization: Organization can be looked at in two ways: unattended and user-assisted. The unattended method looks at what's in a given collection—on a particular custodian's hard drive, for example—and identifies documents in logical, concept-based groups. The user-assisted method takes groups of examples provided by the user in the form of categories and locates everything that is similar to those examples (setting aside everything that falls below a threshold). Both methods generally create folder-based collections of documents based on conceptual relevance.
Findability or Search: Using conceptual search is as simple as directing CAAT to find all the other documents that are similar to the document in question. This might be a concept search, where a document or a part of a document is entered as a query, or a find-similar search. Find-similar is very powerful for e-discovery: with one or two clicks, a reviewer can automatically retrieve all the documents that are conceptually similar to one being reviewed.

Copyright 2012 Content Analyst Company, LLC All rights reserved.