• home contact us
  • who we are
  • chanllenges we solve
  • markets we serve
  • technologies we provide
  • who we are
  • Home
  • | Main
  • | Value
  • | Technology
  • | Support

Technology Overview

Latent Semantic Indexing (LSI)

Concept Search

Concept-Based Categorization

Dynamic Clustering

Email Thread Analysis

Near-Duplicate Document Identification

Difference Highlighting

Instant Context

Language Analytics

Automatic Summarization

CAAT Software Developers Kit

OUR TECHNOLOGY

Content Analyst's CAAT transforms large volumes of unstructured data into organized, relevant information, and exposes insights hidden in the data. The CAAT platform is a dynamic suite of technologies known as Text Analytics. It provides organization tools for classification and email analysis; concept search; and other text analytics capabilities that automate most of the human activity traditionally associated with using unstructured data.

Typically, Text Analytics offerings are collections of disparate software components from different vendors. For companies developing solutions using Text Analytics, these platforms are inefficient and difficult to incorporate. They aren't optimized to be used together nor leverage a common index structure, and require dealing with different vendors.

In contrast, CAAT was designed from the ground up for integration and efficiency. It performs a variety of functions on the same basic index structure, and makes it easy to continue adding new capabilities to your offering as and when you see the need.

The Multiple Capabilities of CAAT

clustering_icon

Dynamic Clustering – CAAT takes an entire collection of information and automatically sorts it into folders and sub-folders by conceptual topics, even creating titles for each folder.

Benefit – quickly organizes information in a logical fashion based on what it's about, not the words in it. Researchers and reviewers can narrow-in to only the information which is relevant to them, and extraneous information can be discarded before it consumes valuable time, space, and resources.

categorization_icon

Concept-based Categorization – groups documents based on content, whether or not the same words are used to describe the same topics or concepts.

Benefit – quickly locates information that is relevance-related, without being flooded with "keyword-responsive" documents that aren't on-topic. This speeds the cost of legal review, streamlines enterprise content management, and sorts through social media content.

search_icon

Conceptual Search – CAAT searches the way people think: by topics or concepts, versus keywords. CAAT can use an entire sentence, phrase, or even a document to find other information which is conceptually similar

Benefit – 2/3's of keyword searches fail because they are overly inclusive or don't find the right information, but concept searches will always find the most relevant information to the query. Because queries are natural language, even cut and pasted from actual documents, searching is faster – accuracy increases several fold.

summarization_icon

Summarization – CAAT uses its own notions of concepts to evaluate an entire document, sentence-by-sentence, and find the most relevant sentences to the overall document, presenting them in a summary form

Benefit – identifies what a document is about from its content, versus titles and author-provided summaries which are often misleading. Researchers and reviewers can quickly determine if a document is relevant to their queries, and if so, which parts are most relevant.

neardup_icon

Near Duplicate Detection – CAAT's analytics includes statistical capabilities to derive a number of duplicate and near-duplicate conditions for documents and text. These include exact duplicates, duplicates that vary only in composition (the traditional "near-duplicate) and, most significantly, documents that are conceptually near-duplicates.

Benefit – identifying information that is nearly duplicate can be more significant than finding exact matches; this information can clog-up information sources, distort search results, and waste valuable reviewers' time. By grouping documents that are very closely matched – even if they differ only slightly – users can identify all information that is closely related earlier in their workflows.

language_icon

Language Analytics – CAAT is language-agnostic – it can perform analytics on most all languages that can be represented in Unicode. CAAT can determine the actual languages in documents, and can operate in a cross-lingual manner, allowing users to query or organize information in one language and locate relevant information in different languages without translation.

Benefit – information comes in many languages – not just English – and users can't rely on inexact machine translations or expensive (and lengthy) human translations only to decide the information wasn't worthwhile anyway. CAAT's language analytics gets users a view of information that's relevant – regardless of language – so they can make informed decisions on multi-lingual information.

email_icon

Email Analytics – CAAT's analytics includes a number of email features: these include thread identification, metadata tracking, segment analysis, and tracking statistics. CAAT can even identify where emails should be that are missing from a string or collection.

Benefit – email is the language of commerce – and CAAT can identify not only who is communicating but what they are communicating about, and if there are other similar email strings within communications. This allows reviewers to quickly narrow-in on only the most relevant conversations among only the most appropriate recipients, and by grouping similar strings and topics, can find relevant information in a fraction of the time they might spend going through chronological email trails.

Content Analyst Company is the original patent-holder for Latent Semantic Indexing, or LSI, and today holds numerous patents around this technology and its applications. The CAAT platform, which is based on patented LSI technology, provides a number of advanced text analytics features in a highly scalable, proven platform designed to cope with massive amounts of unstructured data.

CAAT is delivered to partners as a robust set of APIs along with a sample User Interface to speed integration.  Our Software Developers Toolkit (SDK) includes extensive documentation, and our ContentCare support and delivery programs are designed to help our partners quickly become proficient with CAAT and its broad text analytics capabilities.

 

  • cac_logo_img


Copyright 2012 Content Analyst Company, LLC All rights reserved.

  • COMPANY
  • About Us
  • Press Releases
  • Careers
  • Contact Us
  • PARTNERS
  • Partners
  • Approach
  • Get Started
  • Content Care
  • MARKETS
  • Legal
  • Intelligence
  • Brand Research
  • Compliance
  • Data LP
  • Forensics
  • SOLUTIONS
  • Value
  • Technology
  • Support
  • References