• home contact us
  • who we are
  • chanllenges we solve
  • markets we serve
  • technologies we provide
  • who we are
  • Home
  • | Main
  • | Value
  • | Technology
  • | Support

Technology Overview

Latent Semantic Indexing (LSI)

Concept Search

Concept-Based Categorization

Dynamic Clustering

Email Thread Analysis

Near-Duplicate Document Identification

Difference Highlighting

Instant Context

Language Analytics

Automatic Summarization

CAAT Software Developers Kit

LANGUAGE ANALYTICS

CAAT provides language analytics on a number of levels. As a fully Unicode® compliant platform, CAAT is language agnostic and can work in any language that can be represented in the Unicode encoding system. This means that CAAT works within languages that are Unicode-compliant: searches can be performed in any language on text written in that same language, a feature becoming increasingly important in a global business world.

CAAT also provides a Language Identification module, wherein CAAT will identify the primary language of a document. Optionally, it can also find the full complement of languages present in a document, and note the segments associated with each. This greatly speeds workflows where documents need to be sorted before further processing, such as Machine Translation.

CAAT also works across languages offering a unique and compelling cross-lingual language analytics capability. The same mechanism that allows CAAT to train itself to identify concepts across an index of documents is employed to train CAAT to correlate those languages themselves. Through use of parallel corpora (sets of identical documents that have been already translated into different languages), CAAT learns how a given concept is expressed across languages. This training is a simple indexing function. Once trained, CAAT can search for concepts across languages: it can use a concept expressed in English to find similar concepts in French, Japanese and/or Arabic for example, with no translations required. Other CAAT functions like Concept-based Categorization and Dynamic Clustering also operate in this cross-lingual mode.

In order for CAAT to understand specific technology and issues in this cross-lingual mode, groups of identical translated documents covering these topics can easily be added to the corpora. CAAT can then be "trained" to understand these specific concepts and topics and users can search in one language yet find relevant documents present in other languages without prior translation.

 



Copyright 2012 Content Analyst Company, LLC All rights reserved.

  • COMPANY
  • About Us
  • Press Releases
  • Careers
  • Contact Us
  • PARTNERS
  • Partners
  • Approach
  • Get Started
  • Content Care
  • MARKETS
  • Legal
  • Intelligence
  • Brand Research
  • Compliance
  • Data LP
  • Forensics
  • SOLUTIONS
  • Value
  • Technology
  • Support
  • References