GOVERNMENT AND INTELLIGENCE

government and intelligenceContent Analyst got its genesis in the Government and Intelligence community: our software has been quietly at work in agencies and departments across the Federal government for over a decade. It is the very nature of the work our system is employed to perform that, sadly, prevents us from being able to share the more exciting projects and solutions that Content Analyst powers.

When we step back and talk about the Government and Intelligence community, we can talk about the challenges those Federal entities face, and how they look to Content Analyst to address those issues.

Our Government and Intelligence customers look to Content Analyst for:

Automatic Categorization
Conceptual Search (Search on Steroids)
Cross-Lingual Search
Automatic Summarization
Instant Context

Automatic Categorization - How it Works

Content Analyst was designed from the ground-up as a learning system. Much like any department employee quickly becomes an expert in his or her area of responsibility. Content Analyst can be easily trained on the same subject matter expertise.

“Training” Content Analyst is as simple as providing a sample set of documents – exemplars – that contain either concepts or phrases that you want to correlate to new documents, or general concepts and precepts about that Government department overall. Content Analyst also needs to know into which categories you want information sorted.

Armed with this basic information, Content Analyst is then ready to receive documents, mail, correspondence, RSS feeds, etc. It will read each one, compare the concepts and word contexts against the exemplar set, and assign a category. Our categorization on average achieves 90% accuracy versus controlled human categorization – without the human error that is inherent in a manual sorting or coding process.

Automatic Categorization - How it delivers Value

Automatic categorization cuts the time to sort through an initial collection of documents by 70%. It reduces the time to sort and categorize incoming documents to zero – categorization is now part of the automated process, not a manual check-point. We have found that the same 10% that machine categorization can’t identify are those very documents that the human coders wrestle with, given they don’t easily fit into any category.

“Search on Steroids” - How it Works

Content Analyst’s powerful search software goes beyond the simple keyword and Boolean logic searches that power most search solutions. Instead, our software is already trained to search out and identify concepts and relationships as it performs its indexing function. The more documents that Content Analyst reads, the more it actually learns - obscure relationships and concepts become “fleshed out” as the software reads more and more relevant documents.

Content Analyst turns these concepts and relationships into mathematical expressions – after all, language is mathematic in nature. By using the power of mathematics, Content Analyst can quickly search for related concepts, ideas, and context based on a sentence, a phrase, or even a whole paragraph.

“Search on Steroids” – How it delivers Value

If any organization has to deal with and search through mountains of paper, it is the Federal government. Traditional search programs simply miss too much information – instead of one search, researchers run two, three, four or more keyword sets to find a single item. Other common search problems – that can’t be easily addressed with other techniques are synonomy and polysemy. Everyone is familiar with synonyms, and some systems rely on large word lists for cross-references; often, a word is missed.

Polysemy is never solved out of context – these are words like “bank” that may mean the side of a river, a financial institution, or a particular pool shot. Finally, there are unintentional errors: spelling mistakes and more often, scanning errors.

With Content Analyst, your initial search addresses all these problems. Content Analyst can reduce search times by 65+%, and find appropriate documents that other solutions would have missed. It is highly error-resistant, which means that scanned, OCR, and machine-recognized documents that contain extraneous characters – as well as unintentional typographical errors and misspellings, have virtually no effect on the search results. We find the right documents anyway.

Cross-Lingual Search - How it Works

Thanks in part to its roots in the Intelligence community, the Content Analyst engine has already been trained in most major world languages (including Middle East and Asian languages). The way this works has no parallel in the search world: Content Analyst actually knows how the same concept would be expressed in different languages. Again, this is because it reads and understands context, relationships, and similar context, allowing Content Analyst to easily determine whether a foreign language document is worth translating or not.

Content Analyst’s basic cross-lingual skills are easily enhanced to address specific markets or professions, by simply “training” the engine with duplicate sets of documents in various languages. Content Analyst will correlate the relationships and context both within the language sets and across the language sets, allowing Content Analyst to think multi-lingually.

Cross-Lingual Search – How it delivers Value

The value of cross-lingual search is obvious: it will cut translation costs from the choice of translating everything, or translating an “educated guess” cross-section, to only translating those documents that are actually relevant.

What’s not as obvious – but every bit as valuable – is the time that cross-lingual search saves. Even with on-site translators and machine-assisted translation, it is difficult to translate more than 3-4 pages per person per hour. Therefore, a large translation effort, prior to any real relevant searching, can add weeks or more to the overall search effort. With Content Analyst, that challenge is eliminated – only the actual relevant documents need be translated.

Automatic Summarization - How it Works

Another powerful feature built into Content Analyst is its ability to extract the most relevant subject matter or topics from a document and present only those paragraphs or sentences that most closely match the subject matter in the document.

Through easily-tuned settings, multiple-section documents can be summarized by section, and summaries can be made briefer or more detailed with a few mouse-clicks.

Summarization is very different than an abstract. With an abstract, the author is telling you what he or she wants you to believe they are discussing. With intelligent, contextual-based summarization, however, the result is what the document is actually saying. Often, they are different – sometimes, significantly so.

Automatic Summarization – How it delivers Value

By being able to create summaries of relevant documents – automatically, “on-the-fly” – the arduous task of manually reading through the documents and writing out a human-generated abstract is eliminated – days, or weeks in some cases. Accurate summaries of relevant documents speed the review of complex or large documents – as well as helping find overlap, inconsistencies, patterns, and other important elements contained within those documents.

“Instant Context” - How it Works

Because Content Analyst creates an extensive, cross-referenced index that is based on subject matter, context, and concepts when it prepares documents for searching, a byproduct of that index is the ability to instantly have Content Analyst correlate a word or phrase to the words or phrases that most closely match it in meaning. It’s simply displays a pop-up window, and the corresponding terms are arranged in order of relevance, from closest to least similar.

“Instant Context” – How it delivers Value

With Instant Context, it’s as if you’re having Content Analyst determine the meaning of a word, phrase, or term by seeing it used in context as well as seeing any words that are even slightly synonymous.

For the researcher, this virtually eliminates the time-consuming effort of having to look up unfamiliar words or phrases – and this can take days off of a typical research effort.

In addition, Instant Context completely eliminates the frustrating task of trying to determine the intended word when there is a typographical, spelling, or scanning error – again, Content Analyst already knows that the misrepresented word is actually a term with which you’re already familiar.

 

spotlight