|
DOCUMENT MANAGEMENT
Document Management or Content Management is no stranger to electronic search – and the issues with trying to find unstructured information in electronic archives that are growing exponentially. Factors like Sarbanes-Oxley and HIPPA are forcing corporations to save virtually everything – with the activity of eliminating duplicates and trivial emails taking a backseat since after all, this is a cost-driven exercise.
Companies focusing on document management have offered numerous “text analytics” and “conceptual search” products, but these have all been based on outmoded technologies, requiring extensive backroom preparation, or use of lengthy and often outdated term lists. Once companies select a conceptual search tool, they often find their support and maintenance issues are larger than the problem they were trying to solve.
Content Analyst approaches this market differently. Our product was designed from the ground up to understand and handle unstructured text; we suffer from no database-centric legacy constraints.
Key features our clients find useful for document management are:
Automatic Categorization
Conceptual Search (Search on Steroids)
Cross-Lingual Search
Automatic Summarization
Automatic Categorization - How it Works
Content Analyst was designed from the ground up as a learning system. Much like a file clerk would put his or her knowledge of their company to work in understanding how to file and manage unstructured information, Content Analyst can be easily trained about the key facts and factors and then put that knowledge to work in sorting or categorizing information.
“Training” Content Analyst is as simple as providing a sample set of documents – exemplars – that contain either concepts or phrases against which you want to correlate saved information – as well as sorting new information as it enters the document management system. Content Analyst also needs to know into which categories you want the information sorted.
Armed with this basic information, Content Analyst is then ready to receive documents. It will read each one, compare the concepts and word contexts against the exemplar set, and categorize accordingly. Our categorization on average achieves 90% accuracy versus controlled human categorization, and easily passes the rigorous challenges of the legal community (whose use of categorization is very similar function-wise to the document management world).
Automatic Categorization - How it delivers Value
Automatic categorization cuts the time to sort through an initial collection of information by 70%. As an in-place sorting tool, it can ensure that new documents are accurately routed and assigned to the appropriate response groups or organizations. As a “front-end” built into document management capture solutions (i.e., scanning and input) it eliminates the difficult and usually manual task of assigning the document to the appropriate group. Errors are eliminated, and an entire process is reduced to a single step (that can be entirely automated, as some of our partners have done).
“Search on Steroids” - How it Works
Content Analyst’s powerful search software goes beyond the simple keyword and Boolean logic searches that power most search solutions. Our software is already trained to search out and identify concepts and relationships as it performs its indexing function. The more documents that Content Analyst reads, the more it actually learns - obscure relationships and concepts become “fleshed out” as the software reads more and more relevant documents.
Content Analyst turns these concepts and relationships into mathematical expressions – after all, language is mathematic in nature. By using the power of mathematics, Content Analyst can quickly search for related concepts, ideas, and context based on a sentence, a phrase, or even a whole paragraph.
“Search on Steroids” – How it delivers Value
Simple keyword search is fine for the lay person – companies like Google have become industry giants, and technical people often turn to free-market search engines for data.
Companies, however, have vast stores of information that are not in the public domain – and to search them effectively, they need to mirror a Google-like enterprise search capability behind their own firewalls. Generally speaking, the times that companies need to search archives of information are times of pressure: legal challenges, M&A activities, product development, etc.
The employees doing the searching often want very specific information – one or two keywords may not suffice. They are also often looking for correlating information – meaning the concept or notion they are researching may span different documents, departments, even divisions. Since Content Analyst understands concepts instead of mere keywords, these associations are readily made – and delivered to clients at the click of a mouse button. Polysemy is never solved out of context – these are words like “bank” that may mean the side of a river, a financial institution, or a particular pool shot. Finally, there are unintentional errors: spelling mistakes and more often, scanning errors.
With Content Analyst, your initial search addresses all these problems. Content Analyst can reduce search times by 65+%, and find appropriate documents that all other solutions would miss.
Cross-Lingual Search - How it Works
Thanks in part to its roots in the Intelligence community, the Content Analyst engine has already been trained in most major world languages (including Middle East and Asian languages). The way this works has no parallel in the search world: Content Analyst actually knows how the same concept would be expressed in different languages. Again, this is because it reads and understands context, relationships, and similar context, allowing Content Analyst to easily determine whether a foreign language document is worth translating or not.
Content Analyst’s basic cross-lingual skills are easily enhanced to address specific markets or professions, by simply “training” the engine with duplicate sets of documents in various languages. Content Analyst will correlate the relationships and context both within the language sets and across the language sets, allowing Content Analyst to think multi-lingually.
The corporate world is a global one: few companies deal only within the US and only then with US-based concerns. The minute foreign parties are involved, language becomes an issue, and nowhere is this more apparent than when trying to search for information across those boundaries.
Cross-Lingual Search – How it delivers Value
The value of cross-lingual search is obvious: you can either translate everything first, pray that information from overseas entities has been expressed somewhere in an English document, or translate an “educated guess” cross-section – none of these solutions is really satisfactory.
What’s not as obvious – but every bit as valuable – is the time that cross-lingual search saves. Even with on-site translators and machine-assisted translation, it is difficult to translate more than 3-4 pages per person per hour. Therefore, a large translation effort, prior to any real relevant searching, can take weeks or more. If international collaboration is the norm, this could mean staffs of permanent translators. With Content Analyst, the challenge is eliminated – the search happens first – only after appropriate information is found do you need to worry about translating it. And if the information exists in multiple languages, Content Analyst will find that also, and point it out to you.
Automatic Summarization - How it Works
Another powerful feature built into Content Analyst is its ability to extract the most relevant subject matter or topics from a document and present only those paragraphs or sentences that most closely match the subject matter in the document.
Through easily-tuned settings, multiple-section documents can be summarized by section, and summaries can be made briefer or more detailed with a few mouse-clicks.
Summarization is very different from an abstract. With an abstract, the author is telling you what he or she wants you to believe they are discussing. With intelligent, contextual-based summarization, however, the result is what the document is actually saying from sentences in the original document. Often, this kind of summary is different – sometimes, significantly so.
Automatic Summarization – How it delivers Value
By being able to create summaries of documents that the searcher has identified as interesting – automatically, “on-the-fly,” the arduous task of manually reading through those documents only to find they weren’t really what you were looking for anyway is eliminated – this saves days worth of time and effort, weeks in some cases. Time is always money for companies, and Content Analyst not only saves time, it also saves the personnel costs that can be incurred when such summaries need to be generated by a deadline.
|