CATEGORIZATION

CAAT eliminates the time-consuming burden of content classification or categorization, and enables knowledge workers to focus on their primary objective: understanding relevant information and turning that information into action. Unlike traditional search engines, Content Analyst technology can automatically populate categories with documents based on concepts found within those documents.

categorizationConcept-based Categorization is a remarkably simple process that mimics how people think and classify documents. In this two-step process, the first part is to identify categories into which users want documents sorted or classified. Because CAAT is entirely mathematical, it doesn’t matter what these categories are, and category naming is entirely in the user’s control. There is no limit to the number of categories into which documents are to be classified. That classification hierarchy can even be an existing taxonomy – the process is totally flexible. Better still, the user doesn’t need to know – or worry – what specific concepts are contained within any category; CAAT handles that behind the scenes, automatically.

The second part of the process is to identify example documents for each category. An example document is often just a part of a document – only that part which is representative of what the user would expect to find in that category. The conceptual analytics power of CAAT means that categories only need a handful of example documents in each category to perform properly (a dozen or so is generally sufficient, depending on the application). There is even a categorization “self-test” provided with CAAT that will compare all the example documents in a given category and point out any documents that aren’t a good “fit” within that category.

After these two steps have been completed – and they can occur in parallel, or even as part of a larger workflow since CAAT is designed for deep-level integration – concept-based categorization is a totally automatic function. User settings can control how generalized or specific the “fit” into a category should be, and even whether or not a document should fall into multiple categories if it contains information that is relevant to more than one category. Users can also set thresholds which will determine whether CAAT categorizes a document into a pre-identified category or drops it into an “uncategorized” folder for later review. As with all CAAT functions, a numeric score is provided to tell users just how well each document fit into the resultant categories.

CLICK HERE for page describing Categorization

Categorization Performance

CAAT has been optimized to provide maximum performance. Running on a typical 4-core server, CAAT can accurately classify 1MM documents per hour. Subjected to an industry-standard test for automatic document categorization – the Reuters 21578 Test Set – CAAT achieved the highest rating ever reported for categorizing large amounts of material.

 

 

spotlight