Seven Ways Concept-Based Auto Categorization Tames Big Data

Part 1 – Mergers, Acquisitions and Divestures – Oh My!

By Steven Toole
VP, Marketing – Content Analyst Company


There’s a lot of buzz about big data, and to help shed some light on practical ways to curb its risks and costs while extracting the benefits it has to offer, I’ll address seven approaches companies can take today using concept-based auto categorization.

Mergers and acquisitions present a unique challenge to the records manager at any company. Images come to mind of building the perfect sandcastle only to have that one rogue wave stretch much farther up the beach than we ever imagined, sparking that “now what?” feeling. Okay, that may be an over dramatization, but suffice to say that the records manager is seldom consulted about the timing or fit of a potential merger or acquisition, so it lands on your lap and now you have to figure out how to integrate the other company’s content into your taxonomy, on everyone else’s timeline but yours.

The deal goes through, and you’re now one company. The company you acquired had its own records management taxonomy, if any at all. In a merger of equals, you may need to integrate an amount of unstructured documents equal to or larger than, your own. You’ll need to categorize potentially billions of emails, millions of documents, thousands of forms, and hundreds of retention schedules.

chartsJust as the records manager needs to integrate records and document taxonomies and retention schedules during a merger or acquisition, breaking up is hard to do, as Neil Sedaka’s #1 hit single in 1962 so adequately conveyed. Like the rogue wave that rocked our world when a merger landed in our lap earlier, that same rogue wave forces you to go find anything and everything related to the business unit being jettisoned, carve it out, box it up and ship it off with the exiting unit; or retain for a period of time post-divestiture according to its own retention schedule. So all the integration and synergy that made the businesses once a logical marriage now turns into an exercise of “what’s mine, what’s yours and what’s ours (as in the case with comingled records).”

Example-based auto categorization has established itself very well as a proven standard in two specific use cases: legal eDiscovery and US intelligence. Both of these demonstrate the fast, easy, and repeatable way to pinpoint only the most important documents and emails among libraries spanning millions of files and messages. Its use in eDiscovery also proves that concept-based auto categorization is defensible, and its use in US intelligence proves that it’s highly scalable, making it a very practical approach in records management amidst mergers, acquisitions and divestitures.

By synthetically transferring human taxonomy knowledge across the entire acquired company’s electronic documents and emails, example-based categorization is faster, easier and far more accurate than lexicon-based taxonomy alternatives in categorizing documents obtained via a merger or acquisition, and in identifying documents that should be extracted, disposed or retained for a divestiture.

Some other ways example-based auto categorization helps the content manager in times of mergers, acquisitions and divestitures include:

  1. Disposal of redundant, outdated and trivial (ROT) documents and emails. You can only hope that your counterpart at the company you acquired was as diligent as you are about identifying and removing the junk. But just to be sure, sample documents of spam, old email newsletters, outdated marketing documents, etc. can be used as examples to find similar documents that can be considered for disposal, potentially reducing the clutter on the acquired company’s servers without having to manually inspect each individual document and email.
  2. Maintaining archiving regulatory compliance. The company you acquired may have different regulatory requirements than yours does. Example-based conceptual categorization can be used to enable greater precision in determining exactly which documents and messages from the acquired company need to be archived – and for how long – according to your company’s retention policy and regulatory requirements you must meet.
  3. Improved cross-functional, divisional, and external content sharing and collaboration. It’s no secret that lack of integration is one of the top reasons mergers fail. There’s likely a lot of good intellectual property you just obtained with the merger. This is part of what your company paid to obtain, but if it’s not properly categorized and integrated, your company won’t get the biggest bang for its buck in terms of collaboration and cross-pollination. The merger went through because of synergies – where one and one makes three. Concept-based auto categorization makes those valuable documents from the acquired company much easier to find, dramatically improving collaboration, integration, sharing and syndication of your valuable content. With internal research assets and intellectual property that can be leveraged elsewhere in the enterprise, or content generated for external consumption, auto categorization dramatically improves the ability of users to consume and properly apply these information assets.
  4. Improved content lifecycle management amidst disparate terms and categories. Chances are, the company you acquired had its own set of terms and expressions regarding its business. Even if the company was in the same core business as yours, or a direct competitor, it may have used its own categories. Concept-based auto categorization keeps document your company’s libraries current with those of the acquired company, and can even apply the right categorization decisions to documents that contain the newer terms, without having to define or update you own company’s dictionaries, thesauri, keywords or metatags.
  5. Security, privacy and risk mitigation. How do you identify potentially risky materials from the company you acquired, now falling under your responsibility? Content that either no longer has value for the surviving organization or is not marked for retention through compliance, could be an unnecessary liability and increase the cost burden to cull through in any future litigations. Sensitive customer data such as medical records, social security numbers, credit card numbers, or worse yet: illicit materials, are a virtual time bomb. Example-based auto categorization can reduce risks by enabling you to identify these materials from the company you acquired, dispose of them in a highly defensible way, and demonstrate that your company’s information governance policies are enforceable and consistent.
  6. Auto categorization in any language. Not all mergers and acquisitions occur with parties speaking the same language. On the contrary, mergers and acquisitions are a very common global expansion strategy used by companies for hundreds of years. It’s often easier to enter a new country by simply buying an established entity than to start from scratch. Breaking down language barriers with language- agnostic document classification means that all of the benefits of taming big data, as well as the mitigation of big data’s negative impact, can be applied in global organizations without requiring native language speakers for every language in which the enterprise generates content.

Whether dealing with a merger, acquisition, or divestiture, example-based auto categorization can be a highly effective way to rapidly organize unstructured content and take appropriate action. If your company is more likely to have a target painted on its back, concept-based auto categorization can help your company realize less burden and greater value to a potential acquirer, and you get all the credit for having your company’s big data house in order (possibly even better than the company that’s acquiring you, making you the natural candidate as records manager for the surviving entity).


Leave a Reply

Your email address will not be published. Required fields are marked *



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>