Industry is built upon information. Corporate repositories grow by the second with reports, proposals, memos, comments, forms, contacts, contracts, and correspondence in various forms. These documents form the collective memory of the company.

Ideally, information is governed by rules, often built into software that guides and enforces document organization and retention policies. Do you really need draft cover letters from a project that ended a year ago? Do you even need the project documents, or can they be archived? If your information is organized, pruned and preserved according to rules, you can retrieve a three-year old draft agreement if required, and reliably report that no electronic fax cover sheets exist anymore for that time period. All is well, until a complaint is filed against your company alleging material breach of that agreement and requesting damages in the millions.

Such an event occurs frequently enough that electronic discovery qualifies as a core business process. The 2015 Norton Rose Fulbright Litigation Trends Annual Survey reports among its findings that 55 percent of US-based respondents faced five or more lawsuits in the previous 12 months; 51 percent of companies with revenues at or above $1 billion had to address one or more regulatory proceedings. Given the ever increasing amount of custodial data subject to discovery, it is clear that appropriately-scaled protocols and tools must be in place before documents are demanded.

The demand itself is an annoyance, but addressable given the wherewithal to issue and enforce a legal hold, preserve and collect any relevant data for outside counsel to review. But what do you know about your data overall?

In the process of litigation discovery, electronic work product provides touchstones in the telling of a story. Not just one story: each side, and each party, will have its own interpretation of what appear to be simple documents. The three principal characters in Akiro Kurosawa’s Rashomon, all witnesses or participants or victims in the same crime, tell three completely different stories using the same elements. In much the same way, the same documents, tendered unconsidered to adverse counsel, may serve very different narratives.

“Given the ever increasing amount of custodial data subject to discovery, it is clear that appropriately-scaled protocols and tools must be in place before documents are demanded”

Those of us charged with managing a company’s knowledge must address content in order to understand our company’s data story, and prepare for other interpretations.

The situation calls for a tool that can analyze the profusion of text, regardless of format, for risk-inflected terms, and for what Donald Rumsfeld termed “unknown unknowns”– things you are not even aware that you do not know.

This class of software sits on a device inside the firewall and “crawls” the network in a manner defined by the administrator. The system looks at volumes, shares, and personal systems; it collects file system metadata in its primary pass for general reporting purposes, and may index as well for deeper analysis. The first level can report file types, counts and size for a range of dates to assist in budgeting. At the next level, you get a window into content.

Such tools do more than profile and retrieve documents; they afford a means to:

• Filter by date or range.

• Filter by user or group.

• Filter by file types.

• Filter by file location.

• Deduplicate by file content hash calculation.

• Identify non-searchable files for OCRing or direct examination.

• Use or generate regular expressions to find personally sensitive information (PSI), such as Social Security, Tax ID or credit card numbers, that may have seeped into data stores.

• Display email threads, or identify the conversation’s most complete message.

• Document all collection procedures.

• Search using keywords across live file and archive containers. (My preference and policy is to search after collection, given the very real possibility of missing synonyms, idiomatic phrases, typos, root expansions and the like, to prevent having to go back to original source locations to re-collect.)

Many current “early case assessment” applications generate categories of information based upon the subject matter or tone of a set of documents or messages. Such programs may employ tools such as latent semantic analysis or statistical weighting to rank groups of words with similar meaning, or documents having the same clusters of words regardless of original language.

Near-duplicate documents may be identified in much the same fashion. An examination of several drafts of the agreement may reveal a Track Changes comment in MS Word on Draft 9(a)(2) containing possibly incendiary comments by one reviewer, while Draft 10(c) may contain exculpatory language that was removed at the behest of the other side.

These extra layers of information about your data make the company better prepared. Even where litigation is not imminent, analysis of email data – at least in the United States – can help to preemptively identify behavior that may incur risk before a problem can occur: a potential HR matter, for example. If a matter is imminent, this type of analysis can help the general counsel’s office to quickly decide whether the cost of possible litigation is worth the risk, based upon real information. And if proceeding, the data are already collected for preservation, and culled for review and production, rendering this phase an essentially fixed cost.

One does not proceed into a new software license blithely. It requires a commitment, not only of money and time, but of executive culture and company process. People must be trained, not only in the software, but its purpose; this involves the vendor, IT management, and the general counsel. Top management will need to be shown the strategic value of preparedness and its savings over time in risk containment and litigation expenditure. Operational and administrative departments should be briefed to make clear that the result of early case assessment protocols will be less disruption and a smaller possibility of sensitive information being inadvertently disclosed.

In the wild path of Rashomon, any story might or might not be true. Adding early case assessment to information governance illuminates the dark corners of your data forest, helping you to assert an objectively verifiable claim.