The Metadata Battle:  Providers vs. Consumers

Written by Andy Chau

As a result of working with SharePoint throughout the majority of my career, I’ve earned plenty of experience  implementing Enterprise Content Management (ECM) systems.  In most cases, one of the first topics covered during these engagements is usually a challenging one: metadata, or rather, “What should be included in metadata?”

The answer I usually receive from clients when I ask this question is typically, “Everything”.  This is completely understandable as the customer providing the requirements is usually a consumer of the ECM (i.e. records managers and document controllers).  However, that’s only half the equation and this equation must also be balanced with the needs of producers (i.e. content authors).

Searchability vs. Usability

If you have too little data, the effort to find content increases. With too much data on the other hand, the effort to store the content increases. As effort to store content increases, adoptability of the system decreases.  Finding the right balance is just as much an art form as it is a science.

I like to think of content as chocolate bars and the metadata as the plastic wrappers around them.  A chocolate bar conceals the actual contents inside like nuts, wafers and caramel.  Only a chocolatey surface layer is available to consumer’s vision akin to filenames and document previews.

The chocolate makers wrap the bars in a wrapper to describe what is actually inside the bar – this is similar to metadata.  The labels must be simple, concise and to the point as there is limited space available to describe content.  There can only be a single layer of wrapper; you shouldn’t have dig deep to find out the details.  Most importantly the wrapper must be easy to produce,, cost efficient for the producers and eye catching to the customer.

Considering the above, “implied” data becomes very important. For example, if I see a chili on a chocolate bar label, I will assume it is spicy.  If there is a way to provide the data required for consumers without the providers having to manually input it, then that is the way to go.

In line with this, I recommend organizations consider a couple different approaches to common metadata fields, which I’ll outline below:

Implied Security Classification

Security classification is something that often comes up when discussing metadata; public, private or confidential.  Typically, the author is expected to decide on the security level required for their documentation.  In these cases, you usually get a number of users that will always consider their documents as confidential because of some piece of data that is not intended for general consumption.  Other users will always classify their documents as public to make the content easy to access.  This builds inherent corporate inconsistency and unreliability into document security.

Security classification can be “implied” based on a couple of things, such as the document type and/or where they’re input into the ECM.  Operational documents such as manuals and guides are considered public information and should be readily accessible to end users while financial contracts are confidential.

Thus, I strongly recommend that clients consider deriving, or at least, defaulting their security classification. Try including another common metadata field of document type. For example, if a document is classified as an “employee file”, then set it to confidential. Alternatively, derive the security classification by identifying where a file was added to the system. If the document was added to an executive discussion board or payroll library, then set it to confidential.

Implied Ownership

Most organizations with an ECM system want to be able to identify the owner of a particular document. Once again, the responsibility of assigning document ownership usually falls upon the author to select from a list of system users. This data soon becomes stagnant and irrelevant as employees quit, get promoted, or change duties. Most commonly, the owner is either the author themselves or one of the approvers.

In this case, I recommend deriving the document owner via workflow. Initially while a document is in draft state, the owner can either be blank or set to the creator. As a document is published, the workflow can automatically update the owner to either the last editor or the approver.  This way the data stays up to date and relevant without unnecessary manual intervention.

Conclusion

When planning for an ECM system, minimize the metadata fields that providers must complete to add content.  Aim to keep the number of required fields at less than 5.  To give consumers the needed searchability derive as many of the implied metadata as possible.  Just remember, keeping the barriers to entry low will help to yield higher adoption.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *