Metadata for Legal Information


Credit: Stuart CaieNote: I wrote this post to help explain the concept of metadata and how it can be used to improve free primary law sources. This post focuses on statutes, next week I will discuss applying these principles to opinions.

The simplest way to explain metadata is “data about the data.” Metadata can describe, among other things, the purpose, date of creation, location, means of creation, and standards used in the data. For example, when you take a picture with a digital camera, the image also contains information about the camera used to take the picture, the time and date, image resolution, etc. When you upload that picture to, say, Picasa, you’ll see this information. Picasa knows what type of camera you used because that data is embedded in the image itself.

The Dewey Decimal System is an example of metadata. Along the spine of a book is a number. That number is associated with a classification system – placing it within a catalog of records that contains “data about the data” (subject, author, title, number of pages, publisher, ISBN number, etc.) which in turn helps patrons find what they’re looking for.

As primary legal information moves to an almost purely digital publishing system, new opportunities to help people better find what they are looking for exist within the cataloging and indexing of legal materials. In order to seize these opportunities, the legal and tech community must look closely at applying metadata to legal information.

West’s KeyNote system is an example of metadata applied to primary legal information. The KeyNotes are essentially a reference system that allows the publisher to organize material by subject matter, and helps researchers find what they are looking for. This system was developed when print research was the standard.

Another example of metadata in print format is the Parallel Table of Authorities (PTOA) for the U.S. Code. That table links authorizing U.S. Code sections to their corresponding rules in the Code of Federal Regulations.

Now that laws are increasingly born digital, we have an opportunity to apply metadata at the time of creation, and that data will travel with the primary information as it goes to publishers, out for bulk download, or onto a state’s website. Much of the added value from a system like Westlaw or LexisNexis is in the classification system. If the building blocks of this system were built in to digital materials when they are released to the public, it would be more usable.

In their recent paper, “Examples of Specialized Legal Metadata Adapted to the Digital Environment, from the U.S. Code of Federal Regulations,” Tom Bruce and Robert Richards examine the PTOA and map out what is required to make it “optimally usable by digital systems.” In other words, what metadata can be added to the PTOA and the US Code to produce a more usable and robust reference tool.

Following are their recommendations, with my interpretations in brackets:

“Respecting metadata, the PTOA should be marked up in XML, and the semantic value of PTOA citations, the relationships between them, and other associated metadata, all should be encoded in RDF/OWL. In particular, topical metadata from a controlled legal vocabulary should be added to each row. [Think of these as keynotes or subject matter tags.]

Respecting semantics, each occurrence of each type of relationship should be described in a separate entry, each specifying a relationship type, as illustrated in the sample XML below.

Respecting granularity, relationships between legal authorities and their corresponding regulations should be described at the level of specificity—often the section or subsection level—that accurately reflects the meaning of the relationship. [The data should be linked at a much more specific level. Right now the PTOA only cites to Part level. Deeper granularity would expose more relationships.]

Respecting directionality, the PTOA should be designed so that queries may be made from rules to authorities as well as from authorities to rules. [This means that the datasets would reference each other, not just Authorizing Rule to Regulation.]”

Adding metadata to primary legal information requires a hybrid of legal, technical, and library skills. It’s an exciting area of legal technology that will help to produce better information for lawyers and the public.

On that note, the Administrative Conference of the U.S. is now seeking applications “for a study of the use of the Internet to provide access to administrative legal materials.” This looks like an amazing opportunity to help improve the quality of legal information and expand access to it in the U.S.