Getting a handle on data |
||||
By Julian Perkin |
||||
It is hard to imagine life without the worldwide web, which unleashed the power of the internet on an unsuspecting world. The web is far from perfect, but as individual users we tend to take the rough with the smooth, accepting the broken links and out-of-date information as irritants in exchange for the treasure trove of information, entertainment and cheap goods and services such as low-cost flights. But things are more serious for governments, which risk failing in their duty to provide citizens with proper information and services, and for businesses which can lose customers and money simply because data have a habit of changing form and moving to new locations on the web. Fortunately, new solutions are emerging. Most academic journals published online are now able to cross-reference each other reliably - if you click on a citation it is pretty well guaranteed to lead to a real published paper, even if it was published some years ago and is now hosted on a new owner's website. No broken links. No early draft versions. These cross-references rely on Digital Object Identifiers or DOIs. Proponents argue that DOIs, and a related technical innovation called the Handle System, will provide an equivalent quantum leap to that of the worldwide web, with important implications for governments as well as publishers and other media companies. To be able to open up their systems and share information, governments and departments need to agree on standards enabling information to be correctly and uniquely identified on different systems and for these systems to inter-operate reliably - providing the necessary information flow while respecting the policies on classified material and other statutory exemptions. Demand, already high, is set to ramp up. More than 500,000 requests have been made by citizens, journalists and companies to the US government under its Freedom of Information Act since its enactment in 1966. Two forceful waves of change are inducing government departments to change the way they manage their data. The first is the tide of events that demand a response through greater co-operation - from counter-terrorism imperatives following September 11, including the homeland security initiative in the US, and fall-out from the wars in Afghanistan and Iraq, through to failures of social security and police departments adequately to share information to protect the public. This was made manifest in the UK by the Bichard report published in June this year which showed how failures to pass on information from one police force to another meant standard checks were not made on Ian Huntley, the convicted Soham child murderer, who would otherwise have been barred from a job in a school. The second wave is the trend in policy-making towards open government and greater transparency. More than 50 countries from Mexico to Indonesia and including the US and Canada, the European Union and central and eastern Europe have passed, or are in the process of passing, freedom of information (FoI) laws. The pace of change is increasing - the number of countries with FoI legislation has more than doubled in the past decade. In the UK, for example, the Freedom of Information Act (2000) goes fully into effect from January 2005. Despite accusations that the bill has been watered down, the demands on government departments and their systems to meet new public rights to access government information will still be considerable. This is where the Handle System and DOIs, two standards that are rising to prominence as digital identifiers, can play a key role. The Handle System is a comprehensive system for assigning, managing and resolving persistent identifiers, known as "handles," for digital information on the internet. It provides a global, standard method for uniquely and permanently identifying digital content on the internet. Content means anything from newspaper articles, official reports, photographs and illustrations, and tables of statistics through to music tracks and video libraries. DOIs are a standard method, based on the Handle system, for identifying published digital content. They are primarily concerned with publication and are endorsed by the publishing industry as an international standard. DOI lends itself to providing reliable linking and discovery of documents on the web. The critical difference between DOIs and traditional links on the worldwide web is that DOIs identify the actual content, while the web references its location. Identifying something by its location, as we all know from experience of the web, has its drawbacks. Things frequently move and links then fail. You can never be quite sure whether you have the latest version, or the definitive copy. And, since duplicates are easily and frequently made, searches give you multiple instances - or, worse, different versions - of the same material. Handle-based identifiers, including DOIs, are unique on a global basis, and persistent - that is, they will stand the course of time, unlike many web addresses. As a result, they are guaranteed to resolve to a real document and can be used, through a system of access via trusted intermediaries, to ensure that everyone gets the same, definitive version of a document such as a government report. Identical copies of documents will have the same DOI, so the version found can be trusted, while different releases will have different DOI references that can be correlated through similar systems of access. Of course, there may be value in providing access to different versions of reports, or to supporting information and sources related to documents. DOIs can be used to link together such correlated sets of documents. For images, audio and video clips, DOIs can be sewn into the fabric of the content, so references are carried with the object, even if it is cut and pasted into another document or application. This opens up interesting possibilities - for publishers and commercial organisations including those in the music, graphics arts and video production industries, as well as for governments - concerning copyright control. Embedded DOIs can be used either to police access to, or to track copies made of, copyright-protected material. Add to this the potential to identify elements of content rather than whole documents - a chart in an official report, say, or a chapter of a book, or a track on a music CD - and there is clearly a range of new commercial opportunities for publishers and other information providers, not to mention some threats to existing business models. This ability to identify and access separately component parts of documents - known in this field as "disaggregation" - can serve the needs of public information provision and disclosure to link together partial sources of information from different reports "federated" across the systems of multiple government departments: for example, to respond fully to requests under the Freedom of Information Act. Sensitive information within reports can also be classified so that such reports can essentially be made public without compromising national security, endangering the innocent, or giving away state secrets - with blanked-out names, locations and so on. A key strength is that DOIs will facilitate a degree of convergence between printed reports and on-line data. The DOI code will also be printed below the tables and charts in hard copy reports and books. This code can be typed into a web browser to access the same services - latest figures, more exhaustive statistics, access to source data etc. DOIs will appear just like links that we have become used to. But they will be more reliable, greatly enhancing the online experience by blending in seamlessly behind the scenes. They will be like a turbo-charger under the bonnet of the web engine, rather than an alternative or competitor to the web. In a second article, to be published in the next FT-IT on December 1, we look at some of the users of the new system. Questions of identity ■
Who is
behind DOIs and Handles? ■
Who is
using the system? ■
What
happens when content changes location or ownership? ■
How
can we be sure the system will work, and what safeguards are there? ■
What
about existing identification schemes? ■
What
about copyright and access rights? |
||||
|