The collection, digital imaging and digital conservation of local historical documents

Michael Slater
 North Craven 
 Heritage Trust 

The accompanying note in this Journal about the tiny Raspberry Pi computer being used to host the NCHT website has led to further thinking about the archiving of historical information in general. As technology changes, questions of obsolescence arise about computer equipment, software and data storage devices. In addition it has been a challenge in recent years to locate, copy, translate and transcribe as many of the earliest documents relating to North Craven as can be found and to make them accessible to read on-line. The style of handwriting and spelling has changed over the centuries and this poses the main difficulty. The use of non-classical Latin in early documents usually needs expert translators. This work also raises the issue of conservation of old documents. The work occasions excitement, disappointment, delight, and frustration but eventual satisfaction.

The NCHT has its own website, backed-up securely. One of its purposes is to make it easy to access the Journal, published since 1992, and NCHT archive material concerning its development over the past 50 years. The websites for the Dales Community Archives and the Ingleborough Archaeology Group are also making historical records more easily available for study. While it would be very acceptable to have an archive centre in our locality the funding implications are problematical and technology may provide the answer with a ‘virtual’ archive hub.

What has been done so far?

Many volunteers in Local History Groups have been involved in bringing together copies, translations and transcriptions of documents of local historical value to limit the inconvenience and cost of accessing such material in the national and regional Record Offices. Visits to The National Archives in London, the Borthwick Institute for Archives at York University, the North Yorkshire County Record Office in Northallerton, the West Yorkshire Archive Service in Morley, Bradford and Wakefield, the Lancashire Record Office in Preston, Leeds University Brotherton Library and even the Northumberland County Record Office in Woodhorn near Newcastle upon Tyne have had to be made to inspect and copy original or filmed copies of documents.

Wills and inventories, property deeds, manorial court records, tax and other records of local interest have been the subject of local projects. They all throw light on our local history and behaviour in times past. The history of a nation is important to understand and hopefully to be learned from. Wills and inventories have been collected and analysed for the ancient parishes of Clapham, Giggleswick, Horton in Ribblesdale and Ingleton: a few of these wills date from the early 1400s and some collections extend to 1750. Original wills and copies are held in the Lancashire Record Office and in the Borthwick Institute for Archives. Many property transactions since 1704 held in the Wakefield Registry of Deeds have been inspected and extracts made; there are also deeds in private hands or collections which have been transcribed. Early manorial documents are held in the Yorkshire Archaeological and Historical Society Special Collection at the Brotherton Library in Leeds University and in Chatsworth House in Derbyshire (because of the Duke of Devonshire’s connection with Settle). There is a continuing project to translate and transcribe the proceedings of the manorial courts in Austwick, Giggleswick, Horton, Ingleton, Lawkland and Settle, to the early 1700s, the earliest being for 1420. The Elizabethan manorial records are of value and interest with their lists of tenants (of particular interest for genealogists) and sometimes colourful descriptions of their misdemeanours. The national tax records for our local villages written on huge rolls of vellum held by The National Archives spanning hundreds of years from the 1300s are remarkable and much remains to be done to read and copy them.

It is remarkable that so many documents up to 700 years old have survived, thanks to chance and the work of specialist conservators. Those made of vellum - fine-grained lambskin, kidskin, or calfskin first used in the 14th century - are very tough and could last several hundred years more if kept in a controlled environment. However, the ink can fade or separate from the vellum unless handled carefully. Those written on paper in the 16th and 17th centuries are gradually decaying as discovered when scanning early wills of Clapham residents. The less we do to consult and handle them, the better.

Technical equipment

The advent of digital cameras has revolutionized matters by allowing images of records to be studied easily at home. Consideration has to be given to the question of longevity of digital images made as part of the conservation process. It is common practice now at Record Offices to allow digital photography for visitors to take away images to study privately (copyright remains with the Record Office). But making information more freely available on-line, for example, is a worthwhile pursuit. Translation (from Latin) and transcription may allow copyright to be vested in the person involved if sufficient intellectual effort is required. Such work helps conservation since handling of delicate documents can then be almost eliminated. In earlier years Record Offices made available photographs in the form of microfiches (rectangular plastic ‘cards’) needing special readers and magnifiers, or microfilms with images on a long roll, again needing special reading consoles. The reading of these is difficult and time-consuming and there is evidence that they are suffering wear and tear and will not last much longer.

There are insufficient funds for Record Offices to carry out the monumental task of digitizing records. The National Archives for example hold over 11 million records. Machines are now being used in various places to scan documents and transfer digital images directly by email by the staff or visitor. Images from microfilms can now also be digitized and transferred to a USB flash drive. All these developments cost money but are most welcome. The Record Offices however, regrettably, are not keeping any copies of such digital images or have any system of benefitting from visitors by saving any images or translations or transcriptions on to a Record Office computer.

The storage of digital information

Ideally, transcriptions of text documents and images of documents could be stored for ever in digital format, making physical storage of original material under suitable conditions of temperature and low humidity easier since access can then be severely limited. Much expert consideration is being given to this problem - to electronic format of text, images and hardware requirements. Relying on the continuing existence of commercial equipment, software and formats is risky - non-commercial corporate bodies and non-governmental institutions with long life-expectancy such as independent national libraries and museums and universities might offer the best prospects for preservation of digital information. Ideally digitized text and images need to be kept securely in more than one place free from alteration and degradation with embedded information about the original document to ensure that we always know what the digitized version represents.

For computer-searching of documents for given words we need to store text, which needs much less storage space than images. Currently searching cannot be done for words within images but it seems possible that in future images could be searched for individual words in a manner similar to optical character recognition systems in use today.

The digital format in which information is stored is important; the jpeg or jpg file format for images has worldwide support. Although the tiff format (high quality but large file size) might be the preferred format for preservation of historical documents the jpg format is acceptable: simply copying does not degrade the image but converting to another form for display, for example, will do so. The independent Joint Photographic Experts Group created this jpeg standard. For text the html file format (Hyper-Text Mark-up Language) is a formal Recommendation by the World Wide Web Consortium and is generally adhered to by the major browsers. This format allows text to be easily readable on all types of screen, whether computer, tablet or smartphone. It is difficult to believe that these non-commercial formats will not be used for the foreseeable future since change to a new system would be a formidable task.

It should be the responsibility of all archivists to make sure that definitive copies are made whenever technological change occurs or obsolescence looms. Copies can be altered on purpose or by accident so care is needed in looking after digital information, preferably kept safe by more than one archivist.

Computers and digital storage devices use transistors which depend on the controlled movement of electrons within them. The future appears to depend on electrons - fundamental, negatively charged, indivisible entities which obey the laws of quantum physics, whereas our ancestors relied sensibly on oak-gall ink, animal skins and feathers to make quill pens with which to write.


It appears that digital images in jpeg format and text preservation in html format are appropriate for copying historical documents. Storage devices may be reasonably durable but it is not the case that we can rely on the longevity of devices to read the content of current storage systems. The capacity for storage of large amounts of data is not an issue. Repeated updating using the latest devices, computers and software is essential.


The essential help of Frank Woodhams and David Holdsworth is gratefully acknowledged in preparing this article.


  • (A>
  • (A>
  • (A>
  • (A HREF=>
  • (A HREF=>


Bits, bytes and pixels

Computers work using transistors which are so small that millions can sit on a pinhead. These can be set to one of two states - at zero or positive voltage brought about by movement of electrons in a transistor. This state can be used to represent a number in the binary system using 0s and 1s - i.e. a binary digit called a bit (symbol b). A 0 is represented by zero volts and 1 is represented by a positive voltage.

All information stored in computers is in the form of files, whether a piece of text or a picture. A file is a sequence of these 0s and 1s. These can be considered as sets of 8 bits called 1 byte (symbol B). The largest number you can represent in the binary system with 8 bits is 11111111, or 255 in decimal notation. Since 00000000 is the smallest, you can represent 256 things with a byte such as a letter or symbol in text or type and intensity of colour. The bit can be faithfully copied from one form of storage to another. This means that a copy of a file, whether text or image, is identical to the original without loss of quality.

Storage of text is the least demanding in terms of bytes - one byte per letter or typographic symbol. Images are more demanding; they comprise very large sets of tiny colour points called pixels. A pixel is usually represented by a 24-bit binary number. There are 8 bits each for red, green and blue allowing 256x256x256 = 16,777,216 numbers each defining a specific colour and intensity. 1 pixel therefore requires 3 bytes. The website explains the process and shows the range of colours that can be displayed from combinations of red, green and blue. The retina of the human eye is most sensitive to red, green and blue so a mixtures of these three additive primary colours give the largest range of colours visible to humans.

An image made up of one million pixels therefore requires three million bytes (3 megabytes, 3 MB), but the number of megabytes can be reduced if required, by a complex process of compression achieved by saving in the jpg format, without much loss of image quality. See for a full explanation.

Data storage devices

Storage of digital information has made use of magnetic iron oxide films, in earlier days held on flexible plastic tapes, then on floppy plastic discs. Commercial CDs and DVDs are pressed using a metal substrate deposited on more rigid polycarbonate discs and top-coated with a plastic label - they may shortly become obsolete. Storage devices suitable for holding massive amounts of digital information are now cheaply available and perhaps much less prone to damage and decay than formerly. We are now familiar with external disc drives, memory cards as used in cameras and USB flash memory drives. A typical USB flash memory can now store many gigabytes (1 thousand megabytes = 1 gigabyte, GB) (for comparison the Bible requires about 4 megabytes, 4MB) and external drives up to thousands of gigabytes. The NCHT website holds about 10 gigabytes (10 GB) of information in the memory in the tiny Raspberry Pi computer described in this Journal. A memory card is required for storage as used in digital cameras. A hard disc - the high-speed rotating platter where data are stored - is made out of a mix of elements including ruthenium and platinum, two of the world’s rarest and most expensive metals. The move to solid-state devices with no moving parts is welcomed, probably being less prone to failure but their long-term longevity perhaps remains questionable. The way they work depends on transistors made of n-p-n type silicon wafers (silicon with added traces of arsenic, boron and phosphorus) which might be subject to external interference or internal diffusional degradation. The working life of flash memory drives may be limited to around 10 years. Nevertheless, by occasional transfer of data to newer devices, maintaining security over hundreds of years seems feasible. Maybe low-temperature storage of such devices would be helpful.