Digitization Services

We can digitize your collections at our facility in Joensuu

Modern mass-digitization is based on photographing all samples and their label data. This facilitates the automation of each of the steps in the process by employing in a distributed fashion the best available experts, methods, tools, and artificial intelligence.

We currently offer digitization for pinned insects and herbarium sheets.  This covers over 80% of all objects in scientific collections.

Our service includes all the steps of 1) retrieving the collection, 2) tagging with unique identifiers, 3) imaging, and 4) essential metadata handling. Optionally, 5) full data transcription and 6) georeferencing are available.  After 7) rigorous quality checks and quarantine, the collection will be 8) returned and data will be delivered to the Customer.  See below for a more detailed description of the process.

This full service solution particularly fits for large collections in order to catch up with a large digitization backlog. Our capacity is up to one half million samples in a year.

Price per specimen is typically 0.7€ for imaging and 1€ for label transcription, however, depending on a number of factors.


The Process

1. Transport

Packaging and lorrying

Shipping of the collection into our digitization factory.  Our business partner, and international removal company with experience of handling delicate museum objects, will take care of this.

2. Tagging

Digitization without unique identifiers means nothing

Every sample must be tagged with and unique identifier, which is machine readable such as QR-code.  The customer cam provide us with the tags, or we can print them following the namespace given by the Customer.

3. Imaging

Fully automated

Running entire collections through imaging lines requires serious logistical and physical effort. We are working to streamline that as much as possible.

On the herbarium line, one high-resolution image will be made. On the insect line, medium resolution images will be made of the specimen and the labels, also diagonally, and possibly frontally.

Quality control of the sharpness, color, and skew of the images will be done in near-real time.  Warnings will sound, if the imaging result is not within limits.

4. Post-processing

Meanwhile, on the back-end server...

Using image analysis techniques, the tags will be recognized from the images.  All the images and metadata will be combined as Digital Objects.

Technical methods such as OCR will be used to produce a raw automatic interpretation of the labels. These data can be matched against already transcribed specimens from Open Data. This way the material can be classified and directed to best suited transcription service later on.

5. Transcription

Human and artificial intelligence work together

Up until now in this process, only data from the boxes or drawers about the taxon and major geographic area have been entered for a batch of specimens, and the unique identifier has been generated.

Optionally, detailed transcription of the full data in each specimen that can be seen in the digitized labels, such as taxonomic identification, location, time, and collector name, will be made using the DigiWeb application developed by Digitarium.

We can mobilize an extensive network of professional biologists that have been trained in digitization and have already transcribed metadata of hundreds of thousands of specimens.

6. Georeferencing

Location, location, location, ...

Many older specimens do not have labels indicating the geographic coordinates where the specimen was collected.

Optionally, the geographic coordinates of the specimens will be found using the available gazetteers and machine learning methods.

7. Quality control

You'll get what you ordered

The results of imaging, transcription, and georeferencing go through rigorous quality checks based on the ISO 2859 standard. We use the DigiWeb groupware of Digitarium for collaborative transcription and quality control.

All the data generated during the digitization process will be backed up to Google Cloud Platform (CGP) on a daily basis.

8. Delivery

...on time!

The collection will be returned in original order and condition.

The digitized data will be made available on-line already during the digitization process. If so desired, we can organize a cloud service (GCP) for permanent access to the data and images.

Integrating the digitized data in Customer’s legacy systems and publishing in European and global research infrastructure is responsibility of the Customer, but through our research services we can offer consultation for these steps.