<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>NLN Digital preservation – Blog</title><link>https://digitalpreservation.no/blog/</link><description>Recent content in Blog on NLN Digital preservation</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="https://digitalpreservation.no/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>PREMIS without a single right answer – our approach to preservation metadata and events</title><link>https://digitalpreservation.no/blog/2026-03-24-premis-without-a-single-right-answer-our-approach-to-preservation-metadata-and-events/</link><pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2026-03-24-premis-without-a-single-right-answer-our-approach-to-preservation-metadata-and-events/</guid><description>
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2026-03-24-premis-without-a-single-right-answer-our-approach-to-preservation-metadata-and-events/chatGPTimage.png" alt="AI-generated illustration preservation metadata" loading="lazy" /&gt;&lt;/p&gt;
&lt;p&gt;🗂️ Preservation metadata describes the origin of digital material, where it comes from and how it was created. It documents the actions and events (in this blogpost referred to as &amp;ldquo;events&amp;rdquo;) that have affected a digital object, such as creation, migration, validation, and transfer. This type of metadata is particularly important for digital materials, as it ensures traceability and provides a record of what has been done to the object throughout its lifecycle.&lt;/p&gt;
&lt;p&gt;🕰️ Historically, there has been limited focus on systematic work with preservation metadata for digital content at the National Library. Following the establishment of the Digital Preservation Team, dedicated attention has been given to how both data and metadata will be better managed over time. The PREMIS standard&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;, which is designed to support the structured description of preservation metadata, has been discussed on several occasions. When the decision was made to adopt the &lt;a href="https://dilcis.eu/"target="_blank" rel="noopener"&gt;E-ARK standard&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for our preservation packages, the choice became clear, as E-ARK also recommends and refers to the use of PREMIS for this type of metadata. In recent years, we have documented certain preservation events in a separate database. As part of the ongoing work to upgrade our DPS solution&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;, we also decided to explore the possibility of implementing PREMIS events.&lt;/p&gt;
&lt;p&gt;🧐 As we had little prior knowledge of PREMIS, we had to begin by familiarizing ourselves with what the standard is, what it is intended for, and how it can be applied. We quickly discovered that there is limited guidance on how it should be implemented in practice, and that the standard functions more as a flexible framework that can be interpreted and realized in many different ways.&lt;/p&gt;
&lt;p&gt;📊 Next our approach was to begin with a systematic review of the &lt;a href="https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf"target="_blank" rel="noopener"&gt;PREMIS Data Dictionary&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;. We examined all entities and their semantic units to map what we already document, what would be relevant to document in the short term (to get started), what we would like to implement over time, and what we consider less relevant for our collection. Throughout, we have had to balance the need for sufficient documentation against the risk of ending up with unnecessary amounts of metadata being stored and managed. It is also important to keep in mind the scale of our holdings, larger than most, about 19 petabytes (19.000 terabytes) with unique data by the end of 2025. Even by documenting only a limited set of events in the first DPS solution, we have already accumulated more than 53 million events at the package level and over 76 million events at the file level.&lt;/p&gt;
&lt;p&gt;🔄 We then identified which events were relevant for the team to document internally within the DPS workflow, during ingest, within the preservation process, and at the point of dissemination. As an extension of this work, we saw that it could be possible for depositors to add PREMIS events via our API at the same time as they submit new information packages (SIPs). In this way, we can also enable depositors to include relevant preservation metadata that has been documented before the digital material is received in the DPS. Events submitted via the API are preserved in the same way as those created within the DPS.&lt;/p&gt;
&lt;p&gt;📋 To determine which events we consider relevant to document, we based our work on the Library of Congress list of &lt;a href="https://id.loc.gov/vocabulary/preservation/eventType.html"target="_blank" rel="noopener"&gt;eventTypes&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; and refined it to better fit our material and preservation environment. At present, we allow 11 event types to be submitted via our API, while some additional types are used internally within the DPS. The set of event types currently accepted through the API has initially been developed for the National Library’s internal production workflows, but will be adapted for external depositors as needed.&lt;/p&gt;
&lt;p&gt;💾 So far, we have adopted a simple modeling approach for PREMIS events, documenting three core components: object, event, and agent. For the object, we record two levels: the intellectual entity (essentially the information package) and the file level. We have decided to continue using our own event database, rather than writing events to files within the information package. This approach makes it easier to collect information and add new events without having to update the information package itself. Events are stored in the database in JSON format, not in PREMIS.xml. We have planned for the possibility of writing events from the database into information packages (for example, DIPs at dissemination) in PREMIS.xml format. This also applies to events submitted by depositors via the API. For clarity, our databases are maintained according to the same preservation standards as our bit-level storage.&lt;/p&gt;
&lt;p&gt;In addition to recommending the submission of preservation metadata as events via the API, it is also possible to include this type of metadata as part of the information package (SIP). More information can be found here: &lt;a href="https://digitalpreservation.no/docs/dps/sip/1.0/"&gt;SIP 1.0 (E-ARK)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;🚀 We view the work we have done so far as a starting point. We have chosen to document what we consider important for now, to get up and running, with a deliberate approach that allows us to adjust course along the way. For us, it is more important to establish a practice that can be further developed than to aim for a perfect solution from the start, where we risk spending time on endless discussions - and few practical solutions.&lt;/p&gt;
&lt;p&gt;Documentation on the use of event elements and event types is published here: &lt;a href="https://digitalpreservation.no/docs/dps/api/submission/events/"&gt;Events/preservation metadata&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;More technical information is available here: &lt;a href="https://digitalpreservation.no/swagger/"target="_blank" rel="noopener"&gt;Swagger DPS Submission Service API&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please feel free to reach out if you have any comments or questions 😊&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Preservation Metadata Implementation Strategies (PREMIS) is a metadata standard for recording information required for preservation of digital objects. The standard&amp;rsquo;s documentation and metadata schema are hosted by the Library of Congress.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Digital Preservation Services (DPS) is an umbrella term for services and software used to manage digital preservation at the National Library. The system ingests data for preservation, ensures data integrity, and manages access to preserved data.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Backend Developer – Preserving the Nation’s Memory</title><link>https://digitalpreservation.no/blog/2026-01-16-backend-developer-job-listing/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2026-01-16-backend-developer-job-listing/</guid><description>
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2026-01-16-backend-developer-job-listing/NBrana.jpg" alt="The National Library of Norway in Mo i Rana" loading="lazy" /&gt;&lt;/p&gt;
&lt;h2&gt;Backend Developer – Preserving the Nation’s Memory&lt;span class="hx:absolute hx:-mt-20" id="backend-developer--preserving-the-nations-memory"&gt;&lt;/span&gt;
&lt;a href="#backend-developer--preserving-the-nations-memory" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;We have received funding to expand our team with a new IT position.&lt;/p&gt;
&lt;p&gt;We are currently a multidisciplinary product team of eight people, half of whom are IT developers. The team is responsible for the development and management of the National Library’s core systems for digital long-term preservation.&lt;/p&gt;
&lt;p&gt;By the end of 2025, the National Library’s digital collection amounts to approximately 19 petabytes, consisting of more than 2 billion files. We receive more than 6 terabytes of new data daily, distributed across around 5,000 archival objects.&lt;/p&gt;
&lt;p&gt;Please feel free to contact us if you are interested, or if you know someone who might be.&lt;/p&gt;
&lt;p&gt;You can read more about the position &lt;a href="https://www.finn.no/job/ad/446375166"target="_blank" rel="noopener"&gt;here&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Mission accomplished!</title><link>https://digitalpreservation.no/blog/2025-12-17-the-pilot-has-landed/</link><pubDate>Wed, 17 Dec 2025 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2025-12-17-the-pilot-has-landed/</guid><description>
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2025-12-17-the-pilot-has-landed/pilot_eng.png" alt="Draft for the preservation of museums` digital cultural heritage" loading="lazy" /&gt;&lt;/p&gt;
&lt;p&gt;The National Library of Norway, the Norwegian Directorate for Cultural Heritage, and KulturIT were commissioned by the Norwegian Ministry of Culture and Equality to carry out a pilot project for the long-term preservation of digital museum objects using the National Library’s digital preservation solution (DPS). The assignment was given in December 2024.&lt;/p&gt;
&lt;p&gt;Over the past year, the National Library and KulturIT have worked closely together to develop and test a solution for the ingest and delivery of digital museum objects, initially focusing on images. For the Digital Preservation team at the National Library, this work required the establishment of several new functions within the DPS.&lt;/p&gt;
&lt;p&gt;An API interface has been developed to enable communication between various data systems and the DPS. The API handles authentication and authorization, and all data exchange is governed by agreements between the National Library and the institutions that chooses to use the service. More information about the API is available &lt;a href="https://digitalpreservation.no/docs/dps/api/"target="_blank" rel="noopener"&gt;here&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For data ingest and delivery, a data exchange platform based on the Amazon S3 protocol was tested. This solution allows users to transfer data without needing to understand how the data is stored within the platform.&lt;/p&gt;
&lt;p&gt;Requirements have been established for data delivered to the National Library to comply with the &lt;a href="https://dilcis.eu/specifications/sip"target="_blank" rel="noopener"&gt;E-ARK&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; standard for the structure and content of information packages. More information about these requirements is available &lt;a href="https://digitalpreservation.no/docs/dps/sip/1.0/"target="_blank" rel="noopener"&gt;here&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In addition, metadata requirements have been defined for information packages submitted for preservation. These metadata are considered essential for the management and preservation of the material within the preservation environment and are submitted via the API. More information about the metadata requirements is available &lt;a href="https://digitalpreservation.no/docs/dps/api/submission/metadata/"target="_blank" rel="noopener"&gt;here&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://kulturit.org/"target="_blank" rel="noopener"&gt;KulturIT&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; has, in parallel, developed functionality for selecting museum objects for preservation, as well as solutions for for ingesting and retrieving preserved museum objects from the National Library’s digital preservation system (DPS) based on our requirements.&lt;/p&gt;
&lt;p&gt;The solution has been tested by three of KulturIT’s owner museums, and the pilot project is considered successful.&lt;/p&gt;
&lt;p&gt;A final report has been submitted to the Ministry of Culture and Equality. The report describes how the assignment was carried out and recommends establishing a permanent service for the long-term digital preservation of images for museums. This service should build on the results of the pilot and include an initial phase involving five museums.&lt;/p&gt;
&lt;p&gt;In the longer term, the service may be expanded to include additional museums and extended to cover other media types, such as text, audio, and moving images.&lt;/p&gt;
&lt;p&gt;The National Library of Norway, the Norwegian Directorate for Cultural Heritage, and KulturIT have collaborated closely throughout the pilot and support the report’s recommendations for further work.&lt;/p&gt;</description></item><item><title>Workshop with KBNL</title><link>https://digitalpreservation.no/blog/2025-11-27-workshop-with-kbnl/</link><pubDate>Thu, 27 Nov 2025 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2025-11-27-workshop-with-kbnl/</guid><description>
&lt;p&gt;On 18-19 november, three colleagues from the KB, National Library of the Netherlands (KBNL), visited the Digital Preservation Team at the National Library of Norway (NLN).&lt;/p&gt;
&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2025-11-27-workshop-with-kbnl/KBNL.jpg" title="Guests from KBNL: Inge Hofsink, Richard Ligtenberg og Lonneke Smit" alt="Picture of the guests" loading="lazy" /&gt;
&lt;figcaption&gt;Guests from KBNL: Inge Hofsink, Richard Ligtenberg og Lonneke Smit&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2&gt;Workshop with KBNL&lt;span class="hx:absolute hx:-mt-20" id="workshop-with-kbnl"&gt;&lt;/span&gt;
&lt;a href="#workshop-with-kbnl" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Over the course of two days, we had a tightly packed schedule, starting with a guided tour of the National Library’s facilities.&lt;/p&gt;
&lt;p&gt;While NLN holds all media types and carries out extensive in-house digitisation, KBNL primarily handles text-based material, and all digitisation is carried out through tender processes in which they receive the digitised result.&lt;/p&gt;
&lt;p&gt;After the tour, we organized a joint workshop. The focus here was to share each institution’s experiences working with digital preservation.&lt;/p&gt;
&lt;p&gt;NLN presented the architecture of our Digital Preservation System (DPS) and its roadmap. KBNL presented their preservation solution, which is built on an internally developed system called DAPPR and uses ExLibris Rosetta and S3 for object storage (they also use ExLibris Alma). There was also a focus on experiences with metadata standards such as Dublin Core, MODS, and PREMIS.&lt;/p&gt;
&lt;p&gt;The final part of the workshop was devoted to discussing organisational challenges related to digital preservation work. How can we ensure that institutions keep a continuous focus on digital preservation? How can predictable framework conditions be established to sustain this work? Can we avoid leaving behind a period of digital dark age? Many interesting and valuable reflections emerged. We will return to this topic in a later blog post on &lt;a href="https://digitalpreservation.no/"target="_blank" rel="noopener"&gt;Digital preservation at the National Library of Norway.&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2025-11-27-workshop-with-kbnl/workshop.jpg" title="The NLN–KBNL workshop" alt="Workshop day 2" loading="lazy" /&gt;
&lt;figcaption&gt;The NLN–KBNL workshop&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;</description></item><item><title>World Digital Preservation Day</title><link>https://digitalpreservation.no/blog/2025-11-06-why-preserve/</link><pubDate>Thu, 06 Nov 2025 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2025-11-06-why-preserve/</guid><description>
&lt;p&gt;&lt;a href="https://www.dpconline.org/events/world-digital-preservation-day"target="_blank" rel="noopener"&gt;World Digital Preservation Day&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; is an annual event held on the first Thursday of November. The event is organized by the &lt;a href="https://www.dpconline.org/"target="_blank" rel="noopener"&gt;Digital Preservation Coalition&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; (DPC), and the day features events, campaigns, and the sharing of experiences worldwide. Its goal is to highlight both the value and the challenges of long-term preservation of digital data. This year’s theme is &lt;em&gt;Why preserve?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2025-11-06-why-preserve/WDPD_2025.png" alt="This year’s logo for World digital preservation day" loading="lazy" /&gt;&lt;/p&gt;
&lt;h2&gt;Why digital preservation at the National Library of Norway?&lt;span class="hx:absolute hx:-mt-20" id="why-digital-preservation-at-the-national-library-of-norway"&gt;&lt;/span&gt;
&lt;a href="#why-digital-preservation-at-the-national-library-of-norway" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The National Library of Norway’s mandate is to collect, preserve, and make accessible all published content across all types of media. This means that everything published in Norway, or published abroad about Norway, must be preserved and made available both now and in the future - whether it is physical or born-digital content. “All types of media” include, among other things, books, journals, e-books, audiobooks, posters, cards, advertisements, theatre programs, all Norwegian websites, radio broadcasts, and television programs - the list is extensive ! For this reason, the National Library is considered the most important source of knowledge about Norway and Norwegian society.&lt;/p&gt;
&lt;p&gt;The library’s societal mission is to give the people of Norway access to our shared knowledge and cultural history. This is achieved by preserving, digitizing, and disseminating everything published for the public. Access to our cultural heritage is essential for democracy and public enlightenment, and provides a foundation for identity and historical understanding. &lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The efforts in digital preservation at the National Library are closely related to:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;📚 &lt;strong&gt;Ensuring that today’s society, culture, and knowledge remain accessible for future generations -&lt;/strong&gt; Properly managed digital content lasts longer. Active preservation protects material from technological obsolescence, and with sufficient metadata, it can be understood and playable far into the future.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🌍 &lt;strong&gt;Bridging generations and geographies, making knowledge and culture accessible to everyone -&lt;/strong&gt; Digital content can be accessed anytime, anywhere, making sources for understanding the past and present more readily available to all.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;⚖️ &lt;strong&gt;Preserving digital collections to ensure compliance, accountability, and transparency across sectors -&lt;/strong&gt; When public institutions preserve documents, data, and communication digitally in a secure and systematic way, it enables transparency and reuse of information, makes it possible to trace decisions, actions, and resource use, and ensures that laws, regulations, and requirements are followed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🛡️ &lt;strong&gt;Protecting the integrity of our sources, research, and history -&lt;/strong&gt; In an era of misinformation and manipulation, preserving original documents, images, and audio recordings digitally with secure metadata standards ensures authenticity and reliability. Digital preservation allows research data, methodologies, and results to be stored in ways that can be verified and reused. Storing content in secure environments and multiple independent copies makes deliberate manipulation more difficult.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🌱 &lt;strong&gt;Enabling sustainability, adaptability, and long-term thinking - not just storage! -&lt;/strong&gt; Digital preservation reduces unnecessary duplication and physical wear on archival materials while using resources efficiently over time. Digital collections can be easily integrated into new platforms, used in research, teaching, or public outreach, and adapted to future needs. In short, digital preservation is not just about storing files - it’s about building a system that keeps knowledge accessible and relevant over time, regardless of technological or organizational changes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Happy World Digital Preservation Day!&lt;/strong&gt; 🎉&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;National library of Norway strategy and societal mission, &lt;a href="https://www.nb.no/om-nb/mandat-og-strategi/"target="_blank" rel="noopener"&gt;https://www.nb.no/om-nb/mandat-og-strategi/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Specifications for SIP Submission and Metadata Requirements</title><link>https://digitalpreservation.no/blog/2025-07-08-specifications-for-sip-submission-and-metadata-requirements/</link><pubDate>Tue, 08 Jul 2025 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2025-07-08-specifications-for-sip-submission-and-metadata-requirements/</guid><description>
&lt;p&gt;As part of the &lt;a href="https://digitalpreservation.no/blog/2025-01-28-lam-longterm-preservation-pilot/"&gt;pilot project&lt;/a&gt; on long-term preservation of museums’ digital cultural heritage, the team has been working on defining a standardized approach for the transfer of digital objects to the National Library of Norway. The work is based on the &lt;a href="https://dilcis.eu/"target="_blank" rel="noopener"&gt;E-ARK specification&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;, from which it is defined more specific requirements for the delivery of information packages.&lt;/p&gt;
&lt;p&gt;In addition, the team have focused on how to ensure that a minimum set of descriptive metadata is submitted alongside the digital objects. The primary purpose of this is to enable basic search and identification within the preservation environment, and it is not intended as a substitute for richer metadata delivered within the package itself. The approach is based on the &lt;a href="https://www.dublincore.org/specifications/dublin-core/dcmi-terms/"target="_blank" rel="noopener"&gt;Dublin Core&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; metadata standard, which is a basic standard, features open fields (no controlled vocabularies), and can be used for a variety of media types.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2025-07-08-specifications-for-sip-submission-and-metadata-requirements/Skjermbilde2_blogg_engelsk.JPG" alt="Metadata in digital preservation" loading="lazy" /&gt;
&lt;br&gt;&lt;/p&gt;
&lt;p&gt;Documentation for SIP submission is available here: &lt;a href="https://digitalpreservation.no/docs/dps/sip/1.0/"&gt;SIP 1.0&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Documentation on metadata submission requirements is available here: &lt;a href="https://digitalpreservation.no/docs/dps/api/submission/metadata/"&gt;Metadata Requirements&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Initially, the published documentation is intended for use within the scope of the pilot project. It primarily focuses on the submission of still image material. However, the intention is to use the same specifications for internal deposit workflows at the National Library as well. Further updates and modifications to the documentation will be introduced as we proceed beyond the pilot.&lt;/p&gt;
&lt;p&gt;Do not hesitate to contact us should you have any comments or questions! 👩🏻‍💻&lt;/p&gt;</description></item><item><title>Preserving 1,6 million hours of television</title><link>https://digitalpreservation.no/blog/2025-06-12-preservation-of-16-million-hours-of-television/</link><pubDate>Thu, 12 Jun 2025 08:00:00 +0100</pubDate><guid>https://digitalpreservation.no/blog/2025-06-12-preservation-of-16-million-hours-of-television/</guid><description>
&lt;p&gt;The National Library of Norway has recently completed one of its largest digital preservation projects ever: transferring the entire archive of historical digital television broadcasts to a new preservation system. Over 1.6 million hours of television, spread across as many files, were analyzed, quality assured, and repackaged—amounting to a total of 1,800 terabytes of data.&lt;/p&gt;
&lt;p&gt;This is the story of how we did it.&lt;/p&gt;
&lt;h2&gt;Background: One Hour, One File&lt;span class="hx:absolute hx:-mt-20" id="background-one-hour-one-file"&gt;&lt;/span&gt;
&lt;a href="#background-one-hour-one-file" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Since 2007, Norwegian broadcasters have delivered television broadcasts digitally to the National Library of Norway — one MP4 file per calendar hour of TV transmission. Each file contains everything aired during one hour, regardless of where programs start or end. In practice, this means a single TV program may be spread across multiple files.&lt;/p&gt;
&lt;p&gt;The files were stored in an older bit repository (Oracle HSM) in three copies. Now, the entire digital collection were to be rearchived into a modern preservation system — a process that imposed new requirements for quality, metadata, and packaging.&lt;/p&gt;
&lt;h2&gt;Challenges and Decisions&lt;span class="hx:absolute hx:-mt-20" id="challenges-and-decisions"&gt;&lt;/span&gt;
&lt;a href="#challenges-and-decisions" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;✅ &lt;strong&gt;Checksums for All Files&lt;/strong&gt; &lt;br&gt; Some broadcasters, like TV2 and TVNorge, delivered files with checksums&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; — a digital signature confirming that the file hasn’t changed. But files from other broadcasters lacked these, so we had to generate our own checksums.&lt;/p&gt;
&lt;p&gt;🔠 &lt;strong&gt;Standardizing File Names&lt;/strong&gt; &lt;br&gt; The file names contain important information — broadcaster, channel, date, and time — but many did not follow the standard format. Some lacked information about the broadcast time, others had the channel name misplaced. Before automated processing could begin, thousands of file names had to be corrected.&lt;/p&gt;
&lt;p&gt;🔍 &lt;strong&gt;Validating Technical Properties&lt;/strong&gt; &lt;br&gt; Each file was analyzed using the tools MediaInfo&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; and MediaConch&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt; to validate the format. We checked that the files contained both audio and video, had the correct duration, and weren’t truncated or empty.&lt;/p&gt;
&lt;p&gt;📄 &lt;strong&gt;Metadata and MODS&lt;/strong&gt; &lt;br&gt; Since none of the files had catalog data, we generated MODS&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt; metadata for each one. Information from the file names was extracted and combined with technical metadata from the analysis tools.&lt;/p&gt;
&lt;p&gt;✏️ &lt;strong&gt;Expanding Channel Names to Full Form&lt;/strong&gt; &lt;br&gt; In the original file names, abbreviated channel names were used, such as “BLI” for “TV2 BLISS.” These were converted into full channel names in the metadata to ensure future clarity and understanding.&lt;/p&gt;
&lt;p&gt;📦 &lt;strong&gt;Packaging Files in E-ARK Format&lt;/strong&gt; &lt;br&gt; All files were packaged according to the E-ARK standard&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;, a European standard for long-term preservation. We used the open-source tool Commons-ip for both packaging and validation.&lt;/p&gt;
&lt;p&gt;📝 &lt;strong&gt;Documenting Preservation Activities with PREMIS&lt;/strong&gt; &lt;br&gt; Changes, anomalies, and technical conditions were documented using PREMIS&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt; metadata, ensuring the entire rearchiving process is traceable and verifiable in the future.&lt;/p&gt;
&lt;h2&gt;Technical Solution and Progress&lt;span class="hx:absolute hx:-mt-20" id="technical-solution-and-progress"&gt;&lt;/span&gt;
&lt;a href="#technical-solution-and-progress" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Work began in autumn 2024 with mapping and preparation. The actual rearchiving began in November 2024 and was completed on February 4, 2025. The process was automated using Apache NiFi&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt;, which managed data processing with high control over flow and capacity.&lt;/p&gt;
&lt;p&gt;By combining automation with targeted manual reviews, we significantly increased efficiency. Daily processing capacity rose from 25 to over 40 terabytes.&lt;/p&gt;
&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2025-06-12-preservation-of-16-million-hours-of-television/datavolume.jpg" title="Overview of data volume rearchived per day. 1 TB = 1 GB" alt="Rearchiving per day in TeraBytes" loading="lazy" /&gt;
&lt;figcaption&gt;Overview of data volume rearchived per day. 1 TB = 1 GB&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2&gt;Findings and Anomalies&lt;span class="hx:absolute hx:-mt-20" id="findings-and-anomalies"&gt;&lt;/span&gt;
&lt;a href="#findings-and-anomalies" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;With a dataset this large, some anomalies were unavoidable. Here are some examples of what we encountered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Duplicates:&lt;/strong&gt; Some were caused by seasonal time changes (switching to and from daylight saving time), others were identical broadcasts from different NRK districts. Files were assessed, and the best versions were kept.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Incorrect Checksums:&lt;/strong&gt; Several files arrived with invalid checksums. After closer inspection, we found that all copies were in fact identical, indicating that the error occurred prior to delivery. We updated the checksums, and the issue was documented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Empty Files:&lt;/strong&gt; Around 30 files contained no data. In cases where it was possible, we retrieved the original files. Otherwise, the files were documented as empty with appropriate metadata.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unknown File Types:&lt;/strong&gt; Some MP4 files had incorrect signatures and were unrecognized by file identification tools. Some turned out to be QuickTime files in the wrong container. These were also documented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Truncated Files:&lt;/strong&gt; Some files were technically cut short, containing only partial content. These were preserved with documented notes on the issue.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What We Learned&lt;span class="hx:absolute hx:-mt-20" id="what-we-learned"&gt;&lt;/span&gt;
&lt;a href="#what-we-learned" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;This project showed that when preserving large volumes of data over the long term, it is critical to have robust tools, standards, and reliable control routines. It also demonstrated the importance of combining automation with professional judgment.&lt;/p&gt;
&lt;p&gt;By using Apache NiFi for data flow and the E-ARK standard for archival packaging, we developed a scalable, reusable solution for future preservation efforts.&lt;/p&gt;
&lt;h2&gt;In Conclusion&lt;span class="hx:absolute hx:-mt-20" id="in-conclusion"&gt;&lt;/span&gt;
&lt;a href="#in-conclusion" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Television is a central part of our shared memory. By securing and structuring this material for the future, we enable tomorrow’s researchers, journalists, and the public to understand how Norway has evolved—hour by hour.&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Checksum"target="_blank" rel="noopener"&gt;https://en.wikipedia.org/wiki/Checksum&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://mediaarea.net/en/MediaInfo"target="_blank" rel="noopener"&gt;https://mediaarea.net/en/MediaInfo&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://mediaarea.net/MediaConch"target="_blank" rel="noopener"&gt;https://mediaarea.net/MediaConch&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://www.loc.gov/standards/mods/"target="_blank" rel="noopener"&gt;https://www.loc.gov/standards/mods/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://dilcis.eu/"target="_blank" rel="noopener"&gt;https://dilcis.eu/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://www.loc.gov/standards/premis/"target="_blank" rel="noopener"&gt;https://www.loc.gov/standards/premis/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;&lt;a href="https://nifi.apache.org/"target="_blank" rel="noopener"&gt;https://nifi.apache.org/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Pilot Project for Long-term Digital Preservation of Museum Collections</title><link>https://digitalpreservation.no/blog/2025-01-28-lam-longterm-preservation-pilot/</link><pubDate>Tue, 28 Jan 2025 08:00:00 +0100</pubDate><guid>https://digitalpreservation.no/blog/2025-01-28-lam-longterm-preservation-pilot/</guid><description>
&lt;p&gt;The National Library of Norway, in collaboration with &lt;a href="https://kulturit.org/en"target="_blank" rel="noopener"&gt;KulturIT&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; and commissioned by the Ministry of Culture, has initiated a pilot project focused on ensuring the long-term preservation of museums&amp;rsquo; digital cultural heritage.
This pilot represents a strategic initiative in preserving Norway&amp;rsquo;s digital cultural heritage for future generations.&lt;/p&gt;
&lt;p&gt;The primary objective is to evaluate whether the National Library&amp;rsquo;s digital preservation solution can be integrated with KulturIT&amp;rsquo;s collection management systems currently used by museums.
The pilot emphasizes security protocols, user experience optimization, and process automation, with subsequent evaluation based on cost-benefit analysis.&lt;/p&gt;
&lt;h2&gt;Key Objectives and Technical Components&lt;span class="hx:absolute hx:-mt-20" id="key-objectives-and-technical-components"&gt;&lt;/span&gt;
&lt;a href="#key-objectives-and-technical-components" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The pilot project encompasses several critical technical and operational elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Technical Integration:&lt;/strong&gt; Implementation of machine-to-machine communication between KulturIT&amp;rsquo;s systems and the National Library&amp;rsquo;s preservation infrastructure, using REST API technology. This integration facilitates efficient ingestion and retrieval of archival files.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Intermediate Storage:&lt;/strong&gt; Development of an S3-protocol-based intermediate storage platform, ensuring secure and efficient file transfer operations between systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standardized Package Formats:&lt;/strong&gt; Implementation of E-ARK standards to maintain interoperability and ensure consistency in long-term data preservation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authentication and Authorization:&lt;/strong&gt; Development of robust security mechanisms to ensure that only authorized users and systems can access preserved files and associated functionality.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Museum User Interface:&lt;/strong&gt; Implementation of an intuitive interface enabling museums to select preservation-worthy data and efficiently retrieve preserved data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The pilot will be conducted with a minimum of two participating museums, initially focusing on the long-term preservation of digital images.
Beyond technical infrastructure testing, the solution will validate that all archival packages received from museums meet specified preservation requirements.&lt;/p&gt;
&lt;h2&gt;Architectural Overview&lt;span class="hx:absolute hx:-mt-20" id="architectural-overview"&gt;&lt;/span&gt;
&lt;a href="#architectural-overview" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The following architectural diagram illustrates the proposed solution framework:&lt;/p&gt;
&lt;figure&gt;&lt;img src="https://digitalpreservation.no/blog/2025-01-28-lam-longterm-preservation-pilot/dps-skisse-v2.svg"
alt="A diagram showing proposed data flow in the DPS software"&gt;&lt;figcaption&gt;
&lt;p&gt;Proposed DPS Architecture&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Clients:&lt;/strong&gt; Entities responsible for submitting or retrieving materials preserved by the National Library.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Intermediate Storage:&lt;/strong&gt; Platform services implementing shared intermediate storage solutions based on S3 protocol.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Archival Storage:&lt;/strong&gt; Bit-repository services where preservation data is archived following the 3+2+1 principle. That is three copies, two technologies (disk, tape, tape) and one geographically distributed copy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DPS - Digital Preservation Services:&lt;/strong&gt; The National Library&amp;rsquo;s digital preservation solution, based on the OAIS standard.
This encompasses intermediate storage solutions, bit-repository management, API interfaces, and verification mechanisms for materials received and distributed within the National Library&amp;rsquo;s preservation environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Pilot Scope Limitations&lt;span class="hx:absolute hx:-mt-20" id="pilot-scope-limitations"&gt;&lt;/span&gt;
&lt;a href="#pilot-scope-limitations" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The pilot explicitly excludes functionality for access copies or metadata updates.
However, the potential for future expansion to include these capabilities will be evaluated during the assessment phase.&lt;/p&gt;
&lt;h2&gt;Project Management and Reporting Structure&lt;span class="hx:absolute hx:-mt-20" id="project-management-and-reporting-structure"&gt;&lt;/span&gt;
&lt;a href="#project-management-and-reporting-structure" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The pilot is managed by the National Library of Norway in partnership with KulturIT and the Directorate of Culture.
The project team will maintain regular reporting cycles to the Ministry of Culture.
A comprehensive final report, including findings, cost estimates, and recommendations for future development, will be submitted by December 1, 2025.&lt;/p&gt;
&lt;p&gt;The National Library of Norway is looking forward to productive collaboration with KulturIT in developing solutions that will enhance museums&amp;rsquo; capabilities for preserving and accessing digital cultural heritage materials.
This pilot represents a significant step toward ensuring the long-term preservation of Norway&amp;rsquo;s digital cultural heritage.&lt;/p&gt;</description></item><item><title>Moving 4.3 million digital newspapers</title><link>https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/</link><pubDate>Wed, 06 Nov 2024 12:46:00 +0100</pubDate><guid>https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/</guid><description>
&lt;p&gt;The National Library of Norway is in the process of replacing its bit-repository purchased in 2007 with a more modern preservation solution for digital material.
This solution is based on in-house developed software called DPS (Digital Preservation Services) and uses IBM-HPSS as the underlying system for data storage.&lt;/p&gt;
&lt;h2&gt;Transition to the new preservation solution&lt;span class="hx:absolute hx:-mt-20" id="transition-to-the-new-preservation-solution"&gt;&lt;/span&gt;
&lt;a href="#transition-to-the-new-preservation-solution" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;In 2023, all archiving of data at the National Library was moved to the new DPS preservation solution.
By this time, the old bit-repository contained over 14 Petabytes&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; of digitized and deposited historical material, which must be re-archived to DPS.
A key part of this process is analysing and repackaging the historical data to meet the requirements of the new DPS.&lt;/p&gt;
&lt;p&gt;It was decided that the newspaper collection would be the first material type to be re-archived.
This process was carried out as a collaborative project between the Text team and the Digital Preservation team at the National Library.
The Text team is responsible for digitizing all text-based material, and the Preservation Team is responsible for preserving the digital collection.&lt;/p&gt;
&lt;h2&gt;Brief overview of the National Library’s digital newspaper collection&lt;span class="hx:absolute hx:-mt-20" id="brief-overview-of-the-national-librarys-digital-newspaper-collection"&gt;&lt;/span&gt;
&lt;a href="#brief-overview-of-the-national-librarys-digital-newspaper-collection" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;The collection consists of born-digital and digitized newspapers from 1763 up until today.&lt;/li&gt;
&lt;li&gt;The total newspaper collection consists of approximately 4.6 million newspapers across 1,800 newspaper titles. Of these, around 4.3 million newspapers were to be re-archived to DPS.&lt;/li&gt;
&lt;li&gt;In total, over 16 million packaged files had to be moved from the old bit-repository.&lt;/li&gt;
&lt;li&gt;This amounted to a total of approximately 2.5 Petabytes of data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The National Library’s digital newspaper collection comes from three sources:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Legally deposited PDF newspapers.&lt;/strong&gt;&lt;br&gt;
These are the print files of the newspapers, which are downloaded in PDF format. The newspapers are downloaded by the National Library from the newspaper publishers daily, then processed before being prepared for dissemination and digital preservation. As of 2024, a total of 220 published newspapers are received in this way on a daily/regular basis. This material makes up about 6% of the newspaper collection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Newspapers scanned from microfilm.&lt;/strong&gt;&lt;br&gt;
These are paper newspapers that were first photographed on microfilm and then the microfilm was digitized. The digitization was carried out by commercial companies. Most of these newspapers are published before the year 2000. This material makes up about 41% of the newspaper collection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Newspapers scanned from paper.&lt;/strong&gt;&lt;br&gt;
Original paper newspapers that have been scanned and processed internally by the National Library. This material makes up about 53% of the newspaper collection.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Newspapers can vary greatly in size and number of pages, depending on the publicised era and publication format.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/avisdigitalisering.jpg" alt="A photo of a newspaper pages being turned as part of digitization" loading="lazy" /&gt;&lt;/p&gt;
&lt;p&gt;For almost 20 years, the National Library has systematically digitized and received digital newspapers on a large scale. The methods for doing this have evolved over time. This means that newspapers digitized or received 20 years ago were packaged and archived differently than the newspapers being digitized today.&lt;/p&gt;
&lt;p&gt;Here is an overview of what is done with digital/digitized newspapers that are archived today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each individual page is OCR&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; processed.&lt;/li&gt;
&lt;li&gt;Each individual page is going through a structure recognition process where content is analysed to recognize images, headlines, publisher information, publication date, etc. This information is stored in separate OCR/ALTO&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt; files.&lt;/li&gt;
&lt;li&gt;Separate JP2K (.jpx) files are created for dissemination on nb.no. This also applies to PDF-deposited newspapers.&lt;/li&gt;
&lt;li&gt;Separate JP2K (.jp2) files are created for long term preservation.&lt;/li&gt;
&lt;li&gt;A METS&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt; structure is created for all newspapers. This links the content of each newspaper so that it is possible to virtually browse through the newspaper, search its content in free text, and get highlighted search results.&lt;/li&gt;
&lt;li&gt;As an example: A 20-page newspaper issue can consist of about 160 individual files. These include both metadata files and content files. Typical file types are JP2K-HQ, JP2K-LQ, OCR, ALTO, METS, JHOVE, and other metadata about the newspaper object.&lt;/li&gt;
&lt;li&gt;All similar file types are packed together in container files of the tar&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt; type. A newspaper issue consists of a folder with .tar files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that the oldest digitized newspaper material in the collection may not necessarily have been processed in the same way as today’s newspaper material.&lt;/p&gt;
&lt;h2&gt;Improvements/decisions that impacted the re-archiving process&lt;span class="hx:absolute hx:-mt-20" id="improvementsdecisions-that-impacted-the-re-archiving-process"&gt;&lt;/span&gt;
&lt;a href="#improvementsdecisions-that-impacted-the-re-archiving-process" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Before starting the re-archiving of newspapers from the old bit-repository to the new DPS, the content was analysed. Based on the analyses, several decisions were made that influenced the design of the new archival packages in DPS:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All files that are re-archived must have accompanying checksums&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt;. Very few of the original newspaper files in SAM-FS had any checksum information.&lt;/li&gt;
&lt;li&gt;Internal structure cleanup of existing archival packages was to be performed. Files that were not deemed preservation-worthy would be removed from the archival packages. Examples of such files include temporary status files from the production process.&lt;/li&gt;
&lt;li&gt;File identification would be performed on all files. The tool DROID&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt; was used for this. DROID uses the PRONOM&lt;sup id="fnref:8"&gt;&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref"&gt;8&lt;/a&gt;&lt;/sup&gt; register as authority.&lt;/li&gt;
&lt;li&gt;Documentation of what was done with each file during the re-archiving process was required. This was to be done by creating specific &amp;ldquo;events&amp;rdquo; in the DPS database, based on the PREMIS&lt;sup id="fnref:9"&gt;&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref"&gt;9&lt;/a&gt;&lt;/sup&gt; standard.&lt;/li&gt;
&lt;li&gt;Data sent to DPS would be deleted in the old bit-repository as soon as DPS confirmed that the data was archived there.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Timeline for the re-archiving process&lt;span class="hx:absolute hx:-mt-20" id="timeline-for-the-re-archiving-process"&gt;&lt;/span&gt;
&lt;a href="#timeline-for-the-re-archiving-process" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;div class="hextra-steps hx:ml-4 hx:mb-12 hx:ltr:border-l hx:rtl:border-r hx:border-gray-200 hx:ltr:pl-6 hx:rtl:pr-6 hx:dark:border-neutral-800 [counter-reset:step]"&gt;
&lt;h3&gt;March-April 2023&lt;span class="hx:absolute hx:-mt-20" id="march-april-2023"&gt;&lt;/span&gt;
&lt;a href="#march-april-2023" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;The two teams began with preliminary investigations and mapping of the newspaper material in the old bit-repository in the spring of 2023.&lt;/p&gt;
&lt;h3&gt;May 2023 (migration start)&lt;span class="hx:absolute hx:-mt-20" id="may-2023-migration-start"&gt;&lt;/span&gt;
&lt;a href="#may-2023-migration-start" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;A test migration to DPS was conducted in May 2023. This revealed the need to clean up the source material in the bit-repository. A production workflow was created specifically for re-archiving newspapers to DPS.&lt;/p&gt;
&lt;h3&gt;June-July 2023&lt;span class="hx:absolute hx:-mt-20" id="june-july-2023"&gt;&lt;/span&gt;
&lt;a href="#june-july-2023" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;During the summer months of 2023, the process of generating trustworthy checksums for the newspaper files started. This is described in more detail in a separate &lt;a href="https://digitalpreservation.no/checksum-generation"&gt;blog post&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;October 2023&lt;span class="hx:absolute hx:-mt-20" id="october-2023"&gt;&lt;/span&gt;
&lt;a href="#october-2023" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Large scale re-archiving of deposited newspapers and newspapers scanned from microfilm began in October 2023. In total, approximately 1PB of data was re-archived. The process was then halted due to delays in the delivery of new data storage resources to DPS for purchase and tendering reasons. We simply ran out of storage.&lt;/p&gt;
&lt;h3&gt;January 2024&lt;span class="hx:absolute hx:-mt-20" id="january-2024"&gt;&lt;/span&gt;
&lt;a href="#january-2024" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;In January 2024, re-archiving of the newspapers scanned by the National Library began. This amounted to approximately 1.5 PB of data.&lt;/p&gt;
&lt;h3&gt;June 2024 (migration completed)&lt;span class="hx:absolute hx:-mt-20" id="june-2024-migration-completed"&gt;&lt;/span&gt;
&lt;a href="#june-2024-migration-completed" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;!-- Re-archiving of all newspapers to DPS was completed on June 1, 2024. --&gt;
&lt;/div&gt;
&lt;h4&gt;Data volume processed over time&lt;span class="hx:absolute hx:-mt-20" id="data-volume-processed-over-time"&gt;&lt;/span&gt;
&lt;a href="#data-volume-processed-over-time" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/image2.png" title="The highest volume of data processed and re-archived in one day was: 42.15 TB." alt="" loading="lazy" /&gt;
&lt;figcaption&gt;The highest volume of data processed and re-archived in one day was: 42.15 TB.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h4&gt;Newspaper volume processed over time&lt;span class="hx:absolute hx:-mt-20" id="newspaper-volume-processed-over-time"&gt;&lt;/span&gt;
&lt;a href="#newspaper-volume-processed-over-time" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/image3.png" title="The highest number of individual newspapers migrated in one day was: 97,528." alt="" loading="lazy" /&gt;
&lt;figcaption&gt;The highest number of individual newspapers migrated in one day was: 97,528.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2&gt;Results and lessons learned&lt;span class="hx:absolute hx:-mt-20" id="results-and-lessons-learned"&gt;&lt;/span&gt;
&lt;a href="#results-and-lessons-learned" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;The archiving speed varied considerably over time, with rates reaching 42 TB per day for simpler newspaper materials. This demonstrated HPSS&amp;rsquo;s considerable ingestion capacity and helped us establish realistic timelines for future migrations. However, complex materials (like those requiring OCR reprocessing) had significantly lower throughput rates, due to additional pre-processing time.&lt;/li&gt;
&lt;li&gt;We experienced an unplanned pause in migration due to lack of storage media. This highlighted that large-scale migrations require comprehensive planning beyond technical considerations. Critical administrative prerequisites include active procurement agreements, completed tender processes, and streamlined purchasing procedures - all of which need significant lead time and budgetary planning.&lt;/li&gt;
&lt;li&gt;Old data can be messy, and migration projects present valuable opportunities for cleanup.
&lt;ul&gt;
&lt;li&gt;We discovered and removed 4.5 million 0-byte files - remnants of temporary status tracking from historical digitization processes. This cleanup removes uneccessary materials and aligns our preservation packages &lt;a href="https://digitalpreservation.no/docs/principles/nln-digipres-principles-en/#ensure-that-digital-preservation-is-done-in-a-sustainable-way"&gt;with our principles&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;We cleaned up material that was incorrectly packaged or had incomplete information, ensuring that the newspaper collection is uniformly archived in DPS.&lt;/li&gt;
&lt;li&gt;Poor OCR quality was identified for two years of newspaper volumes (53,678 issues, 107 TB) from 2010-2011 - the first two years of OCR processing. The files were reprocessed using the current newspaper production workflow, to produce new and improved OCR.&lt;/li&gt;
&lt;li&gt;Some digitized newspapers from microfilm were of very poor quality. These were replaced with digital copies from scans of the original paper newspapers. In total, this amounted to 2,051 newspaper issues. It illustrates how original source material often produce better results than secondary generations of copies.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2024-11-04-rearchiving-newspapers/image1.png" title="Svein Erik Molberg and Mona Løkås from the Text Team, Vigdis Sørensen from Digital Preservation." alt="A photo of Svein Erik Molberg and Mona Løkås from the Text Team, Vigdis Sørensen from Digital Preservation." loading="lazy" /&gt;
&lt;figcaption&gt;Svein Erik Molberg and Mona Løkås from the Text Team, Vigdis Sørensen from Digital Preservation.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2&gt;Some numbers&lt;span class="hx:absolute hx:-mt-20" id="some-numbers"&gt;&lt;/span&gt;
&lt;a href="#some-numbers" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;h3&gt;Overview of re-archived newspapers into DPS&lt;span class="hx:absolute hx:-mt-20" id="overview-of-re-archived-newspapers-into-dps"&gt;&lt;/span&gt;
&lt;a href="#overview-of-re-archived-newspapers-into-dps" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Newspaper editions&lt;/th&gt;
&lt;th&gt;Newspaper editions in %&lt;/th&gt;
&lt;th&gt;Data volume in TB&lt;/th&gt;
&lt;th&gt;Data volume in %&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Born digital&lt;/td&gt;
&lt;td&gt;247 385&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microfilm&lt;/td&gt;
&lt;td&gt;1 779 043&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;td&gt;784&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paper base&lt;/td&gt;
&lt;td&gt;2 250 038&lt;/td&gt;
&lt;td&gt;53%&lt;/td&gt;
&lt;td&gt;1 580&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Re-archived to DPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 276 466&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2 439&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Overview of all newspapers in DPS (re-archived and new acquisitions since 2023)&lt;span class="hx:absolute hx:-mt-20" id="overview-of-all-newspapers-in-dps-re-archived-and-new-acquisitions-since-2023"&gt;&lt;/span&gt;
&lt;a href="#overview-of-all-newspapers-in-dps-re-archived-and-new-acquisitions-since-2023" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Newspaper editions&lt;/th&gt;
&lt;th&gt;Newspaper editions in %&lt;/th&gt;
&lt;th&gt;Data volume in TB&lt;/th&gt;
&lt;th&gt;Data volume in %&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Born digital&lt;/td&gt;
&lt;td&gt;302 372&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;td&gt;87&lt;/td&gt;
&lt;td&gt;3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microfilm&lt;/td&gt;
&lt;td&gt;1 899 481&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;td&gt;820&lt;/td&gt;
&lt;td&gt;31%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paper base&lt;/td&gt;
&lt;td&gt;2 397 660&lt;/td&gt;
&lt;td&gt;52%&lt;/td&gt;
&lt;td&gt;1 703&lt;/td&gt;
&lt;td&gt;66%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total in DPS today&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 599 513&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2 610&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;1 Petabyte = 1.000 Terabyte&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Optical_character_recognition"target="_blank" rel="noopener"&gt;https://en.wikipedia.org/wiki/Optical_character_recognition&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://www.loc.gov/standards/alto/techcenter/elementSet/index.html"target="_blank" rel="noopener"&gt;https://www.loc.gov/standards/alto/techcenter/elementSet/index.html&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://www.loc.gov/standards/mets/"target="_blank" rel="noopener"&gt;https://www.loc.gov/standards/mets/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Tar"target="_blank" rel="noopener"&gt;https://en.wikipedia.org/wiki/Tar&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;_(computing)&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums"target="_blank" rel="noopener"&gt;https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;&lt;a href="https://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid"target="_blank" rel="noopener"&gt;https://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:8"&gt;
&lt;p&gt;&lt;a href="https://www.nationalarchives.gov.uk/pronom/"target="_blank" rel="noopener"&gt;https://www.nationalarchives.gov.uk/pronom/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:9"&gt;
&lt;p&gt;&lt;a href="https://www.loc.gov/standards/premis/"target="_blank" rel="noopener"&gt;https://www.loc.gov/standards/premis/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Better Late Than Never: Adding Checksums to 16 Million Legacy Files</title><link>https://digitalpreservation.no/blog/2024-11-04-checksums/</link><pubDate>Wed, 06 Nov 2024 12:45:00 +0100</pubDate><guid>https://digitalpreservation.no/blog/2024-11-04-checksums/</guid><description>
&lt;p&gt;The National Library of Norway has used SAM-FS&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; as a system for long-term storage and archiving of large amounts of data since 2007.
SAM-FS contains 14 Petabytes of data and will soon reach &amp;ldquo;end of life&amp;rdquo; status as a product.&lt;/p&gt;
&lt;p&gt;In 2022, the National Library decided to replace SAM-FS with a more modern preservation solution for digital material.
This new solution is based on in-house developed software called DPS (Digital Preservation Services) and uses IBM-HPSS as the underlying system for data storage.&lt;/p&gt;
&lt;p&gt;Over the last 10 years, the National Library has used checksums&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; as a verification technique for preserved data.
In this context, a checksum is a calculated hash string used to verify that a data file has not been subject to any changes.
Common checksum calculation algorithms include MD5, SHA-1, SHA-256, or SHA-512.
The National Library uses MD5&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2&gt;Lack of checksums&lt;span class="hx:absolute hx:-mt-20" id="lack-of-checksums"&gt;&lt;/span&gt;
&lt;a href="#lack-of-checksums" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Many of the oldest files in SAM-FS &lt;strong&gt;lacked&lt;/strong&gt; checksums when they were stored.
As all files in SAM-FS are stored in three copies, you could say that without an accompanying checksum the three copies exist independently of each other.
If a discrepancy were to arise between the three copies, we would have no original checksum to use for verification.&lt;/p&gt;
&lt;figure&gt;&lt;img src="https://digitalpreservation.no/blog/2024-11-04-checksums/checksum1.svg"
alt="Diagram showing the data flow in and out of SAM-FS"&gt;&lt;figcaption&gt;
&lt;p&gt;Data stored in SAM-FS in 3 instances&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As part of the process of migrating data from SAM-FS to the new DPS, it was decided that checksums should be calculated and stored for all files that did not already have one.&lt;/p&gt;
&lt;h2&gt;Challenge&lt;span class="hx:absolute hx:-mt-20" id="challenge"&gt;&lt;/span&gt;
&lt;a href="#challenge" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;How could we ensure that files being transferred from SAM-FS to DPS were the same as those originally archived when there were no checksums to verify this?
The oldest files were over 20 years old and had been subjected to up to five hardware/platform migrations over time (see Figure 4).&lt;/p&gt;
&lt;p&gt;Which of the three file instances stored in SAM-FS should we choose as the starting point for migration?
How could we know that this was the &amp;ldquo;correct file&amp;rdquo; without having to read and compare all three instances?
Reading and comparing all three instances was considered to be difficult, as this involved reading and processing many Petabytes&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt; of data on the same infrastructure that was also used for daily operations.&lt;/p&gt;
&lt;h2&gt;Solution&lt;span class="hx:absolute hx:-mt-20" id="solution"&gt;&lt;/span&gt;
&lt;a href="#solution" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The fact that we had the files preserved in multiple instances helped us, as we were about to generate checksums for &amp;ldquo;old&amp;rdquo; files for the first time.
One copy is stored on disk (disk copy), and two copies are on tape (tape copy 1 and tape copy 2).&lt;/p&gt;
&lt;figure&gt;&lt;img src="https://digitalpreservation.no/blog/2024-11-04-checksums/checksum2.svg"
alt="Diagram showing data flow in checksum generation"&gt;&lt;figcaption&gt;
&lt;p&gt;Checksum calculation&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;First (Step 2), we created a program (script) that extracted all files in SAM-FS without checksums from one of the tape copies.
A checksum was then calculated for each individual file, which was stored in a database.&lt;/p&gt;
&lt;p&gt;This process took just over 2 calendar months to complete for a dataset of 2.5 Petabytes.&lt;/p&gt;
&lt;figure&gt;&lt;img src="https://digitalpreservation.no/blog/2024-11-04-checksums/checksum3.svg"
alt="Diagram showing data flow in checksum verification and data transfer to the DPS"&gt;&lt;figcaption&gt;
&lt;p&gt;Checksum verification and transfer to DPS&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The actual task of transferring files from SAM-FS to DPS began after all the files on tape had been read, and checksums had been calculated and stored in a database.&lt;/p&gt;
&lt;p&gt;The migration then started by retrieving a file from the disk instance in SAM-FS (step 3).
A checksum for this file was then calculated and compared with the checksum stored for the corresponding tape copy file.
If these matched, it meant that at least two of the three file copies were identical.&lt;/p&gt;
&lt;p&gt;The file was then considered to be OK and was transferred to DPS with the accompanying new checksum (step 4).
If the checksum did not match, it meant that either the disk or tape copy of the file was incorrect.
This was then reported as an error and had to be followed up manually by checking the third copy to see if it matched one of the two that had already been checked.&lt;/p&gt;
&lt;h2&gt;Outcome and lessons learned&lt;span class="hx:absolute hx:-mt-20" id="outcome-and-lessons-learned"&gt;&lt;/span&gt;
&lt;a href="#outcome-and-lessons-learned" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;None of the 16 million files in the dataset of 2.5 Petabytes had checksum discrepancies.
This method of ensuring that the files in DPS are authentic after migration was both time- and resource-consuming, but it proved to work well for us.&lt;/p&gt;
&lt;p&gt;Another experience we had was that one can trust technical storage systems when it comes to avoiding changes in the bit pattern over time.
We have checksum-verified 16 million files that have undergone up to five technological shifts over 20 years, without finding any trace of changes in the bit pattern of any of the files.&lt;/p&gt;
&lt;h3&gt;Platform/Technology generations in SAM-FS&lt;span class="hx:absolute hx:-mt-20" id="platformtechnology-generations-in-sam-fs"&gt;&lt;/span&gt;
&lt;a href="#platformtechnology-generations-in-sam-fs" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Overview of technology shifts in the National Library&amp;rsquo;s SAM-FS long-term storage system.
The &lt;em&gt;TB&lt;/em&gt; value here is native storage capacity per storage unit, disk/tape:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Time period&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Disk copy&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Tape copy 1&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Tape copy 2&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2007-2009&lt;/td&gt;
&lt;td&gt;SUN 6140 (1TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kA (500GB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kA (500GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009-2011&lt;/td&gt;
&lt;td&gt;SUN 6180 (2TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kB (1TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kB (1TB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2012-2016&lt;/td&gt;
&lt;td&gt;Nexsan (3TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kC (5TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – T10kC (5TB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2016-2019&lt;/td&gt;
&lt;td&gt;Nexsan (8TB)&lt;/td&gt;
&lt;td&gt;&lt;em&gt;as above&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;&lt;em&gt;as above&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020-2022&lt;/td&gt;
&lt;td&gt;Fujitsu (16TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – LTO8 (12TB)&lt;/td&gt;
&lt;td&gt;SUN SL8500 – LTO8 (12TB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Hierarchical storage Management System, SAM-FS is also known as Oracle HSM&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums"target="_blank" rel="noopener"&gt;https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/MD5"target="_blank" rel="noopener"&gt;https://en.wikipedia.org/wiki/MD5&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;1 Petabyte = 1.000 Terabyte&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Digital preservation poster</title><link>https://digitalpreservation.no/blog/2024-09-30-poster/</link><pubDate>Mon, 30 Sep 2024 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2024-09-30-poster/</guid><description>
&lt;p&gt;In early 2023 the digital preservation team had a poster made to spread awareness of digital preservation in the wider organization.
The poster was drawn by our former colleague &lt;a href="https://stovis.no"target="_blank" rel="noopener"&gt;Vegard Orheim Stoveland&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; in colloboration with the digital preservation team.
The poster was based on the first revision of the newly defined &lt;a href="https://digitalpreservation.no/docs/principles/nln-digipres-principles-en/"&gt;principles&lt;/a&gt; for digital preservation in the National Library.
The poster aimed to ilustrate why digital preservation is important, and how we intend to do it.&lt;/p&gt;
&lt;p&gt;We hoped to share a revised version of the poster at iPRES 2024, but sadly it did not get accepted&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;.
Of course, that means we don&amp;rsquo;t have to save it for IPres any longer, and can share it freely here.&lt;/p&gt;
&lt;h2&gt;English version&lt;span class="hx:absolute hx:-mt-20" id="english-version"&gt;&lt;/span&gt;
&lt;a href="#english-version" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;&lt;a href="2023-10-04-digital-preservation-vector.jpg"&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2024-09-30-poster/2023-10-04-digital-preservation-vector.jpg" title="click the image for full resolution" alt="English language digital preservation poster" loading="lazy" /&gt;
&lt;figcaption&gt;click the image for full resolution&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Full resolution PDF (16 MB) can be found &lt;a href="2023-10-04-digital-preservation-vector.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Norwegian version&lt;span class="hx:absolute hx:-mt-20" id="norwegian-version"&gt;&lt;/span&gt;
&lt;a href="#norwegian-version" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;&lt;a href="2023-03-05-digital-bevaring-horisontal.jpg"&gt;&lt;figure&gt;
&lt;img src="https://digitalpreservation.no/blog/2024-09-30-poster/2023-03-05-digital-bevaring-horisontal.jpg" title="click the image for full resolution PDF (16 MB)" alt="Norwegian language digital preservation poster" loading="lazy" /&gt;
&lt;figcaption&gt;click the image for full resolution PDF (16 MB)&lt;/figcaption&gt;
&lt;/figure&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Full resolution PDF (16 MB) can be found &lt;a href="2023-03-05-digital-bevaring-horisontal.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;However, our poster showing our Grafana monitoring dashboards was accepted!&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Rearchiving 2 million hours of digital radio, a comprehensive process</title><link>https://digitalpreservation.no/blog/2024-08-28-rearchiving-2-million-hours-of-digital-radio/</link><pubDate>Wed, 28 Aug 2024 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2024-08-28-rearchiving-2-million-hours-of-digital-radio/</guid><description>
&lt;p&gt;The National Library is in the process of a major overhaul of its 2007 bit-repository, replacing it with a contemporary digital preservation system. This new solution is based on an in-house developed system called DPS (Digital Preservation Services), which uses IBM-HPSS as the underlying bit repository for data storage. This transition, which is expected to span over a couple of years, is necessary to ensure the long-term preservation and accessibility of the National Library&amp;rsquo;s digital collection.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2024-08-28-rearchiving-2-million-hours-of-digital-radio/radio.jpg" alt="Old radio" loading="lazy" /&gt;&lt;/p&gt;
&lt;h2&gt;Transition to a New Preservation Solution&lt;span class="hx:absolute hx:-mt-20" id="transition-to-a-new-preservation-solution"&gt;&lt;/span&gt;
&lt;a href="#transition-to-a-new-preservation-solution" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;In 2023, all new data archiving was transferred to the new DPS preservation solution. At this time, the old bit repository contained over 14 Petabytes of digitized and legally deposited historical material, which needs to be re-archived into DPS. A key part of this process involves analyzing and repackaging the historical data to meet the new DPS requirements.&lt;/p&gt;
&lt;h2&gt;Historical Legally Deposited Radio&lt;span class="hx:absolute hx:-mt-20" id="historical-legally-deposited-radio"&gt;&lt;/span&gt;
&lt;a href="#historical-legally-deposited-radio" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Among the materials to be re-archived are 2.2 million hours of digital radio, equivalent to 2.5 million files and a total of 1 Petabyte of data. This includes both born-digital and digitized radio programs from the period 1993-2022.&lt;/p&gt;
&lt;p&gt;In 1993, there were four radio channels delivering 16,500 hours of radio. By 2022, the number of radio channels had increased to 30, collectively delivering 150,000 hours of radio. With the phasing out of the old bit repository, it became necessary to move this data to the new preservation solution.&lt;/p&gt;
&lt;h2&gt;DSM to DPS: A Thorough Process&lt;span class="hx:absolute hx:-mt-20" id="dsm-to-dps-a-thorough-process"&gt;&lt;/span&gt;
&lt;a href="#dsm-to-dps-a-thorough-process" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;DSM (Digital Longterm Storage) has been the National Library&amp;rsquo;s internal management system for legally deposited radio for the past 20 years. The data has been stored in an Oracle HSM bit-repository in three instances (disk, tape, tape), and the radio material was fetched daily from various broadcasters. Some radio broadcasts were stored as mp3 and wav files, with accompanying checksum files. Other broadcasts were only stored as mp3.&lt;/p&gt;
&lt;p&gt;Before the re-archiving process began, it was decided to generate new MP4 playback files from the wav files to replace the varying qualities of the old mp3 files. The new MP4 files is not to be archived in DPS, as they are secured on the Wowza viewing platform. The new MP4 playback format chosen was 160kbit AAC with the M4A (audio) container. Fraunhofer FDK AAC (libfdk_aac) was used as the codec.&lt;/p&gt;
&lt;h2&gt;Technical Improvements and Metadata&lt;span class="hx:absolute hx:-mt-20" id="technical-improvements-and-metadata"&gt;&lt;/span&gt;
&lt;a href="#technical-improvements-and-metadata" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;All objects were packaged according to the &lt;a href="https://earkaip.dilcis.eu/"target="_blank" rel="noopener"&gt;E-ARK standard&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;, and metadata was generated in MODS format. File identification was performed using the tool Siegfried, and technical metadata is extracted using &lt;a href="https://mediaarea.net/en/MediaInfo"target="_blank" rel="noopener"&gt;Mediainfo&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;. Each file was carefully processed and documented, and &amp;ldquo;events&amp;rdquo; were created to record what had been done. These were stored in the DPS database.&lt;/p&gt;
&lt;h2&gt;Timeline and Experience&lt;span class="hx:absolute hx:-mt-20" id="timeline-and-experience"&gt;&lt;/span&gt;
&lt;a href="#timeline-and-experience" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Preliminary studies and mapping began in the fall of 2023, and the development of the re-archiving workflow, based on Apache NiFi, started in December 2023. The first version was put into operation on February 27, 2024, and by June 24, 2024, the re-archiving of 2.2 million radio programs was completed.&lt;/p&gt;
&lt;p&gt;The generation of new MP4 files was time-consuming, but an infrastructure running 35 parallel threads made it possible to achieve a re-archiving rate of 9 Terabytes per day.&lt;/p&gt;
&lt;h2&gt;Findings and Lessons Learned&lt;span class="hx:absolute hx:-mt-20" id="findings-and-lessons-learned"&gt;&lt;/span&gt;
&lt;a href="#findings-and-lessons-learned" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;During the process, discrepancies were found in 577 out of 2.2 million objects. None of the errors occurred while the data was stored in the Oracle HSM bit repository. Most errors are related to inadequate control routines when the material was first received and archived.&lt;/p&gt;
&lt;h3&gt;Checksum mismatch (4 cases)&lt;span class="hx:absolute hx:-mt-20" id="checksum-mismatch-4-cases"&gt;&lt;/span&gt;
&lt;a href="#checksum-mismatch-4-cases" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Objects failed due to a checksum mismatch, where the recorded checksum and the actual calculated checksum during re-archiving differed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; For each of these objects, all three copies in Oracle HSM (disk, tape, tape) were checked and found to be identical. This means the discrepancy between the actual checksum and the recorded checksum must have occurred before the object was first archived. For these four objects, we chose to include the mp3 file in the preserved object, as it had the correct checksum. Additionally, an &amp;ldquo;event&amp;rdquo; was recorded describing why we also preserved the mp3 file.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Null-byte file (1 case)&lt;span class="hx:absolute hx:-mt-20" id="null-byte-file-1-case"&gt;&lt;/span&gt;
&lt;a href="#null-byte-file-1-case" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; An object where the wav file was null byte, causing a checksum mismatch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; We checked all three copies in Oracle HSM and found they were identical. The discrepancy between the actual checksum and the recorded checksum must have occurred before the object was stored in Oracle HSM. For this object, we chose to include the mp3 file in the preserved object, as it had content and the correct checksum. An &amp;ldquo;event&amp;rdquo; was also recorded to explain why the mp3 file was preserved.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Corrupt file (1 case)&lt;span class="hx:absolute hx:-mt-20" id="corrupt-file-1-case"&gt;&lt;/span&gt;
&lt;a href="#corrupt-file-1-case" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; A file with the correct checksum that failed because we couldn&amp;rsquo;t extract &amp;ldquo;duration&amp;rdquo; from Mediainfo, revealing that the file was corrupt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; We chose to archive the corrupt file in the absence of alternatives and recorded an &amp;ldquo;event&amp;rdquo; describing our findings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;URN validation failed (507 cases)&lt;span class="hx:absolute hx:-mt-20" id="urn-validation-failed-507-cases"&gt;&lt;/span&gt;
&lt;a href="#urn-validation-failed-507-cases" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Objects with a URN that was not in the expected format. The URN validator extracts data from the URN and uses it to build MODS metadata for the object to be archived. In this case, the validator indicated that the URN content did not match what it expected. The discrepancy appears to be due to file delivery errors related to a broadcaster for a short period in October 2011.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; URNs were changed to the correct format.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Duplicate files (48 cases)&lt;span class="hx:absolute hx:-mt-20" id="duplicate-files-48-cases"&gt;&lt;/span&gt;
&lt;a href="#duplicate-files-48-cases" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; During the re-archiving process, it was discovered that 48 objects from DSM were already archived in DPS. This is related to the transition of radio production workflows on March 21, 2023. At that time, daily legal deposit of radio was moved from DSM to the new DPS. In this process, the first datasets were manually loaded into the new DPS to ensure the transition was correct. By mistake, they were also captured by the old production workflow and archived in DSM. The data from the transition day was therefore stored in both the new DPS and the old DSM. When we started re-archiving the radio, we encountered duplicate errors for the programs from that day since they were already archived in DPS.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; No action was needed as the data is correctly archived in DPS.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Duplicate files (16 cases)&lt;span class="hx:absolute hx:-mt-20" id="duplicate-files-16-cases"&gt;&lt;/span&gt;
&lt;a href="#duplicate-files-16-cases" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cause:&lt;/strong&gt; After re-archiving was completed, there were 16 objects that, for various reasons, remained in the queue marked as not archived but were actually archived in DPS. These are mostly related to a cleanup after a backup-related incident on March 1, 2024.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;/strong&gt; No action was needed as the data is correctly archived in DPS.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Statistics and Results&lt;span class="hx:absolute hx:-mt-20" id="statistics-and-results"&gt;&lt;/span&gt;
&lt;a href="#statistics-and-results" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The re-archiving process resulted in 2.1 million new MP4 files, totalling 143 TB of new playback files on nb.no, replacing 40 TB of the old MP3 files. In total, 2,183,478 archive packages were re-archived in the new DPS preservation environment.&lt;/p&gt;
&lt;p&gt;This work represents a significant improvement in the National Library&amp;rsquo;s ability to preserve and make available digital radio material for future generations.&lt;/p&gt;
&lt;h1&gt;Some numbers&lt;/h1&gt;&lt;p&gt;&lt;strong&gt;Statistics from DSM (old preservation environment)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MimeType&lt;/th&gt;
&lt;th style="text-align: right"&gt;Quantity&lt;/th&gt;
&lt;th style="text-align: right"&gt;Bytes&lt;/th&gt;
&lt;th style="text-align: right"&gt;TiB&lt;/th&gt;
&lt;th style="text-align: right"&gt;TB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;audio/mpeg&lt;/td&gt;
&lt;td style="text-align: right"&gt;24 997&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 436 936 706 369&lt;/td&gt;
&lt;td style="text-align: right"&gt;1&lt;/td&gt;
&lt;td style="text-align: right"&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;audio/x-wav&lt;/td&gt;
&lt;td style="text-align: right"&gt;2 158 485&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 053 241 022 424 650&lt;/td&gt;
&lt;td style="text-align: right"&gt;958&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 053&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text/plain&lt;/td&gt;
&lt;td style="text-align: right"&gt;358 462&lt;/td&gt;
&lt;td style="text-align: right"&gt;12 177 323&lt;/td&gt;
&lt;td style="text-align: right"&gt;0&lt;/td&gt;
&lt;td style="text-align: right"&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Statistics from DPS (new preservation environment)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th style="text-align: right"&gt;Num AIPs&lt;/th&gt;
&lt;th style="text-align: right"&gt;Bytes&lt;/th&gt;
&lt;th style="text-align: right"&gt;TiB&lt;/th&gt;
&lt;th style="text-align: right"&gt;TB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;radio-DK&lt;/td&gt;
&lt;td style="text-align: right"&gt;350 929&lt;/td&gt;
&lt;td style="text-align: right"&gt;47 733 509 049 163&lt;/td&gt;
&lt;td style="text-align: right"&gt;43&lt;/td&gt;
&lt;td style="text-align: right"&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;radio&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 832 549&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 007 674 770 422 690&lt;/td&gt;
&lt;td style="text-align: right"&gt;916&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 007&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td style="text-align: right"&gt;2 183 478&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 055 408 279 471 850&lt;/td&gt;
&lt;td style="text-align: right"&gt;959&lt;/td&gt;
&lt;td style="text-align: right"&gt;1 055&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;TiB (TebiByte) is bytes/1024 which is the exact unit of measurement for data volume.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;TB(TeraByte) is bytes/1000 which is the most commonly used unit of measurement for approximate data volume.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;2.1 million new MP4 (M4a) files were produced, which together amounted to 130TiB/143TB with new access files on nb.no. This replaced approx. 40TB with old MP3 access files.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://digitalpreservation.no/blog/2024-08-28-rearchiving-2-million-hours-of-digital-radio/radiohours.webp" alt="Radio hours per year" loading="lazy" /&gt;&lt;/p&gt;</description></item><item><title>Preferred Digital Formats at the National Library</title><link>https://digitalpreservation.no/blog/2024-07-05-preferred-digital-formats-at-the-national-library/</link><pubDate>Fri, 05 Jul 2024 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2024-07-05-preferred-digital-formats-at-the-national-library/</guid><description>
&lt;p&gt;This is an overview of the digital formats preferred for digital preservation at the National Library.&lt;/p&gt;
&lt;p&gt;The Digital Preservation Team has compiled a list of file formats which is preferred by the National Library for digital preservation. The format list is based on recommendations from the respective media departments within the National Library.&lt;/p&gt;
&lt;p&gt;The list includes both formats produced by the National Library itself and formats received from others.&lt;/p&gt;
&lt;p&gt;The preferred formats is a list of formats we ideally want and are striving for. The acceptable formats will also be accepted and taken care of without conversion or normalization.&lt;/p&gt;
&lt;p&gt;All other formats should be discussed with the relevant media departments within the National Library before they are accepted and included in the management system.&lt;/p&gt;
&lt;p&gt;The list is published in both Norwegian and English and can be found here:&lt;/p&gt;
&lt;p&gt;The Preferred file formats list is available &lt;a href="https://digitalpreservation.no/docs/formats/preferred-formats-en/"title="Link to our Preferred file formats list"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Digital Preservation Team plans to conduct annual reviews of the list.&lt;/p&gt;</description></item><item><title>Presentation from the Digital libraries: storage for now and forever conference</title><link>https://digitalpreservation.no/blog/2024-05-29-digital-storage-now-and-forever/</link><pubDate>Wed, 29 May 2024 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2024-05-29-digital-storage-now-and-forever/</guid><description>
&lt;p&gt;We were invited to talk at the &lt;a href="https://www.bn.org.pl/aktualnosci/5307-digital-libraries:-storage-for-now-and-forever.-konferencja-na-temat-przechowywania-zbiorow-w-bibliotekach-cyfrowych..html"target="_blank" rel="noopener"&gt;Digital libraries: storage for now and forever&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; conference hosted by the IFLA Preservation and Conservation Center at the National Library of Poland. The presentation is a crash course through digital storage and digital preservation practices at the National Library of Norway over the past 20 years. It concludes with our current status and our future plans.&lt;/p&gt;
&lt;p&gt;The recording of the event can be found below. Our presentation starts at 4h01m44s. The slides are available as a PDF &lt;a href="2024-05-29-IFLA-PAC-DIGIPRES.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;"&gt;
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/OIw_g36bCjw?autoplay=0&amp;amp;controls=1&amp;amp;end=0&amp;amp;loop=0&amp;amp;mute=0&amp;amp;start=14502" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"&gt;&lt;/iframe&gt;
&lt;/div&gt;</description></item><item><title>Ambitions, Goals, and Strategy for Digital Preservation at the National Library</title><link>https://digitalpreservation.no/blog/2024-02-20-strategy/</link><pubDate>Tue, 20 Feb 2024 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2024-02-20-strategy/</guid><description>
&lt;p&gt;The Digital Preservation Team at the National Library (NLN) has developed its first strategy for digital preservation.
This strategy aims to steer, structure, and sharpen the focus of our digital preservation efforts over the coming two years.&lt;/p&gt;
&lt;h2&gt;Strategy Design&lt;span class="hx:absolute hx:-mt-20" id="strategy-design"&gt;&lt;/span&gt;
&lt;a href="#strategy-design" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;In developing the strategy, we aimed for writing a short and precise document.
We wanted to avoid overwhelming the reader with a wall of text, or producing a verbose document that would be of little practical use.
The purpose of the strategy would be to serve as a guideline for making key decisions in our digital preservation work.&lt;/p&gt;
&lt;p&gt;Our design process was iterative, based on extensive review cycles.
Over a span of roughly two months, a dedicated working group of selected team and departmental members met weekly.
Each meeting began with a fresh review of the entire document’s text and structure, intentionally allowing time between sessions for individual reflection and idea maturation.&lt;/p&gt;
&lt;p&gt;The initial meetings saw significant changes to the document’s structure, but we gradually achieved a stable framework, shifting our focus to refining the wording and content.
It was also vital that the strategy remained flexible, avoiding putting constraints or lock-in in relation to external partners or financial considerations.
The resulting concise, one-page document provides valuable guidance and a clear direction for NLN&amp;rsquo;s digital preservation efforts.&lt;/p&gt;
&lt;p&gt;This strategy has been reviewed and ratified by the organization, initially by the team&amp;rsquo;s board of owners and subsequently by the NLN board of directors.&lt;/p&gt;
&lt;h2&gt;Structure of the Strategy Document&lt;span class="hx:absolute hx:-mt-20" id="structure-of-the-strategy-document"&gt;&lt;/span&gt;
&lt;a href="#structure-of-the-strategy-document" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The document is divided into five main sections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We begin with the rationale for digital preservation at the NLN, grounded in the mandates of Norwegian law.&lt;/li&gt;
&lt;li&gt;Next, we outline the NLN’s vision for digital preservation.
We use the concept of &lt;em&gt;national digital cultural heritage&lt;/em&gt; here to define our scope.
This concept highlights that our scope and responsibility extends beyond legally deposited materials, and also includes materials from other cultural institutions.&lt;/li&gt;
&lt;li&gt;We then outline four specific goals.
The goals aim at effective collection, secure storage, and active management of the digital collection, as well as ensuring current and future access to these materials.
Acknowledging that immediate accessibility is a precondition for long-term preservation.&lt;/li&gt;
&lt;li&gt;The document also sheds light on some major challenges facing the NLN when it comes to digital preservation, acknowledging the evolving nature of these obstacles.&lt;/li&gt;
&lt;li&gt;Finally, we specify three strategic priorities: enhancing knowledge of digital preservation in the organization, adhering to established standards, and employing tools and technology responsibly and sustainably.
These are underpinned by the &lt;a href="https://digitalpreservation.no/docs/principles/"title="Link to our principles page"&gt;current principles&lt;/a&gt; of digital preservation at the NLN.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Utilization of the strategy in the team&lt;span class="hx:absolute hx:-mt-20" id="utilization-of-the-strategy-in-the-team"&gt;&lt;/span&gt;
&lt;a href="#utilization-of-the-strategy-in-the-team" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;To make practical use of this strategy, the team has devised a roadmap outlining key digital preservation initiatives in the coming two years.
All activities in the roadmap are tied back to the three strategic priorities.&lt;/p&gt;
&lt;p&gt;Activities are prioritized and detailed at the monthly meeting with the board of owners, subsequently organized into &amp;ldquo;epics&amp;rdquo;&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; within Atlassian Jira.&lt;/p&gt;
&lt;p&gt;For each epic, we outline specific Jira tasks, clarifying objectives and approaches.
For activities with unclear scope, the first task will be a &amp;ldquo;spike&amp;rdquo;&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; where we try to break the activity into more manageable tasks, with clear specifications and &amp;ldquo;definition of done&amp;rdquo;&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;.
This creates a cohesive thread from the overarching strategy down to the specific tasks to be carried out by the digital preservation team.&lt;/p&gt;
&lt;figure&gt;&lt;img src="https://digitalpreservation.no/blog/2024-02-20-strategy/figure.webp"
alt="Figure showing excerpts from our strategy document, 2 year roadmap, monthly delivery plan, epics, and tasks on a kanban board."&gt;&lt;figcaption&gt;
&lt;p&gt;Figure of our strategy, roadmap, delivery plan, epics, and tasks&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This structured approach has proven effective.
It has set a clear direction for NLN’s digital preservation work, constantly reminding the team of our strategic goals through daily Jira tasks and activities.&lt;/p&gt;
&lt;p&gt;The strategy is available &lt;a href="https://digitalpreservation.no/docs/strategy/"title="Link to our strategy page"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&amp;ldquo;Epic.&amp;rdquo; The Agile Dictionary, &lt;a href="https://www.agiledictionary.org/309/epic/"target="_blank" rel="noopener"&gt;agiledictionary.org/309/epic/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;. Accessed 19 Feb. 2024.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&amp;ldquo;Spike.&amp;rdquo; The Agile Dictionary, &lt;a href="https://agiledictionary.com/209/spike/"target="_blank" rel="noopener"&gt;agiledictionary.com/209/spike/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;. Accessed 19 Feb. 2024.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&amp;ldquo;Definition of done.&amp;rdquo; The Agile Dictionary, &lt;a href="https://www.agiledictionary.org/8/definition-of-done/"target="_blank" rel="noopener"&gt;agiledictionary.org/8/definition-of-done/&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;. Accessed 19 Feb. 2024.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Team Digital Preservation</title><link>https://digitalpreservation.no/blog/2023-10-26-presentation/</link><pubDate>Thu, 26 Oct 2023 00:00:00 +0000</pubDate><guid>https://digitalpreservation.no/blog/2023-10-26-presentation/</guid><description>
&lt;p&gt;In June 2022, a dedicated team was established at &lt;a href="https://nb.no/"title="National Library of Norway homepage"target="_blank" rel="noopener"&gt;The National Library of Norway&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; to manage the preservation of its digital collection. The team is responsible for handling all types of digital material, whether digitized from analog sources or born-digital. This includes media formats such as websites, text documents, images, audio, and moving images.&lt;/p&gt;
&lt;p&gt;Areas of responsibility include managing long-term digital preservation solutions and working across the entire process: ingest, quality control, storage, preservation, and access. Data included for long term preservation typically consists of large, high-quality files, as opposed to compressed access copies.&lt;/p&gt;
&lt;p&gt;The Digital Preservation Team collaborates closely with several other specialized media teams within the institution. In addition to receiving digital material covered by the &lt;a href="https://lovdata.no/dokument/NL/lov/1989-06-09-32"target="_blank" rel="noopener"&gt;Legal Deposit Act&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;, the National Library of Norway also produces large vulumes of data through digitization efforts. This includes both material from its own collections and from institutions across the archive, library, and museum (ALM) sector.&lt;/p&gt;
&lt;p&gt;The team is a members of the &lt;a href="https://www.dpconline.org/"title="Digital Preservation Coalition homepage"target="_blank" rel="noopener"&gt;Digital Preservation Coalition&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Organisation&lt;span class="hx:absolute hx:-mt-20" id="organisation"&gt;&lt;/span&gt;
&lt;a href="#organisation" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;The &lt;a href="https://www.nb.no/en/digital-preservation"title="Short page about Digital Preservation at NLN"target="_blank" rel="noopener"&gt;Digital Preservation&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; team consist of 8 members:&lt;/p&gt;
&lt;div class="hextra-cards hx:mt-4 hx:gap-4 hx:grid not-prose" style="--hextra-cards-grid-cols: 6;"&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/trond-teigen-191954ab"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Trond Teigen"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/trond_hu_1cfda489f5c1efe1.webp"
width="250"
height="250"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Trond Teigen&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Team lead&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/torbj%c3%b8rn-pedersen-57617b227b"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Torbjørn Bakken Pedersen"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/torbjorn_hu_f74a58634c43f9f2.webp"
width="250"
height="250"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Torbjørn Bakken Pedersen&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Product lead&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/thomasedvardsen"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Thomas Edvardsen"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/thomas_hu_c2bc21d1cc3052f6.webp"
width="250"
height="250"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Thomas Edvardsen&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Tech lead&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/vigdis-s%c3%b8rensen-8a3618a6"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Vigdis Marie Sørensen"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/vigdis_hu_a37868f757f30862.webp"
width="250"
height="250"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Vigdis Marie Sørensen&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Senior platform developer&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/siarhei-kulakou-0702ba245"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Siarhei Kulakou"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/siarhei_hu_c1564435eca66959.webp"
width="250"
height="250"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Siarhei Kulakou&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Application developer&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.linkedin.com/in/johannes-karlsen-476197267"
target="_blank" rel="noreferrer"&gt;&lt;img
alt="Johannes Karlsen"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/johannes2.0_hu_661101a6a5098a4e.webp"
width="250"
height="247"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Johannes Karlsen&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Application developer&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"&gt;&lt;img
alt="Lise-Lotte Melkild"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/LiseLotte_hu_3323e358a71fb12d.webp"
width="250"
height="251"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Lise-Lotte Melkild&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Metadata specialist&lt;/div&gt;&lt;/a&gt;
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"&gt;&lt;img
alt="Sandra Kråkstad"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://digitalpreservation.no/images/team/sandra_hu_4d36e6025830c859.webp"
width="250"
height="255"
/&gt;&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50"&gt;Sandra Kråkstad&lt;/span&gt;&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2"&gt;Metadata specialist&lt;/div&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;This team reports to a committee of leaders responsible for this area in the National Library. The members are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;IT Director (Product owner)&lt;/li&gt;
&lt;li&gt;Director of Digitalizing Cultural Heritage&lt;/li&gt;
&lt;li&gt;Head of Metadata Standards Development Section&lt;/li&gt;
&lt;li&gt;Head of IT Platform Section&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The National Library’s digital collection in numbers&lt;span class="hx:absolute hx:-mt-20" id="the-national-librarys-digital-collection-in-numbers"&gt;&lt;/span&gt;
&lt;a href="#the-national-librarys-digital-collection-in-numbers" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Over 2 billion files&lt;/li&gt;
&lt;li&gt;More than 100 different file formats&lt;/li&gt;
&lt;li&gt;18 Petabytes of data (that’s 18,000 Terabytes!) stored in 3 copies&lt;/li&gt;
&lt;li&gt;The largest single file is 2.5 Terabytes&lt;/li&gt;
&lt;li&gt;Daily ingest of new material averages over 6 Terabytes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Data volume by type&lt;span class="hx:absolute hx:-mt-20" id="data-volume-by-type"&gt;&lt;/span&gt;
&lt;a href="#data-volume-by-type" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Video and television: 22%&lt;/li&gt;
&lt;li&gt;Film: 21%&lt;/li&gt;
&lt;li&gt;Newspapers: 19%&lt;/li&gt;
&lt;li&gt;Web Archive: 16%&lt;/li&gt;
&lt;li&gt;Radio and audio: 12%&lt;/li&gt;
&lt;li&gt;Books: 8%&lt;/li&gt;
&lt;li&gt;Photos: 2%&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Technology choices used when working with digital preservation&lt;span class="hx:absolute hx:-mt-20" id="technology-choices-used-when-working-with-digital-preservation"&gt;&lt;/span&gt;
&lt;a href="#technology-choices-used-when-working-with-digital-preservation" class="subheading-anchor" aria-label="Permalink for this section"&gt;&lt;/a&gt;&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kafka.apache.org"title="Apache Kafka&amp;#39;s homepage"target="_blank" rel="noopener"&gt;Apache Kafka&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for sending messages between systems&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nifi.apache.org"title="Apache NiFi&amp;#39;s homepage"target="_blank" rel="noopener"&gt;Apache NiFi&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for running the data flows that validate, move, and package data&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mariadb.org"title="MariaDB&amp;#39;s homepage"target="_blank" rel="noopener"&gt;MariaDB&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; as the database engine&lt;/li&gt;
&lt;li&gt;&lt;a href="https://digital-preservation.github.io/droid"title="DROID&amp;#39;s homepage"target="_blank" rel="noopener"&gt;DROID&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; and &lt;a href="https://github.com/richardlehane/siegfried"target="_blank" rel="noopener"&gt;Siegfried&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for file format identification&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com"title="Grafana&amp;#39;s homepage"target="_blank" rel="noopener"&gt;Grafana&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for statistics and reporting&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.hpss-collaboration.org"title="HPSS&amp;#39;s homepage"target="_blank" rel="noopener"&gt;IBM High Performance Storage System (HPSS)&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; as bit repository&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.centos.org"title="CentOS&amp;#39;s homepage"target="_blank" rel="noopener"&gt;CentOS&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; Linux as server platform&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/keeps/commons-ip"target="_blank" rel="noopener"&gt;CommonsIP&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt; for packaging and validating archival packages using the &lt;a href="https://dilcis.eu/"target="_blank" rel="noopener"&gt;E-ARK standard&lt;svg class="hx:inline hx:rtl:rotate-270 hx:align-baseline" height="1em" fill="none" stroke="currentColor" stroke-width="2" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"&gt;
&lt;path d="m9.1716 7.7574h7.0711m0 0v7.0711m0-7.0711-8.4853 8.4853" stroke-linecap="round" stroke-linejoin="round"/&gt;
&lt;/svg&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>