損傷した歴史文書の元の外観を予測する

要旨

歴史的文書は文化的な宝を含んでいますが、文字の欠落、紙の損傷、インクの浸食などの深刻な損傷に苦しんでいます。しかしながら、既存の文書処理方法は主に二値化、強調などに焦点を当てており、これらの損傷の修復を無視しています。このため、私たちは、損傷した歴史的文書の元の姿を予測することを目的とする新しいタスク、Historical Document Repair (HDR) を提案します。この分野の空白を埋めるために、大規模なデータセットHDR28Kと歴史的文書修復用の拡散ベースのネットワークDiffHDRを提案します。具体的には、HDR28Kには28,552の損傷修復画像ペアが含まれており、文字レベルの注釈と複数のスタイルの劣化があります。さらに、DiffHDRは、セマンティックおよび空間情報と、文脈的および視覚的整合性のための緻密に設計された文字知覚損失を使用して、バニラの拡散フレームワークを拡張しています。実験結果は、提案されたHDR28Kで訓練されたDiffHDRが既存の手法を大幅に上回り、実際の損傷した文書の処理において優れた性能を発揮することを示しています。特筆すべきは、DiffHDRは文書の編集やテキストブロックの生成にも拡張でき、その高い柔軟性と汎用性を示しています。この研究が文書処理の新たな方向を切り開き、貴重な文化と文明の継承に貢献すると信じています。データセットとコードはhttps://github.com/yeungchenwa/HDRで入手可能です。

English

Historical documents encompass a wealth of cultural treasures but suffer from severe damages including character missing, paper damage, and ink erosion over time. However, existing document processing methods primarily focus on binarization, enhancement, etc., neglecting the repair of these damages. To this end, we present a new task, termed Historical Document Repair (HDR), which aims to predict the original appearance of damaged historical documents. To fill the gap in this field, we propose a large-scale dataset HDR28K and a diffusion-based network DiffHDR for historical document repair. Specifically, HDR28K contains 28,552 damaged-repaired image pairs with character-level annotations and multi-style degradations. Moreover, DiffHDR augments the vanilla diffusion framework with semantic and spatial information and a meticulously designed character perceptual loss for contextual and visual coherence. Experimental results demonstrate that the proposed DiffHDR trained using HDR28K significantly surpasses existing approaches and exhibits remarkable performance in handling real damaged documents. Notably, DiffHDR can also be extended to document editing and text block generation, showcasing its high flexibility and generalization capacity. We believe this study could pioneer a new direction of document processing and contribute to the inheritance of invaluable cultures and civilizations. The dataset and code is available at https://github.com/yeungchenwa/HDR.

損傷した歴史文書の元の外観を予測する

Predicting the Original Appearance of Damaged Historical Documents

要旨

Summary

Support

Support