(2017/5/29)Notice: The submit deadline is extended for one week, and the due is June,7 12:00 AM, UTC+8.

(2017/5/1)Notice: The Testing dataset is released, which can be downloaded here. Evaluation details are updated, see Evaluation. Some confused annotations in the training dataset have been corrected, see Dataset.

(2017/4/19)Notice: The Training dataset is released, which can be downloaded here.

(2017/3/20)Notice: The starting time of competition have been extended to April 20. Samples of the dataset can be downloaded here.


Page Object Detection(POD) is to detect the specific page objects(e.g. tables, formulas, figures(including charts)) in document images. The dataset contains more than 2000 images with various kinds of page objects. Participants are required to detect these page objects in the provided dataset.


Document Image Understanding (DIU) is an interesting research area with a large variety of challenging problems, which has been receiving increasing attention not only from the document analysis and recognition community, but also from the database and information extraction (IE) communities. Researchers have worked for decades on this topic, as witnessed by scientific literatures. Document image understanding is the task of deriving a high level presentation of the contents of a document image, which involves several phases, mainly including page segmentation (or block segmentation), blocks classification (or blocks labeling) and several operations for processing text, tables, graphics, figures, formulas, etc. Page segmentation, also known as "layout analysis" or "page decomposition", is the process by which a document page image is decomposed into its structural and logical units, such as images, tables, paragraphs, line-art regions, etc. This process is critical for a variety of document image analysis applications. Blocks classification aims at producing a description of the geometric structure of the document, identifying the different logical roles of the detected regions (paragraphs, tables, mathematical equations, figures, etc) and the spatial relationships among them. This competition focuses on the first two phases of document image understanding, locating the logical objects in document pages. The targeted page objects of this competition includes formulas, tables, and images or graphs(including charts). The objective of this competition is to compare the relative advantages of different types of approaches and find the state-of-the-art methods.


The POD competition consists of four tasks:
  1. Detection of formulas
  2. Detection of tables
  3. Detection of figures
  4. Detection of page objects

The first three tasks aim at individual page objects, and the fourth aims all the three kinds of page objects.


The competition dataset consists of 2000 English document page images selected from 1500 scientific papers of CiteSeer. The dataset shows good variety in both page layout styles and object styles, for more information, see Dataset.


The Intersection over Union (IOU) measurement is utilized to estimate whether a objects detected by participant is correctly located or not, and the integrated results are judged by Mean Average Precision(mAps), which is generally used in natural scene image object detection competition. In addition, we also take the F1 metric into consideration, a results ranked by F1 metric will also be reported. For more information, see Evaluation.

Important dates

  • Competition announcement: February 1, 2017
  • Release samples with ground truths of the Competition: March 20, 2017
  • Release the Competition Training Dataset: April 1, 2017 April 20, 2017
  • Release the Competition Testing Dataset: April 10, 2017 May 1, 2017
  • Deadline of results submission: May 10, 2017 May 30, 2017 June 7,2017
  • Release the Annotations of Testing Dataset and Evaluation Tools: June 30, 2017
  • Contact

    Xiaohan Yi(chlxyd@pku.edu.cn)

    Content Protection and Document Processing(CPDP), Institute of Computer Science and Technology of Peking University