International Journal of Advanced Engineering, Management and Science

A New Structural Similarity Measure: Clustering of Multi-Structured Documents

( Vol-3,Issue-6,June 2017 )

Author(s): Ali Idarrou

Page No: 681-689
DOI: 10.24001/ijaems.3.6.11


multimedia document, structural clustering, structural similarity measure.


This paper is part of the continuity of our work on the structural clustering of multi-structured multimedia documents. One of the major problems of our work is how to compare two multi-structured documents, and therefore to compare document structures to be able to identify the resemblances between structures and transformation rules of a structure to another (evaluation of a processing cost). We have defined a new structural similarity measure for identifying common substructures in two multimedia documents, taking into account constraints of such documents (relations between components, order of components, etc). In our previous work, we have studied the impact of the sub-process of "filtering" of our clustering process on the quality of the generated classes. In this work, we describe the sub-processes of transformation of a structure to another and we propose a measure for evaluating the cost of a structural transformation. We evaluate our approach on a corpus of documents extracted randomly from the INEX 2007 corpus and a corpus composed of the notices of books (in XML format) from the library of the Toulouse 1 Capitole University.

