Compare two xml documents extracted from ms word.

PosterContent
nk4um User
Posts: 5
March 16, 2012 08:59Ignore some node attribute from comparision

Hi

I am comparing two OOXML files using deltaxml core.One issue I am facing is if there are two text nodes and one is having attribute xml:space="preserve" then the text content gets compared and shows difference in delta xml output,even though the content is same. Is there any ,through which we can ignore certain node attributes to be ignored from comparision by delta xml.

Thanks Partha

Like · Post Reply
nk4um Administrator
Posts: 42
March 1, 2012 16:13

Posted by roboasimo (View)
While comparing two xml documents extracted from ms word the trace is not accurate. Comparison is taking place on "Best fit algo" basis. So I formatted the tree structure, In output i am loosing my word related formatting tags.All table structure is lost. My question is - "Can we maintain all ms word related tags in document.xml By using filter or some xslt?"

Word xml has a complex structure and comparing it using DeltaXML will show the changes in the XML precisely though it may not always match the two documents together in the way a human may expect. I am not quite sure what you mean when you say it is "not accurate" - we test DeltaXML by extracting from the delta each of the original files, so we are sure that the delta is correct.

Some further processing is needed to convert the delta back into xml that preserves the table structure and could be read back into Word. All the ms word related tags should be there in the delta and could be processed to generate a new valid word xml using xslt, though it would not be trivial to do so. We have specific solutions for DITA, DocBook and ODF (OpenDocument Format), all of which handle tables, but we have not had demand for Word xml because the comparison features of Word can be used for that.

I would be interested to know what you are trying to achieve by comparing the xml extracted from Word, and why you are finding that the Word 'compare documents' feature is not sufficient.

Robin

Like · Post Reply
nk4um User
Posts: 5
February 29, 2012 14:29Compare two xml documents extracted from ms word.

While comparing two xml documents extracted from ms word the trace is not accurate. Comparison is taking place on "Best fit algo" basis. So I formatted the tree structure, In output i am loosing my word related formatting tags.All table structure is lost. My question is - "Can we maintain all ms word related tags in document.xml By using filter or some xslt?"

Like · Post Reply