How to ignore certain attributes of node from getting compared by deltaxml

PosterContent
nk4um Administrator
Posts: 42
March 20, 2012 16:36

Posted by roboasimo (View)
I am using below command for generating deltaxml

deltaxml compare wbw base.xml revised.xml delta-1-2-3.xml word-by-word=true convert-to-html=false

Now any idea,how I can add -ignore switch to the same command without removing the wbw switch.

I am assuming from this that you are using the word-by-word.dxp in the samples/WordByWord directory.

Therefore you need to add a parameter to the DXP file (this is documented here [ http://www.deltaxml.com/products/core/docs/guide-to-dxp.html ]), by editing it as follows.

First add

<booleanParametername="processPreserveSpaceText-value" defaultValue="true" />

inside <pipelineParameters>, so now it looks like this:

<pipelineParameters>
  <booleanParametername="punctuation" defaultValue="false" />
  <booleanParametername="word-by-word" defaultValue="false" />
  <booleanParametername="space-fixup" defaultValue="false" />
  <booleanParametername="orphaned-words" defaultValue="false" />
  <booleanParametername="convert-to-html" defaultValue="true" />
  <booleanParametername="processPreserveSpaceTextValue" defaultValue="true" />
  <stringParametername="orphan-length" defaultValue="2" />
  <stringParametername="orphan-threshold" defaultValue="20" />
</pipelineParameters>

Then change the word-by-word input filter lines in the DXP file to this:

<filterif="word-by-word">
  <classname="com.deltaxml.pipe.filters.dx2.wbw.WordInfilter" />
  <parametername="processPreserveSpaceText" parameterRef="processPreserveSpaceTextValue" />
</filter>

Then from the command line you can provide a value for the parameter if you do not want the default value of true, for example:

deltaxml compare wbw base.xml revised.xml delta-1-2-3.xml word-by-word=true convert-to-html=false processPreserveSpaceTextValue=false

I hope this answers your question.

Robin

Like · Post Reply
nk4um User
Posts: 5
March 19, 2012 11:30

I am using below command for generating deltaxml

deltaxml compare wbw base.xml revised.xml delta-1-2-3.xml word-by-word=true convert-to-html=false

Now any idea,how I can add -ignore switch to the same command without removing the wbw switch.

Like · Post Reply
nk4um Administrator
Posts: 42
March 16, 2012 16:36

Posted by roboasimo (View)
I got what you wanted say about splitting of elements.but in this case,there is no splitting in any of the input. below are the sample and expected output so that we get a clear picture.

The text in the A document is being split into words inside the pipeline of filters (by WordInfilter) and then the markup that splits the words is removed (by WordOutFilter). This is why the output does not show the split words. WordInFilter does not split the words up if it detects the attribute xml:space="preserve" unless you set a parameter to tell it to do so (setProcessPreserveSpaceText). You can also do this in the DXP file like this:

<filterif="word-by-word">
  <classname="com.deltaxml.pipe.filters.dx2.wbw.WordInfilter" />
  <parametername="processPreserveSpaceText" literalValue="true" />
</filter>

The EXPECTED OUTPUT that you show is not what we would expect because the attribute xml:space="preserve" is only in the B document, and this is shown in the actual output. To get the expected output that you have shown you need to use the methods suggested by Nigel to ignore changes (his last link).

I hope this will help you to get what you need.

Robin

Like · Post Reply
nk4um User
Posts: 5
March 16, 2012 14:04

I got what you wanted say about splitting of elements.but in this case,there is no splitting in any of the input. below are the sample and expected output so that we get a clear picture.

Base Content :

    <w:p w:rsidR="001E64B3" w:rsidP="001E64B3">  
     <w:r>
       <w:t>My test file Federated</w:t>
     </w:r>
   </w:p>

Revised Content:

    <w:p w:rsidR="001E64B3" w:rsidP="001E64B3">    
     <w:r>
       <w:t xml:space="preserve">My test file Federated</w:t>
     </w:r>
   </w:p>

EXPECTED OUTPUT:

<w:p deltaxml:deltaV2="A=B" w:rsidR="001E64B3" w:rsidP="001E64B3">    
     <w:r>
       <w:t xml:space="preserve">My test file Federated</w:t>
     </w:r>
 </w:p>

ACTUAL DeltaXML Output:

     <w:p deltaxml:deltaV2="A!=B" w:rsidR="001E64B3" w:rsidP="001E64B3"><w:r deltaxml:deltaV2="A!=B">
       <w:t deltaxml:deltaV2="A!=B">
         <deltaxml:attributes deltaxml:deltaV2="B">
           <dxx:space deltaxml:deltaV2="B">
             <deltaxml:attributeValue deltaxml:deltaV2="B">preserve</deltaxml:attributeValue>
           </dxx:space>
         </deltaxml:attributes>
         <deltaxml:textGroup deltaxml:deltaV2="A">
           <deltaxml:text deltaxml:deltaV2="A">My</deltaxml:text>
         </deltaxml:textGroup>
         <deltaxml:textGroup deltaxml:deltaV2="B">
           <deltaxml:text deltaxml:deltaV2="B">My test file Federated</deltaxml:text>
         </deltaxml:textGroup>
         <deltaxml:textGroup deltaxml:deltaV2="A">
           <deltaxml:text deltaxml:deltaV2="A"> test file Federated</deltaxml:text>
         </deltaxml:textGroup>
       </w:t>
     </w:r>
   </w:p>

Thanks Partha

Like · Post Reply
nk4um Administrator
Posts: 138
March 16, 2012 13:17Word filtering and xml:space interaction

Hi Partha,

I am guessing the pipeline you are using for OOXML includes WordInfilter. If so what is happening is that the xml:space="preserve" attribute is causing the text to be broken up into a sequence of word elements in one input and this sequence is being compared to the complete text in the other. In this case there will be a mismatch.

There are a few possible solutions:

If you use the WordInfilter method in the first bullet above you will still have a change because there is an attribute deletion. If you need to then ignore that change, then please take a look at:

http://www.deltaxml.com/products/core/docs/ignore-changes.html

Hope this helps,

Nigel

Like · Post Reply
nk4um User
Posts: 5

Hi

I am comparing two OOXML files using deltaxml core.One issue I am facing is if there are two text nodes and one is having attribute xml:space="preserve" then the text content gets compared and shows difference in delta xml output,even though the content is same. Is there any ,through which we can ignore certain node attributes to be ignored from comparision by delta xml.

Thanks Partha

Like · Post Reply