How to compare Chinese content

PosterContent
nk4um User
Posts: 5
June 20, 2012 03:17

Thank you for your information.

Hope I can get DITA version ASAP.

Posted by nigelw (View)
Hello again,

A quick status update - The Core 6.3 release has been released with the improved WordInfilter. DITA and DocBook releases will be next.

Thanks,

Nigel

Like · Post Reply
nk4um Administrator
Posts: 158
June 19, 2012 15:19Status update - now in Core

Hello again,

A quick status update - The Core 6.3 release has been released with the improved WordInfilter. DITA and DocBook releases will be next.

Thanks,

Nigel

Like · Post Reply
nk4um Administrator
Posts: 158
April 12, 2012 15:13We are working on Unicode text segmentation

Hello,

Thanks for the feedback; we know about this issue. We are currently working on implementing the algorithm specified in Unicode Appendix 29:

http://unicode.org/reports/tr29/

This will replace or supplement our current word-splitting approach which is based on spaces and simple regular expressions, which as you point out isn't appropriate for some languages/locales.

Initial testing has given us good results (the results for western text are also improved using this algorithm). We are planning to make a new/additional WordInfiter available in our Core toolkit product's next release (6.3, due fairly soon). We then plan to add the capability for our DITA, DocBook format specific products. Unfortunately, it has not been possible to add it our soon to be released DITA 4.0 product, but it should be available in the next minor update release.

We have discussed possible work-arounds, but they are not easy to deploy with the DITA product. We may introduce a beta release and would welcome testing and feedback from Chinese and other non-western language users.

Thanks,

Nigel

Like · Post Reply
nk4um User
Posts: 5
April 11, 2012 04:08How to compare Chinese content

Hi, I tested the DeltaXML for DITA. It is useful for English content. Because there is no space among Chinese word, when I compare the dita file with Chinese content, the tool deal with a sentence as a word, the result is not stasified for us. Do you have any suggestion?

Like · Post Reply