View Ticket
Not logged in
Ticket UUID: b6579b1ec167413cb59a0d8b7888f275478fb0c5
Title: Update with a merge fails on unicode files
Status: Open Type: Code_Defect
Severity: Important Priority:
Subsystem: Resolution: Open
Last Modified: 2010-12-13 21:35:16
Version Found In: cf178577ec
Description & Comments:
Updating a file with a merge fails if the file is a UTF-16 file.
This is what I get when I try to update and merge an UTF-16 file changed in two different checkouts:

MERGE RightToLeft/RightToLeft/Localization/HebrewConverter.cs
***** Cannot merge binary file RightToLeft/RightToLeft/Localization/HebrewConverter.cs

I guess it is the same for UTF-8, but I didn't test this.
For developers outside of the USA it is very important to have full Unicode support for all operations concerning the versioned content of the repository.


drh added on 2010-11-24 20:13:01:
UTF8 should work fine. It is only UTF16 that is not supported. Fossil is seeing the 0x00 bytes in some of the characters and thinks it is dealing with a binary file rather than a text file.

I suppose the "diff" logic (which is essential to doing a merge) could be enhanced to deal with UTF16. Would the poster care to suggest a patch?


anonymous added on 2010-12-13 21:35:16:
If it would be a straight small patch, I'd love to submit a patch, but I think this would be a major rewrite because all string operations which rely on on a single zero byte to terminate a string will fail with UTF-16 characters. I guess that's why break_into_lines returns 0 as soon as it encounters any 0 byte.

Obviously diffing two files executes a lot of string operations and to find and change all of them without braking the standard behaviour seems to be a task for a lot of long, long winter evenings. ;-)

I'll change my sources which require Unicode support to UTF-8 where possible and live with the risk of not beeing able to merge source where UTF-8 is not an option.

Nevertheless, thanks for this great piece of software.