Page 1 of 1

Multi-part UCS-2 message with surrogate pairs not working

Posted: Mon Feb 27, 2017 8:02 am
by striemer
Hi All,
While testing our SMPP functionality I have sent a text message which contains all emojis (codes 1F601-1F64F) using UCS-2 encoding.
As emojis are outside the 16-Bit range they are encoded with surrogate pairs (ie. 2 16-Bit chars per emoji).
First surrogate char is 0xD83D and the second one in the range 0xDE00-0xDE4F.

When looking at the resulting SMS, one of the emojis is displayed as unknown symbol. This seems to happen because the surrogate pair is split in the middle, leaving 2 invalid unicode characters (ie. the last char of the first message and the first char of the last message are invalid, hence they are replaced with the replacement char '\uFFFD' or 65533 in decimal.

Code:

Code: Select all

 ISubmitSmBuilder builder = SMS.ForSubmit()
              .From(sourceAddress, sourceTon, sourceNpi)
              .To(destinationAddress, destinationTon, destinationNpi)
              .Coding(dataCoding)
              .Text(text)
              .ExpireIn(GatewayConfig.SmsExpiry);
Where dataCoding is DataCodings.UCS2 and text is a UTF-16 string.

string of first part of message as char array:
[0] 55357 '☐' char
[1] 56832 '☐' char
[2] 55357 '☐' char
[3] 56833 '☐' char
[4] 55357 '☐' char
......

[61] 56862 '☐' char
[62] 55357 '☐' char
[63] 56863 '☐' char
[64] 55357 '☐' char
[65] 56864 '☐' char
[66] 65533 '�' char <= Was first half of surrogate pair, is now replacement char.

Second message part:
[0] 65533 '�' char <= Was second half of surrogate pair, is now replacement char.
[1] 55357 '☐' char
[2] 56866 '☐' char
[3] 55357 '☐' char
[4] 56867 '☐' char

Original message:

[60] 55357 '☐' char
[61] 56862 '☐' char
[62] 55357 '☐' char
[63] 56863 '☐' char
[64] 55357 '☐' char
[65] 56864 '☐' char
[66] 55357 '☐' char <= This pair gets split
[67] 56865 '☐' char <=
[68] 55357 '☐' char
[69] 56866 '☐' char
[70] 55357 '☐' char

This is obviously an edge case but it would be great if the splitting would consider surrogate pairs especially as all emoji characters use surrogates.
Multi-part UCS-2 messages which don't use surrogate pairs work fine btw.

Does anyone have an idea how this can be fixed ?

Thanks and best regards,
Stefan

Re: Multi-part UCS-2 message with surrogate pairs not workin

Posted: Tue Feb 28, 2017 9:30 pm
by alt
Thank you for reporting this bug to me.
I'll try to find a solution for this issue.

Re: Multi-part UCS-2 message with surrogate pairs not workin

Posted: Tue Feb 28, 2017 11:40 pm
by striemer
Thanks very much for the quick response, much appreciated.

Best regards,
Stefan

Re: Multi-part UCS-2 message with surrogate pairs not workin

Posted: Thu Mar 09, 2017 11:40 pm
by striemer
Hi, do you have a rough idea when a fix for this issue will be available ? We are getting close to releasing our SMS messaging functionality and would need a fix for this before the release date.

Thanks and best regards,
Stefan

Re: Multi-part UCS-2 message with surrogate pairs not workin

Posted: Tue Mar 14, 2017 9:10 pm
by alt
Hi Stefan

I have fixed this issue in the version 1.1.29.1

Re: Multi-part UCS-2 message with surrogate pairs not workin

Posted: Wed Mar 15, 2017 5:58 am
by striemer
Awesome, thanks very much for the quick turnaround.
I have tested and confirm it works as expected.

Best regards,
Stefan