Multi-part UCS-2 message with surrogate pairs not working

.NET library for SMPP protocol
Locked
striemer
Posts: 6
Joined: Mon Feb 27, 2017 7:03 am

Multi-part UCS-2 message with surrogate pairs not working

Post by striemer » Mon Feb 27, 2017 8:02 am

Hi All,
While testing our SMPP functionality I have sent a text message which contains all emojis (codes 1F601-1F64F) using UCS-2 encoding.
As emojis are outside the 16-Bit range they are encoded with surrogate pairs (ie. 2 16-Bit chars per emoji).
First surrogate char is 0xD83D and the second one in the range 0xDE00-0xDE4F.

When looking at the resulting SMS, one of the emojis is displayed as unknown symbol. This seems to happen because the surrogate pair is split in the middle, leaving 2 invalid unicode characters (ie. the last char of the first message and the first char of the last message are invalid, hence they are replaced with the replacement char '\uFFFD' or 65533 in decimal.

Code:

Code: Select all

 ISubmitSmBuilder builder = SMS.ForSubmit()
              .From(sourceAddress, sourceTon, sourceNpi)
              .To(destinationAddress, destinationTon, destinationNpi)
              .Coding(dataCoding)
              .Text(text)
              .ExpireIn(GatewayConfig.SmsExpiry);
Where dataCoding is DataCodings.UCS2 and text is a UTF-16 string.

string of first part of message as char array:
[0] 55357 '☐' char
[1] 56832 '☐' char
[2] 55357 '☐' char
[3] 56833 '☐' char
[4] 55357 '☐' char
......

[61] 56862 '☐' char
[62] 55357 '☐' char
[63] 56863 '☐' char
[64] 55357 '☐' char
[65] 56864 '☐' char
[66] 65533 '�' char <= Was first half of surrogate pair, is now replacement char.

Second message part:
[0] 65533 '�' char <= Was second half of surrogate pair, is now replacement char.
[1] 55357 '☐' char
[2] 56866 '☐' char
[3] 55357 '☐' char
[4] 56867 '☐' char

Original message:

[60] 55357 '☐' char
[61] 56862 '☐' char
[62] 55357 '☐' char
[63] 56863 '☐' char
[64] 55357 '☐' char
[65] 56864 '☐' char
[66] 55357 '☐' char <= This pair gets split
[67] 56865 '☐' char <=
[68] 55357 '☐' char
[69] 56866 '☐' char
[70] 55357 '☐' char

This is obviously an edge case but it would be great if the splitting would consider surrogate pairs especially as all emoji characters use surrogates.
Multi-part UCS-2 messages which don't use surrogate pairs work fine btw.

Does anyone have an idea how this can be fixed ?

Thanks and best regards,
Stefan
alt
Site Admin
Posts: 985
Joined: Tue Apr 25, 2006 9:45 am

Re: Multi-part UCS-2 message with surrogate pairs not workin

Post by alt » Tue Feb 28, 2017 9:30 pm

Thank you for reporting this bug to me.
I'll try to find a solution for this issue.
striemer
Posts: 6
Joined: Mon Feb 27, 2017 7:03 am

Re: Multi-part UCS-2 message with surrogate pairs not workin

Post by striemer » Tue Feb 28, 2017 11:40 pm

Thanks very much for the quick response, much appreciated.

Best regards,
Stefan
striemer
Posts: 6
Joined: Mon Feb 27, 2017 7:03 am

Re: Multi-part UCS-2 message with surrogate pairs not workin

Post by striemer » Thu Mar 09, 2017 11:40 pm

Hi, do you have a rough idea when a fix for this issue will be available ? We are getting close to releasing our SMS messaging functionality and would need a fix for this before the release date.

Thanks and best regards,
Stefan
alt
Site Admin
Posts: 985
Joined: Tue Apr 25, 2006 9:45 am

Re: Multi-part UCS-2 message with surrogate pairs not workin

Post by alt » Tue Mar 14, 2017 9:10 pm

Hi Stefan

I have fixed this issue in the version 1.1.29.1
striemer
Posts: 6
Joined: Mon Feb 27, 2017 7:03 am

Re: Multi-part UCS-2 message with surrogate pairs not workin

Post by striemer » Wed Mar 15, 2017 5:58 am

Awesome, thanks very much for the quick turnaround.
I have tested and confirm it works as expected.

Best regards,
Stefan
Locked