Unicode non-breaking space is not considered white space?
Can anyone confirm that Unicode u00A0 non-breaking space is not considered "whitespace" by Apex and is not detected by trim(), deleteWhitespace, or regex? I'm surprised by regex since I thought s was supposed to include non-breaking spaces. Of the methods below, only replaceAll with the character code works.
String x = 'u00A0' + 'Test'; String y = x.unescapeUnicode(); system.debug('### y trim length: ' + y.trim().length()); system.debug('### y deleteWhitespace length: ' + y.deleteWhitespace().length()); system.debug('### y replaceall regex length: ' + y.replaceAll('\s', '').length()); system.debug('### y replaceall unicode length: ' + y.replaceAll('\u00A0', '').length());
The non-breaking space is not whitespace, according to Java. Apex Code uses the same rules as the Java Pattern class to solve u00a0, which specifies s as follows:
s A whitespace character: [
]
Where " " is 0x20, is 0x09,
is 0x0A, is 0x0B, is 0x0C, and
is 0x0D. No other characters are defined as whitespace, despite Unicode having a number of them.