* Added a new function ast_utf8_replace_invalid_chars() to
utf8.c that copies a string replacing any invalid UTF-8
sequences with the Unicode specified U+FFFD replacement
character. For example: "abc\xffdef" becomes "abc\uFFFDdef".
Any UTF-8 compliant implementation will show that character
as a � character.
* Updated res_pjsip:set_id_from_hdr() to use
ast_utf8_replace_invalid_chars and print a warning if any
invalid sequences were found during the copy.
* Updated stasis_channels:ast_channel_publish_varset to use
ast_utf8_replace_invalid_chars and print a warning if any
invalid sequences were found during the copy.
ASTERISK-27830
Change-Id: I4ffbdb19c80bf0efc675d40078a3ca4f85c567d8
There are various places in Asterisk - specifically in regards to
database integration - where having some kind of UTF-8 validation would
be beneficial. This patch adds:
* Functions to validate that a given string contains only valid UTF-8
sequences.
* A function to copy a string (similar to ast_copy_string) stopping when
an invalid UTF-8 sequence is encountered.
* A UTF-8 validator that allows for progressive validation.
All of this is based on the excellent UTF-8 decoder by Björn Höhrmann.
More information is available here:
https://bjoern.hoehrmann.de/utf-8/decoder/dfa/
The API was written in such a way that should allow us to replace the
implementation later should we determine that we need something more
comprehensive.
Change-Id: I3555d787a79e7c780a7800cd26e0b5056368abf9