Member-only story
Why are emoji characters like ๐ฉโ๐ฉโ๐งโ๐ฆ treated so strangely in Swift strings?
This has to do with how the String
type works in Swift, and how the contains(_:)
method works.
The โ๐ฉโ๐ฉโ๐งโ๐ฆ โ is whatโs known as an emoji sequence, which is rendered as one visible character in a string. The sequence is made up of Character
objects, and at the same time it is made up of UnicodeScalar
objects.
If you check the character count of the string, youโll see that it is made up of four characters, while if you check the unicode scalar count, it will show you a different result:
print("๐ฉโ๐ฉโ๐งโ๐ฆ".characters.count) // 4
print("๐ฉโ๐ฉโ๐งโ๐ฆ".unicodeScalars.count) // 7
Now, if you parse through the characters and print them, youโll see what seems like normal characters, but in fact the three first characters contain both an emoji as well as a zero-width joiner in their UnicodeScalarView
:
for char in "๐ฉโ๐ฉโ๐งโ๐ฆ".characters {
print(char) let scalars = String(char).unicodeScalars.map({ String($0.value, radix: 16) })
print(scalars)
}// ๐ฉโ
// ["1f469", "200d"]
// ๐ฉโ
// ["1f469", "200d"]
// ๐งโ
// ["1f467", "200d"]
// ๐ฆ
// ["1f466"]
As you can see, only the last character does not contain a zero-width joiner, so when using the contains(_:)
method, it works as you'd expect. Since you aren't comparing against emoji containing zero-width joiners, the method won't find a match for any but the last character.