Recently, we published an article about emoji, and shortly after publishing went down a rabbit-hole on the subject in response to lijnks and information sent to us by members of our community.
So, equipped with even more fascinating facts and useful information about emoji, we decided to re-visit the subject, and this time to focus on actionable tips for working with emoji in your development workflow.
1. Emoji Are More Than a Single Character
At first glance, emoji (that’s right, emoji not emojis) might seem like simple characters, but they often consist of multiple code points. For instance, the family emoji 👨👩👦 is actually a sequence of several characters combined using Zero Width Joiners (ZWJs).
JavaScript strings use UTF-16 encoding, meaning you can’t rely on simple .length
to count emojis correctly.
const emoji = "👨👩👦";
console.log(emoji.length); // 7, not 1!
// Correct way to count emojis:
const emojiArray = [...emoji];
console.log(emojiArray.length); // 1
Using the spread operator ([...emoji])
or Array.from(emoji)
ensures that you account for all code points correctly.
2. Sorting Strings with Emoji Can Be Tricky
Sorting strings with emoji might lead to unexpected results because Unicode assigns specific ordering values to emoji.
For example:
const emoji = ["🌍", "🌟", "❤️"];
emoji.sort();
console.log(emoji); // ["❤️", "🌍", "🌟"]
The sorting doesn’t feel intuitive because it follows Unicode values, not how we visually perceive emoji. To handle this, you can use the localeCompare
method:
emoji.sort((a, b) => a.localeCompare(b));
console.log(emoji); // Sorted more intuitively
3. Detecting Emoji in Text
If you’re building an emoji search or filtering feature, you need to identify emoji in a string. Thankfully, emoji are part of specific Unicode ranges.
Here’s a regex pattern to match most emojis:
const emojiRegex = /\p{Emoji}/gu;
const text = "Hello 😊, world 🌍!";
const emoji = text.match(emojiRegex);
console.log(emoji); // ["😊", "🌍"]
The \p{Emoji}
syntax requires the u (Unicode) flag and matches any emoji in the Unicode emoji range.
4. Emoji Have Variations
Some emoji come with multiple variations. For instance, the “thumbs up” emoji (👍) has skin tone modifiers, such as 👍🏿 and other skin tone variations.
To handle this, you can normalise emoji to their base form or detect their variations using the Unicode Skin Tone Modifier range.
Example:
const baseEmoji = "👍🏽";
const normalised = baseEmoji.normalize("NFD").replace(/\p{Emoji_Modifier}/gu, "");
console.log(normalised); // 👍
This approach strips skin tone modifiers and returns the base emoji.
5. Emoji in Domain Names and URLs
Did you know that emoji can be used in domain names?
For instance, 🌟.com is a valid domain… in theory.
In reality, there are 11 top-level domains that can include emoji (.cf, .fm, .ga and more), as well as twelve second-level domains (radio.fm, co.il and others).
However, domain names using emoji are converted to Punycode, a special encoding format.
Converting Emoji to Punycode:
Use JavaScript’s punycode module to handle emoji-based domains.
const punycode = require("punycode");
const domain = "🌟.com";
const punycodeDomain = punycode.toASCII(domain);
console.log(punycodeDomain); // xn--q9jyb4c.com
To decode back:
const decodedDomain = punycode.toUnicode(punycodeDomain);
console.log(decodedDomain); // 🌟.com
While we don’t recommend actually registering one, it’s a fun fact. nonetheless.
Conclusion
Emoji are more complex than they seem, with unique challenges and possibilities for developers.
From handling multi-code-point characters to supporting emoji variations and punycode domains, there’s a lot more to emoji than you think.