Homograph Attacks: The Lookalike-Domain Trick

Some characters from non-Latin scripts look identical to ordinary letters. Attackers use them to register domains that read like a trusted brand but point somewhere else entirely. Here is how the trick works, and how to catch it.

In short

Domain names can include letters from many scripts, not just the English alphabet. A few of those letters are visually identical to Latin ones, so a domain like аpple.com (with a Cyrillic first letter) can look exactly like the real thing while encoding to a completely different address. Your browser converts these internationalized names into an ASCII form called punycode, which starts with xn--, and that encoded form is the honest version to check. Plenty of internationalized domains are perfectly legitimate, so a mixed-script name is a reason to look closer, not a verdict on its own.

Domains were not always limited to English letters

The original domain name system only allowed a narrow set of ASCII characters: the letters a through z, the digits 0 through 9, and the hyphen. That left out most of the world's languages. To fix this, the internet adopted internationalized domain names, or IDNs, which let people register names in scripts like Cyrillic, Greek, Arabic, Chinese, and many others. This is genuinely useful: a business in Athens or Moscow can have a web address that reads naturally in its own alphabet.

The catch is that some characters across different scripts are drawn almost identically. The Latin lowercase a and the Cyrillic lowercase а (the character at code point U+0430) look the same in most fonts, but to a computer they are entirely separate characters. The same goes for the Greek omicron and the Latin o, or the Cyrillic е and the Latin e. Characters that look alike but are not the same are called confusables or homoglyphs, and they are the raw material of a homograph attack.

How attackers turn lookalikes into fake domains

The idea is simple. An attacker takes the name of a real brand and swaps one or more letters for a confusable from another script, then registers that domain. To a person glancing at the address bar or an email link, the result can be indistinguishable from the genuine site. Click through, and you may land on a convincing copy of a login page that quietly harvests whatever you type.

A classic demonstration replaced every letter of a well-known brand with Cyrillic lookalikes, producing a domain that rendered identically to the original in several browsers of the day. The visible text matched, but the underlying characters, and therefore the actual destination, did not. Because the swap can involve a single character, these names are easy to miss and easy to mass-produce.

It is worth being clear about the limits here. Browsers and registries have added defenses over the years, and many lookalike registrations get caught. But the technique still surfaces in phishing campaigns, so it remains worth understanding rather than dismissing.

Punycode: the honest ASCII form

Underneath, the domain name system still speaks only ASCII. So whenever a name contains non-ASCII characters, it gets translated into a plain-ASCII representation called punycode, using an algorithm defined in RFC 3492. The encoded label always begins with the prefix xn-- and is known as the A-label. The human-readable version you see on screen is the U-label.

For example, a domain that displays as аpple.com with a Cyrillic first letter does not travel the network as those Unicode characters. It is encoded to something like xn--pple-43d.com before any lookup happens. That xn-- form is the key insight for spotting trouble: if a domain looks like a familiar brand but its punycode form is an unexpected xn-- string, that is a strong reason to slow down. A genuine all-ASCII brand domain never needs a punycode form at all.

This is also why browsers convert IDNs to punycode in the first place. The conversion is not the attack; it is the safeguard. By showing or resolving the xn-- form in suspicious cases, browsers give you a way to see the real, unambiguous identity of a name that might otherwise be disguised.

How the standards reason about confusables

The Unicode Consortium maintains a technical standard, Unicode Technical Standard 39 (UTS 39), that deals directly with this problem. One of its core ideas is the skeleton: a way of reducing a string to a canonical form by mapping each confusable character to a representative one. If two different strings collapse to the same skeleton, they are confusable with each other. UTS 39 also describes mixed-script detection, the practice of flagging a single label that draws characters from more than one script, which is unusual for a legitimate word.

Security tools, including the lookalike detection built into our own email and domain checks, lean on these ideas. They normalize a candidate domain, compare its skeleton against a list of known brands, and surface anything that resolves to a suspiciously similar shape. The standard gives this a documented, repeatable basis rather than a hunch.

Warning signs worth a closer look

None of the following proves a domain is malicious, but each is a reason to verify before you trust it:

Practical defenses

You do not need special software to defend against this. A few habits go a long way:

Keep it honest: not every IDN is an attack

This is the part that is easy to get wrong. Internationalized domains exist because the web is global, and the overwhelming majority of them are completely legitimate. A bakery in Berlin, a news site in Seoul, or a shop in Cairo may all have perfectly genuine non-Latin domains. Seeing a non-ASCII character or an xn-- form does not mean fraud. Mixed script and unexpected punycode are flags that say "verify this", not conclusions that say "block this". The skill is matching the script to the context: a non-Latin domain for a local business in that language is ordinary, while the same trick aimed at a global ASCII brand is the one to question.

Inspect a domain for lookalikes

Paste any domain to see its punycode (xn--) form, spot mixed scripts and confusable characters, and check whether it shadows a known brand. Runs entirely in your browser.

Inspect a domain for lookalikes →

This guide is educational and reflects publicly available information about internationalized domain names, the Punycode standard, and Unicode's confusable-character guidance. It is not legal advice or a recommendation about any specific domain, email, person, or decision. Security and access decisions should follow your organization's policies and applicable law.