Punycode & Homograph Inspector

Paste a domain in either form. See its Unicode and xn-- punycode (A-label) versions side by side, and get a homograph risk read: labels that mix scripts, Cyrillic or Greek letters posing as Latin, and invisible zero-width characters. Everything is converted in your browser.

How punycode works, and why lookalike domains fool people

Internationalized domain names

The DNS only carries a restricted ASCII alphabet (letters, digits, and the hyphen). To let domains use accents and non-Latin scripts, an internationalized domain name (IDN) encodes each Unicode label into an ASCII form that starts with the prefix xn--. Your browser shows the pretty Unicode version in the address bar but resolves the xn-- version on the wire. The two are meant to be exactly equivalent.

Punycode, the encoding

Punycode (RFC 3492) is the algorithm that turns a Unicode label into that xn-- form and back. It copies the plain ASCII characters first, then appends a compact, reversible encoding of where the non-ASCII characters go. So münchen becomes xn--mnchen-3ya: the ASCII letters mnchen, a separator, then 3ya describing the ü and its position. The conversion is lossless in both directions, which is why this tool can decode as well as encode.

Homograph and lookalike attacks

Many scripts contain letters that look identical, or nearly so, to Latin ones. Cyrillic small a (U+0430) is visually the same as Latin a (U+0061), and Greek omicron looks like Latin o. An attacker can register a domain where one Latin letter is swapped for a confusable from another script, so pаypal.com (with a Cyrillic a) reads as paypal.com but resolves somewhere else entirely. Invisible zero-width characters can be slipped in too. These are classic phishing and brand-impersonation tricks.

Mixed-script is a strong signal, not a verdict

A single label that combines scripts which are commonly confused (Latin plus Cyrillic, for example) is a strong warning sign, and modern browsers will often display the raw xn-- form rather than render such a name. Unicode UTS 39 defines the confusable and mixed-script detection this tool is modeled on, including the idea of a "skeleton" that maps confusables to a single representative so two lookalikes collapse to the same string. That said, mixed script is not absolute proof of intent: some languages legitimately mix scripts, and a clean single-script label can still be a typosquat. Treat the flag as a reason to look closer, then verify the domain through DNS, registration data, and the surrounding context.

What this tool does and does not do

It converts and inspects the string you paste. It does not query DNS, does not check whether the domain is registered, and does not compare against a brand list. The script classification uses a compact ranges table covering the scripts most often abused, so a rare script may be reported only as "other". For a live lookup of a suspicious domain, use the DNS tool.