Subscribe to
NSLog(); Header Image

IDN Spoofing

I don't understand how IDN Spoofing works. I see that the link text goes to http://www.paypа, but why that resolves so unusually is beyond me.

Safari properly renders the а as "a" in the link, but also does so in the URL. Why is Safari so lax when it comes to standard domain names, and where does the url resolve? Reproduction of the code here (Test Now - Left Click On This Link) results in similar (bad) behavior.

9 Responses to "IDN Spoofing"

  1. There's a little more info here.

    The crux of it is:

    The links are directed at "http://www.pа", which the browsers punycode handlers render as

  2. I wish Apple would fix this, they've had plenty of notification. Safari blindly accepts SSL transactions as well displaying the bogus URI to an unmatched certificate because it doesn't do any exception handling for this sorta thing.

    Running an openssl s_client on it clearly identifies the certificate as belonging to the (which you can resolve with whois belonging to Secunia).

    Basically the only way to safely conduct e-commerce in Safari is to paste every url you click through to and picking a font that appropriately handles UTF-8 -- I use Lucida Sans Typewriter-9 (from Office 2k4) for and a paste of that url produces ( with a in a different font -- note i could not paste this back from into this window due to the same inproper handling)

    At least the Mozilla department has been checking code in that supposedly handles this into their nightly's until Apple has sat on enough vulnerabilities to deem a Security Update. :-/

  3. D'oh! Try again:

    The basics:

    For a while now, domains can have non-ASCII letters in them (through some mangling called "punycode" that's rather unimportant). So these guys bought p[Cyrillic "a"] Thus, they can spoof, since the Cyrillic "a" looks just like a Latin "a".

    This is coming up now because most browsers finally started supporting this feature (called IDN).

    That's this thing in a nutshell. People are concerned because there are a lot of foreign letters that look just like English letters, and those could be used to create domains that look like the ones people expect.

  4. Avi: "People are concerned because there are a lot of foreign letters that look just like English letters"

    That one really made me laugh - it's not English letters versus foreign letters - and there are also not too much of them. As an example, take a moment to look closely at a German A-Umlaut (ä), you will notice a difference at first sight. In fact there are not too many non-Latin (for you, Avi: non-ASCII) letters that cannot be distinguished from other non-Latin characters. FYI: German (and many other European languages use latin alphabets - albeit with extensions (such as French accents or German umlauts). For us, ASCII is just a subset but our A is not different from yours.

    Regarding the cyrillic alphabet: look here and you will find that the cyrillic o is another candidate. A possible workaround I would think of would be to find a way to limit the possible domains to just one alphabet per domain (i.e. if the domain would be in cyrillic letters, there is no reason to mix it with latin equivalents). While that might not be technically feasible with the current solution, it would at least represent a logical approach.

  5. The problem can be seen by taking a look at the character encoded at &#1072 (which converts to the hexadecimal value 430) in the Unicode table.

    You can see in the image here that there are more than one character that looks similar enough to the regular set of characters that are normally used. Though there isn't a lot, there is enough. The letters a, e, o, p, c, y and j (to a lesser extent the letter i) can easily be confused with their Latin counterparts, and most websites contains at least one of these characters. (Perhaps it's not wise to have an image like this available, but anyone determined enough to do this will probably find another way.)

  6. It's ok that the glyphs for some codepoints look alike. There are ways of dealing with that, namely, nameprep. What is not ok is that the Cyrillic codepoints with glyphs that look like Latin glyphs in many fonts don't nameprep down to their latin equivalents. If they did, this wouldn't be a problem. See my blog for more hex codes, if you're into that.

  7. укенхваросмт

    All of the above is cyrillic...

    So skypе.com looks valid too... Etc etc

    A unicode-compliant browser will display these characters correctly: the weakness comes from a strength... Oh well, I guess they will have to turn unicode into punycode in the Address Bar...

  8. I coded up a defense for this a few days ago:

    The latest SaftLite also has one, but from the sounds of it, they're not searching the host field specifically so it can get some false positives.

  9. FWIW, Apple's now fixed this.