# Typosquatting

Typosquatting is the malicious practice of registering domain names that closely resemble popular brands and businesses. Attackers do this in the hope of deceiving users. A user might mistype the web address and land up on a malicious site. The site may show harmless ads. More seriously, it might look like the genuine site. The user may then perform transactions and thereby disclose sensitive information.

Typosquatting can also be done on software packages. Developers might unknowingly download and install malicious packages. This compromises the security of their applications.

Typosquatting could be combined with other cyberattacks such as phishing. Domain owners, developers and users have to take precautions to protect themselves against typosquatting.

## Discussion

• What aspects of software can be attacked by typosquatting?

Typosquatting can occur on domain names. This is sometimes called DNS Typosquatting or URL hijacking. Suppose a user wishes to visit live.com but mistypes this as livve.com, live.cm or liv.ecom. This is not exactly harmful if such domains don't exist. However, attackers can register these domains and thereby redirect requests meant for live.com to their own servers. This compromises the intended client-server interaction.

Software supply chain is about how software packages are distributed by developers and downloaded by other developers for use in their applications. A developer wishing to use a package may mistype the name and thereby install the wrong package. The wrong package has been shared by the attacker. Thus, instead of installing atlas_client, the developer may install atlas-client that contains malicious code.

Thus, broadly we have domain typosquatting or package typosquatting. In either case, typosquatting makes sense for highly popular domains or packages. Even if a small proportion of users make a typo error, the attackers stand to benefit a great deal.

• Which are some variations or techniques in typosquatting?

Let's assume that the correct domain name is "cisecurity.org". Here are some typosquatting variations, some of which can be easily mistyped by users:

• Omission: "csecurity.org" where the first "i" is omitted.
• Substitution: "cisecurlty.org" where "l" fills in for a "i". Another example is "cisecurity.com".
• Transposition: "csiecurity.org" where "i" and "s" positions are swapped.
• Hyphenation: "ci-security.org" where there's an extra hyphen.

These variations can be described differently. Assuming that the domain is "google.com", we can have misspelling (goggle.com), extra period (goo.gle.com), confusion between numbers and letters (g00gle.com), phrasal change (googles.com), or additional words (googlesearch.com).

• Which are some advanced typosquatting techniques?

Here are some advanced typosquatting techniques:

• Homograph Attacks: This relies of visual similarity. For example, uppercase "i" in sans-serif font looks similar to "l". Another example is "у", which is actually the Cyrillic U that looks like Latin "y". However, this sort of an attack is not that common.
• Bitsquatting: This relies on computer hardware to make errors at bit level. For example, "microsoft.com" could be typosquatted as "mic2osoft.com", with one bit in error (0x32 for "2" and 0x72 for "r").
• Soundsquatting: When two words sound similar but are spelled differently, users can end up picking up the incorrect one. For example, "ate" may be mistaken for "eight".
• Typosquatting Cross-Site Scripting (TXSS): This relates to software developers who use the wrong domain name in their code. For example, they might mistype the URL of a JavaScript library that they wish to use in a HTML page. This is a series problem since every user visiting the page gets exposed to the typosquatting domain.
• What harm can domain typosquatting cause to systems and businesses?

Popular brands lose traffic as their users get redirected to typosquatting domains. In fact, traffic can get redirected to competitor sites.

Data theft can erode brand value. In 2011, variations of YouTube.com appeared and these redirected to a survey form that collected mobile numbers from unsuspecting users. The "survey" lured users to an SMS subscription that was charged to their phone bills. In another example, omitting a dot in an email addresses resulted in loss of 120,000 emails from Fortune 500 companies over six months. These emails contained trade secrets and invoices.

Typosquatters can abuse affiliate programs and collect referral fees on behalf of genuine site owners.

Users can be tricked into downloading malware. When typosquatting sites look exactly like their genuine counterparts, users may be tricked to enter login credentials and disclose sensitive information. Some typosquatting sites are less harmful. For example, a company offering vacation packages may typosquat on the domains of major airlines and show ads for vacation packages.

It's been found that financial services, retail and critical infrastructure are industries most affected by typosquatting.

• Which are some best practices to protect against domain typosquatting?

If you have a domain, one defensive technique is to register common variations and typo errors of that domain.

Implement two-factor authentication (2FA) for the safety of your customers. Always use SSL certificates so that customers can verify site ownership. Educate your users so that they're aware of common attacks. Having a trademark gives you the option to file a case against typosquatters. Improve SEO on your site so that search engines rank it better than the typosquatted ones.

As a user, enable 2FA for better protection of your accounts. When sensitive information or money transfer is involved, it's better to verify by call before performing the online transaction. Pay attention to how domains are spelled, in URLs and email addresses. Check that site is served on HTTPS. It's equally important to check the domain name that browsers display before the URL.

Avoid typing out URLs. Instead, bookmark the sites you often visit and use these bookmarks instead. In addition, you can install suitable browser plugins that warn potential typosquatting domains on mistyped URLs.

• What tools or algorithms are available to counter domain typosquatting?

DNS Twist can "detect typosquatters, phishing attacks, fraud, and brand impersonation." By running dnstwist on your domain, you can find if it's being typosquatted.

First released in 2006, Microsoft provides Strider URL Tracer to investigate third-party domains. This also generates typosquatted variations and scans them.

Some typos are more likely than others. A probabilistic analysis helps in ranking them. Typing test tools and human-based computation games can help estimate the probabilities. For example, adjacent characters on the keyboard are more likely to be swapped.

Feature engineering is an essential task when building your own detection algorithm. Features could be based on domain average TTL value, domain-IP address association, common ASNs shared by IP addresses, hit-count of domain, n-gram distribution of characters in domain name, and many more. Lot of information can be obtained from DNS records, WHOIS lookup, and even crawling the site.

Algorithms could be knowledge-based where experts identify and apply features. With sophisticated attacks, machine learning-based algorithms are more effective. They're data-driven. ML for domain typosquatting is possible due to easily available data that's collected by ISPs and network administrators.

• What are some AI/ML approaches to detecting typosquatting domains?

Among the ML approaches are:

• Supervised: Relies on labelled datasets. DomainProfiler is an example that used 55 features along with random forest algorithm. The limitation is that labelled datasets are expensive to create.
• Semi-Supervised: Uses both labelled and unlabelled data. Graph-based inference methods such as belief propagation are used in which it's assumed that malicious hosts are most likely to communicate with malicious domains. Another approach is clustering based on co-occurrence patterns such as domain names often seen together in DNS resolver logs.
• Unsupervised: Clustering is done to identify at least two clusters of malicious or benign domains. This is then extended to identify sub-clusters based on the type of malicious behaviour.

In practice, hybrid approaches are adopted to achieve better results. For example, one approach used three different feature selection methods, reduced them to a subset of features, and trained a bagging ensemble model. Two ensemble models were explored: decision tree and k-nearest neighbour.

For evaluation of these algorithms, accuracy, precision, recall, F1-score and AUC are commonly reported.

• What harm can package typosquatting cause to applications and their users?

In software package management, typosquatting can be used to steal cryptocurrency. Malicious packages can attempt to redirect cryptocurrency payments to a wallet address of their choice. This has been seen in Ruby (via RubyGems) and JavaScript (via npm).

When a typosquatting package is installed, it gets access to environment variables. Such packages then attempt to steal user credentials and other secrets that are usually part of environment. A well-known example of this (from August 2017) is crossenv typosquatting on cross-env at npm. In PyPI, two typosquatting packages attempted to steal GPG and SSH keys from developers.

The typical exploit is to run arbitrary code during installation. Worse still is when such an execution is done with administrative privileges. Some packages may even attempt to inject malicious code into the application's source code.

A less harmful attack is to replicate the original functionality and thereby divert attribution from the original package. On npm, typosquatting package asimplemde does this for simplemde.

• What can I do to mitigate package typosquatting?

Developers must be more vigilant when copying and pasting package names. Use safe options when installing dependencies. For example, use the --ignore-scripts option when installing using npm.

Developers should use digital signatures (using gpg) and checksums (using sha256sum) to deliver their software packages in a secure way. This prevents hackers from modifying the content to include typosquatted names.

Package repositories for their part can have checks and rules that minimize the attack surface. For example, npm requires that new packages can't match existing package names with the punctuation removed. Given react-native, names react_native and re.a_ct-native are not allowed.

Repositories can also maintain a list of possible names with small Levenshtein distances. Administrators can be notified if there's an attempt to install one of these. In fact, server logs would contain 404 errors for non-existing packages that users typed by mistake. These can be added to the list to proactively prevent typosquatting.

SpellBound is a tool that can be used to detect typosquatting names. It adds a runtime overhead of less than 2.5%.

## Milestones

1998

R.C. Cumbow in The New York Law Journal uses the term typosquatter, probably the first published use of the term.

1999

Seeing the growing threat of domain typosquatting, the Internet Corporation for Assigned Names and Numbers (ICANN) introduces the Uniform Domain Name Dispute Resolution Policy (UDRP). In the U.S., there's also the Anti-cybersquatting Consumer Protection Act (ACPA).

Jan
2003

What is possibly one of the first large-scale studies, Benjamin Edelman identifies 8,800 typosquatted domains. Many of these have invalid WHOIS data, provide explicit sexual content and even prevent users from exiting the site. Many of these sites are registered to John Zuccarini, who is eventually arrested in September.

Mar
2003

As part of RFC 3490, the IETF defines Internationalized Domain Name (IDN). This allows domain names to contain alphabet-specific characters. This unfortunately opens the doors to homograph attacks. English domain names could be typosquatted by substituting some letters with non-Latin letters that have identical glyphs (same visual appearance).

2006

An analysis of top 10,000 domains shows that 30% of all missing-dot typos come from just six domain parking services. These six account for 2995 typosquatted domains. In all, 5094 typosquatted domains are found to be active.

2008

Banerjee et al. observe that about 99% of all typosquatted domains use one-character modification. More formally, such a domain differs from the real domain by a Damerau-Levenshtein distance of one. They also propose ways to automate typosquatted domain name generation: 1-mod-inplace, 1-mod-deflate, 1-mod-inflate.

2011

At the BlackHat Technical Security Conference, Artem Dinaburg reveals bitsquatting as a technique towards typosquatting. Random bit errors in domain names can cause computers to fetch content from bitsquatted domains. With 30 bitsquatted domains, he finds 52,317 requests from 12,949 unique IP addresses over a period of eight months. An example of this is "www.mic2osoft.com". Typical use is to serve ads, abuse affiliate programs or install malware.

2014

Szurdi et al. propose a framework to detect typosquatting domains. They start with passive data sources from which they generate possible typosquatting domain names. To obtain more information, they actively crawl WHOIS database, collect DNS data and obtain web page content. They cluster domains based on various features. Their approach is called Yet Another Typosquatting Tool (YATT).

Feb
2018

One researcher investigates package names on npm in terms of Levenshtein distance. He finds that 46% of names are at a distance of two or less. Longer the name, less likely it has a small distance to another name. Distance of one can indicate typosquatting. However, npm's preference for short names makes this hard. There are many genuine packages that have small distances: (preact, react), (bocha, mocha), and (class-names, classnames).

Mar
2018

A network of malicious sites ending in ".cm" is discovered. These sites were typosquatting on many popular ".com" sites and attempting to trick users with fake security alerts. Four years of access logs from this network is also discovered and analyzed by security experts. They find that within the first three months of 2018, these sites got about about 12 million visits, almost 50 million visits per year.

Apr
2020

Packages for Ruby (called gems) are distributed via RubyGems repository. An analysis by ReversingLabs finds more than 700 malicious gems due to typosquatting. They describe the example of atlas_client typosquatting as atlas-client. It contains aaa.png that's in fact a portable executable. It contains a Base64 encoded VBScript. It's designed to steal cryptocurrency by modifying wallet addresses.

Author
No. of Edits
No. of Chats
DevCoins
3
0
1703
1
0
7
2328
Words
1
Likes
4769
Hits

## Cite As

Devopedia. 2020. "Typosquatting." Version 4, August 13. Accessed 2022-09-23. https://devopedia.org/typosquatting
Contributed by
2 authors

Last updated on
2020-08-13 04:50:16
• Site Map