Typosquatting
- Summary
-
Discussion
- What aspects of software can be attacked by typosquatting?
- Which are some variations or techniques in typosquatting?
- Which are some advanced typosquatting techniques?
- What harm can domain typosquatting cause to systems and businesses?
- Which are some best practices to protect against domain typosquatting?
- What tools or algorithms are available to counter domain typosquatting?
- What are some AI/ML approaches to detecting typosquatting domains?
- What harm can package typosquatting cause to applications and their users?
- What can I do to mitigate package typosquatting?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
Typosquatting is the malicious practice of registering domain names that closely resemble popular brands and businesses. Attackers do this in the hope of deceiving users. A user might mistype the web address and land up on a malicious site. The site may show harmless ads. More seriously, it might look like the genuine site. The user may then perform transactions and thereby disclose sensitive information.
Typosquatting can also be done on software packages. Developers might unknowingly download and install malicious packages. This compromises the security of their applications.
Typosquatting could be combined with other cyberattacks such as phishing. Domain owners, developers and users have to take precautions to protect themselves against typosquatting.
Discussion
-
What aspects of software can be attacked by typosquatting? Typosquatting can occur on domain names. This is sometimes called DNS Typosquatting or URL hijacking. Suppose a user wishes to visit
live.com
but mistypes this aslivve.com
,live.cm
orliv.ecom
. This is not exactly harmful if such domains don't exist. However, attackers can register these domains and thereby redirect requests meant forlive.com
to their own servers. This compromises the intended client-server interaction.Software supply chain is about how software packages are distributed by developers and downloaded by other developers for use in their applications. A developer wishing to use a package may mistype the name and thereby install the wrong package. The wrong package has been shared by the attacker. Thus, instead of installing
atlas_client
, the developer may installatlas-client
that contains malicious code.Thus, broadly we have domain typosquatting or package typosquatting. In either case, typosquatting makes sense for highly popular domains or packages. Even if a small proportion of users make a typo error, the attackers stand to benefit a great deal.
-
Which are some variations or techniques in typosquatting? Let's assume that the correct domain name is "cisecurity.org". Here are some typosquatting variations, some of which can be easily mistyped by users:
- Omission: "csecurity.org" where the first "i" is omitted.
- Addition: "cissecurity.org" where an extra "s" is added.
- Substitution: "cisecurlty.org" where "l" fills in for a "i". Another example is "cisecurity.com".
- Transposition: "csiecurity.org" where "i" and "s" positions are swapped.
- Hyphenation: "ci-security.org" where there's an extra hyphen.
These variations can be described differently. Assuming that the domain is "google.com", we can have misspelling (goggle.com), extra period (goo.gle.com), confusion between numbers and letters (g00gle.com), phrasal change (googles.com), or additional words (googlesearch.com).
-
Which are some advanced typosquatting techniques? Here are some advanced typosquatting techniques:
- Homograph Attacks: This relies of visual similarity. For example, uppercase "i" in sans-serif font looks similar to "l". Another example is "у", which is actually the Cyrillic U that looks like Latin "y". However, this sort of an attack is not that common.
- Bitsquatting: This relies on computer hardware to make errors at bit level. For example, "microsoft.com" could be typosquatted as "mic2osoft.com", with one bit in error (0x32 for "2" and 0x72 for "r").
- Soundsquatting: When two words sound similar but are spelled differently, users can end up picking up the incorrect one. For example, "ate" may be mistaken for "eight".
- Typosquatting Cross-Site Scripting (TXSS): This relates to software developers who use the wrong domain name in their code. For example, they might mistype the URL of a JavaScript library that they wish to use in a HTML page. This is a series problem since every user visiting the page gets exposed to the typosquatting domain.
-
What harm can domain typosquatting cause to systems and businesses? Popular brands lose traffic as their users get redirected to typosquatting domains. In fact, traffic can get redirected to competitor sites.
Data theft can erode brand value. In 2011, variations of YouTube.com appeared and these redirected to a survey form that collected mobile numbers from unsuspecting users. The "survey" lured users to an SMS subscription that was charged to their phone bills. In another example, omitting a dot in an email addresses resulted in loss of 120,000 emails from Fortune 500 companies over six months. These emails contained trade secrets and invoices.
Typosquatters can abuse affiliate programs and collect referral fees on behalf of genuine site owners.
Users can be tricked into downloading malware. When typosquatting sites look exactly like their genuine counterparts, users may be tricked to enter login credentials and disclose sensitive information. Some typosquatting sites are less harmful. For example, a company offering vacation packages may typosquat on the domains of major airlines and show ads for vacation packages.
It's been found that financial services, retail and critical infrastructure are industries most affected by typosquatting.
-
Which are some best practices to protect against domain typosquatting? If you have a domain, one defensive technique is to register common variations and typo errors of that domain.
Implement two-factor authentication (2FA) for the safety of your customers. Always use SSL certificates so that customers can verify site ownership. Educate your users so that they're aware of common attacks. Having a trademark gives you the option to file a case against typosquatters. Improve SEO on your site so that search engines rank it better than the typosquatted ones.
As a user, enable 2FA for better protection of your accounts. When sensitive information or money transfer is involved, it's better to verify by call before performing the online transaction. Pay attention to how domains are spelled, in URLs and email addresses. Check that site is served on HTTPS. It's equally important to check the domain name that browsers display before the URL.
Avoid typing out URLs. Instead, bookmark the sites you often visit and use these bookmarks instead. In addition, you can install suitable browser plugins that warn potential typosquatting domains on mistyped URLs.
-
What tools or algorithms are available to counter domain typosquatting? DNS Twist can "detect typosquatters, phishing attacks, fraud, and brand impersonation." By running dnstwist on your domain, you can find if it's being typosquatted.
First released in 2006, Microsoft provides Strider URL Tracer to investigate third-party domains. This also generates typosquatted variations and scans them.
Some typos are more likely than others. A probabilistic analysis helps in ranking them. Typing test tools and human-based computation games can help estimate the probabilities. For example, adjacent characters on the keyboard are more likely to be swapped.
Feature engineering is an essential task when building your own detection algorithm. Features could be based on domain average TTL value, domain-IP address association, common ASNs shared by IP addresses, hit-count of domain, n-gram distribution of characters in domain name, and many more. Lot of information can be obtained from DNS records, WHOIS lookup, and even crawling the site.
Algorithms could be knowledge-based where experts identify and apply features. With sophisticated attacks, machine learning-based algorithms are more effective. They're data-driven. ML for domain typosquatting is possible due to easily available data that's collected by ISPs and network administrators.
-
What are some AI/ML approaches to detecting typosquatting domains? - Supervised: Relies on labelled datasets. DomainProfiler is an example that used 55 features along with random forest algorithm. The limitation is that labelled datasets are expensive to create.
- Semi-Supervised: Uses both labelled and unlabelled data. Graph-based inference methods such as belief propagation are used in which it's assumed that malicious hosts are most likely to communicate with malicious domains. Another approach is clustering based on co-occurrence patterns such as domain names often seen together in DNS resolver logs.
- Unsupervised: Clustering is done to identify at least two clusters of malicious or benign domains. This is then extended to identify sub-clusters based on the type of malicious behaviour.
In practice, hybrid approaches are adopted to achieve better results. For example, one approach used three different feature selection methods, reduced them to a subset of features, and trained a bagging ensemble model. Two ensemble models were explored: decision tree and k-nearest neighbour.
For evaluation of these algorithms, accuracy, precision, recall, F1-score and AUC are commonly reported.
-
What harm can package typosquatting cause to applications and their users? In software package management, typosquatting can be used to steal cryptocurrency. Malicious packages can attempt to redirect cryptocurrency payments to a wallet address of their choice. This has been seen in Ruby (via RubyGems) and JavaScript (via npm).
When a typosquatting package is installed, it gets access to environment variables. Such packages then attempt to steal user credentials and other secrets that are usually part of environment. A well-known example of this (from August 2017) is
crossenv
typosquatting oncross-env
at npm. In PyPI, two typosquatting packages attempted to steal GPG and SSH keys from developers.The typical exploit is to run arbitrary code during installation. Worse still is when such an execution is done with administrative privileges. Some packages may even attempt to inject malicious code into the application's source code.
A less harmful attack is to replicate the original functionality and thereby divert attribution from the original package. On npm, typosquatting package
asimplemde
does this forsimplemde
. -
What can I do to mitigate package typosquatting? Developers must be more vigilant when copying and pasting package names. Use safe options when installing dependencies. For example, use the
--ignore-scripts
option when installing using npm.Developers should use digital signatures (using
gpg
) and checksums (usingsha256sum
) to deliver their software packages in a secure way. This prevents hackers from modifying the content to include typosquatted names.Package repositories for their part can have checks and rules that minimize the attack surface. For example, npm requires that new packages can't match existing package names with the punctuation removed. Given
react-native
, namesreact_native
andre.a_ct-native
are not allowed.Repositories can also maintain a list of possible names with small Levenshtein distances. Administrators can be notified if there's an attempt to install one of these. In fact, server logs would contain 404 errors for non-existing packages that users typed by mistake. These can be added to the list to proactively prevent typosquatting.
SpellBound is a tool that can be used to detect typosquatting names. It adds a runtime overhead of less than 2.5%.
Milestones
2003
What is possibly one of the first large-scale studies, Benjamin Edelman identifies 8,800 typosquatted domains. Many of these have invalid WHOIS data, provide explicit sexual content and even prevent users from exiting the site. Many of these sites are registered to John Zuccarini, who is eventually arrested in September.
2003
As part of RFC 3490, the IETF defines Internationalized Domain Name (IDN). This allows domain names to contain alphabet-specific characters. This unfortunately opens the doors to homograph attacks. English domain names could be typosquatted by substituting some letters with non-Latin letters that have identical glyphs (same visual appearance).
Banerjee et al. observe that about 99% of all typosquatted domains use one-character modification. More formally, such a domain differs from the real domain by a Damerau-Levenshtein distance of one. They also propose ways to automate typosquatted domain name generation: 1-mod-inplace, 1-mod-deflate, 1-mod-inflate.
At the BlackHat Technical Security Conference, Artem Dinaburg reveals bitsquatting as a technique towards typosquatting. Random bit errors in domain names can cause computers to fetch content from bitsquatted domains. With 30 bitsquatted domains, he finds 52,317 requests from 12,949 unique IP addresses over a period of eight months. An example of this is "www.mic2osoft.com". Typical use is to serve ads, abuse affiliate programs or install malware.
Szurdi et al. propose a framework to detect typosquatting domains. They start with passive data sources from which they generate possible typosquatting domain names. To obtain more information, they actively crawl WHOIS database, collect DNS data and obtain web page content. They cluster domains based on various features. Their approach is called Yet Another Typosquatting Tool (YATT).
2018
One researcher investigates package names on npm in terms of Levenshtein distance. He finds that 46% of names are at a distance of two or less. Longer the name, less likely it has a small distance to another name. Distance of one can indicate typosquatting. However, npm's preference for short names makes this hard. There are many genuine packages that have small distances: (preact, react), (bocha, mocha), and (class-names, classnames).
2018
A network of malicious sites ending in ".cm" is discovered. These sites were typosquatting on many popular ".com" sites and attempting to trick users with fake security alerts. Four years of access logs from this network is also discovered and analyzed by security experts. They find that within the first three months of 2018, these sites got about about 12 million visits, almost 50 million visits per year.
2020
Packages for Ruby (called gems) are distributed via RubyGems repository. An analysis by ReversingLabs finds more than 700 malicious gems due to typosquatting. They describe the example of atlas_client
typosquatting as atlas-client
. It contains aaa.png
that's in fact a portable executable. It contains a Base64 encoded VBScript. It's designed to steal cryptocurrency by modifying wallet addresses.
References
- Adelia Risk. 2020. "What is Typosquatting?" Adelia Risk, Adelia Associates, May 18. Accessed 2020-08-09.
- Ang, Ming Yi. 2019. "Discovering Malicious Packages Published on npm." Blog, Veracode, September 4. Accessed 2020-08-09.
- Black Hat. 2011. "Black Hat Technical Security Conference: USA 2011." Accessed 2020-08-10.
- Bocetta, Sam. 2020. "How to stop typosquatting attacks." Opensource.com, Red Hat, Inc., January 23. Accessed 2020-08-09.
- Burbidge, Chester. 2018. "Hunting typosquatters on npm." Blog, Scott Logic, February 27. Accessed 2020-08-09.
- Cimpanu, Catalin. 2019. "Two malicious Python libraries caught stealing SSH and GPG keys." ZDNet, December 4. Accessed 2020-08-10.
- Constantin, Lucian. 2020. "RubyGems typosquatting attack hits Ruby developers with trojanized packages." CSO India, IDG Communications, April 16. Accessed 2020-08-09.
- Edelman, B. 2003. "Large-Scale Registration of Domains with Typographical Errors." January. Updated 2003-09-03. Accessed 2020-08-10.
- Gabrilovich, Evgeniy and Alex Gontmakher. 2002. "The Homograph Attack." Communications of the ACM, vol. 45, no. 2, February. Accessed 2020-08-10.
- Grimes, Roger A. 2011. "Typosquatting hacks: Finger slips sink ships." CSO India, IDG Communications, September 27. Accessed 2020-08-09.
- Krebs, Brian. 2018. "Dot-cm Typosquatting Sites Visited 12M Times So Far in 2018." KrebsOnSecurity, April 4. Accessed 2020-08-09.
- Lakshmanan, Ravie. 2020. "Over 700 Malicious Typosquatted Libraries Found On RubyGems Repository." The Hacker News, April 16. Accessed 2020-08-09.
- MS-ISAC. 2018. "Typosquatting." MS-ISAC Security Primer, February. Accessed 2020-08-09.
- Microsoft. 2016. "MSR Strider URL Tracer." v1.0.1.0, Download Center, Microsoft, December 5. Accessed 2020-08-10.
- Moubayed, Abdallah, Emad Aqeeli, and Abdallah Shami. 2020. "Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection." arXiv, v1, June 8. Accessed 2020-08-12.
- NASA FCU. 2019. "Typosquatting Is Quickly Rising, How To Protect Yourself." Blog, NASA Federal Credit Union, July 10. Accessed 2020-08-09.
- Prince, Brian. 2011. "Research Project Shows How Typos and Misspelled Domains Lead to Massive Data Loss." Security Week, Wired Business Media, September 9. Accessed 2020-08-09.
- Spaulding, Jeffrey, Shambhu Upadhyaya, and Aziz Mohaisen. 2016. "The Landscape of Domain Name Typosquatting: Techniques and Countermeasures." arXiv, v1, March 9. Accessed 2020-08-09.
- Szurdi, Janos, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. 2014. "The Long “Taile” of Typosquatting Domain Names." SEC'14: Proceedings of the 23rd USENIX conference on Security Symposium, pp. 191-206, August. Accessed 2020-08-12.
- Tal, Liran. 2018. "Fighting npm typosquatting attacks and naming rules for npm modules." Medium, September 18. Accessed 2020-08-09.
- Tal, Liran. 2019. "10 npm Security Best Practices." Blog, Synk, February 19. Accessed 2020-08-09.
- Taylor, Matthew, Ruturaj K. Vaidya, Drew Davidson, Lorenzo De Carli, and Vaibhav Rastogi. 2020. "SpellBound: Defending Against Package Typosquatting." arXiv, v1, March 6. Accessed 2020-08-12.
- Tschacher, Nikolai. 2016. "Typosquatting programming language package managers." Blog, Incolumitas, June 8. Accessed 2020-08-12.
- Ulikowski, Marcin. 2020. "elceef/dnstwist." GitHub, July 7. Accessed 2020-08-10.
- Verma, Amit. 2009. "Early warning approaches to combat typosquatting." Virus Bulletin, July 1. Accessed 2020-08-12.
- Wang, Yi-Min, Doug Beck, Jeffrey Wang, Chad Verbowski, and Brad Daniels. 2006. "Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting." SRUTI ’06: 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet, USENIX Association, pp. 31-36. Accessed 2020-08-10.
- Whois XML API. 2020. "Anti-Cybersquatting & Typosquatting Tools and Solutions." Whois XML API. Accessed 2020-08-09.
- Wikipedia. 2020. "Internationalized domain name." Wikipedia, August 1. Accessed 2020-08-10.
- Zhauniarovich, Yury, Issa Khalil, Ting Yu, and Marc Dacier. 2018. "A Survey on Malicious Domains Detection through DNS Data Analysis." arXiv, v1, May 22. Accessed 2020-08-12.
- npmjs. 2017. "crossenv malware on the npm registry." npm blog, August 2. Accessed 2020-08-10.
- x0rz. 2017. "Practical waterholing through DNS typosquatting." 0Day.Rocks, on Medium, June 23. Accessed 2020-08-09.
Further Reading
- Tunggal, Abi Tyas. 2020. "What is Typosquatting?" Blog, UpGuard, May 19. Accessed 2020-08-09.
- Spaulding, Jeffrey, Shambhu Upadhyaya, and Aziz Mohaisen. 2016. "The Landscape of Domain Name Typosquatting: Techniques and Countermeasures." arXiv, v1, March 9. Accessed 2020-08-09.
- Moubayed, Abdallah, Emad Aqeeli, and Abdallah Shami. 2020. "Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection." arXiv, v1, June 8. Accessed 2020-08-09.
- Taylor, Matthew, Ruturaj K. Vaidya, Drew Davidson, Lorenzo De Carli, and Vaibhav Rastogi. 2020. "SpellBound: Defending Against Package Typosquatting." arXiv, v1, March 6. Accessed 2020-08-09.
- Spaulding, Jeffrey, DaeHun Nyang, and Aziz Mohaisen. 2017. "Understanding the Effectiveness of Typosquatting Techniques." ACM HotWeb’17, San Jose, CA, USA. doi: 10.1145/3132465.3132467. Accessed 2020-08-12.
- Piredda, Paolo, Davide Ariu, Battista Biggio, Igino Corona, Luca Piras, Giorgio Giacinto, and Fabio Roli. 2017. "Deepsquatting: Learning-based Typosquatting Detection at Deeper Domain Levels." Conference of the Italian Association for Artificial Intelligence, Springer, pp. 347-358. Accessed 2020-08-12.
Article Stats
Cite As
See Also
- Domain Name System
- Phishing
- DNS Security
- Package Manager
- Ransomware
- Adware