Cracking Into Password Requirements

Cracking Into Password Requirements
Cyber Labs

10 of 20

This insight is part 10 of 20 in this Collection.

April 16, 2024 17 mins

Cracking Into Password Requirements

Designing an Integrated Business and People Strategy for the Future of Insurance Hero Banner

This blog post discusses new hashcat rule sets designed to crack passwords with minimum length and character class constraints, resulting in improved performance.

Introduction

Some of the most popular hashcat rule sets were created by taking a large pool of hashcat rules (copied from existing sets or randomly generated) with a popular dictionary list and tallying how many passwords each rule was responsible for recovering when applied to a collection of password hashes from public breach data. The individual rules that lead to the greatest number of recovered passwords are the “winners” and make it into a new rule list.

For example, the famous OneRuleToRuleThemAll rule set was created by measuring the performance of hundreds of thousands of different rules against the Lifeboat data dump with the iconic rockyou.txt dictionary. You can read about the methodology in detail here. The methodology itself seems sound, and the rule list became quite popular and successful. However, a couple observations make it clear that we can still do better.

  1. Rockyou.txt is itself a list of passwords recovered from another data breach that occurred all the way back in 2009. This means that using rockyou.txt as the dictionary introduces a bias towards weak passwords that are older than the minimum length requirements governing contemporary apps and networks. It also means that passwords in rockyou.txt that are based on dates (an extremely common pattern in passwords) are likely wasting CPU cycles and failing to recover many of their contemporary counterparts (more on this later).
  2. Most enterprise networks and many web applications now have minimum length and complexity requirements for their passwords that would not have applied to the passwords in rockyou.txt, nor the Lifeboat data dump. This raises the question as to whether password length and complexity requirements would cause a significant difference in which individual hashcat rules are successful at cracking those passwords.
  3. Lifeboat and rockyou.txt are both “data breach” password lists, and they are the only data sources used as the target hash set and the dictionary, respectively. While both data sets are rather large, they are not the only ones of their kind. There is room to improve our confidence in this kind of methodology by including a wider variety of data sources.

With these observations in mind, we wanted to use a similar methodology to that of OneRuleToRuleThemAll to make new rule sets that are specially honed for cracking passwords that are known to comply with common password requirements. This work should accomplish three key goals.

  1. Create new rule sets that outperform OneRuleToRuleThemAll in environments where password length and complexity requirements are enforced.
  2. Provide an empirical answer to whether password requirements make any difference as to which rules are most effective.
  3. Reveal patterns in how people construct lengthier passwords that use multiple character classes.

The following sections describe the data, methodology, and results of this effort.

The Data

This section describes the data and defines some of the terms used in the methodology section.

super_dict.txt

This is a custom dictionary list that aims to improve on rockyou.txt without overinflating the size. It is composed of a deduplicated combination of the following lists.

The final size of super_dict.txt is 18,705,085 lines, which is only 30.4% bigger than rockyou.txt.

super_rules.txt

This is the candidate pool of existing rules from which the new rule sets were created. It is a deduplicated combination of the following rule sets, all of which are freely available on the internet.

  • best64.rule
  • InsidePro-PasswordsPro.rule
  • T0XlC.rule
  • combinator.rule
  • KoreLogicRulesPrependRockYou50000.rule
  • T0XlCv1.rule
  • d3ad0ne.rule
  • leetspeak.rule
  • toggles1.rule
  • d3adhob0.rule
  • _NSAKEY.v2.dive
  • toggles2.rule
  • dive.rule
  • OneRuleToRuleThemAll.rule
  • toggles3.rule
  • generated2.rule
  • oscommerce.rule
  • toggles4.rule
  • generated.rule
  • rockyou-30000.rule
  • toggles5.rule
  • hob064.rule
  • specific.rule
  • toggles_first_passthrough.rule
  • T0XlC-insert_00-99_1950-2050_toprules_0_F.rule
  • unix-ninja-leetspeak.rule
  • Incisive-leetspeak.rule
  • T0XlC-insert_space_and_special_0_F.rule
  • InsidePro-HashManager.rule
  • T0XlC-insert_top_100_passwords_1_G.rule

The total rule count for super_rules.txt is 478,642 (by comparison, OneRuleToRuleThemAll has about 50,000).

Password Hashes

One of the ways in which we aim to improve on OneRuleToRuleThemAll is by testing the performance of each rule against a much larger body of password hashes. Whereas the creators of OneRuleToRuleThemAll used the Lifeboat data dump as the hashes, we used a larger collection of password hashes numbering over 60 million from multiple sources, including HaveIBeenPwnd, Crackstation, and Hashmob.

The collection of public password data we used can be split into two categories: known plaintexts, and unknown hashes. The password data that was already in plaintext form was used to create rehashed subsets of the total collection based on conformity to the different password requirements we wanted to curate rule sets for. Specifically, the following hash lists were created:

  • minimum 8 characters
  • minimum 10 characters
  • minimum 12 characters
  • minimum 14 characters
  • minimum 8 characters and complex
  • minimum 10 characters and complex
  • minimum 12 characters and complex
  • minimum 14 characters and complex

“Complex” means containing at least 3 unique character classes out of lower-case letters, upper-case letters, numbers, and special characters (i.e., everything else). For simplicity, these are named, in order (with line counts in parentheses):

  • 8-simple (61,779,922)
  • 10-simple (31,595,857)
  • 12-simple (16,137,745)
  • 14-simple (10,664,614)
  • 8-complex (9,966,491)
  • 10-complex (4,604,400)
  • 12-complex (1,831,483)
  • 14-complex (739,210)

We will refer to the above hash lists collectively as “Per-Policy Hash Lists”.

The password data collected in the form of hashes, on the other hand, were kept as they are. These hash lists include (with line counts in parentheses):

These will be referred to simply as Lifeboat, LinkedIn, and NVIDIA, respectively.

Methodology 

We ran a series of hashcat sessions where we varied the dictionary, rules, hash list, and the loopback flag. The goal was to collect data that could measure the following:

  • How the individual rules performed for different password requirements
  • Success rate gains with super_dict.txt over rockyou.txt
  • Success rate gains with the loopback flag

The following lists summarize the experiment variables. Every permutation of the following variables was run, for a total of 88 hashcat sessions. The set of results from all hashcat sessions using rockyou.txt, OneRuleToRuleThemAll, and no loopback flag is the benchmark against which performance gains are measured.

Dictionary
  • rockyou.txt
  • super_dict.txt
Rules
  • OneRuleToRuleThemAll
  • super_rules.txt
--loopback
  • true
  • false
Hash List
  • 8-simple
  • 10-simple
  • 12-simple
  • 14-simple
  • 8-complex
  • 10-complex
  • 12-complex
  • 14-complex
  • Lifeboat
  • LinkedIn
  • NVIDIA
















In order to maintain consistency, collect relevant data, and keep the time requirements feasible, the following hashcat options were used:

Option Parameter Comment
-r <path to rules> The file containing the rules used to mangle the dictionary list.
-m <mode> 0, 100, and 1000 for MD5, SHA1, and NTLM
--status
Print periodic status updates which are also written to a file with tee.
-w3
Increase the work profile to dedicate a greater portion of the computer’s resources to hashcat.
--loopback
Appends recovered passwords to the working dictionary so they can be used to crack other passwords.
--debug-mode=1 1 Every time a new password is recovered, hashcat will log the rule that was used to crack it to the file specified with --debug-file.
--debug-file <path to logfile> See --debug-mode
-o <path to output file> Saves recovered passwords to a file as <hash>:<cleartext>
--potfile-disable
Prevents hashcat from using the potfile. This is critical to making sure the results from each hashcat session are not polluted by the results from a previous session.
-O
Optimize kernel. Limits hashcat to passwords under 32 characters in exchange for significant speed gains.

Using the data from the debug files and the output files, we computed the top 50,000 rules that led to the most recovered passwords for each level of password requirements. These groups of 50,000 rules became the new rule sets. 50,000 was the chosen size for the new rule sets because it is a little smaller than OneRuleToRuleThemAll. However, we also created separate rule sets for the top 10,000, 1000, and 64 rules in each category. The scripts used to perform these calculations are included in our GitHub repo for reference. The new rules are named by the password requirements they are tailored for, along with the number of rules they contain. For example, 12-complex-50k.rule contains the top 50,000 rules for cracking passwords that are at least 12 characters long and contain at least three-character classes.

The resulting rule sets were then validated in another round of hashcat sessions against each hash list using both rockyou.txt and super_dict.txt. The loopback flag was omitted for the validation runs, and we only ran the validation sessions for the 50k rule sets since they are the ones designed to compete with OneRuleToRuleThemAll.

Results

The new rule sets yielded modest increases to the number of passwords cracked in the test data, but we also tried them against two sets of password hashes from live environments as part of on-going penetration tests. The gains in recovered passwords in the live environments were surprisingly high.

This section also includes a table showing the increased rates of recovery from using the loopback flag and substituting super_dict.txt for rockyou.txt.

Performance

Tables A and B compare the performance of the new rule lists against OneRuleToRuleThemAll. Each row in Table A uses the new rule list that corresponds to the target hash list in the leftmost column for that row. For the first row, that would be 8-simple-50k.rule, and so on. In Table B, each entry is using OneRuleToRuleThemAll. An interesting observation here is that the performance gains of the new rule sets are more pronounced when the password requirements include multiple character classes ("complex").

Hash List Number of Hashes Total Guesses Recovered Hashes Percent Recovered (%) Guessing Efficiency
8-simple 61779922 9.33982E+11 37981488 61.47869206 4.06662E-05
8-complex 9966491 9.33982E+11 5434359 54.52630219 5.81848E-06
10-simple 31595857 9.33982E+11 14321202 45.3262021 1.53335E-05
10-complex 4604400 9.33982E+11 2076155 45.09067414 2.22291E-06
12-simple 16137745 9.33982E+11 4255194 26.36795909 4.55597E-06
12-complex 1831483 9.33982E+11 634389 34.63799555 6.7923E-07
14-simple 10664614 9.33982E+11 1363405 12.78438207 1.45978E-06
14-complex 739210 3.92339E+11 189210 25.59624464 4.82262E-07

Table A: Policy-Based Rule List

 

Hash List Number of Hashes Total Guesses Recovered Hashes Percent Recovered (%) Guessing Efficiency
8-simple 61779922 9.72608E+11 37634232 60.91660653 3.87E-05
8-complex 9966491 9.72608E+11 4941433 49.58046919 5.08E-06
10-simple 31595857 9.72608E+11 13653850 43.21405177 1.40E-05
10-complex 4604400 9.72608E+11 1842763 40.02178351 1.89E-06
12-simple 16137745 9.72608E+11 3952870 24.49456228 4.06E-06
12-complex 1831483 9.72608E+11 560374 30.596735 5.76E-07
14-simple 10664614 9.72608E+11 1246590 11.68903066 1.28E-06
14-complex 739210 9.72608E+11 173845 23.51767427 1.79E-07

Table B: OneRuleToRuleThemAll

In addition to validating these new rule sets against the test data, we also had to try them out on two live engagements. For one of these engagements, we were able to retrieve the password history for the whole domain. Both domains used password policies that required a minimum length of 8 characters and had the complexity flag set to true. Table C shows how the new 8-complex-50k rules compared to OneRuleToRuleThemAll when using super_dict.txt as the dictionary.

Hash List Number of Hashes Total Guesses* Total Guesses** Recovered Hashes* Recovered Hashes**
Domain 1 2253 9.33982E+11 9.72608E+11 52 37
Domain 2 910 9.33982E+11 9.72608E+11 159 132
Domain 2 with History 10865 9.33982E+11 9.72608E+11 593 2296

Table C: New 8-complex-50k rules compared to OneRuleToRuleThemAll when using super_dict.txt

*8-complex-50k
**OneRuleToRuleThemAll

The new rule sets post a substantial gain over OneRuleToRuleThemAll relative to the smaller overall number of passwords cracked in the live environments. On the other hand, the password history for the second domain resulted in dramatically better numbers for OneRuleToRuleThemAll—almost four times as many cracks. This most likely indicates that a large portion of the password history on this domain is older than the current password policy, which highlights an important observation. OneRuleToRuleThemAll appears to be better generalized against passwords that are not limited by length and complexity requirements. Another possible implication is that the process of honing a rule list to target a specific set of password requirements might prevent it from generalizing well against passwords with unknown properties.

For the secondary objectives of testing the performance gains of super_dict.txt over rockyou.txt and of using the loopback flag, refer to tables D and E. Table D shows the total percent of hashes cracked from the Lifeboat, LinkedIn, and NVIDIA lists using super_rules.txt.

Hash Set rockyou.txt super_dict.txt
Lifeboat 72.42% 73.82%
LinkedIn 63.98% 65.71%
NVIDIA
5.80% 6.26%

Table D: rockyou.txt vs super_dict.txt

Table E shows the increase in recovered passwords when using the loopback flag. These statistics are based on using super_dict.txt and super_rules.txt.

Hash Set No Loopback Loopback
Lifeboat 73.82% 79.05%
LinkedIn 65.71% 67.41%
NVIDIA 6.26% 7.16%

Table E: Non-Loopback vs Loopback

The performance gains from the loopback option and the super_dict.txt dictionary are modest, but not negligible. For hash types that are fast enough to be using rockyou.txt, the 30.4% increase in search space from using super_dict.txt might be worthwhile. We’d recommend hashcat users enable the loopback flag as well, because the increase to the search space is relatively small.

Behavioral Insights

An accidental advantage of this project is that the rules themselves afford insight into the patterns that people gravitate towards when they must create passwords with minimum length and complexity requirements. The rule sets with the top 64 performers make an interesting case study into what patterns rise to the top and how they change depending on the password requirements.

The most obvious pattern that shows up at all password requirement levels is the use of 4-digit years, typically appended to the end of the password. In most cases, the four-digit year is appended directly to the end of the password with rules like “$2 $0 $1 $2”. Appending the year with an exclamation mark ($2 $0 $1 $2 $!) or an “@” symbol ($@ $2 $0 $1 $2) also appear to be common patterns but, more frequently, the year is instead combined with a permutation on the base word like truncation or capitalization. The following table shows some examples of how specific hashcat rules mangle affect their input.

Base Word Rule Output
password $@ $2 $0 $1 $2 password@2012
password ] passwor
password c Password
password sa@ p@ssword
password ^r ^e ^p ^u ^S Superpassword

By far the most common years that appeared in our hashcat rules were for the current year when the data breaches occurred. As a result, our top-64 rule sets ended up littered with years from 2002 to 2012. Since the underlying cause of those rules being in the top 64 is easy to understand, and since leaving them as-is would make the rule sets ineffective in the present day, we took the liberty of replacing these years by hand with contemporary ones in the 2022-2024 range. As a rule of thumb, 2010 and 2011 were mapped to 2022 and 2023, since those were the most common. Occurrences of other years were mapped to 2024. In some cases, there were too many different old dates in the top-64 sets to keep this mapping strategy perfectly, so some of the rules were brought up to date plus given one of the common additional mutations such as the “@” or the “!” symbols. While these substitutions were unavoidably the product of guesswork, the guiding principle was to represent the most current three years in all the common forms that the old years showed up—without duplicates. A more effective attack on 4-digit year patterns in passwords is an opportunity for future improvement, and a copy of the original top-64 lists is included in our GitHub repo for reference.

Another noticeable pattern in the top-64 rules is how the dominant patterns change as they increase in the length and complexity of the passwords. When the password requirement is a simple 8 character minimum, most of the rules are rotating or truncating a few characters and/or adding some simple numbers to the end. The 4-digit dates become very prevalent once the 12-character minimum is reached, and the rule lists for 14-character minimums have several rules that add longer numbers or prefix the password with phrases like "ilove" or "mynameis". One of the more surprising features of the 12- and 14-character complex requirements are rules like "c $@ $g $m $a $i $l $. $c $o $m" that strongly indicate many people simply use their email address as their password! Above all, the top-64 results are important to this study because the very fact that the lists change so much as you alter the password requirements provides strong evidence that refining hashcat rule selection based on password requirements is worthwhile. People do not choose passwords the same way when confronted with an 8-character minimum as they do with a 14-character minimum and a mandatory three character-classes. We can study these behaviors and we can target them. The small gains offered by this project in our first attempt at targeting password requirements are modest, but the results demonstrate that there is real potential here to advance the art of password cracking in modern environments.

Caveats and Future Work

Although this project was successful at a basic level in producing hashcat rule sets that outperform OneRuleToRuleThemAll when the target passwords adhere to a known set of length and complexity requirements, there are some obvious weaknesses that limit its overall impact. These caveats are worth discussing not only because they place due contextual limits on the rule sets we created, but they also illuminate opportunities to advance this kind of password cracking even further.

Each of the following items highlights a limitation of this project’s results and a corresponding opportunity for further work.

  1. The public password leak data used to compile the Per-Policy Hash Lists came from several sources and involved tens of millions of passwords. Some basic deduplication was performed, but overlapping data and some duplicates are possible. Additionally, any numbers or terminology from temporary cultural trends have biased the data towards 2012 or earlier and weakened its relevance to contemporary passwords. The new rule sets we created, along with most of the popular rule sets in the community, could be greatly improved if they were reconstructed against a large collection of recently leaked passwords.
  2. 14-complex-50k.rule is actually 14-complex-21k.rule because only about 21,000 rules in the entire super_rules.txt collection were responsible for at least one cracked password. Combined with the very low rate of recovery for 14-character complex passwords seen in Table A, this suggests that the existing corpus of hashcat rules is woefully mismatched to how people create these longer passwords. Reshuffling existing rules might be a successful methodology for cracking short and medium length passwords, but it clearly starts to break down when the passwords get longer. The existing body of rules in the community was simply built on shorter passwords. There is a clear opportunity to achieve better success rates against these longer passwords by going back to the drawing board and creating novel rules for them.
  3. This project, like many previous password cracking efforts, is built on data that is mostly in the English language. Current password dictionaries and rule sets based mostly on English are likely to struggle with passwords written in other languages on different keyboards. The security community would greatly benefit from new password cracking resources designed to work with the vocabulary and patterns of other languages.

Minimum length and character class requirements are the de facto solution for forcing people to make more secure passwords. While such passwords are undeniably better than the ones created when there are no requirements, the optimal password requirements are still a subject of debate. Until password cracking (as it is practiced in the field) starts leaving behind its biases towards simple passwords from decade old data breaches, we are going to have a hard time properly stress-testing the current theories on password requirements. The rule sets we have created here are a first step towards adapting our password cracking tools to contemporary password environments. There is still a long way to go.

Tools And Rules Release

The new rule sets created during this project, as well as several scripts supporting the workflow used to create them, are available open source in our GitHub repo.

Aon’s Thought Leader
  • Ethan Wilkins
    Sr Consultant, Security Testing, Cyber Solutions

About Cyber Solutions:

Cyber security services are offered by Stroz Friedberg Inc., its subsidiaries and affiliates. Stroz Friedberg is part of Aon’s Cyber Solutions which offers holistic cyber risk management, unsurpassed investigative skills, and proprietary technologies to help clients uncover and quantify cyber risks, protect critical assets, and recover from cyber incidents.

General Disclaimer

This material has been prepared for informational purposes only and should not be relied on for any other purpose. You should consult with your own professional advisors or IT specialists before implementing any recommendation, following any of the steps or guidance provided herein. Although we endeavor to provide accurate and timely information and use sources that we consider reliable, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future.

Terms of Use

The contents herein may not be reproduced, reused, reprinted or redistributed without the expressed written consent of Aon, unless otherwise authorized by Aon. To use information contained herein, please write to our team.