Entry Date:
August 3, 2007

Protecting SSH from known_hosts Address Harvesting

Principal Investigator Hari Balakrishnan

Co-investigator Cynthia D McLain


Address harvesting is the act of searching a compromised host for addresses of other hosts to attack. SSH, the tool of choice for administering and communicating with mission-critical hosts, security-critical hosts, and even some routers, leaves each user's list of previously contacted hosts open to harvest by anyone who compromises the user's account. Attackers have combined address harvesting with myriad mechanisms to impersonate legitimate users to authenticate to SSH. They have succeeded in breaching systems at major academic, commercial, and government institutions. In this study, we detail the threat posed should attackers automate this mode of attack to create a self-propagating worm. We then present a countermeasure to defend against address harvesting attacks, with an implementation written for OpenSSH.

If you use SSH, your ssh client stores within your home directory a list that maps the host names and IP addresses of every remote host you have connected to with each host's public key. This database, known as known_hosts file, has been used by attackers who compromise user accounts, steal passwords and identity keys, and then use the list of hosts to identify targets on which the same password or key can be used to compromise additional accounts. It is also possible that worms could use known_hosts data to identify new targets.

As of [September 12, 2005], we have collected known_hosts data from 179 hosts, 69 of which ran the script as root and submitted data from all user accounts. In total, we received 37,771 anonymized known_hosts entries from user accounts. These known_hosts entries lead to a total of 12,041 on 107 valid /8 networks (67% of all valid /8 networks).

The data collection script that was run on these hosts also parsed SSH2 identity key files to see what what fraction of these key files had the encryption flag set. We were quite surprised to see that only 38.3% of 447 key files were encrypted.