Entry Date:
August 3, 2007

Real-Time Anomaly Detection

Principal Investigator Hari Balakrishnan


Attackers routinely scan the IP address space of a target network to seek out vulnerable hosts that they can exploit. One of the challenges is the difficulty in defining a portscan activity. How to perform portscan (i.e. the scanning rate and the coverage of the IP address) is entirely up to each scanner; therefore, the scanner can evade any detection algorithm that depends on the parameters that are under its control. However, in principle, the access pattern of port-scanning can be quite different from that of other legitimate activities. Since port scanners have little knowledge of the configuration of a target network (they would not have to scan the network otherwise), their access pattern often includes non-existent hosts or hosts that do not have the requested service running. On the contrary, there is little reason for legitimate users to initiate connection requests to inactive servers. Based on this observation, we formulate a detection problem that provides a basis on an online algorithm. For more detailed treatment of these bounds and the evaluation of the detection algorithm using real network traces.

Worm Detection and Throttling:

A worm is a program containing malicious code that spreads from host to host without human intervention. One instance of such software malcode, a scanning worm, vastly probes a set of randomly chosen IP addresses to locate vulnerable servers that it wish to infect. Analogous to the previous portscan detection problem, this random scanning behavior can be used to identify an infected machine that is engaged in worm propagation. While using sequential hypothesis testing has promise for detecting scanning worms, there is a significant hurdle to overcome. We discuss problems at length and present two innovations that enable us to develop a fast detection of scanning worms

(*) Reverse Sequential Hypothesis Testing -- It is important to design a detection algorithm so that it promptly reacts when a benign host is infected and becomes a worm. To address this, we evaluate the likelihood ratio in reverse chronological order of the observations. We show that the reverse sequential hypothesis testing is equivalent to the forward sequential hypothesis testing that sets the lower threshold to 1-e, resets the likelihood ratio to 1 and repeats the test when the likelihood ratio crosses the lower threshold and terminates the test only when the likelihood ratio becomes greater than equal to the upper threshold.

(*) Credit-Based Connection Rate Limiting -- It is necessary to limit the rate at which first contact connections can be initiated in order to ensure that worms cannot propagate rapidly between the moment scanning begins and time at which the scan's first-contact connections are timed out and observed to be failures by our reverse sequential hypothesis test. Credit-based connection rate limiting works by granting each local host, i, a starting balance of ten credits (Ci = 10) which can be used for issuing first-contact connection requests. Whenever a first-contact connection request is observed, a credit is subtracted from the sending host's balance (Ci = Ci-1). If the successful acknowledgment of a first-contact connection is observed, the host that initiated the request is issued two additional credits (Ci = Ci+2). No action is taken when connections fail, as the cost of issuing a first-contact connection has already been deducted from the issuing host's balance. Finally, first-contact connections are blocked if the host does not have any credit available.

Research Agenda:

(1) Understanding anomalous network activities: In many problem domains, we lack good models for anomalous network activities that capture unique characteristics useful for distinguishing them from benign ones. To address this, we intend to take traces from many vantage points and to look for patterns that can be incorporated into a model.

(2) Algorithms resilient to evasion: When designing detection algorithms in network security, one must be concerning with adversaries who can craft an attack to evade detection once the algorithm is publicized. The barrier should be high enough to resist evasion.

(3) Evaluation: Detection algorithms must be evaluated both analytically and through trace-driven simulation and false positive and false negative cases should be analyzed. Also, estimating the amount of states required to run the algorithm is important since real-time detection of network anomalies often requires monitoring high-bandwidth networks.

(4) Extension to distributed detection systems: We also plan on extending this work to distributed real-time detection problems. For instance, Internet-scale worm propagation can be better identified if detection systems are deployed over many places to cover various vantage points. Coordination among distributed detection systems should be one of the key design components.