Private Information Retrieval

Principal Investigator Shafrira Goldwasser

Project Website http://groups.csail.mit.edu/cis/pir.html

Private Information Retrieval (PIR) schemes allow users to retrieve information from a database while keeping their query private. Motivating examples for this problem include databases with sensitive information, such as stocks, patents or medical databases, in which users are likely to be highly motivated to hide which record they are trying to retrieve. PIR schemes aim at achieving this goal efficiently, where the main cost measure has traditionally been the communication complexity. PIR is a strong primitive, which may also be useful as a building block within other protocols.

Publicly accessible databases are an indispensable resource for retrieving up to date information. But they also pose a significant risk to the privacy of the user, since a curious database operator can follow the users queries and infer what the user is after. Indeed, in cases where the users' intensions are to be kept secret, users are often cautious about accessing the database.

The is a problem of protecting the privacy of the user accessing a public database. The database is represented by a binary vector x of length n. It is assumed that the user wants to learn the value of a certain bit x_i. Note that this problem allows a trivial solution: the user may request the value of every bit of the database. Clearly, in such a protocol the user's privacy is perfectly protected. However, there is a huge increase in the amount of communication needed for such a protocol. The user needs to communicate O(n) bits instead of O(log n) in the usual (non-private) scheme. It is shown that for a user accessing a single database, the trivial approach described above is the best possible. In order to reduce the communication complexity of private information retrieval (PIR) schemes, the authors propose to consider the scenario where the user is able to access k>1 replicated copies of the same database, and privately retrieve information. This means that every individual server (holding a replicated copy of the database) gets no information about the identity of the item retrieved by the user. Surprisingly even considering the case k=2, allows to decrease the communication complexity of PIR protocols from O(n) to O(n^{1/3}).

The initial work has drawn a lot of attention from other researchers in the fields of cryptography, complexity theory and the theory of error-correcting codes.

The main problem of the PIR related research is to reduce the communication complexity of PIR protocols. Following the seminal work of a sequence of improvements had been obtained. We introduce a new unifying approach to PIR. Our approach allows simpler and more intuitive proofs of many of the previously known bounds. We also obtain improvements (by polynomial factors) for various generalized settings of the PIR problem.

In spite of the large amount of work done by many researchers in the field of private information retrieval, the most important questions that have been raised are still far from resolution. The burning problem is to understand whether PIR protocols involving constant number of servers require polynomial (or poly-logarithmic) amount of communication. In our future work we plan to focus on improving the parameters of the PIR protocols. One route that we believe promising is to improve the understanding of a relation between an exceptionally good class of error-correcting codes (known as algebraic-geometry codes) and locally decodable codes. Better understanding of this relation may allow us to use powerful tools of algebra and algebraic geometry in designing the PIR protocols.