Archive for the ‘open problems’ Category

A few thoughts on Obfuscation (1/2)

February 25, 2010 1 comment

During the summer of 2008, I had an excellent opportunity to work with the Cryptography Group at Microsoft Research India, where I worked on obfuscation of computer programs.

Program Obfuscation is one of the long standing open problems in, and a veritable holy grail of cryptography. Although developers have been obfuscating programs using several heuristics for about 20 years now, one of the first formal cryptographic definitions of obfuscation was presented in this seminal paper by Barak et al. Interestingly, this paper, has several eminent authors (Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang).

In this paper, the authors defined obfuscation in a clean and neat manner. Informally, an algorithm is said to obfuscate a class of programs, if no “efficient” adversary can learn any more information from the (source code of the) obfuscated program than he would have otherwise learnt by accessing the functionality of the program alone. It is quite trivial to note that any obfuscated program must at least maintain the functionality of the original program (i.e., for all inputs, the outputs of the two programs are identical). A program is said to be obfuscated if the adversary can simply learn no more. This definition is called black box obfuscation.

What does this mean? Consider a video game or an important piece of software written by a big company (say Microsoft) that requires some user key to validate. However, we know that the validation is done by some software (i.e., some program) in the CD. Therefore, one can simply read off the source code from the CD, identify the piece of code that verifies that the key is legitimate, and simply delete the piece of code. Now, we can re-use, and distribute this piece of software (or game) without any need for buying licenses. This is in fact what exactly happens during the process of “cracking” games.

If we, however, obfuscate our program according to this definition, no malicious person (such as a person intent on pirating software) can learn anything about the program by reading the source code any more than he can by simply running the CD. Therefore, trivially, he cannot in any manner locate the piece of code that does the key verification and delete the offending piece of code. Although this definition is pretty strong (and we’ll see now, that this doesn’t come for free), it does ensure that software piracy becomes nearly impossibly.

We’ve seen that our definition of obfuscation is a powerful one, and as one might expect (if one were a pessimist), and indeed their paper shows, that a general black-box obfuscator (i.e., an obfuscator for any program) is impossible. To show this, they give a class of programs (which are highly contrived, but valid, nevertheless) which are inherently unobfuscatable, in the sense that giving access to any implementation of the function leaks a secret which would otherwise, be (nearly) impossible to get, if only given access to the functionality.

As a very interesting aside, for those with some knowledge of computability theory, Rice’s theorem surprisingly seems to imply that we should be able to obfuscate programs much more easily (because, in general, we cannot learn any non-trivial property of a turing machine in any decidable way other than by running the said turing machine and observing the outputs), but however, (black-box secure) obfuscation schemes are proving exceedingly difficult to achieve.

However, one needn’t lose sleep over this negative result, because by our own admission, we have concluded that although this definition of obfuscation is sufficient, it is perhaps unnecessarily too strong. Therefore, cryptographers have looked to obfuscation schemes that tackle smaller classes of programs. Surprisingly however (and quite counter-intuitive to the implications of Rice’s Theorem) obfuscating even special classes of functions (since obfuscating all functions is a lost cause) seems to be very non-trivial.

Till date, there have been only two classes of programs which have been successfully obfuscated (point-functions, multi-bit point functions and re-encryption schemes). The former two refer to functions which output 1 on access to a small number if inputs (you can think of a password-checker) and the latter refers to special cryptographic primitives called re-encryption schemes (schemes which allow you to encrypt the same message under a different key without having to decrypt the message) whose definition is very intimately related to obfuscation.

In the next part, we shall explore alternate definitions of obfuscation, which have been slightly more successful, their implications, and how this leads to a very curious and interesting open problem.

Categories: open problems