## A few thoughts on Obfuscation (1/2)

During the summer of 2008, I had an excellent opportunity to work with the Cryptography Group at Microsoft Research India, where I worked on obfuscation of computer programs.

Program Obfuscation is one of the long standing open problems in, and a veritable holy grail of cryptography. Although developers have been obfuscating programs using several heuristics for about 20 years now, one of the first formal cryptographic definitions of obfuscation was presented in this seminal paper by Barak et al. Interestingly, this paper, has several eminent authors (Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang).

In this paper, the authors defined obfuscation in a clean and neat manner. Informally, an algorithm is said to obfuscate a class of programs, if no “efficient” adversary can learn any more information from the (source code of the) obfuscated program than he would have otherwise learnt by accessing the functionality of the program alone. It is quite trivial to note that any obfuscated program must at least maintain the functionality of the original program (i.e., for all inputs, the outputs of the two programs are identical). A program is said to be obfuscated if the adversary can simply learn no more. This definition is called black box obfuscation.

What does this mean? Consider a video game or an important piece of software written by a big company (say Microsoft) that requires some user key to validate. However, we know that the validation is done by some software (i.e., some program) in the CD. Therefore, one can simply read off the source code from the CD, identify the piece of code that verifies that the key is legitimate, and simply delete the piece of code. Now, we can re-use, and distribute this piece of software (or game) without any need for buying licenses. This is in fact what exactly happens during the process of “cracking” games.

If we, however, obfuscate our program according to this definition, no malicious person (such as a person intent on pirating software) can learn anything about the program by reading the source code any more than he can by simply running the CD. Therefore, trivially, he cannot in any manner locate the piece of code that does the key verification and delete the offending piece of code. Although this definition is pretty strong (and we’ll see now, that this doesn’t come for free), it does ensure that software piracy becomes nearly impossibly.

We’ve seen that our definition of obfuscation is a powerful one, and as one might expect (if one were a pessimist), and indeed their paper shows, that a general black-box obfuscator (i.e., an obfuscator for *any* program) is impossible. To show this, they give a class of programs (which are highly contrived, but valid, nevertheless) which are inherently *unobfuscatable*, in the sense that giving access to *any* implementation of the function leaks a secret which would otherwise, be (nearly) impossible to get, if only given access to the functionality.

As a very interesting aside, for those with some knowledge of computability theory, Rice’s theorem surprisingly seems to imply that we should be able to obfuscate programs much more easily (because, in general, we cannot learn any non-trivial property of a turing machine in any decidable way other than by running the said turing machine and observing the outputs), but however, (black-box secure) obfuscation schemes are proving exceedingly difficult to achieve.

However, one needn’t lose sleep over this negative result, because by our own admission, we have concluded that although this definition of obfuscation is sufficient, it is perhaps unnecessarily too strong. Therefore, cryptographers have looked to obfuscation schemes that tackle smaller classes of programs. Surprisingly however (and quite counter-intuitive to the implications of Rice’s Theorem) obfuscating even special classes of functions (since obfuscating *all* functions is a lost cause) seems to be very non-trivial.

Till date, there have been only two classes of programs which have been successfully obfuscated (point-functions, multi-bit point functions and re-encryption schemes). The former two refer to functions which output 1 on access to a small number if inputs (you can think of a password-checker) and the latter refers to special cryptographic primitives called re-encryption schemes (schemes which allow you to encrypt the same message under a different key without having to decrypt the message) whose definition is very intimately related to obfuscation.

*In the next part, we shall explore alternate definitions of obfuscation, which have been slightly more successful, their implications, and how this leads to a very curious and interesting open problem.*

## Why Cryptomania?

The title of this blog comes from a very interesting paper by Russell Impagliazzo titled “My Personal View of Average-Case Complexity”. In this paper, he describes five possible worlds we live in and their implications to computer science. It’s an absolutely fascinating read and a you can find a terse summary here. However, for the lay person, I will try to explain what Cryptomania refers to.

In complexity theory, we are constantly dealing with the computational powers of various machines derived from the seminal Turing Machine. Two of the most important classes among these, central to the field, are the classes P and NP. Many people might have heard of the long standing open problem of whether P=NP?

Intuitively, one can say that, if P=NP, then we’d be in a world gifted with algorithms that run in polynomial time for a very large class of problems. This includes problems in routing, scheduling, graph coloring, and a host of other important problems. (For more details, see the class of NP-Complete problems). This is the first world that Impagliazzo imagines. In this world, aptly titled *Algorithmica*, we have fast algorithms (albeit probabilistic) for a huge host of difficult (as of today), but nevertheless, important problems.

In cryptography, however, we require problems that are fundamentally hard to solve, so that, we can design systems where in no adversarial (or malicious) person can gain access to a secret without solving these hard problems in the first place. Since almost all cryptographic protocols can eventually be broken by an adversary searching the entire space of possible solutions (there are exceptions, for eg. schemes that are information theoretically secure in the sense that there is no information whatsoever for the adversary to “find”), we settle ourselves to creating schemes breaking whom will necessitate an adversary work in the order of tens or hundreds of years.

Therefore, central to cryptography, is the presence of hard problems which fundamentally require anyone to expend exponential time (and/or space) to solve. Unfortunately, till date, there are no such guaranteed hard problems although we have several interesting candidates. One might, initially, imagine an optimistic dichotomy that dictates that we either have fast algorithms for potentially all problems, killing any hopes of cryptography, but making our lives easier in many other ways, or we instead have a world where there are several important problems (such as routing) that are hard, but we can take solace in the fact that the existence of these hard problems makes cryptography possible.

Unfortunately, the notion of hardness has a subtle nuance, which this paper addresses. Even if P does not equal to NP (to the consternation of algorithmists), we are in no way guaranteed that cryptography exists. Because, we require problems that are hard not just for *some* instances, but also, additionally, hard for an *average* instance (i.e., an instance chosen at random) which is referred to, in complexity theory, as “average-case hardness”.

The paper then proceeds to use this idea to hypothesize four more possible worlds we exist in, and their implications in some detail. The last of these worlds is aptly named *Cryptomania* where, we not only have such average-case hard problems, but also special classes of functions that are hard, called trapdoor functions which are immensely useful in cryptography and allows us to construct public-key schemes, i.e., schemes where people can exchange secrets without having ever interacted before, in a secure manner.

Already, we have strong indications that lead us in the direction of P≠NP, but it’s everyone’s hope that we go a few steps further and find ourselves in Cryptomania; sooner rather than later!

*PS: A special thanks to Vimal, my room-mate, for lending a patient ear, giving me an opportunity to elucidate my thoughts, and for encouraging me to start blogging!*