## A few thoughts on Obfuscation (2/2)

*This is the second part in a 2-part post on Obfuscation. You can read the first part here (Part 1).*

So, in the previous post, we looked at the definition of obfuscation, its relevance, and a rather pessimistic outlook for achieving black-box obfuscation. Naturally, this points to the fact that the definition of obfuscation is rather too strong. Therefore, to facilitate research, and new ideas in this field, people started to weaken the definition of obfuscation. A discerning viewer might realize (on reading the papers) that the obfuscation result shown in Wee’s paper itself requires a slight weakening of the security definition (where we are no longer protecting arbitrary predicates of an *arbitrary* program, but rather a predicate of a specific parameter, which defines the program uniquely within a *class* of programs).

It is an interesting problem to compare the difficulty of obfuscating parametrized programs to obfuscating a more general class of programs (where defining the class does not require programs to come from a “standard” template). However, note that, obfuscation definitions that protect predicates of only a specific parameter (as opposed to the general program description) are still immensely useful in practice, because we could protect the secret-keys of encryption schemes, MACs, and even delegate signatures with this (limited) definition of obfuscation. This is true because each of these schemes have implementations whose description is publicly available, and the security rests inherently on protecting all predicates of the secret-key (which may be as short as 256 bits) alone. For eg., one can see that, such a notion of obfuscation can trivially convert any symmetric encryption scheme to a public-key (or asymmetric) encryption scheme, because we can take any of several fast symmetric key encryption schemes, and run them through the obfuscator after instantiating it with a random key, and publicly publish the obfuscated algorithm. Only we have access to the decryption (since only we know the key) and if an obfuscator exists, the obfuscated code should reveal nothing about the key. Voilà! Public-key schemes.

Back to the definition of obfuscation. It is quite clear that general obfuscators are very difficult to achieve even for selective classes of programs (if we stick to the black-box definition). People, in recent years, have started exploring alternate definitions of obfuscations. One of the first in this area was the paper by Goldwasser, Kalai, and Rothblum titled On Best Possible Obfuscation, which posited a modified, weaker definition of obfuscation. Rather than requiring that an adversary learn *nothing* from the obfuscated code, the adversary learns nothing more than what would be leaked out by *any* implementation of the program. In other words, we consider a scheme to be an obfuscation scheme, if it is the implementation that (in some sense) leaks the *least* information among all possible implementations of the primitive.

One can see that, going by this definition, best-possible obfuscation schemes exist for every cryptographic primitive. A couple of pertinent questions arise: How does one show that a particular implementation is the best that can be done? And, whether this purported best-possible obfuscation leaks some non-trivial information. If we are able to somehow show that even the best possible implementation leaks some non-trivial information, then we can be assured that (in a black-box) sense, we can do no better with this primitive. Tying this all together, Barak et. al’s original result (that started out this study of Obfuscation) claims that there are some programs which will succumb to the above premise (i.e., the best possible obfuscation itself leaks out the secret, and hence any obfuscation thereof either leads to a contradiction or is trivial).

In a few years time, people started experimenting with “looser” definitions of obfuscation. A very interesting definition goes as follows. We retain the notion of black-box obfuscation as defined originally, but we allow ourselves some leeway in defining the power of an adversary. In particular, we consider that there exists some “tamper-proof” hardware that has some computational power which we have access to. If we take this idea to the extreme, we realize that we can “cheat away” the obfuscation problem by implementing the entire program within the purported “tamper-proof” hardware. This is clearly side-stepping the problem, and moreover, it implies that if we want obfuscation to be practical, we need tamper-proof boxes for each and every possible obfuscated program we intend to run.

Therefore, the reasonable compromise in this definition that makes it meaningful and yet not too difficult is to assume that the tamper-proof hardware (people in complexity theory refer to this as oracles) has only access to a small amount of memory and runs in polynomial time (as opposed to how they are used in complexity theory). We then allow the obfuscating algorithm as well as any adversary to be able to query the hardware a small number of times. But no adversary can gain access to the internal contents of the hardware token.

A practical realization of this idea is to consider tamper-proof USB sticks. If, say Microsoft or Apple require to distribute software that they do not wish to be “cracked” or pirated, they simply obfuscate the software using a USB drive and then distribute the software with a USB drive for each customer. Now, as per the definition, to run the program the user (and hence any adversary) will require access to the hardware token, so the customer running the piece of software can simply insert the USB drive into the USB channel when the software requires to verify the registration key, and can continue to execute the piece of software. Even if we are able to digitally copy the software, without access to the USB drive, we will not be able to run the software (or learn any information about a legal registration key from the implementation).

There are some drawbacks. The only successful implementations so far require us to perform a huge number of computations for every step of the original program. Moreover, there has only been success in modeling the programs as circuits or polynomials of low degree. Although these are very robust, complete, standard, and structured way of modeling programs, they are far from efficient or practical. But the nice thing about this model is that, if we allow ourselves a security parameter of (say) 80 bits (i.e., every adversary must perform at least 2^80 computations before he learns anything useful—something that is considered impossible by today’s standards) then a very old result by Oded Goldreich and Rafail Ostrovsky shows that, we can obfuscate (any) program represented as a Turing Machine with a running time blow up of (up to a few orders of magnitude) 80^3 (approximately half a million). Interested people can read up on their paper here. This result is over 13 yrs old, but only recently was understood to imply obfuscation.

There is still some more work required however, to get from their paper to obfuscation. Although the result described above incredibly inefficient, it’s efficiency can be improved by a few orders of magnitude for simple circuits, as opposed to general turing machines. Unfortunately, that is still far from practical. The original result also requires that the hardware token maintain state (which implies that we have to make sure there are no frivolous queries to the USB stick, and that the states of the hardware token are synchronized initially with the obfuscator as well as the customer—a task that is far from trivial). My colleague Srivatsan (who is now at CMU) and I were able to get rid of the requirement of state, at the cost of bandwidth (to put it loosely) but it is encouraging to note that with a little work on a 13 year old result, we can show that universal obfuscation schemes exist; at least in a theoretical sense.

More recently (2008), in this paper, Goldwasser et. al, showed that the efficiency can be significantly improved (instead of the cubic inefficiency in the above general scheme, they require only a linear blowup in the running time) if we make a different kind of assumption. They proposed a model where instead of a hardware token (which can be realized as a USB drive), they have access to a one-time memory with a self-destruct property. Briefly, for every execution of the program, they need a secure piece of hardware that has 2 keys (k1, and k2), which upon receiving input (either 1 or 2) outputs the corresponding key and then self-destructs without revealing any information about the other key.

It is a rather subjective question as to which of the above two models are more realistic, but what we can say is that the latter assumption makes obfuscation significantly more efficient. Therefore, if tomorrow, a budding entrepreneur is able to come up with a cheap way to manufacture one-time keys for every execution instance of important programs, then we can obfuscate these said programs in a general way, as in their paper.

** My thoughts**: Since the blog post was titled a few thoughts, and now that the entire history of obfuscation is out of our way, let us move into the speculative realm where I would like to share interesting open problems in this area. The first open problem relates to the power of various notions of obfuscation. Obfuscation has been called, not unsurprisingly, the holy grail of cryptography, because it allows us to solve many outstanding problems in the field of cryptography. However, a large number of these open problems (as outlined in the seminal work) rely on the definition of obfuscation in the strictest sense. What do these subsequent weakened versions imply is still not completely understood. Moreover, at least among the models that require access to special hardware, little work has been done relating the various assumptions (so as to allow us to understand the relative strengths of each of these hardware assumptions).

Secondly, and more importantly, the most pressing open problem in my mind regarding obfuscation is: What are the *necessary conditions* for obfuscation. In the blog post we have seen how a “no-frills” obfuscation model rules out any possibility of universal obfuscation. But we have also seen that assumptions of “tamper-proof” and “one-time” hardware imply universal (albeit inefficient, in some cases) obfuscation. These are all sufficient conditions for obfuscation. We understand that without any additional support, it is impossible, but an outstanding breakthrough would be to identify what assumptions are (without a doubt) required before we can even think of coming up with universal obfuscation schemes. This would significantly improve our understanding of what it means to obfuscate programs and might well be one of the most important problems in cryptography today.

## A few thoughts on Obfuscation (1/2)

During the summer of 2008, I had an excellent opportunity to work with the Cryptography Group at Microsoft Research India, where I worked on obfuscation of computer programs.

Program Obfuscation is one of the long standing open problems in, and a veritable holy grail of cryptography. Although developers have been obfuscating programs using several heuristics for about 20 years now, one of the first formal cryptographic definitions of obfuscation was presented in this seminal paper by Barak et al. Interestingly, this paper, has several eminent authors (Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang).

In this paper, the authors defined obfuscation in a clean and neat manner. Informally, an algorithm is said to obfuscate a class of programs, if no “efficient” adversary can learn any more information from the (source code of the) obfuscated program than he would have otherwise learnt by accessing the functionality of the program alone. It is quite trivial to note that any obfuscated program must at least maintain the functionality of the original program (i.e., for all inputs, the outputs of the two programs are identical). A program is said to be obfuscated if the adversary can simply learn no more. This definition is called black box obfuscation.

What does this mean? Consider a video game or an important piece of software written by a big company (say Microsoft) that requires some user key to validate. However, we know that the validation is done by some software (i.e., some program) in the CD. Therefore, one can simply read off the source code from the CD, identify the piece of code that verifies that the key is legitimate, and simply delete the piece of code. Now, we can re-use, and distribute this piece of software (or game) without any need for buying licenses. This is in fact what exactly happens during the process of “cracking” games.

If we, however, obfuscate our program according to this definition, no malicious person (such as a person intent on pirating software) can learn anything about the program by reading the source code any more than he can by simply running the CD. Therefore, trivially, he cannot in any manner locate the piece of code that does the key verification and delete the offending piece of code. Although this definition is pretty strong (and we’ll see now, that this doesn’t come for free), it does ensure that software piracy becomes nearly impossibly.

We’ve seen that our definition of obfuscation is a powerful one, and as one might expect (if one were a pessimist), and indeed their paper shows, that a general black-box obfuscator (i.e., an obfuscator for *any* program) is impossible. To show this, they give a class of programs (which are highly contrived, but valid, nevertheless) which are inherently *unobfuscatable*, in the sense that giving access to *any* implementation of the function leaks a secret which would otherwise, be (nearly) impossible to get, if only given access to the functionality.

As a very interesting aside, for those with some knowledge of computability theory, Rice’s theorem surprisingly seems to imply that we should be able to obfuscate programs much more easily (because, in general, we cannot learn any non-trivial property of a turing machine in any decidable way other than by running the said turing machine and observing the outputs), but however, (black-box secure) obfuscation schemes are proving exceedingly difficult to achieve.

However, one needn’t lose sleep over this negative result, because by our own admission, we have concluded that although this definition of obfuscation is sufficient, it is perhaps unnecessarily too strong. Therefore, cryptographers have looked to obfuscation schemes that tackle smaller classes of programs. Surprisingly however (and quite counter-intuitive to the implications of Rice’s Theorem) obfuscating even special classes of functions (since obfuscating *all* functions is a lost cause) seems to be very non-trivial.

Till date, there have been only two classes of programs which have been successfully obfuscated (point-functions, multi-bit point functions and re-encryption schemes). The former two refer to functions which output 1 on access to a small number if inputs (you can think of a password-checker) and the latter refers to special cryptographic primitives called re-encryption schemes (schemes which allow you to encrypt the same message under a different key without having to decrypt the message) whose definition is very intimately related to obfuscation.

*In the next part, we shall explore alternate definitions of obfuscation, which have been slightly more successful, their implications, and how this leads to a very curious and interesting open problem.*

## Why Cryptomania?

The title of this blog comes from a very interesting paper by Russell Impagliazzo titled “My Personal View of Average-Case Complexity”. In this paper, he describes five possible worlds we live in and their implications to computer science. It’s an absolutely fascinating read and a you can find a terse summary here. However, for the lay person, I will try to explain what Cryptomania refers to.

In complexity theory, we are constantly dealing with the computational powers of various machines derived from the seminal Turing Machine. Two of the most important classes among these, central to the field, are the classes P and NP. Many people might have heard of the long standing open problem of whether P=NP?

Intuitively, one can say that, if P=NP, then we’d be in a world gifted with algorithms that run in polynomial time for a very large class of problems. This includes problems in routing, scheduling, graph coloring, and a host of other important problems. (For more details, see the class of NP-Complete problems). This is the first world that Impagliazzo imagines. In this world, aptly titled *Algorithmica*, we have fast algorithms (albeit probabilistic) for a huge host of difficult (as of today), but nevertheless, important problems.

In cryptography, however, we require problems that are fundamentally hard to solve, so that, we can design systems where in no adversarial (or malicious) person can gain access to a secret without solving these hard problems in the first place. Since almost all cryptographic protocols can eventually be broken by an adversary searching the entire space of possible solutions (there are exceptions, for eg. schemes that are information theoretically secure in the sense that there is no information whatsoever for the adversary to “find”), we settle ourselves to creating schemes breaking whom will necessitate an adversary work in the order of tens or hundreds of years.

Therefore, central to cryptography, is the presence of hard problems which fundamentally require anyone to expend exponential time (and/or space) to solve. Unfortunately, till date, there are no such guaranteed hard problems although we have several interesting candidates. One might, initially, imagine an optimistic dichotomy that dictates that we either have fast algorithms for potentially all problems, killing any hopes of cryptography, but making our lives easier in many other ways, or we instead have a world where there are several important problems (such as routing) that are hard, but we can take solace in the fact that the existence of these hard problems makes cryptography possible.

Unfortunately, the notion of hardness has a subtle nuance, which this paper addresses. Even if P does not equal to NP (to the consternation of algorithmists), we are in no way guaranteed that cryptography exists. Because, we require problems that are hard not just for *some* instances, but also, additionally, hard for an *average* instance (i.e., an instance chosen at random) which is referred to, in complexity theory, as “average-case hardness”.

The paper then proceeds to use this idea to hypothesize four more possible worlds we exist in, and their implications in some detail. The last of these worlds is aptly named *Cryptomania* where, we not only have such average-case hard problems, but also special classes of functions that are hard, called trapdoor functions which are immensely useful in cryptography and allows us to construct public-key schemes, i.e., schemes where people can exchange secrets without having ever interacted before, in a secure manner.

Already, we have strong indications that lead us in the direction of P≠NP, but it’s everyone’s hope that we go a few steps further and find ourselves in Cryptomania; sooner rather than later!

*PS: A special thanks to Vimal, my room-mate, for lending a patient ear, giving me an opportunity to elucidate my thoughts, and for encouraging me to start blogging!*