## A few thoughts on Obfuscation (2/2)

*This is the second part in a 2-part post on Obfuscation. You can read the first part here (Part 1).*

So, in the previous post, we looked at the definition of obfuscation, its relevance, and a rather pessimistic outlook for achieving black-box obfuscation. Naturally, this points to the fact that the definition of obfuscation is rather too strong. Therefore, to facilitate research, and new ideas in this field, people started to weaken the definition of obfuscation. A discerning viewer might realize (on reading the papers) that the obfuscation result shown in Wee’s paper itself requires a slight weakening of the security definition (where we are no longer protecting arbitrary predicates of an *arbitrary* program, but rather a predicate of a specific parameter, which defines the program uniquely within a *class* of programs).

It is an interesting problem to compare the difficulty of obfuscating parametrized programs to obfuscating a more general class of programs (where defining the class does not require programs to come from a “standard” template). However, note that, obfuscation definitions that protect predicates of only a specific parameter (as opposed to the general program description) are still immensely useful in practice, because we could protect the secret-keys of encryption schemes, MACs, and even delegate signatures with this (limited) definition of obfuscation. This is true because each of these schemes have implementations whose description is publicly available, and the security rests inherently on protecting all predicates of the secret-key (which may be as short as 256 bits) alone. For eg., one can see that, such a notion of obfuscation can trivially convert any symmetric encryption scheme to a public-key (or asymmetric) encryption scheme, because we can take any of several fast symmetric key encryption schemes, and run them through the obfuscator after instantiating it with a random key, and publicly publish the obfuscated algorithm. Only we have access to the decryption (since only we know the key) and if an obfuscator exists, the obfuscated code should reveal nothing about the key. VoilĂ ! Public-key schemes.

Back to the definition of obfuscation. It is quite clear that general obfuscators are very difficult to achieve even for selective classes of programs (if we stick to the black-box definition). People, in recent years, have started exploring alternate definitions of obfuscations. One of the first in this area was the paper by Goldwasser, Kalai, and Rothblum titled On Best Possible Obfuscation, which posited a modified, weaker definition of obfuscation. Rather than requiring that an adversary learn *nothing* from the obfuscated code, the adversary learns nothing more than what would be leaked out by *any* implementation of the program. In other words, we consider a scheme to be an obfuscation scheme, if it is the implementation that (in some sense) leaks the *least* information among all possible implementations of the primitive.

One can see that, going by this definition, best-possible obfuscation schemes exist for every cryptographic primitive. A couple of pertinent questions arise: How does one show that a particular implementation is the best that can be done? And, whether this purported best-possible obfuscation leaks some non-trivial information. If we are able to somehow show that even the best possible implementation leaks some non-trivial information, then we can be assured that (in a black-box) sense, we can do no better with this primitive. Tying this all together, Barak et. al’s original result (that started out this study of Obfuscation) claims that there are some programs which will succumb to the above premise (i.e., the best possible obfuscation itself leaks out the secret, and hence any obfuscation thereof either leads to a contradiction or is trivial).

In a few years time, people started experimenting with “looser” definitions of obfuscation. A very interesting definition goes as follows. We retain the notion of black-box obfuscation as defined originally, but we allow ourselves some leeway in defining the power of an adversary. In particular, we consider that there exists some “tamper-proof” hardware that has some computational power which we have access to. If we take this idea to the extreme, we realize that we can “cheat away” the obfuscation problem by implementing the entire program within the purported “tamper-proof” hardware. This is clearly side-stepping the problem, and moreover, it implies that if we want obfuscation to be practical, we need tamper-proof boxes for each and every possible obfuscated program we intend to run.

Therefore, the reasonable compromise in this definition that makes it meaningful and yet not too difficult is to assume that the tamper-proof hardware (people in complexity theory refer to this as oracles) has only access to a small amount of memory and runs in polynomial time (as opposed to how they are used in complexity theory). We then allow the obfuscating algorithm as well as any adversary to be able to query the hardware a small number of times. But no adversary can gain access to the internal contents of the hardware token.

A practical realization of this idea is to consider tamper-proof USB sticks. If, say Microsoft or Apple require to distribute software that they do not wish to be “cracked” or pirated, they simply obfuscate the software using a USB drive and then distribute the software with a USB drive for each customer. Now, as per the definition, to run the program the user (and hence any adversary) will require access to the hardware token, so the customer running the piece of software can simply insert the USB drive into the USB channel when the software requires to verify the registration key, and can continue to execute the piece of software. Even if we are able to digitally copy the software, without access to the USB drive, we will not be able to run the software (or learn any information about a legal registration key from the implementation).

There are some drawbacks. The only successful implementations so far require us to perform a huge number of computations for every step of the original program. Moreover, there has only been success in modeling the programs as circuits or polynomials of low degree. Although these are very robust, complete, standard, and structured way of modeling programs, they are far from efficient or practical. But the nice thing about this model is that, if we allow ourselves a security parameter of (say) 80 bits (i.e., every adversary must perform at least 2^80 computations before he learns anything useful—something that is considered impossible by today’s standards) then a very old result by Oded Goldreich and Rafail Ostrovsky shows that, we can obfuscate (any) program represented as a Turing Machine with a running time blow up of (up to a few orders of magnitude) 80^3 (approximately half a million). Interested people can read up on their paper here. This result is over 13 yrs old, but only recently was understood to imply obfuscation.

There is still some more work required however, to get from their paper to obfuscation. Although the result described above incredibly inefficient, it’s efficiency can be improved by a few orders of magnitude for simple circuits, as opposed to general turing machines. Unfortunately, that is still far from practical. The original result also requires that the hardware token maintain state (which implies that we have to make sure there are no frivolous queries to the USB stick, and that the states of the hardware token are synchronized initially with the obfuscator as well as the customer—a task that is far from trivial). My colleague Srivatsan (who is now at CMU) and I were able to get rid of the requirement of state, at the cost of bandwidth (to put it loosely) but it is encouraging to note that with a little work on a 13 year old result, we can show that universal obfuscation schemes exist; at least in a theoretical sense.

More recently (2008), in this paper, Goldwasser et. al, showed that the efficiency can be significantly improved (instead of the cubic inefficiency in the above general scheme, they require only a linear blowup in the running time) if we make a different kind of assumption. They proposed a model where instead of a hardware token (which can be realized as a USB drive), they have access to a one-time memory with a self-destruct property. Briefly, for every execution of the program, they need a secure piece of hardware that has 2 keys (k1, and k2), which upon receiving input (either 1 or 2) outputs the corresponding key and then self-destructs without revealing any information about the other key.

It is a rather subjective question as to which of the above two models are more realistic, but what we can say is that the latter assumption makes obfuscation significantly more efficient. Therefore, if tomorrow, a budding entrepreneur is able to come up with a cheap way to manufacture one-time keys for every execution instance of important programs, then we can obfuscate these said programs in a general way, as in their paper.

** My thoughts**: Since the blog post was titled a few thoughts, and now that the entire history of obfuscation is out of our way, let us move into the speculative realm where I would like to share interesting open problems in this area. The first open problem relates to the power of various notions of obfuscation. Obfuscation has been called, not unsurprisingly, the holy grail of cryptography, because it allows us to solve many outstanding problems in the field of cryptography. However, a large number of these open problems (as outlined in the seminal work) rely on the definition of obfuscation in the strictest sense. What do these subsequent weakened versions imply is still not completely understood. Moreover, at least among the models that require access to special hardware, little work has been done relating the various assumptions (so as to allow us to understand the relative strengths of each of these hardware assumptions).

Secondly, and more importantly, the most pressing open problem in my mind regarding obfuscation is: What are the *necessary conditions* for obfuscation. In the blog post we have seen how a “no-frills” obfuscation model rules out any possibility of universal obfuscation. But we have also seen that assumptions of “tamper-proof” and “one-time” hardware imply universal (albeit inefficient, in some cases) obfuscation. These are all sufficient conditions for obfuscation. We understand that without any additional support, it is impossible, but an outstanding breakthrough would be to identify what assumptions are (without a doubt) required before we can even think of coming up with universal obfuscation schemes. This would significantly improve our understanding of what it means to obfuscate programs and might well be one of the most important problems in cryptography today.