Javascript Obfuscation

Every now and then we get a question about our Javascript obfuscation tools. There are some tools out there that claim to be able to obfuscate any Javascript, but we haven't been able to find a tool that could hit the sweet spot between human and machine readability. The results were either human readable (though, admittedly, not by most humans) or not machine readable. So we set off to build our own.

Our first effort was a very crude and naive Javascript-script (err) that used regexps to replace strings by, well, shorter strings. It has a list of protected keywords (Javascript, DOM and our public API). It did not even take string literals into account, but it worked better than anything else out there. Unfortunately it became rather slow as Xopus grew, taking up to a minute for the entire build cycle. Having to wait a minute to test your interpreted application seriously cripples the development process, so we had to come up with something else.

The second effort we wrote in Java. This obfuscator is still in use today. It is based on a Javascript lexer and it has some understanding of the flexible use of identifiers in Javascript. It still uses a list of protected words, but now it can warn you if you use an identifier in a string which may break the obfuscated code. It also has a mode to generate obfuscated but human readable output, which will help debug problems caused by the obfuscator. It still isn't fool proof, but if you stick to a few rules, it can be quite robust. Q42 even uses it at some other projects. But the best features of this obfuscator are the speed; it can obfuscate and gather all Xopus files in 4 seconds, that is faster than file copy using Windows Explorer! And it uses Unicode characters for output, which drastically reduces the file size and readability. Parts of the Xopus code will even be rendered in right-to-left languages which look like this (indented for clarity):

function է()
{
  var خ=this.դ.getElementsByTagName('xml')[0];
  if(!خ) 
    return;
  var د=this.ذ(خ);
  this.ر=د;
  this.ز(د);
  this.س(د);
  this.ش(د);
  for(var M=0;M<this.ե.length;M++)
    ɛ.ص(this.ե[M],this.window);
},

function ض(د) 
{
  Ý.Ť(د.selectNodes( 
      "//@src|//@xml|//@xsd|//@xsl|//@url"),this.ũ);
}

You see that punctuation now looks confusing due to the partial right-to-left rendering. The third assignment is even inverted, making this very hard to read. However, this is still just replacement of identifiers, so there is room left for improvement. Since the obfuscator is not scope aware, it can not re-use identifiers for local variables for instance.

Our third attempt is currently in the research phase. The focus is less on obfuscation and more on performance and size of the output code. In the code above we could inline the second function and move the length reference outside of the for loop to improve performance. And we could replace local variables with single byte names (these hebrew characters use 2 bytes in UTF-8) and use [] notation (var a="getElementsByTagName";this.դ[a]) for long protected properties and methods to reduce the length. Parts of Xopus are written in XSL, so we want to obfuscate that as well.

To achieve these goals we need to parse and compile Javascript. But wait.. compile a dynamic language like Javascript? What can be compiled if functions can be added to and removed from objects at will? Not much, so we will have to stick to a subset of Javascript. And this is not a bad thing. Javascript was never meant to be used to build applications like Xopus. It's flexibility helps us to rapidly create prototypes, but it also makes the code very hard to maintain and debug. So while prototyping we want to be flexible but in production we want something more robust.

So we have developed, like so many others, a Javascript framework that does inheritance (including interfaces, multiple inheritance, polymorphism, decoration and signature checking), resource and package management, access control and event handling (again including signature checking). We're planning to add unit testing as well.

Given that framework, we can parse our code and build a parse tree that remains stable at runtime. Using that parse tree we can do the obfuscation and compiler optimizations described above. We already have a partial proof of these concepts running in Haskell. We hope to soon be able to embed this technology in our daily process to further improve the quality of Xopus.

Modified: August 27th 2007
By: Laurens van den Oever

DeanEdwards
anonymous user
January 26th 2008
(var a="getElementsByTagName";this.դ[a])

I looked at using something like this in packer. Unfortunately it can reduce overall performance by about 20% if you use it everywhere.