If you’ve been watching our GitHub wiki, following us on Twitter, or reading the wikitech-l mailing list, you’ve probably known for a while that Wikipedia has been transitioning to HHVM. This has been a long process involving lots of work from many different people, and as of a few weeks ago, all non-cached API and web traffic is being served by HHVM. This blog post from the Wikimedia Foundation contains some details about the switch, as does their page about HHVM.
I spent four weeks in July and August of 2014 working at the Wikimedia Foundation office in San Francisco to help them out with some final migration issues. While the primary goal was to assist in their switch to HHVM, it was also a great opportunity to experience HHVM as our open source contributors see it. I tried to do most of my work on WMF servers, using HHVM from GitHub rather than our internal repository. In addition to the work I did on HHVM itself, I also gave a talk about what the switch to HHVM means for Wikimedia developers.
Most of my time at Wikimedia was spent working on some fixes for HHVM’s DOMDocument support. MediaWiki, the software powering Wikipedia, allows users to import and export articles as XML files. To do this, it uses a custom PHP stream wrapper, which our implementation of DOMDocument didn’t properly support. We had a number of outstanding pull requests in this area, including a large one from from WMF developer Tim Starling. The main problem blocking its merge was a potential security issue: the default PHP settings allow XML external entity injection attacks, which we didn’t want to allow in HHVM. I combined the pending pull requests, cleaned them up a little, and implemented a protocol whitelist for external entities, keeping HHVM secure by default with the option to open things up if necessary. This is a good example of our general philosophy on PHP compatibility: we try to match PHP behavior exactly, unless doing so causes significant security issues or other problems.
We also spent a good amount of time debugging an issue with MediaWiki’s Lua scripting extension. This extension is compatible with both PHP 5.x and HHVM thanks to our Zend compatibility layer; if you’re thinking of switching to HHVM but have some native extensions you depend on, this is a great example to follow. The Lua library used by their extension is written in C and uses setjmp and longjmp to handle errors. In certain situations this was causing a number of C++ object destructors to be skipped, leading to memory corruption. Luckily, there was an easy fix once we figured out what was happening: when built as C++ code, the Lua library uses C++ exceptions instead of setjmp/longjmp, calling the appropriate destructors as the stack is unwound.
Once we had the correctness issues under control, I spent some time looking at HHVM’s performance on the MediaWiki workload. We’ve put a lot of engineering time into making HHVM fast, but most of that work has been on the PHP that powers Facebook. Like any large codebase, Facebook’s PHP code contains recurring idioms and design patterns, and HHVM is great at optimizing these. Looking at HHVM’s performance in the context of another large codebase, with its own recurring idioms and design patterns, was a great way to identify optimization opportunities we hadn’t yet discovered. I found two easy wins, the first of which was simply that the default Ubuntu PCRE package didn’t have the JIT enabled. We rebuilt PCRE with the JIT option enabled, which HHVM knows how to take advantage of, and saw a 8% improvement in the parsing benchmark.
The other tweak was also quite simple. PHP classes can have destructor functions that are run when instances of that class are destroyed, even if that only happens at the very end of the request because the object was kept alive by a global variable or a cyclic reference. Exactly matching PHP’s behavior here requires a small but measurable amount of extra bookkeeping at runtime, so HHVM has an option to not run destructors for objects that are alive at the end of a request. This is fine for PHP codebases like Facebook’s, where the application was written with this restriction in mind. MediaWiki, however, was developed using the standard PHP interpreter, so for correctness it should be run in the slower but more correct configuration. Since we weren’t using this configuration internally it wasn’t optimized very heavily, and I was able to improve the wikitext parser benchmark by another 4% with this commit. I then followed that up with another commit to make HHVM compatible by default, requiring anyone who needs that last bit of extra performance to opt into it.
While the Wikimedia Foundation blog post linked above has a complete picture of the performance gains they saw, we have a brief summary here as well. One important fact to keep in mind while while interpreting these results is that they were migrating from a fairly old version of PHP: 5.3. PHP 5.6 has some substantial performance improvements over 5.3, so the bump from PHP 5.6 to HHVM would likely be less dramatic. That said, switching all API/web servers over to HHVM is by no means the end of this process. There are a number of other tweaks that can be made to further improve HHVM’s performance, most notably using Repo Authoritative mode.
The first and most important graph shows the effect HHVM had on the experience for Wikipedia’s community of editors. While most page views on Wikipedia are served from a static cache without running any PHP, editing a page executes a lot of PHP and is one of the most expensive actions on the site. We were able to directly compare PHP 5.3 and HHVM during the transition, when some application servers were running PHP and some were running HHVM:
As you can see, the median page save time on the HHVM application servers was about 45% of PHP’s. This means that every time someone previews or saves an edit, they get the result back about twice as fast. And 7 seconds was just the median save time – there are lots of larger pages that used to take tens of seconds to save, even causing timeout errors in some extreme cases. These numbers look great but they don’t tell the whole story. Total page save time includes time spent waiting on the database (or any other IO), so the effects on CPU usage were even more pronounced. While reduced CPU usage doesn’t directly affect the user experience, it does reduce hardware costs by allowing a smaller number of servers to handle the same amount of traffic. Here’s a graph showing CPU usage of a random MediaWiki server, before and after it was converted to HHVM:
There are two clear steps downward: the initial switch to HHVM, and then a configuration tweak. The tweak was enabling the
hhvm.server_stat_cache config option, which caches the results of stat calls and uses inotify to detect when files are changed. It’s hard to make any very precise claims about this graph since load varies with time of day and other factors, but by comparing daily peaks we can get some reasonable numbers. Peak CPU utilization dropped from about 70% to 12%, almost a 6x improvement.
Having free access to all of the knowledge contained in Wikipedia is an incredible resource for hundreds of millions of people around the world. Everyone on the HHVM team is excited to see our hard work helping out such an important project, and we’re optimistic about seeing even more performance gains in the future. The success of this transition is also a great indicator of the growing strength of the open-source community surrounding HHVM; Wikimedia developers Tim Starling, Ori Livneh, Erik Bernhardson, and Chad Horohoe all contributed a lot of work back to HHVM during the migration process.