HHVM 3.15

Posted on September 15, 2016 by

The next LTS release of HHVM has been cut and packages are now available for Ubuntu 14.04, 15.04, 15.10, and 16.04; and Debian 7 and 8. This release replaces the 3.9 LTS and brings with it a host of improvements and new features.

The 3.15 release features native support for the shmop, pg-sql, and scrypt extensions, enhancements to the garbage collector, critical fixes for TC recycles which had been broken in recent releases. Also included in this release, various bug and incompatibility patches, and enhancements to inlining, method dispatch, and other important performance optimizations.

Many of the extension and user-facing bug fixes included in this release would not have been possible without the support of our community, both through pull requests and bug reports. Special thanks to Simon and the many other contributors to the recently merged pg-sql extension, and Joe, William, and the rest of our friends at LastPass for their work on the scrypt extension.

Instructions for installing HHVM from either packages or source can be found in our documentation. Should you encounter any new issues with this release please file a bug report on our issue tracker.

Posted in Announcement | Leave a reply



Improved User Documentation

Posted on December 9, 2015 by

We are happy to announce our next generation of Hack and HHVM user documentation available at http://docs.hhvm.com.

Back in August, we announced that we are going full force in revamping user documentation. We sent out a public survey to gauge the standing on the existing documentation at the time. We had 160 responses to the survey. Those results served as both validation and a guide to our approach with the new documentation.

The key takeaways from the survey results were as follows:

  • The existing documentation is generally good, but could use improvement.
  • The look and feel of the site could use work.
  • Async, Collections and the Type System could use better content.
  • More sample code, a clear API reference and feature tutorials are must haves.
  • The examples were not runnable or sometimes even wrong.
  • No more docbook.

We believe we have made big improvements to all these areas.

New GitHub Repo

We have a new GitHub repo for the documentation. The name “user-documentation” reflects that this documentation is focused primarily to the users of Hack and HHVM. This repo contains content, the source code to render the site, and detailed README information for running the site locally and contributing content.

Site Structure

These docs have been written from the ground up, both from content and infrastructure. There are three main sections consisting of two user guides and one API reference:

The guides provide “what” and “how-to” information on a variety of topics, including code examples.

The API reference is a raw class and method reference for everything we have added to Hack. There is documentation for each class, each method on a class and for stand-alone functions, many of which will also include an example.

It is worth noting that the old documentation combined a copy of the http://php.net documentation with the specific documentation for Hack and HHVM. This caused two problems. First, the PHP-specific documentation could become stale if updates were made on the php.net side. Secondly, this could confuse readers who may think that this was Hack and HHVM specific documentation since search engines linked to our website for that documentation in many cases. To solve this, we are intentionally not including the php.net information in our documentation; instead we will be redirecting requests to those pages to the actual php.net source of truth.

Site Technology

The source code that generates the site is written entirely in Hack, referencing some third-party libraries that are written in Hack, PHP and other languages. This is a real-world use of Hack — the generation of the Hack and HHVM documentation with the Hack programming language.

Documentation

The API documentation is generated from the various HHI, HNI and Systemlib files that exist in the HHVM source code — i.e., code is the source of truth for the API reference.

All of the guides are written in markdown, which we will feel is a much more user friendly format than the docblock representation in the previous documentation. And this will also be more amenable to users of the documentation to provide issues and pull requests.

Examples

Unless it is an explicit code snippet to support documentation, all of our examples are runnable code. You can copy and paste the code and run it directly with HHVM and the Hack typechecker. We have also made all of our examples compatible with the HHVM test runner. This helps ensure that all our examples produce the expected output, from both an HHVM and Hack perspective.

Site Generation

In a nutshell, the site generation code does two main things. On the raw API side, it uses reflection on the HHVM codebase to produce a Hack/HHVM/PHP-agnostic intermediate YAML format format, which in turn generates the Markdown documentation for the class and function signatures, etc. Then, all of the markdown, including that from the the guide and API commentary side, is converted to HTML.

Another benefit of the YAML production is that we hope to re-use this framework for other languages/projects in the future.

After dependency installation, the entire build process takes on the order of less than one minute, so forking and testing changes to the documentation should be nearly painless.

Feedback

Please have a look around the new site. For a short period of time, you may see a survey banner at the top of the documentation pages. We would really appreciate your feedback on the new site through this survey so that we ensure that we met the quality bar we tried to achieve with this revamp.

If you see any problems on a particular documentation page, please use the issue link at the bottom of that page to file a GitHub issue. For any general problems, please file an issue directly.

Thank you and enjoy!

Posted in Announcement | Leave a reply



PHP 7 Support

Posted on December 4, 2015 by

For those that haven’t been following along, the next version of the PHP language, version 7.0.0, was very recently released. Those of us working on HHVM offer our congratulations to all the contributors to this latest release! We’re all really excited to see this release come out the door, and for what it means for the future of PHP.

The release has implications for HHVM as well. PHP 7 contains many new and exciting language features, such as anonymous classes and generator delegation. The HHVM project is committed to continuing to support the evolving PHP language, and as such we are proud to announce that the current nightly releases have support for all major PHP 7 features, and the upcoming 3.11.0 stable release will be the first release of HHVM with support for the major PHP 7 features.

A small number of the changes in PHP 7 are not backwards compatible, and we don’t want to leave behind our many users relying on our solid PHP 5 support and who may, for good reason, not want to upgrade immediately. Therefore, HHVM will continue to simultaneously support PHP 5 and PHP 7 for the foreseeable future. This is the way the simultaneous support works:

  • Features that pose no compatibility issues are just always available. For example, anonymous classes require no special configuration to use — they work by default.
  • Similarly, features which were removed in PHP 7, such as alternative open tags, will just continue to work in HHVM for the foreseeable future.
  • The INI option hhvm.php7.all = 1 enables the PHP 7 behavior for backwards-incompatible changes. (PHP 5 behavior will remain the default still for a while.) Each major compatibility break has its own option, if you want to turn them on one by one. See our INI documentation for details.

The language features of PHP 7.0 are of course just the start. As PHP 7.1 and onward develop, we will keep adding the new PHP features into HHVM, to continue having parity with the language.

Please give the nightly builds a try and let us know about any issues so we can fix them before the 3.11.0 release! All the major features are working and pass PHP 7’s own test suites, but there are certain to be edge cases and minor features we’re missing. We want to make sure that 3.11.0 is as solid a release as possible for our new PHP 7 support!

Posted in Announcement | Leave a reply



Improving Arrays in Hack

Posted on October 30, 2015 by

“…use collections whenever and wherever possible. However, 100% usage of collections is obviously not realistic given how much arrays are used in various codebases …there just may be legitimate use cases for an array.”

This is the guidance given to Hack developers on using arrays. While we still advise using collections whenever possible, in hindsight there are more “legitimate use cases” for arrays than we first believed. With that in mind, the team is planning on improving arrays so they are better supported in Hack. We’ve opened a number of issues on GitHub (#6451#6452#6453#6454#6455) with our initial plans. Since this would be a significant change in the language we are looking for feedback from the community. Before diving into what we are thinking of building, let’s take some time to examine the problem we want to solve.

The Problem with Arrays in Hack

Arrays are the ubiquitous data structure in PHP, used to represent everything from lists, associated lists, sets, tuples, or even a bag of data. This flexibility itself makes it challenging for Hack to understand how an array will be used. Consider the example below. Should $arr be treated as a map-like array that contains both int s and strings or a shape-like array whose id field is an int, but name is a string? Currently whenever the type checker encounters ambiguity like this it errs towards not reporting errors and trusts the programmer knows what he/she is doing.

 $arr = [
   'id' => 4,
   'name' => 'mark',
 ];
 $name = $arr['name'];
 $arr[$name] = ucfirst($name);

If this was the only problem with PHP arrays, then the solution would be “simple”; make the type checker smarter (something we are working on). However there are a number of other semantic details around arrays that are nearly impossible to analyze statically.

Indexing non-existent keys

In many languages, attempting to index a non-existent key throws an exception and execution is halted. Instead of halting execution, PHP will raise a E_NOTICE and return null. The type checker is faced with a difficult decision: should it treat any index operation as producing a potentially nullable value or ignore this possibility? Hack today chooses to ignore the fact that indexing an array may produce a null value, allowing potential bugs to go undetected.

Key coercion

Arrays only support strings or ints as keys and coerces invalid types to one of these types. For instance, if a float is used as a key, it will be truncated to an int. Hack chooses not to support these key coercions and the type checker will report an error if you try to use a float as a key. But there is one coercion that the type checker cannot detect, which is the conversion of int-like strings to be int. This can lead to unexpected failures at runtime, as demonstrated in the code example below.

 function expects_string(string $s): void {}

 function will_fail_at_runtime(): void {
   // Hack believes $arr is type array<string, int>
   $arr = array('123' => 123);
  
   foreach ($arr as $k => $_) {
     // The string '123' will be changed to int(123), triggering a
     // TypeHintException at runtime
     expects_string($k);
   }
 }

Arrays containing references

A more subtle semantic detail is how arrays behave when storing references. In particular, in some instances, a reference will be flattened to a value if the array contains the only reference to that value. This behavior is demonstrated below.

 $x = 0;
 $arr1 = array(&$x);

 $arr2 = $arr1;
 $arr2[0]++;
 var_dump($arr1, $arr2);
 /* Changes in $arr2 reflected in $arr1
 array(1) {
   [0]=>
   &int(1)
 }
 array(1) {
   [0]=>
   &int(1)
 }
 */

 unset($x);
 $arr2 = $arr1;
 $arr2[0]++;
 var_dump($arr1, $arr2);
 /* Changes in $arr2 no longer reflected in $arr1
 array(1) {
   [0]=>
   int(1)
 }
 array(1) {
   [0]=>
   int(2)
 }
 */

Here we begin by storing a reference to $x inside an array. We then assign the array to another variable $arr2 and increment the value stored in the array. Since we stored a reference, the change will be reflected in both $arr1 and $arr2. But if we run the same code after unsetting $x we get a different result. When we reassign $arr1 to $arr2, the ref count of $arr1[0] drops to one and is flattened to a value. Now changes to $arr2 are no longer reflected in $arr1. This is again behavior the type checker turns a blind eye to when dealing with arrays.

Legitimate Use Cases

These quirks of arrays motivated the creation of collections in Hack. At the time we believed usage of arrays would dwindle over time and collections would be used everywhere. In retrospect we realized there are legitimate reasons why a programmer may want to choose an array over a collection, primarily due to the fact arrays are values while collections are mutable objects. Arrays being values have certain advantages over collections that cannot be closed without significantly changing how collections work.

Values are eligible for more optimizations

To understand why this is the case, you have to be familiar with the execution model of HHVM, which was inherited from PHP. PHP has a shared-nothing model where data is not shared between requests. After each request all objects that were allocated are dumped and the next request starts with a clean slate. Consider the following code:

 class MyConfig {
   private static Map<int, string> $collection = ImmMap {
     ... // statically map 100 ints to some string
   };
  
   private static array<int, string> $array = array(
     ... // same mapping as above
   );
 }

In most languages, the static initializer of a variable is run once at the start of the program. However, since HHVM is required to clear the world for each request, static initializers are run for each request. HHVM constructs the ImmMap of MyConfig::$collection for every request, which can be a non-trivial amount of CPU. Since arrays are value-types, HHVM does not have to reconstruct MyConfig::$array for each request. Instead HHVM can initialize the array once at start up and reuse the same array for all requests (assuming the array only contains value types).

For similar reasons, storing and retrieving a collection from APC is more expensive than doing the same thing for an array. Given these performance ramifications, it is reasonable that someone would choose to use an array over a collection in these cases.

Values are less prone to bugs

Collections are mutable by default. There are cases where this behavior is desirable, but in large part, mutability makes code more difficult to reason about. Consider the following code:

 abstract final class FriendFetcher {
  
   /* Memoize the list of friends we fetch for the user */
   <<__Memoize>>
   public static async function forUser(int $id): Awaitable<Set<int>> {
     $ids = await fetch_friend_ids_from_backend($id);
     return new Set($ids);
   }
  
   public static async function mutualFriends(
     int $id1,
     int $id2,
   ): Awaitable<Set<int>> {
     // fetch both set of friends
     list($friends1, $friends2) = await genva(
       self::forUser($id1),
       self::forUser($id2),
     );
    
     // Union the two friend sets and retain only those
     // that appear in both sets
     return $friends1
       ->addAll($friends2)
       ->retain(
         $friend ==>
           $friends1->contains($friend) &&
           $friends2->contains($friend)
       );
   }
 }

This code computes the set of mutual friends between users. This uses a simple algorithm of fetching both users’ set of friends and intersecting them. However there is a subtle bug in this code. FriendFetcher::forUser is annotated with <<__Memoize>> . This caches the result of the method for a given user ID, saving the cost of fetching from the backend multiple times for the same result. This returns the same Set object for a given user. FriendFetcher::mutualFriends uses Set::addAll to union two sets of friends together, modifying the Set instead of returning a new Set object. This is the same Set object that is cached using the <<__Memoize>> annotation. The next time we call FriendFetcher::forUser for $id1, the Set returned will also contain the friends of $id2.

There are mechanisms to work around this such as using ConstSet to hide the Set::addAll method, using ImmSet to make it an error to add anything to the set, or making the type checker aware of this potential bug and warning against it. While they would address this particular problem, this illustrates the dangers of defaulting to mutable collections. We considered changing Set to be ImmSet and introducing a MutSet type, but this comes with its own set of complications. When a programmer wants to modify a set, they would need to copy the immutable set to a mutable set, and then remember to make the set immutable again. The copy-on write behavior of arrays is desirable in these cases since it allows writing code as you are working with mutable data, but keeps it local to the function. When an array is passed to or returned from a function, the programmer has the guarantee that it will not be modified unless explicitly passed by reference.

Values are easier to reason about

Another consequence of the mutable reference semantics of Collections is that their generics are invariant. What does this mean? Consider the following code:

 function expects_vector_of_mixed(Vector<mixed> $vec): void {
  ...
 }

 function expects_vector_of_string(Vector<string> $vec): void {
   // Hack Error!!!
   expects_vector_of_mixed($vec);
 }

Passing a Vector<string> to a function expecting a Vector<mixed> is not allowed by the type checker. This is because expects_vector_of_mixed could modify $vec by adding an int and now our Vector<string> no longer contains only strings. While we should prevent programmers from making this mistake, it is unintuitive to many programmers.

Because arrays are values we can avoid this issue. When an array is passed to another function we know that function cannot modify the array we gave it (assuming you do not use references). The variance of array type parameters are covariant, meaning it is safe to pass an array<string> to a function that expects an array<mixed>.

Planned Improvements

Given the trade offs outlined with arrays and collections, we are exploring a third approach that combines the virtues of arrays and collections. It will be a value type like arrays, but have the clean semantics of collections. Details around these new kinds of arrays are being discussed and fleshed out in various issues on GitHub. Here is an outline of what we are thinking of building.

  • #6451 – Introduce vec<T> array type to mirror the Vector<T> collection class
  • #6452 – Introduce dict<Tk, Tv> array type to mirror the Map<Tk, Tv> collection class
  • #6453 – Introduce keyset<T> array type to mirror the Set<T> collection class
  • #6454 – Give stricter semantics for indexing these array types, while adding support for safely accessing potentially non-existent keys
  • #6455 – New sugar syntax for de-nesting function calls

If you have thoughts around these ideas, we invite you to join the discussion on GitHub.

Looking Further Out

Having three different container types adds additional overhead to Hack, but we feel if implemented properly Hack arrays and collections will eliminate almost all legitimate use cases of PHP arrays. We are committed to continuing our support for PHP arrays and collections in the language, but may evolve their role in the future. Here are some really high level ideas we are thinking about.

PHP arrays for PHP APIs

Hack arrays will replace PHP arrays as the suggested value-type container in Hack, but PHP arrays will still be useful when integrating with PHP code. For instance when creating an .hhi file for a PHP library that expects PHP arrays.

In typical Hack code, PHP arrays should be avoided because of the surprising semantics they have at runtime. To discourage their usage we could change the type checker to be more conservative and report more potential errors with arrays. Some ideas include

  • Make indexing a PHP array produce a nullable type
  • Make storing a string key change the key type of a PHP array to arraykey
  • Make PHP and Hack arrays incompatible with one another

Move collections from the runtime to a library

Hack arrays will free up the Collections API from being an exact replacement for arrays. This would allow us to take greater advantage of the fact that these are objects instead of values. One idea is to move the implementation of Collections from a HHVM built-in to a library written in Hack. This would lower the barrier for adding new features/capabilities to Collections. For example supporting objects as keys for Map or Set, or creating specialized collections like IntMap that are optimized for storing integer keys.

Posted in Hack Language | Leave a reply



LLVM Code Generation in HHVM

Posted on October 23, 2015 by

One of the most common questions we get about HHVM is why we don’t use LLVM for code generation. The primary reason has always been that while LLVM is great at optimizing C, C++, Objective-C, and other similar statically-typed languages, PHP is dynamically typed. The kinds of optimizations that provide huge performance benefits for static languages tend to be less useful in dynamic languages, or at least overshadowed by all the dynamic dispatching that’s done based on runtime types. We knew that there was probably something to be gained from using LLVM as a backend, but there were many larger opportunities go after first.

Warmup speed has also been a major factor in HHVM’s design from the very beginning, since we were replacing a compiled C++ binary with a JIT compiler. We needed to keep HHVM’s JIT pipeline as simple and as fast as possible to ensure that it could start up and serve web requests quickly enough to keep up with Facebook’s aggressive deployment schedule.

Once HHVM was in production and we had data indicating that we could afford to make the compilation process take longer in order to produce better code, we added an intermediate representation called HHIR between HHBC (our bytecode) and machine code. This new stage in the pipeline enabled all kinds of powerful, PHP-specific optimizations that we’ve been adding to for over two years now, nearly quadrupling the performance of HHVM in the process. A year after introducing HHIR we added vasm, a low-level IR positioned between HHIR and the final x86 machine code.

hhvm pipeline before

But there was still the question of what, if anything, an LLVM backend could buy us. About a year and a half ago, we started exploring this option in earnest. After considering a few options, we decided that we’d lower vasm to LLVM IR and then let LLVM handle final code generation, changing the pipeline to look like this:

hhvm pipeline

We briefly considered lowering directly from HHIR to LLVM IR, since HHIR has more information about the source program than vasm does. It quickly became clear that the information lost while lowering HHIR to vasm was mostly PHP-specific metadata that LLVM doesn’t understand, so we decided that lowering from vasm would be fine. It would also be significantly less work, since HHIR is a much more complex IR than vasm.

After about a year of hard work, we got our LLVM backend up and running. The vast majority of the code is in the file vasm-llvm.cpp, but in terms of time and effort required to get the backend functional, that file represents a minority of the work put in. We had to make some fairly significant changes to HHVM, as well as a number of changes and additions to LLVM itself, for both performance and correctness.

Changes in HHVM

The modifications made to HHVM generally fit into two categories: reworking how HHVM’s JIT compiles PHP function calls and cleaning up some x86-specific concepts in vasm.

PHP function calls

The vasm instruction used to implement PHP function calls, bindcall, had some fairly odd semantics, for a variety of historical reasons. The primary consequence was that it wasn’t possible to spill live values to memory across a bindcall. HHVM uses different regions of memory for the PHP and C++ stacks, and keeps the PHP stack in a format that’s compatible with the bytecode interpreter. This means that when the JIT needs to spill temporary values, it uses the C++ stack. Rather than allowing each PHP frame to allocate whatever spill space it needed on the C++ stack, we used to allocate a fixed amount of space that was shared by all active call frames. In order to model bindcall as a normal LLVM invoke instruction, we removed this limitation so LLVM could allocate private spill space for the current PHP frame. This also helped our existing backend, since it could now keep values live across bindcall, rather than being forced to write all live values to their corresponding locations on the PHP stack.

Generalizing vasm

As mentioned above, vasm began life as a thin layer between HHIR and x86 machine code. As a result, vasm instructions were nearly 1:1 with the resulting x86 machine code. This included some x86 instructions with implicit inputs and/or outputs, like the idiv instruction. In order to cleanly handle instructions like this, we simply created new vasm instructions with explicit inputs and outputs. When translating vasm to LLVM IR, we handled these new instructions like any other instruction, but when translating vasm directly to x86 machine code, we first lowered the new instructions to their x86-specific counterpart. This allowed us to take full advantage of these x86 quirks in the default backend while also cleanly targeting LLVM IR. This work also helped pave the way for other, non-x86 backends in HHVM.

LLVM modifications

As of right now, our LLVM backend requires a modified build of LLVM. We added a number of features to support code introspection, code patching, and our custom ABI. We also modified some existing optimizations that were performing poorly on certain code patterns we frequently generate.

Location records

Location records, or locrecs for short, are a way for users of LLVM to get information about the specific machine instructions that result from LLVM instructions. We use this in HHVM for exception handling and certain types of state syncing, both of which have metadata tables keyed on the address of calls to C++ helpers. We briefly considered using stack maps to accomplish this, but according to the documentation, stack maps restrict certain optimizations and are a much larger hammer than we need. All we needed was the location of the code and none of the other features offered by stack maps, so we built the much more lightweight locrecs.

Smashable calls

Modifying code that may be running in another thread is a core part of HHVM’s design. We call this process “smashing”, to distinguish it from patching code that isn’t visible to other threads. The rules for doing this safely vary by platform, but on x86-64 the primary condition is that the entire instruction being smashed must fit on a single 64-byte cache line. This means that when we’re emitting a jmp or call that might be smashed, we ensure it does not cross a 64-byte boundary by inserting one or more nops before it.

LLVM didn’t support this particular type of alignment, so we added a smashable attribute for the LLVM call and invoke instructions. When this attribute is set, LLVM’s code generation backend will ensure that the call (or jmp, in the case of tail calls) is appropriately aligned for smashing. We then use a location record to get the final address of the smashable instruction, which is stored in our metadata tables.

As with location records, we considered using stack maps and patchpoints to implement smashable calls. Unfortunately, patchpoints don’t support our specific alignment constraints or tail calls, both of which are necessary for correctness in HHVM.

ABI issues

Our compilation units use a custom ABI that allows us to pass and return important values in specific registers. Luckily, it was easy to add a new calling convention to LLVM to support this, as well as another calling convention to support the way we call C++ helper functions.

A number of smaller tweaks to handle our unique stack and code alignment rules were necessary as well. Normal C++ functions on x86-64 expect the stack pointer to be 16-byte aligned before the call instruction, so they’re 8 bytes off of that alignment on entry to the function. The code emitted by HHVM expects the opposite: the stack pointer must be 16-byte aligned on entry. This allows us to call C++ helper functions without having to adjust the stack pointer first. We added the ability to specify a stack alignment skew value, representing how far off of the standard alignment the stack pointer will be coming into a function. As for code alignment, we added the ability to disable alignment of functions that don’t have the OptimizeForSize attribute set. We’ve done a number of experiments and found that, for our workload, the space lost by aligning functions is more of a performance regression than we could gain from aligning jump targets.

Performance tweaks

Finally, we changed a few of LLVM’s optimizations to better handle the code we throw at it. These were all opportunities we discovered by comparing the output of our custom backend with the LLVM backend. Some changes were enabling more uses of memory operands rather than explicit loads/stores (to keep code size down), and support for more compact conditional tail calls (performing a tail call directly with a jcc instruction, rather than a jcc to a call instruction).

Specific details about all of these modifications are available in our Github repository. We also maintain a patch that you can use to recreate our custom build of LLVM, until we finish upstreaming these changes.

Performance results

As measured by Perflab, our internal performance evaluation tool that replicates the same workload we see in production, the LLVM backend generates code that equals the performance of our custom backend. We spent some time comparing how the two backends did on specific pieces of code and found a number of situations where LLVM generated better code and a some where it generated worse code. Overall, the net effect was neutral.

Once reaching performance parity, we were able to prove that the LLVM backend was production-ready by deploying it to small groups of test machines for a day at a time. The extra compilation time introduced by inserting LLVM into our pipeline is quite large, but it can be mitigated by only using LLVM for the highest gear of our JIT, which is only used for the functions executed during an initial profiling phase.

We also benchmarked the LLVM backend on some popular open source PHP frameworks using our oss-performance measurement tool. The chart below was obtained by averaging 10 runs each of MediaWiki, Drupal 7, and WordPress, using the runtime option -vEval.JitLLVM=1 to enable the LLVM backend. The y axis is requests completed per second at maximum load, so higher is better. The mean RPS with LLVM enabled is down about 0.5% for all three frameworks, though in all cases the difference between the two is smaller than their standard deviations. The raw data can be found here.

llvm-framework-perf

Ultimately, we decided to not deploy the LLVM backend to production for now. Making such a large change across the whole web fleet always comes with risk, and without a performance improvement to justify the risk it’s not worth it. We aren’t giving up, though—HHVM is still relatively young compared to other similar JITs, with plenty of room to grow in ways that may benefit the LLVM backend. We only recently implemented support for compiling loops within a single compilation unit, and as our HHIR optimizations become more sophisticated, the code we give to LLVM may start to look more and more like C code, which is what LLVM excels at optimizing. And JavaScriptCore has implemented their own LLVM backend with great results, which is very encouraging for the potential of using LLVM in a mature dynamic language JIT.

We’re not actively working on the LLVM backend right now, but we include it in all of our automated performance and correctness tests to ensure it doesn’t unexpectedly regress. And as always, anyone reading this who would like to take a look at what we’re doing and/or contribute can check out our Github repository!

Posted in Announcement | Leave a reply



HHVM 3.10.0

Posted on October 21, 2015 by

HHVM 3.10.0 is out! You can get it from the usual places. Major changes in this version include:

Core changes

  • Performance improvements for certain usages of strtr()
  • Fixes to rand() when requesting very large integers
  • XDebug compatibility and resource usage fixes
  • Improved reflection support
  • The usual myriad of bug fixes and general performance improvements

Hack language

  • Dramatically improved type inference for arrays
  • Improved typechecking for some standard library functions
  • Other minor fixes, e.g., improved type enforcement in some edge cases and fixes to using shapes inside namespaced code

As always, let us know how it goes! File an issue on Github if you run into any issues. (NB: we already have one known issue in 3.10.0; the strtr performance improvement causes a crash in some cases. Version 3.10.1 will be on its way out by the time you read this, which fixes this issue.)

Posted in Uncategorized | Leave a reply



Experimental Mac OS X Support

Posted on August 31, 2015 by

We’re happy to announce official Mac OS X support in HHVM, with version 3.9! If you use Homebrew and want to get started now, the steps to use our official tap are really simple:

brew tap hhvm/hhvm
brew install hhvm

If you don’t use Homebrew or want to build HHVM yourself, take a look at our wiki page for directions. Building HHVM is a very resource-intensive and slow process — be prepared to devote a few gigabytes of RAM and a couple of hours to letting it compile. (The Homebrew formula doesn’t have a bottle yet, so the same applies to whether you use Homebrew or not.)

Keep in mind that OS X support is still experimental at this point. Most things seem to work — it passes most of our test suite, and is able to run both a simple Drupal 7 and WordPress installation when I briefly tested it. However, it hasn’t been extensively battle-tested! For example, there are a couple of failures in our own test suite when running on OS X. It’s good enough to use for local development work, but I don’t recommend a production deployment on anything but Linux quite yet. Information about supported Linux distributions is available on our wiki.

Let us know how it goes! It’s been a long road getting here, and inevitably some bumps in the road still ahead (for example, I’m pretty sure as of this writing master doesn’t build on OS X and am working on fixing it). But OS X support is very important to us and to our community; we are committed to supporting it long-term. In particular, this means that we will make sure HHVM continues to improve on OS X in the future, so that later releases can be less and less experimental. Note that for the 3.9 LTS specifically, although we will continue to backport security patches into the branch, after the release of 3.10, users of the Homebrew formula will be upgraded to 3.10 and we won’t provide the usual LTS-specific packaging for OS X. We don’t anticipate this being a problem since no one should be using the 3.9 OS X release in production anyways, and you can always manually build out of the 3.9 branch if you really need to.

Finally, I’d be remiss without calling out a few key community contributors, without whose work this port would never have happened. A special thank you to:

  • Daniel Sloof, who did most of the initial port to OS X.
  • Jingyi Wei, who contributed many useful OS X fixes, including finally getting the JIT itself working.
  • All of the contributors to the various unofficial Homebrew HHVM packages, most especially this one; although for various reasons we ended up building our own formula, the unofficial ones were enormously helpful to peek at.

As always, if you run into issues, please file them on GitHub!

Posted in Announcement | Leave a reply



HHVM 3.9.0

Posted on August 19, 2015 by

We’re happy to announce that HHVM 3.9.0 is now available. You can try one of our prebuilt packages for Ubuntu and Debian, or build from source.

This release contains a variety of enhancements to the type-checker and runtime, and as always many changes designed to improve performance. Key type-checker changes include a new set of library functions to operate on Shapes, a new type to represent Foo::class strings, and improved reflection for type-constants. Runtime improvements include a new facility for pooling curl handles between requests, and a runtime setting to reclaim memory in the translation cache (TC) from dead translations.

The new TC garbage collection (enabled via hhvm.enable_reusable_tc = true) should prove particularly helpful to anyone experiencing frequent crashes related to out of memory from TC exhaustion. Modifications to source files actively being served are a common source of TC leaks. This is a particularly prevalent pattern in development environments, where this setting will likely be helpful. Support for this feature is still experimental, if you experience crashes with it turned on please open issues on our GitHub repository.

The 3.9 release comprises the internal “Nash”, “Osborne”, and “Park” releases. This release is our third LTS and with it support for 3.3 (our first LTS) is ending. Support for 3.6 will continue for six months (when 3.12 is scheduled to be released), and 3.9 will be supported for 12 months.

Posted in Announcement | Leave a reply



HHVM 3.8.0

Posted on July 13, 2015 by

We’re happy to announce that HHVM 3.8.0 is finally available. You can, as always, try one of our prebuilt packages for Ubuntu and Debian, or build from source. (As of this writing, the packages are still building, so if a 3.8.0 package isn’t available for your supported distro, hold tight!)

This release is the first one to contain performance improvements from our HHVM lockdown this half. Along with the changes detailed in that post targeted specifically at open-source frameworks like WordPress, MediaWiki, and Drupal, this release also contains the results of Facebook’s internal performance team’s lockdown. While the internal team focuses on performance of Facebook’s code specifically, many of the changes they made will improve the performance of all PHP and Hack code, including the aforementioned open-source frameworks.

Here’s a chart summarizing the performance results of 3.7 vs. 3.8:

Image 2015-07-13 at 3.19.02 PM

Take a look at our lockdown post for details on methodology. It’s specifically worth noting that the improvements here aren’t as large as in the lockdown post since that post also includes our fixes to the frameworks themselves, whereas the above chart is a like-for-like comparison of 3.7 and 3.8.

The 3.8 release comprises the internal “Irwin”, “Jobs”, “King”, “Lukather”, and “McQueen” releases. Normally, we do an external release every four internal releases, but decided to delay 3.8 due to some stability issues. Apologies for being a couple weeks late! We still plan to release 3.9 on its original schedule.

Give it a try and let us know how it goes!

Update

This release also includes:

  • the proxygen http server (hhvm -m server -v Server.Type=proxygen)
  • stream_socket_enable_crypto() for client sockets
  • Automatic typechecking of Hack code. Previously, the hh_client static analysis tool would have to be separately run. Now, results from hh_client will be included in the runtime errors too. (Running it manually is still the recommended way for fast error feedback though!) The hhvm.hack.lang.auto_typecheck INI option controls this behavior.

Posted in Announcement | Leave a reply



CVE-2015-4663

Posted on July 7, 2015 by

We just released HHVM versions 3.3.7, 3.6.5, and 3.7.3 which fix CVE-2015-4663, a serious issue affecting SSL/TLS certificate validation. Note that the issue affects file_get_contents, the stream API, etc, but does not affect anything using the cURL API directly.

Release packages are available for all supported OSes; debug packages are building and should be available shortly. Please make sure you are running one of those supported versions.

As a reminder, most HHVM releases are supported for 8 weeks, before moving on to the next stable release. For example, version 3.7.x is the current stable release, which will shortly be supplanted by 3.8.x and no longer receive updates. LTS releases are supported for a year; 3.3.x and 3.6.x are the current LTS branches. This means that if you are running 3.4.x or 3.5.x, then you are running an unsupported version of HHVM which is vulnerable to this issue and will not be receiving updates!

Thanks to Anthony Ferrara for reporting this issue.

Posted in FYI | Leave a reply