For several months now, Hack has had a feature available called async which enables writing code that cooperatively multitasks. This is somewhat similar to threading, in that multiple code paths are executed in parallel, however it avoids the lock contention issues common to multithreaded code by only actually executing one section at any given moment.

“What’s the use of that?”, I hear you ask.  You’re still bound to one CPU, so it should take the same amount of time to execute your code, right?  Well, that’s technically true, but script code execution isn’t the only thing causing latency in your application.  The biggest piece of it probably comes from waiting for backend databases to respond to queries.

When you call $zuck = file_get_contents('https://graph.facebook.com/4');, the runtime essentially pauses execution while it waits for network packets to cross the internet, query a database for an answer, and come back.  About 350ms from my house, more than 200 of which is just to negotiate TLS. 350ms is practically an eternity on the scale of a web request.

That’s where cooperative multitasking comes in.  While your https call is busy sitting on its hands waiting for a response, there’s no reason you shouldn’t be able to do other things, maybe even fire off more requests.  The same goes for database queries, which can take just as long, or even filesystem access which is faster than network, but can still introduce lag times of several milliseconds, and those all add up!

So how do we write code that can cooperate with other code?  Hack’s solution for that, as I said earlier, is async, and its sibling structures await, WaitHandle, and Awaitable.  Let’s take a look at a simplified example; a single task which can run in all the glorious parallelism of lone execution.

1
2
3
4
5
6
7
8
9
10
11
12
13
<?hh

async function hello(): Awaitable<string> {
  return "Hello World";
}

$a = hello();
var_dump($a);
var_dump($a->getWaitHandle()->join());

// Object(HH\StaticWaitHandle)#1 (0) {
// }
// string(11) "Hello World"

Immediately of notice is that returning from a function (or method) marked as “async” doesn’t actually return that value, it returns a instance of a WaitHandle object (in this case, an HH\StaticWaitHandle, but we’ll see other types).  We don’t see the actual value until, as the caller, we’ve told the underlying async layer “wait until this task has done all its work, then give me the value”.  This is because, like a generator, the async function is designed to be pauseable and resume execution at a later time.

Most places in an async code base, that will be done by invoking $result = await $waitHandle; because that leaves the CPU free to run through other async functions, however at the top-most point, where there's nothing else to share processing time with, we use a hard-block in the form of $handle->join();`  Which effectively says “Deal with all your pending handles, then give me the end result.”  Let’s try adding a line to this function:

1
2
3
4
5
6
7
8
9
<?hh

async function hello(): Awaitable<string> {
  await RescheduleWaitHandle::create(
    RescheduleWaitHandle::QUEUE_DEFAULT,
    0,
  );
  return "Hello World";
}

This function produces the same results, but this time we’ve used RescheduleWaitHandle to say “If there’s any other async function trying to run right now, let it have some fun, come back to me when you’ve got nothing better to do.” Again, because we’re inside of an async function, we use await, rather than join(), because we’re probably sharing the stage with some other async function somewhere else, so we don’t want the hard block which comes from join().  Similarly, we could schedule a sleep to wait an arbitrary amount of time:

1
2
3
4
5
6
<?hh

async function hello(): Awaitable<string> {
  await SleepWaitHandle::create(1000); /* 1000us, aka 1ms */
  return "Hello World";
}

But those uses are all pretty silly, especially considering that we only have one task to perform which will ultimately have to block on the ->join() call anyway. How can we do more than one thing? Enter AwaitAllWaitHandle which organizes execution of an arbitrary number of WaitHandles passed to it as an array, Map, or Vector.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?hh

async function hello(): Awaitable<string> {
  return "Hello World";
}
async function goodbye(): Awaitable<string> {
  return "Goodbye, everybody!";
}
async function run(
  array<WaitHandle<string>> $handles,
): Awaitable<array<string>> {
  await AwaitAllWaitHandle::fromArray($handles);
  return array_map($handle ==> $handle->result(), $handles);
}
$results = run(array(hello(), goodbye()))->getWaitHandle()->join();
print_r($results);
// Array
// (
//  [0] => Hello World
//  [1] => Goodbye, everybody!
// )

AwaitAllWaitHande::fromArray() (and its Collection cousins AwaitAllWaitHandle::fromVector() and AwaitAllWaitHandle::fromMap()) don’t actually yield values, instead they queue execution of their children such that when the AwaitAllWaitHandle yields, all of its handles will be done (as we could test with ->isFinished()).  Subsequent calls to ->result() will return each handle’s value, just as if we had called ->join().

So that’s great and all, but none of this is blocking code, and we still haven’t seen an advantage over just executing them in series. In fact, they are effectively executed in series because it’s just cooperative multitasking, not multi-threading. So let’s add in the simplest kind of blocking call, an http fetch using cURL. The work-horse of this feature is the following builtin function:

1
2
3
4
async function curl_multi_await(
  resource $curlMultiHandle,
  float $timeout = 1.0,
): Awaitable<int>;

This allows us to “await” on the only part of a curl request which actually needs to block. It’s a bit unhelpful all by itself, so let’s make a simple wrapper for it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<?hh
async function curl_exec_await(string $url): Awaitable<string> {
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

  $mh = curl_multi_init();
  curl_multi_add_handle($mh, $ch);
  do {
    $active = 1;
    do {
      $status = curl_multi_exec($mh, $active);
    } while ($status == CURLM_CALL_MULTI_PERFORM);
    $select = await curl_multi_await($mh);
    if ($select == -1) break;
  } while ($status === CURLM_OK);
  $content = (string)curl_multi_getcontent($ch);
  curl_multi_remove_handle($mh, $ch);
  curl_multi_close($mh);
  return $content;
}

During lulls in network activity, the await curl_multi_async($mh); allows us to pass control to another WaitHandle (perhaps another curl request, perhaps something else), meanwhile the rest of the execution performs non-blocking actions, like calling curl_multi_exec(). Armed with this helper, and our lessons from earlier, we can combine this all into multiple parallelized fetches:

1
2
3
4
5
6
7
8
<?hh
include "curl-exec-await.php";

$results = run(array(
  curl_exec_await("https://graph.facebook.com/4"),
  curl_exec_await("http://www.example.om"),
  curl_exec_await("http://www.hhvm.com"),
))->getWaitHandle()->join();

If you’re still paying attention, then you’re probably wondering why we couldn’t just pass multiple easy handles into one multi handle and use the curl-multi interface that PHP’s known and loved for years. And you’d be right to wonder. If the only operation we were performing were cURL calls, then we could certainly get by with the existing curl-multi interface. Where async really starts to shine is when you combine other awaitable resources into the picture. Imagine something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?hh

async function fibonacci_gen(int $num): int {
  $n1 = 0; $n2 = 1; $n3 = 1;
  for ($i = 2; $i < $num; ++$i) {
    $n3 = $n1 + $n2;
    $n1 = $n2;
    $n2 = $n3;
    if (!($i % 100)) {
      // Give other tasks a chance to do something
      // every 100th iteration
      await RescheduleWaitHandle::create(
        RescheduleWaitHandle::QUEUE_DEFAULT,
        0,
      );
    }
  }
  return $n3;
}
$results = run(array(
  "web"  => curl_exec_await("http://example.com"),
  "sql"  => mysql_async_query($conn, "SELECT * FROM user"),
  "disk" => file_async_contents("/var/log/access.log"),
  "mc"   => memcached_async_get("somekey"),
  "fib"  => fibonacci_gen(2000),
))->getWaitHandle()->join();

Now we’re combining multiple kinds of blocking operations (and one expensive bit of pure script code) in a single coordination framework without having to know, from the callsite, how to block on the underlying file descriptors or when.

For now, there’s no streams, mysql, or memcached APIs for async blocking, but streams should arrive soon (it’s currently up for review), and memcached/webscalesql support are hot on its heels. Watch this space for those.

As you can see from the paragraph above, async is still in active development, so you’ll need HHVM 3.5.0 (which may not exist at the time of this reading) or a nightly to get the curl support.  Check out installing nightlies on Ubuntu or elsewhere in the HHVM wiki for more info on how to do that.

Comments


  • Carl G: This is cool. But, going a little off-topic, I'm really starting to wonder why you've even released Hack to the public. I'm a professional, and using a new language puts a dent in my productivity for a while. Going backwards from an advanced IDE like phpStorm to a text editor and package of CLI tools is another huge productivity hit. Those combined mean I'll never be able to adopt Hack unless you get us some IDE support. I know most other devs are in the same boat. You're basically talking to yourselves until you release FBIDE or a plugin for phpStorm, Netbeans, or some other PHP-loving IDE. These new features are nice, but maybe you could hit the breaks for a (few?) weeks and just get the IDE out.
  • Vincent DM: @Carl G: I can only agree! I don't want to whine, and I sincerely appreciate all the time and effort Facebook and its people are investing into hack, but it's almost tragic how the language is closely watched by many but used by relatively few. Proper IDE support is a vital step to kickstart the community, I think. Until then, I'm sticking with PHP and some nodejs for async stuff...
  • Sara Golemon: AIUI, several editor projects and their communities are working on hack plugins. IMO these should be in the hands of developers who actually use these tools as they're the best ones for that task. As for FBIDE... yeah... I wish the team responsible for that would get on releasing it too...
  • Paul Moss: So when it the memcached/webscale support due, for 3.5 or 3.6? Would this also include memcached which comes with mysql 5.6? Finally, does webscale mean mysql 5.6 client as well? Essentially, I would love to parallelise my requests for memcached and mysql. Can't wait really.
  • Jeroen De Dauw: Another +1 to this. I've been wanting to do some katas in Hack for months now, yet can't bring myself to start without proper IDE support. I want to take a step forewards without taking one or two backwards first.
  • ip512: Guzzle has already asynchonous response management (http://guzzle.readthedocs.org/en/latest/clients.html#asynchronous-response-handling) In which case this asynchronous approach can be preferable over using Guzzle ?
  • Brice: Yes, it would be great to have validation and syntax highlighting working in eclipse with vi + emacs as well. IMO -- Eclipse support is the priority as it's the official base of PDT. The commerical projects like intellij, storm, &c could base their support from there. Totally excited about hack & hvvm in general! Keep it up.
  • Sara Golemon: memcached looks like it'll be relatively soon, though I can't guarantee it'll be in by 3.5. We'll be using mcrouter as the underlying client library (as libmemcached doesn't have a good async interface), but it should work against anything that supports the memcached protocol... "webscale" means the async support will require libwebscalesqlclient on the *client* side. The actual server can be WebscaleSQL, MySQL, MariaDB, or anything which speaks the mysql protocol. I can't wait to round out the DB async support either! Very exciting. :)
  • Sara Golemon: I don't know Guzzle well, but glancing through that link, it looks like it's only for http(s) requests. Hack-Async covers that (via async-curl), but also includes many other types of blocking requests as well. So personally, I would prefer using Hack-Async, but in the end you want to go for whatever makes sense for your application and your developer(s).
  • Paul Moss: Sarah, thanks for the reply and the clarification! You and the guys who work on this totally rock! Yes, its very exciting, I can't wait! :)
  • Fred Emmott: It would be possible to build this async approach into guzzle. As well as async/await giving a single approach to any kind of blocking request (which will become more useful as we release more), we generally feel that the async/await flow is easier to work with than callbacks, which tend to lead to spaghetti code.
  • FractalizeR: I wonder, how this will be transpiled into PHP? (https://code.facebook.com/posts/398235553660954/announcing-the-hack-transpiler/)
  • Jesse Cascio: NOOO!!! you took my blog post haha, I just did this on Dec 1 and it took me so long to figure out how to do it. http://jessesnet.com/development-notes/2014/hacklang-async-processing/ Thanks for sharing, I'm going to go through this in more detail and see what I could have done better.
  • Stefan Parker: I've been converting my project to use async in all the memcache-fetching (as in, combining memcache gets) code paths, and I've noticed anecdotally that the web request actually takes longer now. Because this is all locally my data-fetching is essentially 0ms, but is it true that execution time with an async structure is actually (noticeably) slower than without?
  • Vincent: What was the reasoning behind this style of implementation for async, as opposed to the function callback style?
  • Jan Oravec: async/await makes it possible to structure asynchronous code in the same way as if it was implemented synchronously, resulting in significantly shorter and more readable code.
  • Terry Cullen: Is there a list somewhere of what works asynchronously and what doesn't? Am I correct in thinking that all regular PHP can run async but code that calls extensions (like pdo, redis) needs the extension to be updated to run async? What about the all the different wrappers in http://php.net/manual/en/wrappers.php ? Can things like file_get_contents() be called async?
  • Aftab Naveed: if it still uses one CPU core to process the request, my question is how come it passes control to next task when that core is still busy and waiting for the request to finish? is there any other example which can elaborate this ?
  • Guilherme Cardoso: Mongodb, that would be a great implementation!
  • hhvm-rocks: ...await curl_multi_async($mh); allows us to pass control to another... just a small correction; it should be "await curl_multi_await($mh);"
  • Aftab Naveed: Sorry I wanted to clarify my question here as that sounds really stupid, comparing to Google Golang concurrency which uses the same core but switches control to another core if required, was wondering if HACK async does similar kind of thing?
  • Aftab Naveed: Are there any async implementations planned for Guzzle?
  • Fred Emmott: That's really up to the Guzzle developers; we don't plan on creating a fork. We do provide async curl and stream primitives.
  • Fred Emmott: This blog post is very old; the modern approach is documented here: https://docs.hhvm.com/hack/async/utility-functions