Saturday 21 January 2023

PHP: looking at ways of making HTTP requests

G'day:

I'm reacquainting myself with PHP, and part of this process is chucking some tests together to demonstrate to myself how bits and pieces of it works. This has the added bonus of being able to put the code in front of my team, to help provide learning info for them. This article is pretty much just showing sample code, and it's for the reader to compare and contrast. There's likely not gonna be too much exposition from me once we get to the code. I'm sure I can pad things out by a few hundred words before we get there though. I am me after all.

This time, I've decided to revisit how to make HTTP requests.

I've got four candidate solutions to look at:

I am aware of PHP's curl extension always being available, but its API is a bit of a mess (it's been part of PHP since the bad old days).

I've also used Guzzle in the past, with mixed success. It started out being simple and handy, and I liked it. But then between a major version bump (I can't remember which versions this occurred between), the old API was basically dumped in favour of a new, non-backwards-compatible, and largely (and pointlessly IMO) overly complex promise-based approach. To provide asynchronicity in HTTP requests. Which was something I never needed and seemed like an odd addition to an HTTP library. I suspect the author had started to look at Node.jS with all its async HTTP shiz, and went "I know… I'll ruin something perfectly useful by adding this crap into it as well". Ugh. However I note Guzzle is still around, so - armed with an open mind - I'll look at that too.

During my googling I have also spied that Symfony has an HTTP client too. It probably always did, but in my last gig we went the Guzzle route, so I had not looked further afield.

Also during my googling (and reading the Guzzle docs), I discovered PHP's own streams extension can be used to make HTTP requests. That sounds interesting, so have decided to give that a go too.

My approach is to create a test class, and add a test for each of those four platforms, to do each of a GET and a POST. They are not complicated tests, it's just a case of getting the thing to do something simple that I can expect results from.

I will also concede that I used Copilot to do probably 80% of the work here, with my polishing that last 20%.

Installation

  • curl needs ext-curl installed. It ships with the Docker image, so I didn't have to do anything for this.
  • Installing Guzzle is a matter of adding it as a dependency in composer.json: "guzzlehttp/guzzle": "^7.5.0" at time of writing.
  • Similarly with Symfony's HTTP client: "symfony/http-client": "^6.2.2"
  • And PHP's streams lib is native to PHP. No installation necessary.

Curl

/** @testdox it can make a GET request */
public function testGet()
{
    $ch = curl_init();
    curl_setopt_array($ch, [
        CURLOPT_URL => 'https://api.github.com/users/adamcameron',
        CURLOPT_USERAGENT => $this->getUserAgentForCurl(),
        CURLOPT_RETURNTRANSFER => 1
    ]);
    $response = curl_exec($ch);
    curl_close($ch);

    $this->assertEquals(200, curl_getinfo($ch, CURLINFO_HTTP_CODE));
    $this->assertJson($response);
    $this->assertGitInfoIsCorrect($response);
}

It also uses these two helper methods:

private function getUserAgentForCurl(): string
{
    return sprintf("curl/%s", curl_version()['version']);
}
protected function assertGitInfoIsCorrect(string $response): void
{
    $myGitMetadata = json_decode($response);
    $this->assertEquals('adamcameron', $myGitMetadata->login);
    $this->assertEquals('Adam Cameron', $myGitMetadata->name);
}

(A bunch of the other tests below also use that one above).

The GET test in each case will be to get my own GitHub profile and to superficially check it's been fetched properly. BTW I needed that getUserAgentForCurl carry-on because curl by itself does notsent a user agent, and Github says "nuh-uh" if it doesn't get one. So I've just contrived a user agent that is the one that the underlying curl implementation would use (eg: ike if one was doing a curl from bash).

The POST test will post to https://httpbin.org/post. I only discovered httpbin.org when I was doing this exercise, and I wanted a simple (public) way of echoing back a post request. Handy.

/** @testdox it can make a POST request */
public function testPost()
{
    $ch = curl_init();
    curl_setopt_array($ch, [
        CURLOPT_URL => 'https://httpbin.org/post',
        CURLOPT_USERAGENT => $this->getUserAgentForCurl(),
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_POST => 1,
        CURLOPT_POSTFIELDS => ['foo' => 'bar']
    ]);
    $response = curl_exec($ch);
    curl_close($ch);

    $this->assertEquals(200, curl_getinfo($ch, CURLINFO_HTTP_CODE));
    $this->assertJson($response);
    $httpBinResponse = json_decode($response);

    $this->assertEquals('bar', $httpBinResponse->form->foo);
}

An example of what https://httpbin.org/post returns is:

{
    "args":{
        
    },
    "data":"",
    "files":{
        
    },
    "form":{
        "foo":"bar"
    },
    "headers":{
        "Accept":"*/*",
        "Content-Length":"141",
        "Content-Type":"multipart/form-data; boundary=------------------------b0133bb008e6829b",
        "Host":"httpbin.org",
        "User-Agent":"curl/7.74.0",
        "X-Amzn-Trace-Id":"Root=1-63cc3251-4a3f92c809ad00f75261466a"
    },
    "json":null,
    "origin":"82.8.81.31",
    "url":"https://httpbin.org/post"
}

Guzzle

First up: I'm really pleased how compact and straight-forward Guzzle's code is for these exercises. And also that one is not forced to write async code for a non-async situation.

/** it can make a GET request */
public function testGet()
{
    $client = new Client();
    $response = $client->request('GET', 'https://api.github.com/users/adamcameron');
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getBody());
    $this->assertGitInfoIsCorrect($response->getBody());
}
/** @testdox it can make a POST request */
public function testPost()
{
    $client = new Client();
    $response = $client->request(
        'POST',
        'https://httpbin.org/post',
        ['form_params' => ['foo' => 'bar']]
    );
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getBody());
    $httpBinResponse = json_decode($response->getBody());
    $this->assertEquals('bar', $httpBinResponse->form->foo);
}

I also decided to revisit the async side of things:

/** it can make an asynchronous GET request */
public function testAsyncGet()
{
    $client = new Client();
    $promise = $client->requestAsync('GET', 'https://api.github.com/users/adamcameron');
    $response = $promise->wait();
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getBody());
    $this->assertGitInfoIsCorrect($response->getBody());
}

Simple. It seems the current implementation is taking the "async-await" approach with these things like JS has these days.

That was not much of a test though. This time I am gonna make a bunch of requests (which are artificially slow) and make sure they do seem to run asynchronously. I've slung this in my web directory:

// html/test-fixtures/slow.php
$timeToWait = $_GET['timeToWait'] ?? 0;
sleep($timeToWait);
echo "waited $timeToWait seconds";

And calling that with varying delays:

/** it can make multiple asynchronous GET requests */
public function testMultipleAsyncGet()
{
    $client = new Client();
    $requestsToMakeConcurrently = [
        $client->getAsync('http://nginx/test-fixtures/slow.php?timeToWait=1'),
        $client->getAsync('http://nginx/test-fixtures/slow.php?timeToWait=2'),
        $client->getAsync('http://nginx/test-fixtures/slow.php?timeToWait=3')
    ];
    $startTime = microtime(true);
    $responses = Promise\Utils::unwrap($requestsToMakeConcurrently);
    $endTime = microtime(true);

    $totalTime = $endTime - $startTime;
    $this->assertGreaterThan(3, $totalTime);
    $this->assertLessThan(4, $totalTime);

    array_walk($responses, function ($response, $i) {
        $this->assertEquals(200, $response->getStatusCode());
        $this->assertEquals(sprintf("waited %d seconds", $i+1), $response->getBody());
    });
}

The assertions there are a bit woolly. I figured it should def take longer than 3sec cos at least one of the requests will take 3sec. Plus there'll be a wee bit of overhead. That overhead ought not be more than a second, so if the whole lot finishes in less than 4sec, it's a pretty good indicator that all three requests were being made simultaneously. It occurs to me now I could perhaps look @ the Nginx activity logs for when the requests come in. Please hold…

172.31.0.4 - - [21/Jan/2023:18:57:49 +0000] "GET /test-fixtures/slow.php?timeToWait=1 HTTP/1.1" 200 27 "-" "GuzzleHttp/7"
172.31.0.4 - - [21/Jan/2023:18:57:50 +0000] "GET /test-fixtures/slow.php?timeToWait=2 HTTP/1.1" 200 27 "-" "GuzzleHttp/7"
172.31.0.4 - - [21/Jan/2023:18:57:51 +0000] "GET /test-fixtures/slow.php?timeToWait=3 HTTP/1.1" 200 27 "-" "GuzzleHttp/7"

Now Nginx is logging when it responds to the request, not when it receives it, so what we can infer from this is that the requests all arrived at 18:57:48, and the 1sec request finished after 1sec at 18:57:49; the 2sec request finished after 2sec @ 18:57:50, and similarly the third one, 3sec, finished after 3 seconds at 18:57:51.

It's easier to see if I make the requests hang on for different periods of time. Here's an example where they take 1sec, 12sec and 23sec respectively:

172.31.0.4 - - [21/Jan/2023:18:59:07 +0000] "GET /test-fixtures/slow.php?timeToWait=1 HTTP/1.1" 200 27 "-" "GuzzleHttp/7"
172.31.0.4 - - [21/Jan/2023:18:59:18 +0000] "GET /test-fixtures/slow.php?timeToWait=12 HTTP/1.1" 200 28 "-" "GuzzleHttp/7"
172.31.0.4 - - [21/Jan/2023:18:59:29 +0000] "GET /test-fixtures/slow.php?timeToWait=23 HTTP/1.1" 200 28 "-" "GuzzleHttp/7"

We can infer they all arrived at 18:59:06. 1sec later at 18:59:07 the first request completed; 12sec later the second one completed at 18:59:18 (18:59:18 - 18:59:06 is 12sec); and lastly the third request - which will take 23sec to run - indeed finishes at 18:59:29 - 18:59:06 = 23sec later.

Excellent. Working as expected.

For completeness I also tested an async POST request:

/** @testdox it can make an asynchronous POST request */
public function testAsyncPost()
{
    $client = new Client();
    $promise = $client->requestAsync(
        'POST',
        'https://httpbin.org/post',
        ['form_params' => ['foo' => 'bar']]
    );
    $response = $promise->wait();
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getBody());
    $httpBinResponse = json_decode($response->getBody());
    $this->assertEquals('bar', $httpBinResponse->form->foo);
}

No surprises.


Symfony

/** @testdox it can make a GET request */
public function testGet()
{
    $client = HttpClient::create();
    $response = $client->request('GET', 'https://api.github.com/users/adamcameron');
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getContent());
    $this->assertGitInfoIsCorrect($response->getContent());
}

This is identical to the Guzzle example except Symfony uses a factory method to create the client object compared Guzzle just using new; and Guzzle uses getBody instead of Symfony's getContent.

/** @testdox it can make a POST request */
public function testPost()
{
    $client = HttpClient::create();
    $response = $client->request(
        'POST',
        'https://httpbin.org/post',
        ['body' => ['foo' => 'bar']]
    );
    $this->assertEquals(200, $response->getStatusCode());
    $this->assertJson($response->getContent());
    $httpBinResponse = json_decode($response->getContent());
    $this->assertEquals('bar', $httpBinResponse->form->foo);
}

It's just occurred to me that knowing Symfony, it can likely do async request collections too. And after some googling: sure enough I've found a way ("Symfony › HTTP Client › Concurrent Requests"):

/** it can make multiple asynchronous GET requests */
public function testMultipleAsyncGet()
{
    $client = HttpClient::create();
    $requestsToMakeConcurrently = [
        $client->request('GET', 'http://nginx/test-fixtures/slow.php?timeToWait=1'),
        $client->request('GET', 'http://nginx/test-fixtures/slow.php?timeToWait=2'),
        $client->request('GET', 'http://nginx/test-fixtures/slow.php?timeToWait=3')
    ];
    $stream = $client->stream($requestsToMakeConcurrently);

    $i = 1;
    $startTime = microtime(true);
    foreach ($stream as $response => $chunk) {
        if ($chunk->isLast()) {
            $this->assertEquals(200, $response->getStatusCode());
            $this->assertEquals("waited $i seconds", $response->getContent());
            $i++;
        }
    }
    $endTime = microtime(true);
    $totalTime = $endTime - $startTime;
    $this->assertGreaterThan(3, $totalTime);
    $this->assertLessThan(4, $totalTime);
}

This is analogous to the Guzzle version. It's implementation is not as nice though IMO.


PHP Streams

/** @testdox it can make a GET request */
public function testGet()
{
    $context = stream_context_create([
        'http' => [
            'method' => 'GET',
            'header' => ['User-Agent: ' . $this->getUserAgentForCurl()]
        ]
    ]);
    $response = file_get_contents('https://api.github.com/users/adamcameron', false, $context);
    $this->assertJson($response);
    $this->assertGitInfoIsCorrect($response);
}
/** @testdox it can make a POST request */
public function testPost()
{
    $context = stream_context_create([
        'http' => [
            'method' => 'POST',
            'header' => [
                'User-Agent: ' . $this->getUserAgentForCurl(),
                'Content-Type: application/x-www-form-urlencoded'
            ],
            'content' => http_build_query(['foo' => 'bar'])
        ]
    ]);
    $response = file_get_contents('https://httpbin.org/post', false, $context);
    $this->assertJson($response);
    $httpBinResponse = json_decode($response);
    $this->assertEquals('bar', $httpBinResponse->form->foo);
}

OK. It's poss just me being pedantic, but I get a bit itchy looking at file_get_contents on a URL. I mean I know an HTTP request is fetching a file - so semantically that's fine - but it still seems odd.


Conclusion

For these superficial test cases, I prefer Guzzle. Doubtless there more one can do with Symfony's HTTP client, because there's always more one can do with Symfony's stuff; but the same will apply with Guzzle too no doubt. I did not know about PHP's streams before, and whilst this might not be a good use of it, there'll likely be other situations to use them.

I'm mostly pleased that Guzzle seems easy to use again, and for both sync and async stuff. Cool.

All the code shown in here is @ /test/integration/http on Github.

Righto.

--
Adam