Saturday 1 April 2023

Symfony / Doctrine / DBAL: convincing/configuring it to use a PrimaryReadReplicaConnection connection

G'day:

A while back I documented how to create/configure a PrimaryReadReplicaConnection connection in PHP. PrimaryReadReplicaConnection is the replacement for MasterSlaveConnection, which has been retired due to socially-insensitive nomenclature. This is all in "PHP: PrimaryReadReplicaConnection - configuration / usage example".

Today's exercise is to get one to work in my Symfony project (adamcameron/SymfonyFastTrack).

Symfony's default DB connectivity is done via the DATABASE_URL environment variable which must be set (docs: Configuring Doctrine ORM). This is fine for simple situations, even though I personally think it's a daft way of handling connnection parameters: it's a bit "type couply" to using a URL to connect to a DB, which is not the only way of doing it. But falls flat pretty quickly once not in a simple situation. The problem is that the DATABASE_URL allows only for a single connection. It's gonna be pretty common to be using primary / replicas I think. I guess perhaps the Symfony thinking(?) is that this is better handled on a DB load balancer than in the app. However: I don't have one of those, and i have also seen enterprise-scale PHP-driven operations that also simply use a PrimaryReadReplicaConnection for this. I have a PrimaryReadReplicaConnection driver. I need to make it work. This is not well / clearly / at all documented.

Firstly: there is no escaping it. One needs to have DATABASE_URL set. And it needs to be valid. This is even if you remove references to it in config: the connection won't configure (and the application won't work) unless it exists. The Symfony docs say to use override_url to prevent this behaviour:

When specifying a url parameter, any information extracted from that URL will override explicitly set parameters unless override_url is set to true. An example database URL would be mysql://snoopy:redbaron@localhost/baseball, and any explicitly set driver, user, password and dbname parameter would be overridden by this URL. See the Doctrine DBAL documentation for more information.
Doctrine DBAL Configuration

However that link to the Doctrine docs is obsolete, and that setting has been deprecated since 2.4 (we're on 2.9 now):

UPGRADE FROM 2.3 to 2.4
=======================

Configuration
--------

 * Setting the `host`, `port`, `user`, `password`, `path`, `dbname`, `unix_socket`
   or `memory` configuration options while the `url` one is set has been deprecated.
 * The `override_url` configuration option has been deprecated.
DoctrineBundle/UPGRADE-2.4.md

OK. Great. Thanks for that. I must provide a DATABASE_URL. But any explicitly-set overrides I set in the actual connection config are ignored. And the functionality that used to permit me to actively say "FFS will you just do what yer told" (ie: override_url) has been deprecated. And the URL approach doesn't actually support the options I want to use, so it's completely useless to me. This seems pretty mickey mouse to me, from where I'm standing. But all right then; I'll play yer silly game.

After some trial end error (because there's no documentation that I can find), I came up with this (in config/packages/doctrine.yaml):

doctrine:
    dbal:
        default_connection: default
        connections:
            default:
                wrapper_class: Doctrine\DBAL\Connections\PrimaryReadReplicaConnection
                dbname: '%env(resolve:POSTGRES_PRIMARY_DB)%'
                host: '%env(resolve:POSTGRES_PRIMARY_HOST)%'
                port: '%env(resolve:POSTGRES_PRIMARY_PORT)%'
                user: '%env(resolve:POSTGRES_PRIMARY_USER)%'
                password: '%env(resolve:POSTGRES_PRIMARY_PASSWORD)%'
                driver: pdo_pgsql
                server_version: 15
                charset: utf8
                replicas:
                    replica1:
                        dbname: '%env(resolve:POSTGRES_REPLICA_DB)%'
                        host: '%env(resolve:POSTGRES_REPLICA_HOST)%'
                        port: '%env(resolve:POSTGRES_REPLICA_PORT)%'
                        user: '%env(resolve:POSTGRES_REPLICA_USER)%'
                        password: '%env(resolve:POSTGRES_REPLICA_PASSWORD)%'
                        charset: utf8


I have used the primary values in DATABASE_URL (in docker/php/envVars.public):

# …

DATABASE_URL="postgresql://${POSTGRES_PRIMARY_USER}:${POSTGRES_PRIMARY_PASSWORD}@${POSTGRES_PRIMARY_HOST}:${POSTGRES_PRIMARY_PORT}/${POSTGRES_PRIMARY_DB}?serverVersion=15&charset=utf8"

To test that this actually works, and it doesn't just send everything to the primary, I have set up two completely separate DBs, with different credentials (and host, port, and even database name). These are separate PostgreSQL containers. I'll push the config to Github, but I won't repeat it here as it's really just a duplication of what I already had in this codebase. But here's links to the relevant bits of various files:

It's important to note that in this experimentation the two DBs are completely separate, and there's no replication going on. I'm just testing the connection config is correctly switching between primary and replica, and it's easier to do when the databases have different data in them.

# primary.test
"id"    "value"
101    "Test row 1"
102    "Test row 2"
104    "PRIMARY"
105    "TEST_VALUE"
106    "TEST_VALUE"

(You can see I've already been running some tests there)

# replica.test
101    "Test row 1"
102    "Test row 2"
103    "REPLICA"
/** @testdox Writing to primary definitely does not impact the replica */
public function testWritingToPrimaryDoesNotImpactReplica()
{
    $sqlForCount = "SELECT COUNT(1) AS count FROM test";

    $primaryConnection = $this->getPrimaryConnection();
    $replicaConnection = $this->getReplicaConnection();

    $initialPrimaryCount = $primaryConnection->executeQuery($sqlForCount)->fetchOne();
    $initialReplicaCount = $replicaConnection->executeQuery($sqlForCount)->fetchOne();
    $this->assertNotEquals(
        $initialPrimaryCount,
        $initialReplicaCount,
        "Test aborted as the test requires the DBs to NOT be in sync (and they are)"
    );

    $defaultConnection = $this->getDefaultConnection();
    $this->assertFalse($defaultConnection->isConnectedToPrimary(), "Connection did not start on replica");

    $initialDefaultCount = $defaultConnection->executeQuery($sqlForCount)->fetchOne();
    $this->assertEquals($initialDefaultCount, $initialReplicaCount, "Row count from default should match replica");

    $defaultConnection->executeStatement("INSERT INTO test (value) VALUES (?)", ["TEST_VALUE"]);
    $this->assertTrue(
        $defaultConnection->isConnectedToPrimary(),
        "Connection did not switch to primary after INSERT"
    );

    $countFromDefault = $defaultConnection->executeQuery($sqlForCount)->fetchOne();
    $countFromPrimary = $primaryConnection->executeQuery($sqlForCount)->fetchOne();
    $countFromReplica = $replicaConnection->executeQuery($sqlForCount)->fetchOne();

    $this->assertEquals($countFromDefault, $countFromPrimary, "Row count from default should match primary");
    $this->assertEquals($initialReplicaCount, $countFromReplica, "Row count from replica should not have changed");
}

Here:

  • I'm creating three connections, one each that directly queries the primary and replica DBs respectively, and one that is the app's default connection (which is the PrimaryReadReplicaConnection one).
  • I get row counts from each, making sure they start off differently (otherwise it'll be harder to see the difference, further down).
  • I then check the row count from the default connection matches the replica (it should start using the replica connection.
  • Then I insert a row using the default connection. This has a dual effect:
  • The default connection should now be pointing to the primary database.
  • So a row count from that connection should match the primary now.
  • And the replica count should be unchanged.

And the helper functions:

private function getPrimaryConnection(): Connection
{
    return DriverManager::getConnection([
        "dbname" => getenv("POSTGRES_PRIMARY_DB"),
        "user" => getenv("POSTGRES_PRIMARY_USER"),
        "password" => getenv("POSTGRES_PRIMARY_PASSWORD"),
        "host" => getenv("POSTGRES_PRIMARY_HOST"),
        "port" => getenv("POSTGRES_PRIMARY_PORT"),
        "driver" => "pdo_pgsql"
    ]);
}

private function getReplicaConnection(): Connection
{
    return DriverManager::getConnection([
        "dbname" => getenv("POSTGRES_REPLICA_DB"),
        "user" => getenv("POSTGRES_REPLICA_USER"),
        "password" => getenv("POSTGRES_REPLICA_PASSWORD"),
        "host" => getenv("POSTGRES_REPLICA_HOST"),
        "port" => getenv("POSTGRES_REPLICA_PORT"),
        "driver" => "pdo_pgsql"
    ]);
}

public function getDefaultConnection(): PrimaryReadReplicaConnection
{
    $kernel = new Kernel("test", true);
    $kernel->boot();

    $container = $kernel->getContainer();

    return $container->get("doctrine.dbal.default_connection");
}

And that all worked, so that's good.

I'm gonna tag all this in GitHub as 1.6, but I'm gonna roll-back the two different DBs and just point the replica to the same DB as the primary from now on. I want the PrimaryReadReplicaConnection connection in place in this project, but I don't wanna horse around keeping throw-away DBs actually in sync. That version is tagged as 1.6.1.

And now tomorrow I can get on with what I actually wanted to be writing about today :-|

Righto.

--
Adam