Thursday, 31 July 2025

Symfony forms

G'day:

The next thing on my "things I ought to know about already" todo list is Symfony forms. I'm aware they exist as a thing as I've seen some code using them, but don't know much beyound them being a thing. I actually used the Symfony form lib to create the UI for "Integrating Elasticsearch into a Symfony app", but it was very superficial and I waved-away any discussion on it as I knew this article here was coming up. Plus it was off-topic in that article. I got Github Copilot to write all the code for the forms too, so I am really none-the-wiser as to how it all works.

However this is about to change. I have a baseline Symfony-driven project in Github, and I have opened the docs page for Symfony Forms. I've breezed through about the first half of the first page, but have rewound and decided to write this article as I go.

Installation:

docker exec php composer require symfony/form:7.3.*

Installation feedback ensued, and looks like it worked.

My first experiment is going to be to create a sign-up form: a new user enters their name (given & family parts), email, login ID and password, and we'll save that lot. Conceits to deal with here:

  • The login ID should default to the lowercased & dot-separated full name, eg: "Adam Cameron" => "adam.cameron". Although that's just a suggestion and they can change it to whatevs. This is just client-side stuff but I want to see how I can integrate the JS event handlers.
  • The email address should be validated for well-formedness (via some mechanism I don't need to provide).
  • The password data-entry control should be the usual "password" and "confirm password" inputs.
  • The password should be validated - both client- and server-side - for strength.

First I'm going to need a backing entity for this. I've used the entity maker for this, for which I first needed to install the Symfony Maker Bundle. Then it's easy:

docker exec php composer require --dev symfony/maker-bundle:^1
[installation stuff]

docker exec -it php bin/console make:entity


 Class name of the entity to create or update (e.g. GentlePizza):
 > User

 created: src/Entity/User.php
 created: src/Repository/UserRepository.php

 Entity generated! Now let's add some fields!
 You can always add more fields later manually or by re-running this command.

 New property name (press <return> to stop adding fields):
 > givenName

 Field type (enter ? to see all types) [string]:
 >

 Field length [255]:
 >

 Can this field be null in the database (nullable) (yes/no) [no]:
 >

 updated: src/Entity/User.php

 Add another property? Enter the property name (or press <return> to stop adding fields):
 
 [you get the idea]

That results in this user entity:

// src/Entity/User.php

namespace App\Entity;

use App\Repository\UserRepository;
use Doctrine\ORM\Mapping as ORM;

#[ORM\Entity(repositoryClass: UserRepository::class)]
class User
{
    #[ORM\Id]
    #[ORM\GeneratedValue]
    #[ORM\Column]
    private ?int $id = null;

    #[ORM\Column(length: 255)]
    private ?string $givenName = null;

    #[ORM\Column(length: 255)]
    private ?string $familyName = null;

    #[ORM\Column(length: 255)]
    private ?string $email = null;

    #[ORM\Column(length: 255)]
    private ?string $loginId = null;

    #[ORM\Column(length: 255)]
    private ?string $password = null;

    // [...]
}

Now we can build the form for this entity. There's two options: do it direct in the controller (wrong place for it), or as a class (which is what I'll do). I see there is a bin/console make:registration-form script, which - having asked Copilot to clarify, because the docs don't - is pretty much what I want as I am indeed creating a user registration form. I'll give it a go to at least see what if scaffolds/generates.

docker exec -it php bin/console make:registration-form
[fairly poor experience ensues]

OK, that was less good than it could be. It asked for my entity (required me to implement Symfony\Component\Security\Core\User\UserInterface first), asked a coupla things about sending emails and stuff (over-reach for a form builder), and then triumphantly came up with this form:

Forget the lack of styling - that's on me - but WTF has that got to do with the entity I gave it?

[DEL] [DEL] [DEL] [DEL]

Right that's that lot gone. Not being one to learn from past experience, there's a second option: docker exec -it php bin/console make:form. I'll try that. I have my [DEL] key primed just in case…

$ docker exec -it php bin/console make:form

 The name of the form class (e.g. FierceChefType):
 > UserType

 The name of Entity or fully qualified model class name that the new form will be bound to (empty for none):
 > App\Entity\User


 [ERROR] Entity "App\Entity\User" doesn't exist; please enter an existing one or create a new one.


 The name of Entity or fully qualified model class name that the new form will be bound to (empty for none):
 > User

 created: src/Form/UserType.php


  Success!


 Next: Add fields to your form and start using it.
 Find the documentation at https://symfony.com/doc/current/forms.html

The only glitch there was me entering the fully-qualified name of the entity, not just its class name. And the results:

class UserType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options): void
    {
        $builder
            ->add('givenName')
            ->add('familyName')
            ->add('email')
            ->add('loginId')
            ->add('password')
        ;
    }

    public function configureOptions(OptionsResolver $resolver): void
    {
        $resolver->setDefaults([
            'data_class' => User::class,
        ]);
    }
}

/me stops hovering over the [DEL] key

OK that's more like it: focusing on the job at hand. But obvs it needs some work. Indeed here's the one Copilot made for me before I noticed these build wizards:

// src/Form/Type/UserType.php

class UserType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options): void
    {
        $builder
            ->add('givenName', TextType::class)
            ->add('familyName', TextType::class)
            ->add('email', EmailType::class)
            ->add('loginId', TextType::class)
            ->add('password', PasswordType::class)
        ;
    }

    public function configureOptions(OptionsResolver $resolver): void
    {
        $resolver->setDefaults([
            'data_class' => User::class,
        ]);
    }
}

Just a bit more thorough.

I did some back-and-forth with Copilot to tweak some rules and some UI behaviour - and even got it to write some CSS (new-user.css) for me - and we ended-up with this:

public function buildForm(FormBuilderInterface $builder, array $options): void
{
    $builder
        ->add('givenName', TextType::class)
        ->add('familyName', TextType::class)
        ->add('email', EmailType::class)
        ->add('loginId', TextType::class, [
            'label' => 'Login ID',
        ])
        ->add('password', RepeatedType::class, [
            'type' => PasswordType::class,
            'first_options'  => [
                'label' => 'Password',
                'constraints' => [
                    new Assert\Regex([
                        'pattern' => '/^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$/',
                        'message' => 'Password must be at least 8 characters long and include an uppercase letter, a lowercase letter, and a number.',
                    ]),
                ],
            ],
            'second_options' => ['label' => 'Confirm Password'],
            'invalid_message' => 'The password fields must match.',
        ])
    ;
}
  • "Login ID" was rendering as "Login id", without a bit of guidance.
  • There's no canned password strength validation rules, so regex it is.
  • And the RepeatedType with first_options / second_options is how to do the password confirmation logic.
  • (not seen here) the code for defaulting the login ID to [givenName].[familyName] needed to be done in JS (new-user.js) which I got Copilot to knock together, and I didn't really pay attention to it as it's nothing to do with the Symfony forms stuff. It works though).

The controller for this is thus:

class UserController extends AbstractController
{
    #[Route('/user/new', name: 'user_new')]
    public function new(Request $request, EntityManagerInterface $em): Response
    {
        $user = new User();
        $form = $this->createForm(UserType::class, $user);

        $form->handleRequest($request);

        if ($form->isSubmitted() && $form->isValid()) {
            $em->persist($user);
            $em->flush();

            return $this->redirectToRoute(
                'user_success',
                ['id' => $user->getId()]
            );
        }

        return $this->render('user/new.html.twig', [
            'form' => $form->createView(),
        ]);
    }

    #[Route('/user/success/{id}', name: 'user_success')]
    public function userSuccess(User $user): Response
    {
        return $this->render('user/success.html.twig', [
            'user' => $user,
        ]);
    }
}

Mostly boilerplate. The key form bits are highlighted, and self-explanatory. It's interesting how Symfony is able to infer whether we're dealing with the initial GET or the ensuing POST from the form object. There's some docs which are worth reading: Processing Forms.

I do however wish Symfony's default approach was not to roll the GET and POST handling into the same controller method. They're two different requests, with two different jobs. It strikes me as being poor design to implement things this way.

I quizzed Copilot about this, and we(*) were able to separate out the two concerns quite nicely:

// src/Controller/UserController.php

#[Route('/user/new', name: 'user_new', methods: ['GET'])]
public function showNewUserForm(): Response
{
    $form = $this->createForm(UserType::class, new User());
    return $this->render('user/new.html.twig', [
        'form' => $form->createView(),
    ]);
}

#[Route('/user/new', name: 'user_new_post', methods: ['POST'])]
public function processNewUser(Request $request, EntityManagerInterface $em): Response
{
    $user = new User();
    $form = $this->createForm(UserType::class, $user);
    $form->handleRequest($request);

    if ($form->isSubmitted() && $form->isValid()) {
        $em->persist($user);
        $em->flush();
        return $this->redirectToRoute('user_success', ['id' => $user->getId()]);
    }

    return $this->render('user/new.html.twig', [
        'form' => $form->createView(),
    ]);
}

This is way better, and I'm gonna stick with this. Looking at it, all symfony is gaining by munging the two methods together is to save two statements being repeated. That is not worth rolling the logic into one method that now does two things.

This is perhaps a timely reminder that DRY does not mean "don't repeat code", it means "don't implement the same concept more than once". I write about this in "DRY: don't repeat yourself". The Symfony approach misunderstand DRY here, I think.

OK so the controller method is no use without a view, so here's the twig. Well. Let's back-up. This is what the twig was initially:

{% extends 'base.html.twig' %}

{% block title %}New User{% endblock %}

{% block body %}
    <h1>Create New User</h1>
    {{ form_start(form) }}
        {{ form_widget(form) }}
        <button class="btn btn-primary">Submit</button>
    {{ form_end(form) }}
{% endblock %}

This was functional, and the happy path even looked OK thanks to the CSS that Copilot wrote:

(My bar for "looks OK" is very low, I admit this).

However validation errors were not rendering well, so I did a bunch of back-and-forth with Copilot to get the mark-up for the form workable with CSS to dolly things up a bit. We ended up having to expand-out all the fields:

{# templates/user/new.html.twig #}

{% extends 'base.html.twig' %}

{% block stylesheets %}
    {{ parent() }}
    <link rel="stylesheet" href="{{ asset('css/new-user.css') }}">
{% endblock %}

{% block javascripts %}
    {{ parent() }}
    <script src="{{ asset('js/new-user.js') }}"></script>
{% endblock %}

{% block title %}New User{% endblock %}

{% block body %}
    <h1>Create New User</h1>
    {{ form_start(form) }}
    <div>
        {{ form_label(form.givenName) }}
        {{ form_widget(form.givenName) }}
        {{ form_errors(form.givenName) }}
    </div>
    <div>
        {{ form_label(form.familyName) }}
        {{ form_widget(form.familyName) }}
        {{ form_errors(form.familyName) }}
    </div>
    <div>
        {{ form_label(form.email) }}
        {{ form_widget(form.email) }}
        {{ form_errors(form.email) }}
    </div>
    <div>
        {{ form_label(form.loginId) }}
        {{ form_widget(form.loginId) }}
        {{ form_errors(form.loginId) }}
    </div>
    <div>
        {{ form_label(form.password.first) }}
        <div class="field-input">
            {{ form_widget(form.password.first) }}
            {{ form_errors(form.password.first) }}
        </div>
    </div>
    <div>
        {{ form_label(form.password.second) }}
        {{ form_widget(form.password.second) }}<br>
        {{ form_errors(form.password.second) }}
        {{ form_errors(form.password) }}
    </div>
    <button class="btn btn-primary">Submit</button>
    {{ form_end(form) }}
{% endblock %}

That's fine: it's simple enough.

Oh I like the way JS and CSS are handled here: this code hoists them up into the head block for me.

And I also need a place to land after a successful submission:

{# templates/user/success.html.twig #}

{% extends 'base.html.twig' %}

{% block body %}
    <h1>Thank you, {{ user.givenName }} {{ user.familyName }}!</h1>
    <p>Your account has been created successfully.</p>
{% endblock %}

And this all works! Hurrah. Well: once I added the DB table it did anyhow:

# docker/mariadb/docker-entrypoint-initdb.d/1.createTables.sql

USE db1;

CREATE TABLE user (
    id INT AUTO_INCREMENT PRIMARY KEY,
    given_name VARCHAR(255) NOT NULL,
    family_name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    login_id VARCHAR(255) NOT NULL,
    password VARCHAR(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I'm not using Doctrine Migrations for this as they can get tae hell (see PHP / Symfony: working through "Symfony: The Fast Track", part 4: not really about Symfony, this one › "Doctrine: know your limits!"(*)). But slinging this in /docker-entrypoint-initdb.d in the MariaDB container file system will ensure it gets recreated whenever I rebuild the DB. Which is fine for dev.

I can create a new user now.

However I need some server-side validation. We can't be having a new user reusing the same Login ID as an existing user, so we need to stop that.

Oh for goodness sake, doing this is just a one line addition to the User entity:

#[ORM\Entity(repositoryClass: UserRepository::class)]
#[UniqueEntity(fields: ['loginId'], message: 'This login ID is already in use.')]
class User
{

This is brilliant, but no bloody use as an example. So let's pretend it's not that easy and we have to create a custom validator for this.

First we need a new constraint class:

// src/Validator/UniqueLoginId.php

namespace App\Validator;

use Symfony\Component\Validator\Constraint;

#[\Attribute]
class UniqueLoginId extends Constraint
{
    public string $message = 'This login ID is already in use.';
}

And a validator thereof:

// src/Validator/UniqueLoginIdValidator.php

namespace App\Validator;

use Symfony\Component\Validator\Constraint;
use Symfony\Component\Validator\ConstraintValidator;
use Doctrine\ORM\EntityManagerInterface;
use App\Entity\User;

class UniqueLoginIdValidator extends ConstraintValidator
{
    private EntityManagerInterface $em;

    public function __construct(EntityManagerInterface $em)
    {
        $this->em = $em;
    }

    public function validate($value, Constraint $constraint)
    {
        if (!$value) {
            return;
        }

        $existing = $this->em->getRepository(User::class)->findOneBy(['loginId' => $value]);
        if ($existing) {
            $this->context->buildViolation($constraint->message)->addViolation();
        }
    }
}

This needs to be configured in services.yaml:

# config/services.yaml

services:
    # [...]

    App\Validator\UniqueLoginIdValidator:
      arguments:
        - '@doctrine.orm.entity_manager'
      tags: [ 'validator.constraint_validator' ]

And then applied to the form field:

// src/Form/Type/UserType.php

class UserType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options): void
    {
        $builder
            ->add('givenName', TextType::class)
            ->add('familyName', TextType::class)
            ->add('email', EmailType::class)
            ->add('loginId', TextType::class, [
                'label' => 'Login ID',
                'constraints' => [
                    new UniqueLoginId(),
                ],
            ])

And that's it: pretty easy (not quite as easy as the one-liner, but still)! There's some docs to read: Constraints At Field Level.

I was surprised I had to manually wire-in the validator into services.yaml: I've been spoilt recently with the AsDoctrineListener and AsMessageHandler directly on the classes/methods that define them, and Symfony's autowiring picks them up automatically.

I'm also a bit bemused as to why the validation system needs two files: a constraint class and a validator class. This is not clearly explained in the docs that I could see. As far as I can gather the constraint class defines the rules that would mean an object is valid; and the validator actually checks them against an object. I've read how this is a separation of concerns, but I am not entirely convinced we have two separate concerns here to be separated. in the example we have here, it's the validator that is defining the rule, and doing the validation:

public function validate($value, Constraint $constraint)
{
    if (!$value) {
        return;
    }

    $existing = $this->em->getRepository(User::class)->findOneBy(['loginId' => $value]);
    if ($existing) {
        $this->context->buildViolation($constraint->message)->addViolation();
    }
}

One thing I did learn - from Copilot - is that the relationship between the class names - [Constraint] and [Constraint]Validator - is just a convention, and there does not need to be that name-mapping going on. It's the default behaviour of Constraint::validatedBy. I guess in theory one could have one validator class for a suite of same-themed constraints.

I'm not convinced though.

But hey, it's easy and it works well! This is the main practical thing here.

I've breezed down the rest of the docs, and there's a few other interesting things, but nothing that needs looking at right now. So I'll leave it here.

Righto.

--
Adam

(*) Copilot did it all. I just tested it.

Tuesday, 29 July 2025

Quick look at event-driven programming in Symfony

G'day:

First up: this is not a very comprehensive look at this functionality. I just wanted to dip into it to see how it could work, and contextualise the docs.

I had a look at Symfony's messaging system in Using Symfony's Messaging system to separate the request for work to be done and doing the actual work which leverages Doctrine's entity events to send messages to other PHP apps via Symfony's messaging and RabbitMQ as the message bus's transport layer.

This time around I want to look at more generic events: not just stuff that Doctrine fires on entity activity, but on ad-hoc events I fire in my own code. This again is in an effort to get application code to focus on the one immediate job at hand, and easily delegate multiple-step processing to other systems / handlers.

Symfony's event handling is done via its EventDispatcher Component.

Installation is easy:

docker exec php-web composer require symfony/event-dispatcher:7.3.*

There is no other config. One just needs the library installed.

Symfony's DI container now has an EventDispatcher ready to be wired in to one's code, eg:

// src/Controller/StudentController.php

class StudentController extends AbstractController
{
    public function __construct(
        private readonly EventDispatcherInterface $eventDispatcher
    ) {
    }

Once one has that, one can dispatch events with it:

// src/Controller/StudentController.php

#[Route('/courses/{id}/students/add', name: 'student_add')]
public function add(/* [...] */): Response {

    // [...]

    if ($form->isSubmitted() && $form->isValid()) {
        // [...]

        $this->eventDispatcher->dispatch(new StudentRequestEvent($request, $student));

        return $this->redirectToRoute('course_view', ['id' => $course->getId()]);
    }
    // [...]

The object one passes to the dispatcher is arbitrary, and used for a) getting some data to the handler; b) type-checking which event a handler is for:

// src/Event/StudentRequestEvent.php

class StudentRequestEvent
{
    public function __construct(
        private readonly Request $request,
        private readonly Student $student
    ) { }

    public function getRequest(): Request
    {
        return $this->request;
    }

    public function getStudent(): Student
    {
        return $this->student;
    }
}

One can see here how the handler method looks for methods being handed a specific event class:

// src/EventListener/StudentProfileChangeListener.php
class StudentProfileChangeListener
{
    public function __construct(
        private readonly LoggerInterface $eventsLogger
    )
    { }

    #[AsEventListener()]
    public function validateProfileChange(StudentRequestEvent $event): void
    {
        $request = $event->getRequest();
        $route = $request->attributes->get('_route');

        if (!in_array($route, ['student_add', 'student_edit'])) {
            $this->eventsLogger->notice("validateProfileChange skipped for route [$route]");
            return;
        }
        $student = $event->getStudent();
        $this->eventsLogger->info(
            sprintf('Validate profile change: %s', $student->getFullName()),
            ['student' => $student]
        );
    }
}

The important bits are the #[AsEventListener()] attribute on the method, and that the method expects a StudentRequestEvent.

Here the "validateProfileChange" handling is just faked: I'm logging some telemetry so I can see what happens when the events fire & get handled.

In this case I have this event being fired in each of the add / edit / delete controller methods, and have the handler above listening for student_add and student_edit events; and another sendStudentWelcomePack handler which only listens for student_add (the code is much the same, so I won't repeat it). student_delete does not have anything handling it. Well: the handlers fire, but they exit-early.

If I add / edit / delete a student, we can see the log entries coming through, indicating which handlers fired, etc:

# after I submit the add form
[2025-07-29T15:53:26.607638+01:00] events.INFO: Validate profile change: Jillian Marmoset {"student":{"App\\Entity\\Student":{"email":"jillian.marmoset@example.com","fullName":"Jillian Marmoset","dateOfBirth":"2011-03-24","gender":"female","enrolmentYear":2016,"status":"Active"}}} []
[2025-07-29T15:53:26.607994+01:00] events.INFO: Send Welcome Pack to Jillian Marmoset {"student":{"App\\Entity\\Student":{"email":"jillian.marmoset@example.com","fullName":"Jillian Marmoset","dateOfBirth":"2011-03-24","gender":"female","enrolmentYear":2016,"status":"Active"}}} []



# after I submit the edit form
[2025-07-29T15:53:53.306371+01:00] events.INFO: Validate profile change: Gillian Marmoset {"student":{"App\\Entity\\Student":{"email":"gillian.marmoset@example.com","fullName":"Gillian Marmoset","dateOfBirth":"2011-03-24","gender":"female","enrolmentYear":2016,"status":"Active"}}} []
[2025-07-29T15:53:53.306733+01:00] events.NOTICE: sendStudentWelcomePack skipped for route [student_edit] [] []



# after I confirm the delete
[2025-07-29T15:54:06.827831+01:00] events.NOTICE: validateProfileChange skipped for route [student_delete] [] []
[2025-07-29T15:54:06.828087+01:00] events.NOTICE: sendStudentWelcomePack skipped for route [student_delete] [] []

Those logs are pointless, but in the real world, if an event is dispatched, any number of handlers can listen for them. And the controller method that dispatches the event doesn't need to know about any of them. We can also leverage the async message bus in the event handlers too, to farm processing off to completely different app instances. I can see how this will be very useful in the future…

As all this was far easier than I expected, this is a pretty short article. But I know how Symfony events can facilitate more event-driven programming now, and help keep my code more on-point, simple, and tidy.

Righto.

--
Adam

Shared/distributed locks in PHP with Symfony locking

G'day:

This is another one of these issues that got into my mind at some point, and I never had a chance to look into it until now.

Previously I was working on a CFML app, which had code that could not be run at the same time as other code, so handled this with CFML's native <cflock> mechanism. This works fine on a single CFML server, but is no good when there's more than server running the application. We never had to solve this issue during my time working on that project, but the question stuck with me

I don't give a rat's arse about solving this with CFML; but I can foresee it being a "good to know" thing in the PHP space. Turns out it's actually very bloody easy.

Symfony proves a locking component: The Lock Component. And the docs are pretty straight forward. It's installed as one might predict:

composer require symfony/lock:7.3.*

It creates config/packages/lock.yaml:

framework:
    lock: '%env(LOCK_DSN)%'

And that had me looking for where LOCK_DSN was set, and what it needed to be. This lead me to the Available Stores bit of the docs I linked to above, which listed a bunch of underlying storage mechanisms for the locks. Each had features and limitations, which got me to think about what I am trying to test here. Basically two things:

  • The locks needed to be respected across different PHP containers.
  • I needed to be able to create blocking locks.

On that second point: the Locking Component's default behaviour is to try to acquire a lock, and respond immediately one way or the other (yep you got the lock; nope you didn't get that lock). This is cool a lot of the time; but sometimes I can see wanting to wait until [whatever] has finished with the lock, and then grab the lock and crack on with some other stuff.

The only option that supported both remote & blocking locks was a PostgreSql solution, but I'm fucked if I'm gonna install PostgreSql just to solve a locking challenge. I looked at some other solutions, and the "flock"-based solution would work for me. Despite it not being remote-capable, it stores its locking metadata on the file system; and I could easily use a mounted volume in my docker containers to have multiple PHP containers using the same directory. For my immediate purposes this is fine. If I need multiple app containers spread across multiple host machines, I'll look into other solutions.

So the answer for where LOCK_DSN is set is: in .env:

# stick stuff in here that all envs need

LOCK_DSN=flock
MESSENGER_TRANSPORT_DSN=amqp://guest:guest@host.docker.internal:5672/%2f/messages

And that's all the DSN needs to have as a value when using Flock.

The code for initialising a Flock lock is thus:

$store = new FlockStore('[some directory to use for lock metadata]');
$factory = new LockFactory($store);

So before going any further I need that shared directory set up:

# docker/php/envVars.public

APP_ENV=dev
APP_CACHE_DIR=/var/cache/symfony
APP_LOCK_DIR=/tmp/symfony/lock
APP_LOG_DIR=/var/log/symfony
COMPOSER_CACHE_DIR=/tmp/composer-cache
PHPUNIT_CACHE_RESULT_FILE=0
# docker/php/Dockerfile

# [...]

# need to use 777 as both php-fpm and php-cli will write to these directories
RUN mkdir -p /var/cache/symfony && chown www-data:www-data /var/cache/symfony && chmod 777 /var/cache/symfony
RUN mkdir -p /var/cache/symfony/dev && chown www-data:www-data /var/cache/symfony/dev && chmod 777 /var/cache/symfony/dev
RUN mkdir -p /var/log/symfony && chown www-data:www-data /var/log/symfony && chmod 777 /var/log/symfony
RUN mkdir -p /tmp/symfony/lock && chown www-data:www-data /tmp/symfony/lock && chmod 777 /tmp/symfony/lock

# [...]
# docker/docker-compose.yml

services:
  # [...]

  php-web:
    # [...]

    volumes:
      - ..:/var/www
      - /var/log/symfony:/var/log/symfony
      - /tmp/symfony/lock:/tmp/symfony/lock

    # [...]

  php-worker:
    # [...]

    volumes:
      - ..:/var/www
      - /var/log/symfony:/var/log/symfony
      - /tmp/symfony/lock:/tmp/symfony/lock

    # [...]

Now I can wire-up the services:

# config/services.yaml

parameters:

services:
  # [...]

  Symfony\Component\Lock\Store\FlockStore:
    arguments:
      - '%env(APP_LOCK_DIR)%'

  Symfony\Component\Lock\LockFactory:
    arguments:
      - '@Symfony\Component\Lock\Store\FlockStore'

# [...]

Oh and I need a logger for this too:

# config/packages/monolog.yaml

monolog:
  channels:
    # [...]
    - locking

  handlers:
    # [...]

    locking:
      type: stream
      path: '%kernel.logs_dir%/locking.log'
      level: debug
      channels: ['locking']

Now I'm gonna create a web endpoint that creates a lock around some long-running code, logging as I go:

# src/Controller/LockController.php

namespace App\Controller;

use Psr\Log\LoggerInterface;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Routing\Attribute\Route;

#[Route('/lock', name: 'app_lock')]
class LockController extends AbstractController
{
    public function __construct(
        private readonly LockFactory $lockFactory,
        private readonly LoggerInterface $lockingLogger
    )
    {
    }

    #[Route('/long', name: 'app_lock_long')]
    public function longLock(): Response
    {
        $this->lockingLogger->info('web_lock_long: started');

        $lock = $this->lockFactory->createLock('long_lock', 30);
        $this->lockingLogger->info('web_lock_long: lock created');

        if ($lock->acquire(true)) {
            $this->lockingLogger->info('web_lock_long: lock acquired');
            sleep(20); // Simulate a long-running process
            $this->lockingLogger->info('web_lock_long: processing done, releasing lock');
            $lock->release();
        } else {
            $this->lockingLogger->warning('web_lock_long: could not acquire lock');
        }

        return new Response('Lock operation completed.');
    }
}

Most of that is boilerplate and logging. The short version is:

$lock = $this->lockFactory->createLock('long_lock', 30); // name, TTL

if ($lock->acquire(true)) { // true makes it a blocking lock
    // do stuff
    $lock->release();
} else {
    // didn't get the lock
}

I'll run this on the php-web container, eg: http://localhost:8080/lock/long. It'll get a lock, and then sit around doing nothing for 20sec. Logging all the way.

I have created an equivalent command for php-worker to run via the CLI. It's analogous to the controller method, logic-wise:

src/Command/LongLockCommand.php

namespace App\Command;

use Psr\Log\LoggerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Lock\LockFactory;

#[AsCommand(
    name: 'app:long-lock',
    description: 'Acquires a long lock for testing lock contention.'
)]
class LongLockCommand extends Command
{
    public function __construct(
        private readonly LockFactory $lockFactory,
        private readonly LoggerInterface $lockingLogger
    ) {
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $this->lockingLogger->info('command_lock_long: started');

        $lock = $this->lockFactory->createLock('long_lock', 30);
        $this->lockingLogger->info('command_lock_long: lock created');

        if ($lock->acquire(true)) {
            $this->lockingLogger->info('command_lock_long: lock acquired');
            sleep(20); // Simulate a long-running process
            $this->lockingLogger->info('command_lock_long: processing done, releasing lock');
            $lock->release();
        } else {
            $this->lockingLogger->warning('command_lock_long: could not acquire lock');
        }

        $output->writeln('Lock operation completed.');
        return Command::SUCCESS;
    }
}

This is run thus:

docker exec php-worker bin/console app:long-lock

And that's everything. If I hit the URL in a browser, and then a few seconds later call the command, I see this sort of thing in the log file:

tail -f /var/log/symfony/locking.log

[2025-07-29T10:57:41.626263+01:00] locking.INFO: web_lock_long: started [] []
[2025-07-29T10:57:41.626556+01:00] locking.INFO: web_lock_long: lock created [] []
[2025-07-29T10:57:41.626671+01:00] locking.INFO: web_lock_long: lock acquired [] []
[2025-07-29T10:57:47.231988+01:00] locking.INFO: command_lock_long: started [] []
[2025-07-29T10:57:47.233814+01:00] locking.INFO: command_lock_long: lock created [] []
[2025-07-29T10:58:02.398162+01:00] locking.INFO: web_lock_long: processing done, releasing lock [] []
[2025-07-29T10:58:02.398440+01:00] locking.INFO: command_lock_long: lock acquired [] []
[2025-07-29T10:58:23.175676+01:00] locking.INFO: command_lock_long: processing done, releasing lock [] []

Success. We can see that the web request creates and acquires the lock, and the command comes along afterwards and cannot acquire its lock until after the web request releases it.

Job done. This was way easier than I expected, actually.

Righto.

--
Adam

Monday, 28 July 2025

Using RabbitMQ as a transport layer for Symfony Messaging

G'day:

First up, I'm building on the codebase I worked through in these previous articles:

Reading that lot would be good for context, but in short the first two go over setting up a (very) basic CRUD website that allows the editing of some entities, and on create/update/delete also does the relevant reindexing on Elasticsearch. The third article removes the indexing code from inline, and uses the Symfony Messaging system to dispatch messages ("index this"), and message handlers ("OK, I'll index that"). These were all running in-process during the web request.

The overall object of this exercise is to deal with this notion, when it comes to the Elasticsearch indexing:

[That] overhead is not actually needed for the call to action to be provided[…]. The user should not have to wait whilst the app gets its shit together[.]

When serving a web request, the process should be focusing on just what is necessary to respond with the next page. It should not be doing behind-the-scenes housework too.

Using the messaging system is step one of this - it separates the web request code from the Elasticsearch indexing code - but it's all still runnning as part of that process. We need to offload the indexing work to another process. This is what we're doing today.


Step 0

Step 0 is: confirm RabbitMQ will work for me here. I only know RabbitMQ exists as a thing. I've never used it, and never actually even read any docs on it. I kinda just assumed that "well it's an industry-standard queuing thingey, and Symfony works with a bunch of stuff out of the box, so it'll probably be fine". I read some stuff on their website (RabbitMQ), and RabbitMQ's Wikipedia page too. Seems legit. And I also checked Symfony for mention of integrating with it, and landed on this: Using RabbitMQ as a Message Broker, and that looked promising.


RabbitMQ Docker container

Something I can do without messing around with any app code or Symfony config is getting a RabbitMQ container up and running (and sitting there doing nothing).

It has an official RabbitMQ image, and the example docker run statements look simple enough:

docker run -d --hostname my-rabbit --name some-rabbit rabbitmq:3

As this is gonna be used internally, no need for runing securely or with a non-default user etc, but obviously all those options are catered for too.

I also noted there's a variant of the image that contains a management app, so I decided to run with that. Converting the generic docker run statement to something for my docker-compose.yml file was easy enough:

docker/docker-compose.yml
services:

  # [...]

  rabbitmq:
    container_name: rabbitmq

    hostname: rabbitmq

    image: rabbitmq:4.1.2-management

    ports:
      - "5672:5672"
      - "15672:15672"

    stdin_open: true
    tty: true

Port 5672 is its standard application comms port; 15672 is for the manager. The hostname is needed for RabbitMQ's internals, and doesn't matter what it is for what I'm doing.

Building that worked fine, and the management UI was also up (login: guest, password: guest):

The only thing I'm going to be using the management UI for is to watch for messages going into the queue, should I need to troubleshoot anything.


Splitting the PHP work: web and worker

The main thing I am trying to achieve here is to lighten the load for the web-app, by farming work off to another "worker". This will be running the same codebase as the web app, but instead of running php-fpm to listen to traffic from a web server, it's going to run [whatever] needs to be run to pull messages of the queue and handle them. The web app puts a message on the queue; the worker app pulls them off the queue and does the processing.

So I need a second PHP container.

My initial naive attempt at doing this was to simply duplicate the entry in docker-compose.yml, and remove the port 9000 port mapping from php-worker as it won't be listening for web requests. This build and ran "fine", except for the entrypoints of both php-web and php-worker conflicted with each other, as they were both trying to do a composer install on the same volume-mounted vendor directory (the physical directory being on my host PC). This screwed both containers.

After a lot of trial and error (mostly error), I came up with a process as follows:

  1. Copy composer.json and composer.lock to /tmp/composer in the image file system, and run composer install during the build phase. This means that processing is already done by the time either container is brought up. For the Dockerfile to be able to do this, I needed to shift the build context in docker-compose.yml to be the app root, so it can see composer.json and composer.lock.
  2. Having those files in /tmp/composer/vendor is no help to anyone, so we need to copy the vendor directory to the app root directory once the container is up.
  3. As both php-web and php-worker need these same (exact same: they're looking at the same location in the host file system) vendor files, we're going to get just php-web to do the file copy, and get php-worker to wait until php-web is done before it comes up.

Here are the code changes, first for php-web

# docker/docker-compose.yml

services:
  # [...]

  php:
    container_name: php
    build:
      context: php
      dockerfile: Dockerfile

  php-web:
    container_name: php-web
    build:
      context: ..
      dockerfile: docker/php/Dockerfile

    # [...]
    
    entrypoint: ["/usr/local/bin/entrypoint-web.sh"]

  php-worker:
    container_name: php-worker
    build:
      context: ..
      dockerfile: docker/php/Dockerfile

    env_file:
      - mariadb/envVars.public
      - elasticsearch/envVars.public
      - php/envVars.public

    stdin_open: true
    tty: true

    volumes:
      - ..:/var/www

    healthcheck:
      test: ["CMD", "pgrep", "-f", "php bin/console messenger:consume rabbitmq"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

    extra_hosts:
      - host.docker.internal:host-gateway

    secrets:
      - app_secrets

    entrypoint: ["/usr/local/bin/entrypoint-worker.sh"]

    depends_on:
      php-web:
        condition: service_healthy

Notes:

  • Just the naming and build context changes for the original PHP container.
  • Oh and it has its own entrypoint script now.
  • The config for php-worker is much the same as for php-web except:
    • Port 9000 doesn't need to be exposed: it's not going to be serving web requests.
    • It overrides the healthcheck in the Dockerfile with its own check: just that messenger:consume is running.
    • It has a different entrypoint than the web container.

Here's the relevant Dockerfile changes:

# docker/php/Dockerfile

FROM php:8.4.10-fpm-bookworm

RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "zip", "unzip", "git", "vim", "procps"]

# [...]
COPY docker/php/usr/local/etc/php/conf.d/error_reporting.ini /usr/local/etc/php/conf.d/error_reporting.ini
COPY docker/php/usr/local/etc/php/conf.d/app.ini /usr/local/etc/php/conf.d/app.ini

# [...]

RUN pecl install xdebug && docker-php-ext-enable xdebug
COPY docker/php/usr/local/etc/php/conf.d/xdebug.ini /usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini

# [...]

WORKDIR /var/www
ENV COMPOSER_ALLOW_SUPERUSER=1

COPY --chmod=755 usr/local/bin/entrypoint.sh /usr/local/bin/
ENTRYPOINT ["entrypoint.sh"]
WORKDIR /tmp/composer
COPY composer.json composer.lock /tmp/composer/
ENV COMPOSER_ALLOW_SUPERUSER=1
RUN composer install --no-interaction --prefer-dist --no-scripts

# [...]

COPY --chmod=755 docker/php/usr/local/bin/entrypoint-web.sh /usr/local/bin/
COPY --chmod=755 docker/php/usr/local/bin/entrypoint-worker.sh /usr/local/bin/


EXPOSE 9000

And the entry point scripts:

# docker/php/usr/local/bin/entrypoint-web.sh

#!/bin/bash

rm -f /var/www/vendor/up.dat
cp -a /tmp/composer/vendor/. /var/www/vendor/
touch /var/www/vendor/up.dat

exec php-fpm
# docker/php/usr/local/bin/entrypoint-worker.sh

#!/bin/bash

exec php bin/console messenger:consume rabbitmq

That up.dat is checked in php-web's healthcheck now:

# bin/healthCheck.php

if (file_exists('/var/www/vendor/up.dat')) {
    echo 'pong';
}

This means it won't claim to be up until it's finished copying the vendor files, and php-worker won't come up until php-web is healthy.

I'm pretty sure those're all the PHP changes. I now have two PHP containers running: one handling web, the other handling messages.


Symfony config

The Messenger: Sync & Queued Message Handling docs explained all this pretty clearly.

I needed to install symfony/amqp-messenger, and for that to work I also needed to install the ext-amqp PHP extension. This needed some tweaks in the Dockerfile:

# docker/php/Dockerfile

# [...]

RUN [ \
    "apt-get", "install", "-y",  \
    "libz-dev", \
    "libzip-dev", \
    "libfcgi0ldbl", \
    "librabbitmq-dev" \
]
# [...]

RUN pecl install amqp
RUN docker-php-ext-enable amqp

# [...]

Then I needed to configure a MESSENGER_TRANSPORT_DSN Symfony "environment" variable in .env:

MESSENGER_TRANSPORT_DSN=amqp://guest:guest@host.docker.internal:5672/%2f/messages

(In prod I'd have to be more secure about that password, but it doesn't matter here).

And finally configure the Messaging system to use it:

# config/packages/messenger.yaml

framework:
  messenger:
    transports:
      rabbitmq:
        dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
    routing:
      'App\Message\*': rabbitmq

At this point when I rebuilt the containers, everything was happy until I ran my code…


App changes

In my last article I indicated some derision/bemusement about something in the docs:

The docs say something enigmatic:

There are no specific requirements for a message class, except that it can be serialized

Creating a Message & Handler

I mean that's fine, but it's not like their example implements the Serializable interface like one might expect from that guidance? From their example, I can only assume they mean "stick some getters on it", which is not really the same thing. Oh well.

Using Symfony's Messaging system to separate the request for work to be done and doing the actual work

And now I come to discover what they actually meant. And what they meant is that yes, the Message data should be Serializable. As in: via the interface implementation.

I had been passing around a LifecycleEventArgs implementation, eg one of PostPersistEventArgs, PostUpdateEventArgs or PreRemoveEventArgs, eg:

# src/EventListener/SearchIndexer.php

class SearchIndexer
{
    public function __construct(
        private readonly MessageBusInterface $bus
    ) {}

    public function postPersist(PostPersistEventArgs $args): void
    {
        $indexMessage = new SearchIndexAddMessage($args->getObject());
        $this->bus->dispatch($indexMessage);
    }

    public function postUpdate(PostUpdateEventArgs  $args): void
    {
        $indexMessage = new SearchIndexUpdateMessage($args->getObject());
        $this->bus->dispatch($indexMessage);
    }

    public function preRemove(PreRemoveEventArgs $args): void
    {
        $indexMessage = new SearchIndexDeleteMessage($args->getObject());
        $this->bus->dispatch($indexMessage);
    }
}

And those don't serialize, so I had to update the code to only pass the stdclass object that getObject() returned. And then likewise update src/MessageHandler/SearchIndexMessageHandler.php now that it doesn't need to call getObject itself, as it's already receiving that, eg:

#[AsMessageHandler]
public function handleAdd(SearchIndexAddMessage $message): void
{
    $this->searchIndexer->sync($message->getArgs()->getObject());
}

#[AsMessageHandler]
public function handleUpdate(SearchIndexUpdateMessage $message): void
{
    $this->searchIndexer->sync($message->getArgs()->getObject());
}

#[AsMessageHandler]
public function handleRemove(SearchIndexDeleteMessage $message): void
{
    $this->searchIndexer->delete($message->getArgs()->getObject());
}

Once I fixed that one glitch: it worked. When I edited an entity I could see a message going into the RabbitmQ queue (via the manager UI), and could see it being removed, and could see the results of the Elasticsearch update. Cool.

It took ages to get the PHP containers working properly - mostly the composer installation stuff - but the RabbitMQ and Symfony bits were really easy! Nice.

Righto.

--
Adam

Friday, 25 July 2025

Using Symfony's Messaging system to separate the request for work to be done and doing the actual work

G'day:

Context

What am I on about with that title? OK, let me explain.

In a few gigs I had in the past, we've had an issue of there being too much work to do to fulfil a request than we could allow to let run for a web app. Say there's half a dozen things to do when updating a record, and collectively they take 10sec to run: we can't have the user sitting around for 10sec to get the next call to action. This is made more annoying when most of that overhead is not actually needed for the call to action to be provided; it's (necessary) housekeeping to keep the application working, rather than to keep the user working. The user should not have to wait whilst the app gets its shit together, basically.

Previously I've seen this being solved in a number of ways:

  • Hide the processing in separate threads spun-off by the main request thread. This improves things for the user, but it doesn't lighten the load for the web app at all. And it's also at this point questionable as to whether this is actually the web app's job. It's clearly nothing to do with fulfilling web requests, if it's being done in the background.
  • Set some sort of work data somewhere, that indicates some processing of it needs to be done. Then have a separate process running tasks that check for the existence of work data, and process it if there is any. This is like a hand-cranked queuing system which operates via a task going "got anything for me? No. How about now? No. Now? Ooh yes, got one. What about now? No?". It works, but there's a lot of time having some code asking a question when the answer is "no". For most cases, we know when the work needs to be done: when the code that identifies the work says so. That's when it needs to be done.

So that's what I'm trying to solve. I've never had to solve this sort of thing before, but I have a suspicion that it's gonna involve a message queuing system.


Situation

The example I am going to use is the Elasticsearch integration I put into my test site over the last coupla days:

The first article is long; the second one is very short for me. In summary I have a test app that provides data-entry forms for a number of DB-backed entities. For the purposes of the search functionality, I am using Elasticsearch. And the articles shows how to keep Elasticsearch up to date at the same time as the DB updates are made.


Task at hand

Unfortunately for this exercise, the Elasticsearch operations are lightning quick, so it's really no issue to have the code execution "inline" with the web requests. But this processing falls foul of something I touched on above:

[That] overhead is not actually needed for the call to action to be provided[…]. The user should not have to wait whilst the app gets its shit together[.]

So I'm going to ignore the premature optimisation red flag, and am going to farm the Elasticsearch updates off to some other process that the web request doesn't need to worry about.

My chosen path here is to use the Symfony Messager Component, and ultimately have that working with RabbitMQ under the hood so I can farm-off the processing to a completely different app container, so as to reduce load on the web-app container.

I have never done any of this before, and I have to admit that I am currently struggling to even understand the docs. So this should… errr… be "interesting". Or a dumpster fire. One of those.

Spoiler

I do not get to the point of introducing RabbitMQ in this article. It was enough effort to get the messaging stuff working "in-process", so I'm splitting this exercise into 2-3 chunks.


Logging

To start with, I'm just gonna try to get messages and handlers working in-process via Symfony. So no performance gains, no RabbitMQ, and no second app container. And to watch things working (or not), my aim is to simply get a handler to log something if it receives a message. To do this I'm gonna need to install Monolog, which is straight forward.

docker exec php composer require symfony/monolog-bundle:^3.1.0
# config/packages/monolog.yaml

monolog:
  channels:
    - messaging

  handlers:
    messaging:
      type: stream
      path: '%kernel.logs_dir%/messaging.log'
      level: debug
      channels: ['messaging']

And test it:

// tests/Functional/System/MonologTest.php

namespace App\Tests\Functional\System;

use Monolog\Handler\TestHandler;
use Monolog\Level;
use PHPUnit\Framework\Attributes\TestDox;
use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase;

class MonologTest extends KernelTestCase
{
    #[TestDox('It logs messages to the messaging log file')]
    public function testMessagingLog()
    {
        self::bootKernel();
        $container = self::$kernel->getContainer();

        $logger = $container->get('monolog.logger.messaging');

        $testHandler = new TestHandler();
        $logger->pushHandler($testHandler);

        $uniqueMessage = 'Test message ' . uniqid('', true);
        $logger->info($uniqueMessage);

        $this->assertTrue($testHandler->hasInfoRecords());
        $this->assertTrue($testHandler->hasRecord($uniqueMessage, Level::Info));

        $records = $testHandler->getRecords();
        $infoRecords = array_filter($records, fn($r) => $r['level'] === Level::Info->value);
        $lastInfoRecord = end($infoRecords);
        $this->assertEquals($uniqueMessage, $lastInfoRecord['message']);
    }
}

That's working fine.


Symfony Messenger

Installation is obvious:

docker exec php composer require symfony/messenger:7.3.*

That spews out a bunch of stuff, all simply informational. It's also installed a config/packages/messenger.yaml, but there's no actual settings in it. According to the informational stuff, we're good to go now.

Further up I linked to the docs for the Symfony Messager Component. Those are a good example of docs written for the author of the docs; or for someone who already knows what the docs are saying. They're largely impenetrable to me, someone who needs the docs to find out how the thing works because I don't already know. So many technical documents are written for the wrong audience like this. However I googled further and found these instead: Messenger: Sync & Queued Message Handling. These docs make sense.

The system works in two parts:

Message
A class that represents something needing to be done, and contains the data needed to do that thing.
MessageHandler
A class that is the handler for that sort of Message. When a Message is dispatched to the message bus, it's the MessageHandler that actually [does the thing].

In my case:

Message
The SearchIndexer's event handlers need some indexing work done by Elasticsearch. They will dispatch a Message with the data needing to be processed (add/update/delete). They will no longer do the actual indexing.
MessageHandler
When a Message is dispatched, Symfony will give the Message to the correct MessageHandler to process. The indexing code will be shifted to here.

Logging via messaging

But before I run, I need to walk. Before breaking my indexing by messing around for it, I'm going to create a Message that is a message to log; and a MessageHandler to log it.

// src/Message/LogMessage.php

namespace App\Message;

use Monolog\Level;

class LogMessage
{
    public function __construct(
        private readonly string $message,
        private readonly Level $level = Level::Info,
        private readonly array $context = []
    )
    {
    }

    public function getMessage(): string
    {
        return $this->message;
    }

    public function getLevel(): Level
    {
        return $this->level;
    }

    public function getContext(): array
    {
        return $this->context;
    }
}

This Message takes the fields that we want to log. The docs say something enigmatic:

There are no specific requirements for a message class, except that it can be serialized

Creating a Message & Handler

I mean that's fine, but it's not like their example implements the Serializable interface like one might expect from that guidance? From their example, I can only assume they mean "stick some getters on it", which is not really the same thing. Oh well.

I note from the docs @ the link above for the Serializable interface: PHP is deprecated it. Instead it is running with "just sling some magic methods in yer class, it'll all be grand". This seems like a backwards step, and a return to the wayward PHP language design of the 1990s-2000s. Shudder.

Anyway, that's why I have those getters in there.

Now the MessangeHandler:

// src/MessageHandler/LogMessageHandler.php

namespace App\MessageHandler;

use App\Message\LogMessage;
use Psr\Log\LoggerInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

#[AsMessageHandler]
class LogMessageHandler
{
    public function __construct(
        private readonly LoggerInterface $messagingLogger,
    ) {
    }

    public function __invoke(LogMessage $message): void
    {
        $this->messagingLogger->log(
            $message->getLevel(),
            $message->getMessage(),
            $message->getContext()
        );
    }
}

The trick here is the AsMessageHandler attribute which makes sure the autowiring system knows what to do with this; and then the magic __invoke method does the work.

I'm coming back to this a week or so later, as I have to implement this stuff in another codebase. I realise now I missed a step in this bit of the exercise: as well as creating the Message and MessageHandler, I also need to actually dispatch a message, don't I? Otherwise ain't nuthin gonna get logged.

As this logging test was only temporary, I no longer have it in the codebase, but I'm gonna try to recreate here. The log entry (below) is being triggered in the postUpdate method of src/EventListener/SearchIndexer.php. We'd need to add the dispatch call like this:

public function postUpdate(PostUpdateEventArgs  $args): void
{
    // [...]

    $logMessage = new LogMessage(
        'SearchIndexer: postUpdate',
        $args->getObject(),
    );
    $this->bus->dispatch($logMessage);
	
    // [...]

Once we do that, then… we can continue below.

Having done that, I tail the log file, and make an edit to one of my entities:

$ docker exec -it php tail -f /var/log/symfony/messaging.log

[2025-07-25T15:37:45.699028+01:00] messaging.INFO: SearchIndexer: postUpdate {"entity":{"App\\Entity\\Institution":{"id":193,"name":"Alenestad WootyWoo Polytechnic","address":"9319 King WootyWoo Suite 361Kerlukeshire, LA 00306","city":"Alenestad WootyWoo","postalCode":"64378-8198","country":"Cocos (WootyWoo) Islands","establishedYear":1998,"type":"polytechnic","website":"https://example.com/accusamus-esse-natus-excepturi-consequuntur-culpa-ut-officia-asperiores"}}} []

And there we have a log entry going in. Cool. That worked. Easy.

I did actually have an issue the first time around. Before running anything from the UI, I had run that test to ensure logging was working. Because that's run from the shell, it runs as root… which means the log file gets created as root too. When I then triggered the message handling via the UI, the code was running as www-data (this is what php-fpm runs as), and it did not have perms to write to the log file, so it went splat. This just required me changing the ownership on the log file to www-data, and we were all good.


Elasticsearch indexing via Messaging

OK, time to do it for realsies.

Looking at what I need to do, I have three different indexing events to deal with: adding a record on postPersist; updating a record on postUpdate, and deleting a record on preRemove. Each handler method receives different arguments, so in a way I wonder if these are three different Messages? On the other hand, I could use one Message and have an "action" property which is then used in the MessageHandler to decide what indexing task to run. The latter sounds like less code; but it also seems like subpar design, having to have a magic string on both ends of the messaging system to be matched against one another. Hrm.

OK, I can see how I can possibly make this not awful (low bar I'm setting myself there). How's this…

First I have an abstract Message class:

// src/Message/AbstractSearchIndexMessage.php

namespace App\Message;

use Doctrine\Persistence\Event\LifecycleEventArgs;

abstract class AbstractSearchIndexMessage
{
    public function __construct(
        private readonly LifecycleEventArgs $args
    )
    { }

    public function getArgs(): LifecycleEventArgs
    {
        return $this->args;
    }
}

LifecycleEventArgs is the base class of all of those subclasses that are the args for each of the event handlers, eg:

public function postPersist(PostPersistEventArgs $args): void
// [...]

public function postUpdate(PostUpdateEventArgs  $args): void
// [...]

public function preRemove(PreRemoveEventArgs $args): void
// [...]

Then we have three concrete implementations for each of the three Message types we have:

// src/Message/SearchIndexAddMessage.php


namespace App\Message;

class SearchIndexAddMessage extends AbstractSearchIndexMessage
{

}



// src/Message/SearchIndexUpdateMessage.php

namespace App\Message;

class SearchIndexUpdateMessage extends AbstractSearchIndexMessage
{

}


// src/Message/SearchIndexDeleteMessage.php

namespace App\Message;

class SearchIndexDeleteMessage  extends AbstractSearchIndexMessage
{

}

No code duplication. The sub-classing is just to give us unique Messages to dispatch and then handle.

We can deal with all of those in one MessageHandler class:

// src/MessageHandler/SearchIndexMessageHandler.php

namespace App\MessageHandler;

use App\Message\SearchIndexAddMessage;
use App\Message\SearchIndexDeleteMessage;
use App\Message\SearchIndexUpdateMessage;
use App\Service\ElasticSearchIndexerService;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

class SearchIndexMessageHandler
{
    public function __construct(
        private readonly ElasticSearchIndexerService $searchIndexer,
    ) {
    }

    #[AsMessageHandler]
    public function handleAdd(SearchIndexAddMessage $message): void
    {
        $this->searchIndexer->sync($message->getArgs()->getObject());
    }

    #[AsMessageHandler]
    public function handleUpdate(SearchIndexUpdateMessage $message): void
    {
        $this->searchIndexer->sync($message->getArgs()->getObject());
    }

    #[AsMessageHandler]
    public function handleRemove(SearchIndexDeleteMessage $message): void
    {
        $this->searchIndexer->delete($message->getArgs()->getObject());
    }
}

Unlike the LogMessageHandler we had before where the entire class was marked as #[AsMessageHandler], here I'm tagging individual methods. And this is why we needed the separate Message classes: Symfony uses them for autowiring: see how each handler method expects its $args to be one of those subclasses, as appropriate. This is how which handler method to call for each Message type is identified.

You possibly noticed that I'm introducing a new service class here: ElasticSearchIndexerService:

namespace App\Service;

use App\Entity\SyncableToElasticsearch;
use Symfony\Component\Routing\Generator\UrlGeneratorInterface;

class ElasticSearchIndexerService
{
    public function __construct(
        private readonly ElasticsearchAdapter $adapter,
        private readonly UrlGeneratorInterface $urlGenerator,
    )
    {
    }

    public function sync(object $object): void
    {
        if (!$object instanceof SyncableToElasticsearch) {
            return;
        }

        $doc = $object->toElasticsearchDocument();
        $doc['body']['_meta'] = [
            'type' => $object->getShortName(),
            'title' => $object->getSearchTitle(),
            'url' => $this->urlGenerator->generate(
                $object->getShortName() . '_view',
                ['id' => $object->getId()]
            ),
        ];

        $this->adapter->indexDocument($doc['id'], $doc['body']);
    }

    public function delete(object $object): void
    {
        if (!$object instanceof SyncableToElasticsearch) {
            return;
        }

        $this->adapter->deleteDocument($object->getElasticsearchId());
    }
}

This code was in SearchIndexer (1.2) before, because that's where it was needed. Now the indexer is much more simple:

// src/EventListener/SearchIndexer.php (1.3)

namespace App\EventListener;

use App\Message\SearchIndexAddMessage;
use App\Message\SearchIndexDeleteMessage;
use App\Message\SearchIndexUpdateMessage;
use Doctrine\Bundle\DoctrineBundle\Attribute\AsDoctrineListener;
use Doctrine\ORM\Event\PostPersistEventArgs;
use Doctrine\ORM\Event\PostUpdateEventArgs;
use Doctrine\ORM\Event\PreRemoveEventArgs;
use Doctrine\ORM\Events;
use Symfony\Component\Messenger\MessageBusInterface;

#[AsDoctrineListener(event: Events::postPersist, priority: 500, connection: 'default')]
#[AsDoctrineListener(event: Events::postUpdate, priority: 500, connection: 'default')]
#[AsDoctrineListener(event: Events::preRemove, priority: 500, connection: 'default')]
class SearchIndexer
{
    public function __construct(
        private readonly MessageBusInterface $bus
    ) {}

    public function postPersist(PostPersistEventArgs $args): void
    {
        $indexMessage = new SearchIndexAddMessage($args);
        $this->bus->dispatch($indexMessage);
    }

    public function postUpdate(PostUpdateEventArgs  $args): void
    {
        $indexMessage = new SearchIndexUpdateMessage($args);
        $this->bus->dispatch($indexMessage);
    }

    public function preRemove(PreRemoveEventArgs $args): void
    {
        $indexMessage = new SearchIndexDeleteMessage($args);
        $this->bus->dispatch($indexMessage);
    }
}

All it does is listen for the Doctrine events, and dispatch relevant messages. It doesn't know anything about how to index stuff, it just knows how ask "someone needs to do this, please".

And the only other thing I needed to do was to update the ReindexSearchCommand to use the ElasticSearchIndexerService rather than SearchIndexer, given I had moved that code.

Having done that lot: everything work same as before. So not much forward progress, but our ducks are far better lined-up ready for the next bit of the work.


That was a chunk of work, code and writing. I'm going to pause this exercise here and publish this article as it seems like a reasonable juncture to do so. Plus I want a beer. I have some thinking to do before I introduce RabbitMQ into this, and to use a different PHP container to run the code for the message handling. And I think there will be a code refactoring exercise first. However I am not sure yet, and need to sleep on it.

Righto.

--
Adam

Making sure this app also deletes from Elasticsearch when I delete an entity

G'day:

In yesterday's article - Integrating Elasticsearch into a Symfony app - I long-windedly showed how I integrated Elasticsearch into my test app to handle search indexing. I showed the "CRU" operations of "CRUD". And then admitted I forgot about deletes until I was writing the article, but otherwise left that notion there.

During the evening I decided I was being a lazy fuck, so went and added the deletion support as well. As it turns out it was predictably dead easy.

I have a SearchIndexer class that contains the config and event handlers to be triggered on create and update operations. All I needed to do is to add one for when deletions occur, and annotate things appropriately:

#[AsDoctrineListener(event: Events::preRemove, priority: 500, connection: 'default')]
class SearchIndexer
{
    // [...]

    public function preRemove(PreRemoveEventArgs $args): void
    {
        $entity = $args->getObject();
        if (!$entity instanceof SyncableToElasticsearch) {
            return;
        }

        $this->adapter->deleteDocument($entity->getElasticsearchId());
    }

That was it. If I deleted an entity: it was now removed from the Elasticsearch index too.

One observation here is that initially I tried with the postRemove event (note: "post" not "pre", like I have in the code). However for reasons best known to Doctrine, the PostRemoveEventArgs that the event handler receives doesn't include the entity's ID. Copilot reasoned that this was because the row that has that ID had already been removed from the DB, but that's ballocks (or is a bogus rationalisation, anyhow), as the ID belongs to the entity; the DB is just storage. One could also extend that analogy such that "well the Student with the name Jane Wootywoo doesn't exist in the DB any more either, but you seem to have no issue providing that information". Just because the storage tier doesn't need that ID any more, doesn't mean the entity or the application in general hasn't finished with it. But, anyway, the solution is to use the preRemove hook instead.

A second observation is that that getElasticsearchId method had previously been protected, but I saw no problem making it public for this usage.

I also updated the UI for students so I could perform a delete for testing, which necessitated this controller method:

// src/Controller/StudentController.php

#[Route('/students/{id}/delete', name: 'student_delete', requirements: ['id' => '\d+'])]
public function delete(Student $student, EntityManagerInterface $em): Response
{
    foreach ($student->getEnrolments() as $enrolment) {
        $em->remove($enrolment);
    }

    $em->remove($student);
    $em->flush();

    return $this->redirectToRoute('student_list');
}

I'd normally do this in a service as it's not the controller's job to know that Enrolments are related to Students so also need clearing up; but this is not an exercise in building an MVC app so I cut a corner here.

This controller method was the target for a "delete" link I put on the UI on the Student view page.

That's all I have to say about this. I'm quite pleased again with how easy it was to implement with Doctrine.

Righto.

--
Adam

Thursday, 24 July 2025

Integrating Elasticsearch into a Symfony app

G'day:

Here I am trying to solve a problem I had had in the past. We had a back-end web app which was pretty much a UI on some DB tables (for the purposes of this summary anyhow): companies, customers, accounts, that sort of thing. I won't disclose the business model too much as it's not relevant here, and I don't want to tie this back to a specific role I've been in. Anyway, you get the idea: a hierarchy of business entities with backing storage.

Part of the app was a global search (you know, top right of the UI, and one can plug anything one likes in there, and search results ensue). There was no specific backing storage for this, the search basically did a bunch of SELECT / UNIONs straight on the main transactional DB:

SELECT
    someColumns
FROM
    tb11
WHERE
    col1 LIKE '%search_term%'
    OR 
    col2 LIKE '%search_term%'
    OR
    // etc

UNION

SELECT
    someColumns
FROM
    tb12
WHERE
    colx LIKE '%search_term%'
    OR 
    coly LIKE '%search_term%'
    OR
    // etc
UNION    
// moar tables etc

For a proof of concept, this works OK. For getting something to market: it'll do. For a database that has scaled up: it stops working. The query itself was taking ages, but at the same time it was messing with other queries going on at the same time: if someone did a global search, it could kill the requests of other users. Oopsy.

We denormalised the data a bit into a dedicated search data table, and kept that up to date with event handlers when data in the source tables changed. From there we used the DB's built-in full-text searching to get the results. This was better, but it was all a bit Heath Robinson, still putting too much load on the transactional DB, and the DB's full-text-search capabilities were… erm… not as helpful as it could have been. There's also a chunk of "getting app devs to do the job of a DB dev" in the mix here.

We realised we needed to get the search out of the transactional DB, but we never had the chance to do anything about it whilst I was still on that team.

Time has passed.

This issue has stuck with me, and I've always wanted to have a look at other ways to solve it. I have some time for investigating stuff at the moment, so over the last coupla days I have turned my mind to solving this.

What I'm going to do is to run an Elasticsearch container alongside my app container, and where in the earlier scenario we built our own search index table to query; I'm gonna fire the data to Elasticsearch and let it look after it.

First of all, I have started with my default PHP8, Nginx, MariaDB container setup, with a Symfony site running on the PHP container. Same baseline I always use, and I won't go over it as it's all in Github (php-elasticsearch, and the README.md covers it).

Full disclosure: as a further exercise, I got Github Copilot to generate all the code for this. It was all at my guidance and design, but I wanted to see how much of the solution I could get Copilot to write. About 95% of it, in the end: sometimes it was easier for me to tweak something and tell Copilot why I'd done it, rather than explain what I would have wanted it to do. I am completely happy with the code I have ended up with: it's pretty much what I would have keyed in had I done it myself. Pretty much.

First: an Elasticsearch container.

# docker/docker-compose.yml

services:
  # ...

  elasticsearch:
    container_name: elasticsearch

    image: elasticsearch:9.0.3

    environment:
      - discovery.type=single-node
      - network.host=0.0.0.0
      - xpack.security.enabled=false
      - xpack.security.http.ssl.enabled=false

    ports:
      - "9200:9200"
      - "9300:9300"

    stdin_open: true
    tty: true

    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data

    healthcheck:
      test: [ "CMD", "curl", "-fs", "http://localhost:9200/_cluster/health" ]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 30s

volumes:
  elasticsearch-data:

# ...

No need for a specific Dockerfile for this: the standad image is pretty much ready to go. I made some config tweaks:

  • I have NFI what discovery.type=single-node does, other than what it sounds like it does. But it was on the docker run example on Docker Hub, so I ran with it.
  • network.host=0.0.0.0 is just so it will listen to requests on the host network rather than just the internal Docker network.
  • xpack.security.enabled=false means I don't need to pass credentials with my queries. This is not appropriate for anything other than dev, OK?
  • xpack.security.http.ssl.enabled=false means it'll work over http instead of https-only. This is OK if the client and server are on the same network (as they are with me), but not if they're across the public wire from each other.
  • We're querying on port 9200; but Elasticsearch also needs 9300 for an admin channel or something. Again, it's from the docker run example on Docker hub.
  • I'm putting its data on an "external" volume so that it persists across container rebuilds. On a serious system this would map to a file system location, but a Docker Volume is fine for my purposes.
  • It's cool that the image has a healthcheck end point build in. And good on Copilot for knowing this.

I've created an integration test to make sure that works:

// tests/Integration/System/ElasticSearchTest.php

namespace App\Tests\Integration\System;

use App\Tests\Integration\Fixtures\ElasticsearchTestIndexManager;
use Elastic\Elasticsearch\Client;
use PHPUnit\Framework\TestCase;
use Elastic\Elasticsearch\ClientBuilder;

class ElasticSearchTest extends TestCase
{
    private Client $client;
    private string $id = 'test_id';
    private ElasticsearchTestIndexManager $indexManager;

    private const string INDEX = ElasticsearchTestIndexManager::INDEX;

    protected function setUp(): void
    {
        $address = sprintf(
            '%s:%s',
            getenv('ELASTICSEARCH_HOST'),
            getenv('ELASTICSEARCH_PORT')
        );

        $this->client = ClientBuilder::create()
            ->setHosts([$address])
            ->build();

        $this->indexManager = new ElasticsearchTestIndexManager($this->client);
        $this->indexManager->ensureIndexExists();
    }

    protected function tearDown(): void
    {
        $this->indexManager->removeIndexIfExists();
    }

    public function testWriteAndReadDocument()
    {
        $doc = ['foo' => 'bar', 'baz' => 42];

        try {
            $this->client->index([
                'index' => self::INDEX,
                'id'    => $this->id,
                'body'  => $doc
            ]);

            $response = $this->client->get([
                'index' => self::INDEX,
                'id'    => $this->id
            ]);

            $this->assertEquals($doc, $response['_source']);
        } finally {
            $this->client->delete([
                'index' => self::INDEX,
                'id'    => $this->id
            ]);
        }
    }
}

The guts of this is that the test indexes (adds) a document to the Easticsearch DB, reads it back, and then deletes it. I have factored-out some code from this as it's used in another test (qv):

tests/Integration/Fixtures/ElasticsearchTestIndexManager.php

namespace App\Tests\Integration\Fixtures;

use Elastic\Elasticsearch\Client;

class ElasticsearchTestIndexManager
{
    public const string INDEX = 'test_index';

    private Client $client;

    public function __construct(Client $client)
    {
        $this->client = $client;
    }

    public function ensureIndexExists(): void
    {
        if (!$this->client->indices()->exists(['index' => self::INDEX])->asBool()) {
            $this->client->indices()->create(['index' => self::INDEX]);
        }
    }

    public function removeIndexIfExists(): void
    {
        if ($this->client->indices()->exists(['index' => self::INDEX])->asBool()) {
            $this->client->indices()->delete(['index' => self::INDEX]);
        }
    }
}

All pretty self-explanatory I reckon.

Oh I also needed to install a library for Elasticsearch support:

docker exec php composer require elasticsearch/elasticsearch:^9.0.0

I've also installed Elasticvue on my PC so I can query the data indepedent of my code.

I'm not going to call the Elasticsearch Client - using its bespoke syntax - directly in my code. That's bad separation of concerns. I'm going to implement an adapter to hide as much as possible from the app:

// src/Service/ElasticsearchAdapter.php

namespace App\Service;

use Elastic\Elasticsearch\Client;

class ElasticsearchAdapter
{
    private const string INDEX = 'search_index';
    private Client $client;

    public function __construct(Client $client)
    {
        $this->client = $client;
    }

    public function indexDocument(string $id, array $body): void
    {
        $this->client->index([
            'index' => self::INDEX,
            'id'    => $id,
            'body'  => $body,
        ]);
    }

    public function getDocument(string $id): array
    {
        $response = $this->client->get([
            'index' => self::INDEX,
            'id'    => $id,
        ]);
        return $response['_source'] ?? [];
    }

    public function deleteDocument(string $id): void
    {
        $this->client->delete([
            'index' => self::INDEX,
            'id'    => $id,
        ]);
    }

    public function searchByString(string $query): array
    {
        $body = [
            'query' => [
                'query_string' => [
                    'query' => $query,
                ],
            ],
        ];
        $response = $this->client->search([
            'index' => self::INDEX,
            'body'  => $body,
        ]);
        return $response['hits']['hits'] ?? [];
    }
}

This wraps those calls away, so the code wanting to "do stuff" with Elasticsearch doesn't need to know how. It's just "do this for me will you? kthxbye".

And I test this:

// tests/Integration/Service/ElasticsearchAdapterTest.php

namespace App\Tests\Integration\Service;

use App\Service\ElasticsearchAdapter;
use App\Tests\Integration\Fixtures\ElasticsearchTestIndexManager;
use Elastic\Elasticsearch\ClientBuilder;
use Elastic\Elasticsearch\Exception\ClientResponseException;
use PHPUnit\Framework\Attributes\TestDox;
use PHPUnit\Framework\TestCase;

class ElasticsearchAdapterTest extends TestCase
{
    private ElasticsearchAdapter $adapter;
    private string $id = 'test_id';
    private array $body = ['foo' => 'bar'];
    private ElasticsearchTestIndexManager $indexManager;

    protected function setUp(): void
    {
        $address = sprintf(
            '%s:%s',
            getenv('ELASTICSEARCH_HOST'),
            getenv('ELASTICSEARCH_PORT')
        );

        $client = ClientBuilder::create()
            ->setHosts([$address])
            ->build();
        $this->adapter = new ElasticsearchAdapter($client);

        $this->indexManager = new ElasticsearchTestIndexManager($client);
        $this->indexManager->ensureIndexExists();
    }

    protected function tearDown(): void
    {
        $this->indexManager->removeIndexIfExists();
    }

    #[TestDox('Indexes a document successfully')]
    public function testIndexDocument(): void
    {
        $this->adapter->indexDocument($this->id, $this->body);
        $result = $this->adapter->getDocument($this->id);
        $this->assertEquals($this->body, $result);
    }

    #[TestDox('Retrieves a document successfully')]
    public function testGetDocument(): void
    {
        $this->adapter->indexDocument($this->id, $this->body);
        $result = $this->adapter->getDocument($this->id);
        $this->assertEquals($this->body, $result);
    }

    #[TestDox('Deletes a document successfully')]
    public function testDeleteDocument(): void
    {
        $this->adapter->indexDocument($this->id, $this->body);
        $this->adapter->deleteDocument($this->id);

        $this->expectException(ClientResponseException::class);
        $this->adapter->getDocument($this->id);
    }
}

A coupla things to note here:

  • This test is hitting Elasticsearch; it'd probably be better to have this as a functional test and mock-out the Client.
  • Especially given it's covering much the same ground as the previous integration test, which is a true intergration test. We don't need both.
  • I'm not testing searchByString in here. I probably should be, I forgot about it, and only noticed now.

Right: we can now talk to the Elasticsearch DB. Cool. Now we have to tell Symfony about it.

From a design perspective, I've decided that whenever Symfony calls Doctrine to write to an entity, I'll intercept that somehow, and also fire off a call to the ElasticsearchAdapter to perform the equivalent action. This is done via Doctrine Lifecycle Listeners. There's not much to them:

// src/EventListener/SearchIndexer.php

namespace App\EventListener;

use App\Entity\SyncableToElasticsearch;
use App\Service\ElasticsearchAdapter;
use Doctrine\Bundle\DoctrineBundle\Attribute\AsDoctrineListener;
use Doctrine\ORM\Event\PostPersistEventArgs;
use Doctrine\ORM\Event\PostUpdateEventArgs;
use Doctrine\ORM\Events;
use Symfony\Component\Routing\Generator\UrlGeneratorInterface;

#[AsDoctrineListener(event: Events::postPersist, priority: 500, connection: 'default')]
#[AsDoctrineListener(event: Events::postUpdate, priority: 500, connection: 'default')]
class SearchIndexer
{
    public function __construct(
        private ElasticsearchAdapter $adapter,
        private UrlGeneratorInterface $urlGenerator
    ) {}

    public function postPersist(PostPersistEventArgs $args): void
    {
        $this->sync($args->getObject());
    }

    public function postUpdate(PostUpdateEventArgs  $args): void
    {
        $this->sync($args->getObject());
    }

    public function sync(object $entity): void
    {
        if (!$entity instanceof SyncableToElasticsearch) {
            return;
        }

        $doc = $entity->toElasticsearchDocument();
        $doc['body']['_meta'] = [
            'type' => $entity->getShortName(),
            'title' => $entity->getSearchTitle(),
            'url' => $this->urlGenerator->generate(
                $entity->getShortName() . '_view',
                ['id' => $entity->getId()]
            ),
        ];

        $this->adapter->indexDocument($doc['id'], $doc['body']);
    }
}

The docs are clear on this, but basically one tags a class as AsDoctrineListener, specifies which events to listen to, and which DB connection (this is all in the attributes on the class). Then the specified event handlers are called when the given Doctrine events occur. Really easy!

One shortcoming in the system is that one can either use a Doctrine entity listeners, which can be configured for a single entity; or one can use a Doctrine lifecycle listener - as I have here - and apply it to all entities. There's no way of like sticking an attribute on an Entity and say "listen to events on this please". This seems like a lost opportunity. I've solved this via the code in sync:

if (!$entity instanceof SyncableToElasticsearch) {
    return;
}

And I implement that interface on all the entities I want to index. This actually seems reasonable from a design perspective, as I ended up needing a couple methods on them to support the indexing operation anyhow.

// src/Entity/SyncableToElasticsearch.php

namespace App\Entity;

interface SyncableToElasticsearch
{
    public function toElasticsearchDocument(): array;
    public function getSearchTitle(): string;
}
// src/Entity/AbstractSyncableToElasticsearch.php

namespace App\Entity;

abstract class AbstractSyncableToElasticsearch implements SyncableToElasticsearch
{
    protected ?int $id;

    abstract public function toElasticsearchArray(): array;

    public function toElasticsearchDocument(): array
    {
        return [
            'id' => $this->getElasticsearchId(),
            'body' => $this->toElasticsearchArray(),
        ];
    }

    protected function getElasticsearchId(): string
    {
        $entityType = $this->getShortName();
        return $entityType . '_' . $this->id;
    }

    public function getShortName(): string
    {
        $parts = explode('\\', static::class);
        $entityType = strtolower(end($parts));

        return $entityType;
    }
}

I have this abstract class which all the SyncableToElasticsearch entities extend as there's a coupla bits of code they all need to run as part of making the document for Elasticsearch. I don't like using inheritance when I can avoid it, but it was either this or a trait, and I dislike traits even more than inheritance.

And here's the relevant bits of one of the entities:

// src/Entity/Student.php

namespace App\Entity;

// ...

#[ORM\Entity(repositoryClass: StudentRepository::class)]
class Student extends AbstractSyncableToElasticsearch
{
    // ...

    public function toElasticsearchArray(): array
    {
        return [
            'email' => $this->email,
            'fullName' => $this->fullName,
            'dateOfBirth' => $this->dateOfBirth?->format('Y-m-d'),
            'gender' => $this->gender,
            'enrolmentYear' => $this->enrolmentYear,
            'status' => $this->status?->label(),
        ];
    }

    public function getSearchTitle(): string
    {
        return $this->fullName;
    }
}

Each entity knows how to derive what it needs to give to Elasticsearch, so this is the best place for these.

If we come back to SearchIndexer::sync now, we can see how all this is used:

$doc = $entity->toElasticsearchDocument();
$doc['body']['_meta'] = [
    'type' => $entity->getShortName(),
    'title' => $entity->getSearchTitle(),
    'url' => $this->urlGenerator->generate(
        $entity->getShortName() . '_view',
        ['id' => $entity->getId()]
    ),
];

$this->adapter->indexDocument($doc['id'], $doc['body']);

So for some reason the given entity has been updated in the DB, the postUpdate event has been fired and intercepted and the handler runs; and if it's an entity that's in the search index: we create a document for the search index and fire it off to Elasticsearch. Done. That's it. I mean literally, that's the end of the exercise. All the rest of this article is various support stuff I wrote to load data (into the DB and into EleasticSearch), have a UI for it, etc.

I needed a script to get all the data into Elasticsearch in the first place. Pretty easy (there's a lot of code, but it's most config / boilerplate):

// Command/ReindexSearchCommand.php

namespace App\Command;

use App\Entity\Course;
use App\Entity\Department;
use App\Entity\Instructor;
use App\Entity\Institution;
use App\Entity\Student;
use App\Entity\Assignment;
use App\EventListener\SearchIndexer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Routing\Generator\UrlGeneratorInterface;

#[AsCommand(
    name: 'search:reindex',
    description: 'Reindex all or specific entities into Elasticsearch'
)]
class ReindexSearchCommand extends Command
{
    private const array ENTITY_MAP = [
        'assignment' => Assignment::class,
        'course' => Course::class,
        'department' => Department::class,
        'instructor' => Instructor::class,
        'institution' => Institution::class,
        'student' => Student::class,
    ];

    public function __construct(
        private readonly EntityManagerInterface $em,
        private readonly SearchIndexer $indexer
    ) {
        parent::__construct();
    }

    protected function configure(): void
    {
        $this->addArgument(
            'entity',
            InputArgument::REQUIRED,
            'Entity to reindex (all, student, course, department, instructor, institution, assignment)'
        );
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $entityArg = strtolower($input->getArgument('entity'));

        if ($entityArg === 'all') {
            $entitiesToIndex = self::ENTITY_MAP;
        } elseif (isset(self::ENTITY_MAP[$entityArg])) {
            $entitiesToIndex = [$entityArg => self::ENTITY_MAP[$entityArg]];
        } else {
            $output->writeln('<error>Unknown entity: ' . $entityArg . '</error>');
            return Command::FAILURE;
        }

        foreach ($entitiesToIndex as $name => $class) {
            $output->writeln("Indexing $name...");
            $this->indexEntity($class, $output);
        }

        $output->writeln('<info>Reindexing complete.</info>');

        return Command::SUCCESS;
    }

    private function indexEntity(string $class, OutputInterface $output): void
    {
        $batchSize = 100;
        $repo = $this->em->getRepository($class);
        $qb = $repo->createQueryBuilder('e');
        $count = (int) $qb->select('COUNT(e.id)')->getQuery()->getSingleScalarResult();

        $output->writeln("  Found $count records.");

        for ($i = 0; $i < $count; $i += $batchSize) {
            $qb = $repo
                ->createQueryBuilder('e')
                ->setFirstResult($i)
                ->setMaxResults($batchSize);

            $results = $qb->getQuery()->getResult();

            foreach ($results as $entity) {
                $this->indexer->sync($entity);
            }

            $this->em->clear();
            gc_collect_cycles();
        }
    }
}

The key bit is the indexEntity method. It loops over batches of entities and passes them to the indexer's sync method (that we looked at before). Done.

What's with the entities in that ENTITY_MAP:

private const array ENTITY_MAP = [
    'assignment' => Assignment::class,
    'course' => Course::class,
    'department' => Department::class,
    'instructor' => Instructor::class,
    'institution' => Institution::class,
    'student' => Student::class,
];

These are the entities in my stub app. Basically I've decided to represent elements of educational institutions:

NB: Enrolments do not have anything to index in the search, so they do not extend AbstractSyncableToElasticsearch. All they do is tie a Student to a Course.

I have built a crude UI to list, view and edit each entity, which one enters via http://localhost:8080/institutions, which will display something like:

Drilling down:

Hey I told you it was crude.

Oh and "wootywoo" is my go-to string to search for. In case you were wondering.

One can add Students via the Courses view page (http://localhost:8080/courses/{id}/view):

There is no capacity to delete entities in this UI. Mostly cos I forgot about it until just now. I can't see as there being much more to it than hook some code up to another event.

I'm not gonna go into the details of the UI, as it's only helper-code, and I'm gonna do some research into Symfony forms and that sort of malarky separately in a week or so. But there are some controllers and some forms and some templates if you want to look.

The last part of this is how I got the test data into the database (not ElasticSearch) in the first place. This is all done with Data Fixtures, and Factories (see the Symfony docs: DoctrineFixturesBundle). Again, I'm not going into this here as it's support code. But go have a look [shrug]. This helped me load a whole bunch of data into the system, based on these constraints:

// tests/Fixtures/FixtureLimits.php

namespace App\Tests\Fixtures;

final class FixtureLimits
{
    public const int INSTITUTIONS_MAX = 100;
    public const int DEPARTMENTS_MIN = 5;
    public const int DEPARTMENTS_MAX = 15;
    public const int INSTRUCTORS_MIN = 5;
    public const int INSTRUCTORS_MAX = 15;
    public const int STUDENTS_MIN = 10;
    public const int STUDENTS_MAX = 50;
    public const int COURSES_MIN = 5;
    public const int COURSES_MAX = 10;
    public const int ASSIGNMENTS_MIN = 1;
    public const int ASSIGNMENTS_MAX = 5;
}

And that's about it. I'm quite happy about how easy it was to get the Elasticsearch part of this done. It took way longer to write the code to load the data, and for the UI than it did to get the Elasticsearch stuff integrated.

Righto.

--
Adam


PS: I started feeling guilty about the lack of deletion support in this work, so I sorted it out. See Making sure this app also deletes from Elasticsearch when I delete an entity.