G'day:
Right, so a couple of weeks ago we built a complete database-driven scheduled task system for Symfony. We got dynamic configuration through a web interface, timezone handling, working days filtering, execution tracking, and - the real kicker - a worker restart mechanism that actually updates running schedules when users change tasks without redeploying anything.
Then last week we debugged all the spectacular bugs in that implementation - Doctrine gotchas, timezone configuration self-sabotage, entity detachment through message queues, and every other way you can break an ORM if you set your mind to it.
All working perfectly. Execution tracking shows when tasks last ran and what happened. Users can configure schedules through a proper web interface. The whole system updates dynamically without manual intervention. Job done, time to move on to the next thing, right?
Well, not quite. Turns out we'd built a lovely scheduling system that could run tasks reliably and track their execution, but we'd forgotten to implement something rather important: what happens when tasks actually fail?
Our system would dutifully run a task every 30 seconds, log when it completed successfully, update the execution tracking data, and carry on to the next one. But if a task failed? It would log the error, then cheerfully schedule it to run again in another 30 seconds. And again. And again. Forever.
No failure limits, no automatic deactivation, no "maybe we should stop trying this after it's failed a dozen times" logic. Tasks could fail endlessly without consequence, which is not as helpful as it could be in a production scheduling system.
What followed was an afternoon of learning exactly why Doctrine events and database transactions don't play nicely with worker restarts, and discovering that sometimes the obvious solution really is the best one - if you can stop yourself from overthinking it.
The missing piece: what happens when tasks fail?
So what exactly had we forgotten to implement? Failure handling. We'd built all the infrastructure for running tasks and tracking their execution, but we'd never actually defined what should happen when a task fails repeatedly.
Our AbstractTaskHandler was doing comprehensive logging when tasks failed:
} catch (Throwable $e) {
$this->tasksLogger->error('Task failed', [
'task_id' => $taskId,
'task_type' => $this->getTaskTypeFromClassName(),
'error' => $e->getMessage(),
'exception' => $e
]);
throw $e;
}
So we knew when tasks were failing. The logs showed every error in detail. But none of that information was being used to make any decisions about what to do next. The task would just get scheduled to run again at its normal interval, fail again, get logged again, and repeat the cycle indefinitely.
In a production system, you need some kind of circuit breaker logic. If a task fails three times in a row, maybe there's something fundamentally wrong and it shouldn't keep trying. Maybe the external API it's calling is down, or there's a configuration issue, or the task itself is buggy. Continuing to hammer away every 30 seconds just wastes resources and fills up your logs with noise.
The obvious solution seemed straightforward: track how many times each task has failed consecutively, and automatically deactivate tasks that hit a failure threshold. Keep a failureCount in the execution tracking data, increment it on failures, reset it on success, and disable the task when it hits 3.
Simple business logic. How hard could it be to implement?
Turns out, quite hard. Because implementing failure handling properly meant diving head-first into the murky waters of Doctrine events, database transactions, and worker restart timing. What should have been a 20-minute addition turned into an afternoon of debugging increasingly creative ways for the system to break itself.
The obvious solution that wasn't so obvious
The implementation plan seemed dead simple. We already had a TaskExecution entity for tracking execution data, so we just needed to add a failureCount field:
#[ORM\\Column(nullable: false, options: ['default' => 0])]
private ?int $failureCount = 0;
Then update the AbstractTaskHandler to increment the failure count on errors and reset it on success:
try {
$this->handle($task);
// Reset failure count on success
$execution->setFailureCount(0);
$this->updateTaskExecution($task, $startTime, $executionTime, 'SUCCESS');
} catch (Throwable $e) {
// Increment failure count
$currentFailures = $execution->getFailureCount() + 1;
$execution->setFailureCount($currentFailures);
// Deactivate task after 3 failures
if ($currentFailures >= 3) {
$task->setActive(false);
$this->entityManager->persist($task);
}
$this->updateTaskExecution($task, $startTime, $executionTime, 'ERROR: ' . $e->getMessage());
throw $e;
}
Dead straightforward. Count failures, reset on success, deactivate after three strikes. The kind of logic you'd expect to find in any robust scheduling system.
We tested it with a task that was guaranteed to fail - threw an exception every time it ran. First failure: count goes to 1, task keeps running. Second failure: count goes to 2, still active. Third failure: count goes to 3, task gets deactivated and disappears from the schedule.
Perfect. Except for one small problem: the task didn't actually disappear from the schedule.
The active field got updated in the database correctly. The failure count was tracking properly. But the running scheduler kept trying to execute the task every 30 seconds, completely ignoring the fact that we'd just deactivated it. The worker would dutifully run the failed task again, see it was supposed to be inactive, increment the failure count to 4, try to deactivate it again, and carry on in an endless loop.
The problem was timing. We were updating the task configuration during task execution, which should have triggered a schedule reload so the worker would pick up the change. But the reload was happening at exactly the wrong moment, creating a race condition that turned our elegant failure handling into an infinite loop.
The Doctrine event problem
The issue was with our existing worker restart mechanism. We'd been using a TaskChangeListener that listened for Doctrine events and triggered schedule reloads whenever task configuration changed:
#[AsDoctrineListener(event: Events::postUpdate)]
#[AsDoctrineListener(event: Events::postPersist)]
#[AsDoctrineListener(event: Events::postRemove)]
class TaskChangeListener
{
private function handleTaskChange($entity): void
{
if (!$entity instanceof DynamicTaskMessage) {
return;
}
$this->tasksLogger->info('Task change detected, triggering worker restart');
$this->triggerWorkerRestart();
}
}
This worked perfectly when users updated tasks through the web interface. Change a schedule from "every 5 minutes" to "every 30 seconds", hit save, and within a few seconds the new schedule was live and running.
But when our failure handling logic updated the active field on a DynamicTaskMessage, it triggered the same listener. So the sequence became:
- Task fails for the third time in the handler
- Handler sets active = false and saves the entity
- Doctrine postUpdate event fires
- TaskChangeListener triggers a worker restart
- Worker restart happens while the handler is still running
That last step was the killer. The postUpdate event fires while you're still inside the database transaction that's updating the task. The worker restart spawns a new process that tries to read the updated task configuration, but the transaction hasn't committed yet. So the new worker process sees the task as still active, thinks it's overdue (because it just "failed" but the schedule hasn't been updated), and immediately runs it again.
Meanwhile, the original handler finishes its transaction and commits the active = false change. But it's too late - the new worker is already executing the task again with the old data, which will fail again, increment the failure count that it thinks is still 2, try to deactivate the task again, trigger another restart, and round we go.
Transaction isolation nightmare. The event system was designed for "fire and forget" notifications, not "coordinate complex multi-process state changes". We needed the worker restart to happen after the transaction committed, not during it.
postFlush: when the cure is worse than the disease
The "obvious" fix was to use postFlush events instead of postUpdate. The postFlush event fires after Doctrine commits all pending changes to the database, so there's no transaction timing issue. Perfect!
Except for one small problem: postFlush events don't tell you which entities were updated. The event just says "something got flushed to the database", but you have no idea what that something was.
So we'd end up with worker restarts triggered by every single database write in the entire application. User updates their profile? Worker restart. Product price gets updated? Worker restart. Log entry gets written? Worker restart. Session data gets saved? Worker restart.
In a typical web application, database writes happen constantly. Every page load, every form submission, every background process touching the database would trigger a schedule reload. We'd have workers restarting dozens of times per minute, which is roughly the opposite of what you want from a stable scheduling system.
We tried a few approaches to work around this:
- Track entity changes manually - store a list of modified entities during the request, check it in the postFlush handler. Complicated and error-prone.
- Use unit of work change sets - inspect Doctrine's internal change tracking to see what actually changed. Fragile and dependent on internal APIs.
- Custom flush operations - separate the task updates from other database operations. Architectural nightmare.
All of these solutions were more complex than the original problem. We were trying to hack around the fundamental limitation that postFlush events give you the right timing but no entity context, while postUpdate events give you entity context but the wrong timing.
Doctrine events just weren't going to work for this use case. The combination of "need entity-specific filtering" + "need post-transaction timing" + "avoid restart loops from execution updates" was impossible to solve cleanly with the lifecycle events.
Time to try a completely different approach.
The message bus epiphany
After wrestling with Doctrine events for the better part of an afternoon, we stepped back and had one of those "hang on a minute" moments. We were trying to use database lifecycle events to trigger application-level actions - worker restarts, schedule reloads, process coordination. But Doctrine events are designed for database concerns: maintaining referential integrity, updating timestamps, logging changes.
What we were trying to do wasn't really a database concern at all. We wanted to send a message to the scheduling system saying "hey, something changed, you might want to reload your config". That's application logic, not data persistence logic.
And we already had the perfect tool for sending messages between different parts of the application: Symfony's message bus. The same message bus that was handling our task execution was sitting right there, designed exactly for this kind of "do something after the current operation finishes" use case.
So instead of trying to hack around Doctrine event timing, why not just dispatch a ScheduleReloadMessage when we need a worker restart?
// In the failure handling logic
if ($currentFailures >= 3) {
$task->setActive(false);
$this->entityManager->persist($task);
$this->entityManager->flush();
// Tell the scheduler to reload after this transaction commits
$this->messageBus->dispatch(new ScheduleReloadMessage());
}
(NB: that's not the actual code being run, it's simplified for the sake of demonstration, the real code is @ src/MessageHandler/AbstractTaskHandler.php)
The message bus naturally handles the timing. Messages get processed after the current request/transaction completes, so there's no race condition between updating the database and reloading the schedule. The transaction commits first, then the message gets processed, then the worker restart happens with the correct data.
Plus we get proper separation of concerns: the task handler focuses on business logic (tracking failures, deactivating problematic tasks), and the message bus handles infrastructure concerns (coordinating worker restarts).
Sometimes the "clever" solution that requires fighting the framework is wrong, and the simple solution that works with the framework is right. We'd been so focused on making Doctrine events do what we wanted that we'd forgotten about the message infrastructure we'd already built.
Hindsight, eh?
The actual implementation is beautifully simple. The ScheduleReloadMessage is just an empty class - no properties, no constructor, just a marker to tell the system "reload the schedule":
class ScheduleReloadMessage
{
// That's it. Sometimes the simplest solutions are the best ones.
}
And the ScheduleReloadMessageHandler just writes the timestamp to the file that triggers the worker restart:
#[AsMessageHandler]
class ScheduleReloadMessageHandler
{
public function __invoke(ScheduleReloadMessage $message): void
{
file_put_contents($this->restartFilePath, time());
$this->logger->info('Schedule reload triggered via message bus');
}
}
Amazing how little code is required to solve what felt like a complex coordination problem.
Implementation walkthrough
With the message bus approach sorted, the actual implementation was straightforward. We ditched the TaskChangeListener entirely - no more Doctrine events, no more transaction timing issues, no more endless restart loops.
The failure tracking logic lives in the AbstractTaskHandler, which now takes the message bus as a constructor parameter:
public function __construct(
private readonly LoggerInterface $tasksLogger,
private readonly EntityManagerInterface $entityManager,
private readonly MessageBusInterface $messageBus
) {}
The execution logic tracks failures and handles deactivation cleanly:
try {
$result = $this->handle($task);
// Reset failure count on success
$execution->setFailureCount(0);
$this->updateTaskExecution($task, $startTime, $executionTime, $result);
} catch (Throwable $e) {
$execution = $this->getOrCreateExecution($task);
$currentFailures = $execution->getFailureCount() + 1;
$execution->setFailureCount($currentFailures);
$errorMessage = 'ERROR: ' . $e->getMessage();
if ($currentFailures >= self::MAX_FAILURES) {
$task->setActive(false);
$this->entityManager->persist($task);
$this->entityManager->flush();
$this->tasksLogger->warning('Task deactivated after repeated failures', [
'task_id' => $task->getId(),
'failure_count' => $currentFailures
]);
// Schedule reload after transaction commits
$this->messageBus->dispatch(new ScheduleReloadMessage());
$errorMessage .= ' (Task deactivated after ' . $currentFailures . ' failures)';
}
$this->updateTaskExecution($task, $startTime, $executionTime, $errorMessage);
throw $e;
}
The TaskExecution entity got the new failure tracking field:
#[ORM\\Column(nullable: false, options: ['default' => 0])]
private ?int $failureCount = 0;
And we needed to update the web interface to dispatch schedule reload messages when users make changes through the UI. The DynamicTaskController now injects the message bus and triggers reloads on create/update/delete operations:
public function create(Request $request): Response
{
// ... form handling ...
if ($form->isSubmitted() && $form->isValid()) {
$this->entityManager->persist($task);
$this->entityManager->flush();
$this->messageBus->dispatch(new ScheduleReloadMessage());
return $this->redirectToRoute('dynamic_task_index');
}
}
Now both programmatic changes (task failures) and user-driven changes (web interface updates) use the same mechanism for triggering schedule reloads. Consistent, predictable, and no transaction timing issues.
Bonus features that fell out for free
Once we had the message bus approach working for failure handling, a few other features became trivial to implement. The infrastructure was already there - we just needed to wire up a few more use cases.
Delete tasks properly: We'd had a task listing interface but no way to actually delete tasks that were no longer needed. Adding a delete action to the controller was straightforward:
public function delete(DynamicTaskMessage $task): Response
{
$this->entityManager->remove($task);
$this->entityManager->flush();
$this->messageBus->dispatch(new ScheduleReloadMessage());
return $this->redirectToRoute('dynamic_task_index');
}
Delete the task, flush the change, tell the scheduler to reload. The deleted task disappears from the running schedule within seconds.
Ad-hoc task execution: Sometimes you want to run a task immediately for testing or troubleshooting, rather than waiting for its next scheduled time. Since we already had the message infrastructure, this was just a matter of dispatching a TaskMessage directly:
public function run(DynamicTaskMessage $task): Response
{
$taskMessage = new TaskMessage(
$task->getType(),
$task->getId(),
$task->getMetadata() ?? []
);
$this->messageBus->dispatch($taskMessage);
$this->addFlash('success', 'Task execution requested');
return $this->redirectToRoute('dynamic_task_index');
}
Add a "Run Now" button to the task listing, and users can trigger immediate execution without disrupting the normal schedule. Handy for testing new tasks or dealing with one-off requirements.
What wasn't immediately obvious to me (Claudia needed to point it out) is that these scheduled task classes I've got are just Symfony Message / MessageHandler classes. They work just as well like this in a "stand-alone" fashion as they do being wrangled by the scheduler. Really handy.
NullTaskHandler for testing: We added a NullTaskHandler that does absolutely nothing except log that it ran:
class NullTaskHandler extends AbstractTaskHandler
{
protected function handle(DynamicTaskMessage $task): string
{
// Deliberately do nothing
return 'NULL task completed successfully (did nothing)';
}
}
Perfect for testing the scheduling system without any side effects. Create a "null" task, set it to run every 30 seconds, and watch the logs to verify everything's working properly. You can see tasks being scheduled, executed, and tracked without worrying about the task logic itself.
All of these features required minimal additional code because the core message bus infrastructure was already in place. Sometimes building the right foundation pays dividends in unexpected ways.
Claudia's summary: When the simple solution is staring you in the face
Right, Adam's asked me to reflect on this whole exercise. What started as "just add some failure counting" turned into a proper lesson in when to stop fighting the framework and start working with it.
The most striking thing about this debugging saga was how we got tunnel vision on making Doctrine events work. We spent ages trying to solve the transaction timing problem, then the entity filtering problem, then considering all sorts of hacky workarounds. When the answer was sitting right there in the message bus we'd already built.
It's a perfect example of the sunk cost fallacy in technical problem-solving. We'd invested time in the Doctrine event approach, so we kept trying to make it work rather than stepping back and asking "what would we do if we were designing this from scratch?"
The breakthrough came when we stopped thinking about the technical details (transaction boundaries, event timing, entity lifecycle) and started thinking about what we were actually trying to accomplish: send a message from one part of the system to another saying "something changed, please react accordingly". That's literally what message buses are designed for.
The resulting solution is cleaner than what we started with. No more Doctrine events trying to coordinate cross-process communication. No more transaction timing issues. Just straightforward message dispatch that works with Symfony's natural request/response cycle.
Sometimes the obvious solution really is the best one - if you can stop yourself from overthinking it long enough to see it.
Adam's bit
(also written by Claudia this time… a bit cheeky of her ;-)
Building robust failure handling taught me something important about production systems: the edge cases aren't really edge cases. Tasks will fail. Networks will be unreliable. External APIs will go down at the worst possible moment. Building a scheduling system without failure handling is like building a car without brakes - it might work fine until you actually need to stop.
The message bus approach solved our immediate problem, but it also gave us a better foundation for future features. Need to send notifications when tasks fail? Dispatch a message. Want to collect metrics about task performance? Another message. Need to coordinate with external systems? You get the idea.
Most importantly, we learned when to stop being clever. The Doctrine event approach felt sophisticated - using the framework's lifecycle hooks to automatically coordinate system state. But sophisticated isn't always better. Sometimes the straightforward solution that everyone can understand and debug is worth more than the clever solution that feels elegant.
Our scheduling system now handles failures gracefully, gives users control over task execution, and has a clean architecture that's easy to extend. Not bad for an afternoon's work, once we stopped overthinking it.
Righto.
--
Adam