When I started writting Perfecty Push Notifications for WordPress, a self-hosted Push Server in PHP using wep-push-php, I decided to target almost every WordPress server in the planet (Shared hosts, VPS, Dedicated hosts). It was important to offer a plugin that worked out-of-the-box in almost any installation while at the same time gave a good user experience.
Although saying a 2300% performance improvement is extravagant, you can say "yeah, it was a bad design since the beginning, that's why", I'll argue the reasons behind it and invite you to read the whole post, maybe there will be some take aways for you, specially if you like to build side projects as I do :)
Everything started with the end-user in mind
Because sending thousands of notifications via a web form that waits for minutes in a never loading page while doing the whole processing is awful, I decided to use background jobs to send the notifications.
There are sophisticated ways for background processing in WordPress like Action Scheduler used in Woocommerce, which has automatic adjustments like:
[...] Action Scheduler will only process actions in a request until:
- 90% of available memory is used
- processing another 3 actions would exceed 30 seconds of total request time, based on the average processing time for the current batch
- in a single concurrent queue
However, it was particularly problematic in my case because the websites would need that the cron system can reach themselves from the internet, which in some server configurations it's not always granted by default.
Considering that WordPress has its own cron system WP-Cron and is well supported in the vast majority of installations, I decided to start with this one instead. It is built-in and has tons of documentation resources and help available online for the future end-users. The only concern was how it relies on the web traffic to trigger the jobs, however it can be adjusted so that wp-cron.php is executed by a system cron, so it was not a big problem.
I implemented a simple background processing in batches with the default parameters defined as low so that the plugin worked out-of-the-box in almost any installation. The drawback was that it was not fast enough for highly demanding websites since the beginning, however I planned to solve it in future iterations because the important matter was to have an MVP that showcased its value.
Highlights of the original mechanism:
- The
batch_size
setting defined the total number of notifications sent in a WP-Cron execution. It was adjustable and by default it was 30 notifications per job execution. - After sending a
batch_size
number of notifications, it would auto-schedule himself to send the next batch of notifications. - Used the default
batchSize = 1000
parameter from wep-push-php, however thebatch_size
setting from Perfecty Push would still limit it.
The code at that time looked like this:
public static function execute_broadcast_batch( $notification_id ) {
Log::info( 'Executing batch for job id=' . $notification_id );
$notification = Perfecty_Push_Lib_Db::get_notification( $notification_id );
if ( ! $notification ) {
Log::error( "Notification job $notification_id was not found" );
return false;
}
// if it has been taken but not released, that means a wrong state
if ( $notification->is_taken ) {
Log::error( 'Halted, notification job already taken, notification_id: ' . $notification_id );
return false;
}
// we check if it's a valid status
// we only process running or scheduled jobs
if ( $notification->status !== Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_SCHEDULED &&
$notification->status !== Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_RUNNING ) {
Log::error( 'Halted, received a job with an invalid status (' . $notification->status . '), notification_id: ' . $notification_id );
return false;
}
// this is the first time we get here so we mark it as running
if ( $notification->status == Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_SCHEDULED ) {
Log::info( 'Marking job id=' . $notification_id . ' as running' );
Perfecty_Push_Lib_Db::mark_notification_running( $notification_id );
}
Perfecty_Push_Lib_Db::take_notification( $notification_id );
// we get the next batch, starting from $last_cursor we take $batch_size elements
// we only fetch the active users (only_active)
$users = Perfecty_Push_Lib_Db::get_users( $notification->last_cursor, $notification->batch_size, 'created_at', 'desc' );
if ( count( $users ) == 0 ) {
Log::info( 'Job id=' . $notification_id . ' completed, released' );
$result = Perfecty_Push_Lib_Db::mark_notification_completed_untake( $notification_id );
if ( ! $result ) {
Log::error( "Could not mark the notification job $notification_id as completed" );
return false;
}
return true;
}
// we send one batch
$result = self::send_notification( $notification->payload, $users );
if ( is_array( $result ) ) {
$notification = Perfecty_Push_Lib_Db::get_notification( $notification_id );
$total_batch = $result[0];
$succeeded = $result[1];
$notification->last_cursor += $total_batch;
$notification->succeeded += $succeeded;
$notification->is_taken = 0;
$notification->last_execution_at = current_time( 'mysql', 1 );
$result = Perfecty_Push_Lib_Db::update_notification( $notification );
Log::info( 'Notification batch for id=' . $notification_id . ' sent. Cursor: ' . $notification->last_cursor . ', Total: ' . $total_batch . ', Succeeded: ' . $succeeded );
if ( ! $result ) {
Log::error( 'Could not update the notification after sending one batch' );
return false;
}
} else {
Log::error( 'Error executing one batch for id=' . $notification_id . ', result: ' . $result );
Perfecty_Push_Lib_Db::mark_notification_failed( $notification_id );
Perfecty_Push_Lib_Db::untake_notification( $notification_id );
return false;
}
// execute the next batch
if ( ! wp_next_scheduled( self::BROADCAST_HOOK, array( $notification_id ) ) ) {
$result = wp_schedule_single_event( time(), self::BROADCAST_HOOK, array( $notification_id ) );
Log::info( 'Scheduling next batch for id=' . $notification_id . ' . Result: ' . $result );
} else {
Log::warning( "Don't schedule next batch, it's already scheduled, id=" . $notification_id );
}
return true;
}
Good for a first version, but problematic
However, the above code is slow and there was a problem with how the
batchSize
parameter from the web-push-lib worked during my initial tests. This parameter defines the batches size during flushing, by making asynchronous HTTP requests. You can see those batches as concurrent requests, and they can potentially create high spikes in memory and CPU usage, which can cause some weird errors in the downstream components like:
[14-Dec-2020 19:41:36] WARNING: [pool website.com]
child 4593 said into stderr: "PHP message: ERROR |
Failed to send one notification, error:
cURL error 60: Issuer certificate is invalid.
(see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
for https://fcm.googleapis.com/fcm/send/XXXXXXXXXXXXXXXXXXXXXXX"
Although I initially suspected problems with cURL and the certificates, it was my server that couldn't handle more than 300 notifications concurrently with such a specs. So, instead of tweaking the
batchSize
parameter from the web-push-php lib, I adjusted the
batch_size
value in my plugin and never used a value higher than 250 in my production-like environment.
For a fresh installation of the plugin, the default value of
batch_size = 30
had a very decent throughput, it took around 3 minutes to send 1.000 notifications, acceptable if you want to send Push Notifications for free. So I launched it:
However, as some websites started to have more than 10.000 users, the plugin was taking more than 23 minutes to complete the batch processing and it was noticeable slow. I needed to do something...
2300% faster
Recently I published the
v1.4.0
version with performance improvements that make the plugin 2300% faster. The plugin can now send more than
13.000
notifications in
56
seconds, in a very basic server of 2Gb RAM and 2vCPU, which is a huge gain compared to the 23 minutes it was taking before, or
2300%
faster.
The server load after the improvements looked like this:
Highlights of the new mechanism:
- The execution of multiple batches is done in a single cron job (before it used multiple cron jobs), which reduced the wasted time between cron job executions (~5s to 10s). At the same time, if it can send all notifications at once, it will do it.
- The execution is split in subsequent cron jobs if it is taking more than 80% of the maximum execution time. This can be avoided if the script has no time limit or the limit is very high.
- The
batch_size
setting now defaults to1.500
, before it was30
and caused weird issues with values higher than250
. With this mechanism I've used values like20.000
and it works smoothly :) - The parallel flushing (batchSize from web-push-php) is now adjustable and defaults to a low value (50) to avoid the weird cURL issues mentioned above. It can be increased by using better server specs.
If you want to take a look at the code, it's this:
public static function execute_broadcast_batch( $notification_id ) {
Log::info( 'Executing batch for job id=' . $notification_id );
$notification = Perfecty_Push_Lib_Db::get_notification( $notification_id );
if ( ! $notification ) {
Log::error( "Notification job $notification_id was not found" );
return false;
}
// if it has been taken but not released, that means a wrong state
if ( $notification->is_taken ) {
Log::error( 'Halted, notification job already taken, notification_id: ' . $notification_id );
return false;
}
// we check if it's a valid status
// we only process running or scheduled jobs
if ( $notification->status !== Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_SCHEDULED &&
$notification->status !== Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_RUNNING ) {
Log::error( 'Halted, received a job with an invalid status (' . $notification->status . '), notification_id: ' . $notification_id );
return false;
}
// this is the first time we get here so we mark it as running
if ( $notification->status == Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_SCHEDULED ) {
Log::info( 'Marking job id=' . $notification_id . ' as running' );
Perfecty_Push_Lib_Db::mark_notification_running( $notification_id );
}
Perfecty_Push_Lib_Db::take_notification( $notification_id );
// we get the next batch, starting from $last_cursor we take $batch_size elements
// we only fetch the active users (only_active)
$total_succeeded = 0;
$cursor = $notification->last_cursor;
$start_time = microtime( true );
while ( true ) {
$users = Perfecty_Push_Lib_Db::get_users( $cursor, $notification->batch_size );
$count = count( $users );
$cursor = $cursor + $count;
if ( $count == 0 ) {
Log::info( 'Job id=' . $notification_id . ' completed, released' );
$result = Perfecty_Push_Lib_Db::mark_notification_completed_untake( $notification_id );
if ( ! $result ) {
Log::error( "Could not mark the notification job $notification_id as completed" );
break;
}
break;
}
$succeeded = self::send_notification( $notification->payload, $users );
if ( $succeeded !== 0 ) {
Log::info( "Completed batch, successful: $succeeded, cursor: $cursor" );
$total_succeeded += $succeeded;
} else {
Log::error( 'Error executing one batch for id=' . $notification_id );
Perfecty_Push_Lib_Db::mark_notification_failed( $notification_id );
Perfecty_Push_Lib_Db::untake_notification( $notification_id );
break;
}
// check that we don't exceed 80% of max_execution_time
// in case we do, we split the execution to a next cron cycle to avoid the termination of the script
// if max_execution_time=0, we never split
if ( self::time_limit_exceeded( $start_time ) ) {
Log::warning( 'Time execution is reaching 80% of max_execution_time, moving to next cycle' );
break;
}
}
if ( $total_succeeded != 0 ) {
$notification = Perfecty_Push_Lib_Db::get_notification( $notification_id );
$notification->last_cursor = $cursor;
$notification->succeeded += $total_succeeded;
$notification->is_taken = 0;
$notification->last_execution_at = current_time( 'mysql', 1 );
$result = Perfecty_Push_Lib_Db::update_notification( $notification );
Log::info( 'Notification cycle for id=' . $notification_id . ' sent. Cursor: ' . $notification->last_cursor . ', Succeeded: ' . $total_succeeded );
if ( ! $result ) {
Log::error( 'Could not update the notification after sending one batch' );
return false;
}
if ( $notification->status === Perfecty_Push_Lib_Db::NOTIFICATIONS_STATUS_RUNNING ) {
// execute the next batch
if ( ! wp_next_scheduled( self::BROADCAST_HOOK, array( $notification_id ) ) ) {
$result = wp_schedule_single_event( time(), self::BROADCAST_HOOK, array( $notification_id ) );
Log::info( 'Scheduling next batch for id=' . $notification_id . ' . Result: ' . $result );
} else {
Log::warning( "Don't schedule next batch, it's already scheduled, id=" . $notification_id );
}
}
}
return true;
}
The process to adjust the performance of the plugin in a WordPress site with this new mechanism is very well described in the official Perfecty Push documentation here: https://docs.perfecty.org/wp/performance-improvements/
Conclusion
Of course, this number can be lowered down much more by adjusting the web server limits (memory limit or the fpm params), or increasing the server specs, or moving away other components if they reside in the same server (mail server, metrics server, external admin panel, etc). The idea is that with the new approach, it's easier for the end users to tune it and achieve a much faster push server.
It also demonstrates that working in iterations helps in showing the product value to the end-users since the beginning, and that it's preferred to have a good working version released on time rather than getting stuck forever until it reaches the absolute perfection. Please understand, perfection takes time and a couple of iterations.
Photos
Photo by Jean Gerber on Unsplash
Photo by ray sangga kusuma on Unsplash
Photo by Josh Calabrese on Unsplash