Nginx client_header_buffer_size - Debugging Cloudflare-related 400 Errors for Magento/Adobe Commerce HTTP Requests
One of the more interesting things I’ve had to debug this year was why Magento
was apparently returning 400 responses to builder.io for certain webhook payloads.
A few of the team had looked into this but drawn a bit of a blank so I offered to take a look and it ended up being a fairly puzzling one to debug.
The same payload POST direct to a local running version of the application would give the expected 200 response, but when running through it’s production stack of Cloudflare and into the excellent Corefinity - we suddently started getting 400 response codes which was fairly odd as the custom controller that was receiving the payload had no code in it to return a 400 response.
My first thought was that Cloudflare was stopping the requests going due to either the body content or an OWASP ruleset. Running the requests through in no-proxy mode discounted this as being the issue.
A fairly important piece of debugging spotted that the nginx log presented the 400 error as well, so that ruled out Cloudflare being the issue.
I added some ancillary logging to fire whenever the controller Magento side fired to determine whether or not the request was even getting that far and fairly quickly realised that they weren’t even hitting Magento to return a 400 response.
My thoughts then shifted to either PHP’s post_max_size, or perhapsmax_input_vars set to low and the payload couldn’t be parsed. The additional logging never fired when the 400 was being returned so that pointed to nginx or varnish being the culprit.
At this point, I trapped a few of the outbound POST requests again and spotted that once they’d been through Cloudflare there was quite a number of headers and suspected that the issue we were actually running into was that nginx’s two configuration values were too low for:
client_header_buffer_sizelarge_client_header_buffers
This turned out to be the case, once these were increased the 400 errors soon
disappeared. We did also debug whether varnish was involved but fairly quickly
ruled out that being part of the issue. It was solely nginx.
If you ever encounter add odd 400 response for a webhook on any host, I can highly recommend that you check those two configuration values first and with a bit of luck I’ve saved you some debugging.
The best piece of advice I can give you is to isolate the point inside the stack you’re getting to when the 400 is thrown, that will help you narrow down what the cause is quickly.