Nginx Monitoring Guide: Key Metrics & Status

Every website owner wants their content delivered fast. Slow loading pages frustrate visitors and hurt search rankings. Your web server sits right in the middle of this equation. When something breaks there, users notice immediately. That is why nginx monitoring matters more than most people think. You cannot fix what you cannot see.

NGINX started as a simple HTTP server. Today it powers millions of busy websites. It works as a reverse proxy, handles load balancing, and caches content efficiently. But even the best software needs watching. This guide walks through everything about monitoring NGINX properly. You will learn which metrics actually matter, how to collect them, and which tools make the job easier.

Why Nginx Monitoring Matters

NGINX handles a lot of traffic silently in the background. Most problems do not announce themselves. Your site might slow down gradually. Users might start seeing random errors. Without monitoring, you stay blind until complaints roll in.

Monitoring gives you visibility. You see exactly what NGINX does at any moment. Are connections piling up? Is the server dropping requests? These questions get answered before customers feel pain.

The impact on business is real: a loss of 10% of requests on an e-commerce site means lost revenue. A news site’s slowdown during peak hours leads to a loss of readers. Monitoring helps prevent both scenarios. You detect problems early, not after they escalate.

NGINX acts as a reverse proxy for many companies. It sits in front of application servers, databases, and microservices. When the proxy fails, everything behind it becomes unreachable. Monitoring your proxy layer protects your whole infrastructure.

Load balancing makes NGINX even more useful. Traffic gets distributed across multiple backend servers. But load balancing only works when every part functions correctly. If one backend fails and NGINX keeps sending traffic there, users experience errors. Monitoring upstream servers prevents this problem.

Reliability depends on data. You need to know request rates, error percentages, and response times. These numbers tell you when to scale up, when to fix code, or when to add more servers.

Security also benefits from monitoring. Unusual traffic patterns might indicate an attack. A sudden spike in 5xx errors could mean someone is probing your systems. Monitoring gives you early warning signs.

Key Nginx Metrics to Monitor

Raw numbers tell the real story. NGINX exposes many metrics through its status interface. Some metrics matter more than others. Focusing on the wrong numbers wastes time. Here is what actually deserves attention.

Basic Activity Metrics

These show what NGINX does right now. Think of them as vital signs for your server.

Accepts metric counts how many connections NGINX has accepted in total since the server started running. This number keeps growing steadily over time under normal conditions. A flat line across this metric means no traffic is arriving, which could indicate a network problem somewhere between your users and the server;
Handled shows the number of connections that were successfully processed by NGINX. Under normal operating conditions, the Handled value stays very close to the Accepts value. A growing gap between these two numbers means something went wrong during connection processing, and some requests never reached NGINX;
Active connections tells you exactly how many connections are open and being processed right now. High numbers are not automatically a bad sign, as busy servers naturally maintain many active connections. However, sudden spikes without corresponding increases in traffic deserve immediate investigation;
Waiting connections represent idle keep alive connections that remain open for future requests. These are completely fine and actually improve efficiency. They show users keeping connections open for subsequent requests rather than constantly reconnecting, which reduces overhead;
Reading and Writing represent connections that are actively receiving requests from clients or sending responses back to them. These values should stay low on a healthy server with reasonable response times. Persistent high reading counts might indicate slow clients that cannot receive data quickly or a potential attack in progress;
Total requests count every single request NGINX has processed since the server started up. Watching this growth rate over hours, days, and weeks helps you predict future traffic patterns and plan for capacity upgrades;
Idle connections are very similar to waiting connections in what they represent. Too many idle connections might suggest that clients are hanging up improperly or that timeout values are set too high.

Error Metrics

Errors tell you when something breaks. Different errors mean different problems.

Dropped connections happen when NGINX cannot accept more connections. The operating system queue fills up. Your server simply cannot handle the load. This is a serious warning sign;
4xx errors come from client problems. Missing pages, bad authentication, malformed requests. A sudden increase might indicate broken links or someone scanning your site;
5xx errors come from server problems. Your application crashed. The database timed out. The reverse proxy could not reach the backend. These require immediate attention;
Server error rate combines all errors into a percentage. Anything above 1 percent needs investigation. Above 5 percent is an emergency.

Performance Metrics

Speed matters. Slow responses drive users away.

Requests per second measures throughput. This fluctuates with traffic. Watch for unexpected drops, which might indicate connection limits or resource exhaustion;
Request processing time measures how long NGINX takes to handle each request. Slow processing points to backend problems, slow disk I/O, or application code issues.

Upstream Server Metrics

When NGINX works as a reverse proxy or handles load balancing, upstream servers become critical.

Active connections by the upstream server shows load distribution. One server handling significantly more connections than others suggests poor load balancing or a server struggling;
5xx codes by upstream server identifies failing backends. If one server produces 90 percent of errors, that server needs attention;
Available servers per upstream group tracks health. Load balancing only works when multiple servers respond. Watch for servers dropping out of rotation.

Resource Metrics

NGINX itself consumes system resources. Tracking these helps with capacity planning.

Resource usage includes CPU, memory, disk, and network components. Elevated CPU levels could indicate excessive SSL handshakes or intricate rewrite rules. While uncommon, memory leaks can occur with custom modules;
Work throughput refers to the volume of data transmitted and received by NGINX. By comparing throughput with request rates, you can determine the average size of responses;
Work errors capture issues occurring inside NGINX worker processes. Any crashes or failed operations in this area demand prompt troubleshooting.

How to Monitor NGINX

Getting metrics out of NGINX takes configuration. The method depends on which version you run. Open source NGINX differs from NGINX Plus. Both offer monitoring capabilities, just at different levels of detail.

Using the Status Page with Open Source NGINX

Open source NGINX includes a simple status module called ngx_http_stub_status_module. This module is not enabled by default. You need to compile it in or use a prebuilt package that includes it.

Check if your NGINX has it:

text

nginx -V 2>&1 | grep http_stub_status_module

If you see –with-http_stub_status_module, you are good. Otherwise you need to rebuild NGINX or install a different package.

Enable the status page in your configuration:

text

server {

    listen 80;

    server_name your-domain.com;

    location /nginx_status {

        stub_status;

        allow 127.0.0.1;

        deny all;

    }

}

The allow and deny lines restrict access. Exposing this page publicly gives attackers useful information. Restrict it to localhost or your monitoring server IP.

After reloading NGINX, visit http://your-server/nginx_status. You see output like:

text

Active connections: 42 

server accepts handled requests

 1234567 1234567 9876543 

Reading: 0 Writing: 3 Waiting: 39

This text format is simple to parse. Many monitoring tools read this directly.

Getting a JSON Interface

Stub status outputs plain text. For JSON, you need extra work. A small script can convert the stub output to JSON. Or you can use the ngx_http_api_module if you compile NGINX with it.

The API module provides JSON natively. Enable it with:

text

location /api {

    api write=off;

    allow 127.0.0.1;

    deny all;

}

Then access http://your-server/api for structured metrics.

Using Access Logs for Additional Metrics

Access logs contain request details. Each log line includes response time, status code, bytes sent, and request path. Parsing these logs gives you metrics not available from the status page.

Configure custom log formats:

text

log_format detailed '$remote_addr - $remote_user [$time_local] '

                    '"$request" $status $body_bytes_sent '

                    '$request_time $upstream_response_time';

access_log /var/log/nginx/access.log detailed;

The $request_time variable logs total request processing time. $upstream_response_time logs backend response time. Comparing these shows where delays happen.

Log parsing at scale requires tools. goaccess reads logs and generates real time reports. mtail extracts metrics from logs for monitoring systems.

Monitoring Upstream Servers

When NGINX load balances to upstream servers, you need visibility into each backend. The ngx_http_upstream_hc_module provides health checks. Enable it with:

text

upstream backend {

    server app1.example.com max_fails=3 fail_timeout=30s;

    server app2.example.com max_fails=3 fail_timeout=30s;

}

location /upstream_status {

    upstream_status;

    allow 127.0.0.1;

    deny all;

}

This shows each upstream server’s status, failure counts, and response times.

NGINX Plus Differences

NGINX Plus includes much more monitoring capability out of the box. You get a built-in JSON interface with dozens of metrics. The status dashboard shows connection states, cache performance, SSL metrics, and upstream health.

NGINX Plus also tracks worker connections per process. This helps debug connection limits and worker process balancing.

Keep-alive metrics show how well connections reuse. Low keep-alive rates increase overhead.

HTTP/2 metrics display stream counts, concurrent streams, and connection efficiency. HTTP/2 multiplexing changes how you think about connection counts.

8 Common Nginx Monitoring Pitfalls

Even experienced teams make mistakes with monitoring. Avoiding these traps saves headaches later.

Monitoring only averages. Average response time hides problems. Fifty percent of requests could be fast while the other half time out. Look at percentiles. The 99th percentile tells you what slow users actually experience;
Ignoring the gap between accepted and handled. A small gap means dropped connections. Many teams overlook this. They see active connections and think everything is fine. That gap represents lost traffic;
Forgetting about upstream health. Your NGINX server might run perfectly while all your backends are failing. Watch upstream metrics just as closely as local metrics;
Collecting too many metrics. Thousands of data points create noise. You cannot watch everything. Focus on actionable metrics. Add more only when specific problems require them;
Not setting baselines. A sudden change means nothing without a normal range. Record what your metrics look like during regular operation. Then deviations become obvious;
Exposing status pages publicly. Unauthenticated status endpoints leak information. Attackers see your traffic volume, connection counts, and backend health. Always restrict access;
Using default log formats. Default logs miss request times and upstream response times. These fields are essential for debugging performance;
Testing from inside only. Internal monitoring shows your network’s view. Users see something different. Supplement internal checks with external monitoring from multiple locations.

Nginx Monitoring Best Practices

Good monitoring takes real thought and planning. These practices separate effective setups from noisy ones that generate alerts without providing value.

Watch handled connections, not just errors

Many teams focus exclusively on HTTP error codes when setting up alerts. This approach misses a critical failure mode. When Handled connections drop while Accepts continue rising, connections are getting dropped at the kernel level before NGINX even sees them. This problem is actually more serious than a few 5xx errors because it indicates your server cannot accept traffic at all.

Track worker limits

Each worker process in NGINX has a fixed maximum number of connections it can handle simultaneously. Hitting this limit causes request queuing and slow responses across your entire application. You should track worker_connections usage actively to know when you are approaching this boundary.

Separate static and dynamic metrics

Static files like images, CSS, and JavaScript should serve in milliseconds under almost any condition. Dynamic requests that hit application code or databases naturally take longer to process. Mixing these two categories together hides problems entirely. A slowdown in static content delivery is a very different problem from a slowdown in dynamic content generation.

Use different alerts for 4xx and 5xx errors

Fourxx errors come from client mistakes like requesting missing pages or sending bad input. These rarely need immediate action or paging a human operator. Fivexx errors come from server side failures and often do require urgent attention. You should alert on 5xx error spikes within minutes while keeping 4xx alerts configured at much higher thresholds.

Track upstream availability

Tracking upstream server availability as a percentage reveals the real story behind your backend health. Ninety nine percent available might sound good at first glance, but that number actually means your backend fails for nearly four hours per month. For critical services, you should target 99.9 percent or higher.

Monitor from multiple locations

Your internal monitoring server might have a clean, fast network path to NGINX while your users route through congested networks full of problems. External checks from different geographical areas catch routing issues, peering problems, and regional outages that internal monitoring would never detect.

Automate your baselines

Manual threshold setting inevitably fails as traffic patterns change over time. What looks like a spike during a quiet period might be completely normal during peak hours. Use monitoring tools that learn normal ranges automatically and adjust their alerting thresholds based on observed behavior.

Look at the whole system

High request processing time could mean NGINX itself is slow and needs investigation. But it could also mean your disk is completely full, preventing log writes. Or it could indicate your network link has become saturated. You must look at the whole picture rather than focusing on NGINX metrics in isolation.

Check TLS handshake times

SSL and TLS handshakes are computationally expensive operations that consume significant CPU resources. Slow handshake times point directly to certificate problems, OCSP lookup issues, or CPU exhaustion on your server. Tracking this separately from other metrics helps you identify the specific bottleneck.

Keep one year of data

Seasonal traffic patterns repeat reliably year after year. Last year’s Black Friday data helps you plan capacity for this year’s Black Friday. Without at least twelve months of historical data, you will keep rediscovering the same patterns instead of proactively preparing for them.

Top 5 Nginx Monitoring Tools

Several tools excel at NGINX monitoring. Each takes a different approach. Your choice depends on budget, team size, and existing infrastructure.

1. Monitor Us

Monitor Us provides a straightforward approach to NGINX monitoring. The platform collects metrics directly from your stub status page or API endpoint. You get prebuilt dashboards showing active connections, requests per second, and error rates without writing queries.

Setup takes minutes. Point Monitor Us to your http://server/nginx_status URL. The system starts collecting immediately. Alerts come through email, Slack, or PagerDuty.

Monitor Us excels at external monitoring. It checks your NGINX servers from multiple geographic locations. This catches problems internal monitoring systems miss, like regional outages or routing issues.

The interface is clean and simple. Teams new to monitoring find it accessible. Pricing is transparent and scales with your number of servers.

2. Zabbix

Zabbix handles NGINX monitoring through its HTTP agent. It parses the stub status page and converts metrics into JSON. The template system lets you reuse monitoring configurations across hundreds of servers.

Zabbix excels at correlation. It pulls system metrics alongside NGINX metrics. CPU spikes get compared to request rate increases. Memory usage gets tracked against connection counts.

The open source version costs nothing. Enterprise support is available. Zabbix scales from one server to thousands.

3. Datadog

Datadog offers commercial monitoring with NGINX integration. Their agent collects dozens of NGINX metrics automatically. You get dashboards, alerts, and anomaly detection without configuration headaches.

The pricing model is per host. Large deployments get expensive quickly. For smaller teams, the time savings often justify the cost.

Datadog shines at tracing requests through your entire stack. You see how long NGINX took, how long the application took, and how long the database took. This end to end visibility is hard to replicate with open source tools.

4. NGINX Plus Dashboard

NGINX Plus includes a built-in live activity dashboard. This shows real time metrics without external tools. You see requests per second, connection states, cache hits, and upstream health in one screen.

The dashboard is good for quick checks. For long term trending and alerting, you still need external monitoring. But for day to day operations, the dashboard covers many needs.

5. Sematext

Sematext provides full stack monitoring including NGINX logs and metrics. Their solution parses access logs automatically, extracting request times, status codes, and user agents.

The log shipping agent forwards data to Sematext Cloud. You then search, filter, and alert on log data. Metrics from the stub status page get collected alongside logs.

Sematext works well for teams already using their infrastructure monitoring. The learning curve is gentler than Prometheus.

Setting Up Your First Monitoring Check

Start simple: enable the stub status page as shown earlier, then set up a script to retrieve and analyze data every minute and write the results to a file.

Next, add an alerting system: send an email when the number of active connections exceeds 1,000 for five minutes, and add another alert when the difference between accepted and processed connections exceeds 1 percent.

Gradually add more metrics: track requests per second once you understand connection patterns. Add outbound traffic monitoring after setting up load balancing.

Your monitoring system should evolve along with your infrastructure. Start with five metrics, add five more next month, and within a year you’ll have comprehensive coverage.