Jenkins is an integral part of many modern IT infrastructures. It allows organizations to model CI/CD pipelines via code, automate operational workflows, increase efficiency, and reduce the time it takes to test, ship, and deploy applications.
In this article, we share a comprehensive guide to monitoring Jenkins. We discuss why it’s important to monitor Jenkins, some key performance and health metrics, and explore two of the best monitoring plugins.
Monitoring Jenkins will allow you to predict errors, ensure high availability, optimize your configuration sets, and track the progress of key automation workflows.
Organizations use Jenkins jobs and pipelines to automate various areas of the IT ecosystem. For example, they might have Jenkins pipelines that:
To ensure crucial pipelines and jobs are working optimally, monitoring Jenkins is essential. Pipelines of an unhealthy Jenkins instance might fail or get stuck, leading to delays in testing, analysis, and code deployment. Metrics related to memory, CPU usage, build success rate, and queues. can be used to gauge the general health of Jenkins in real time.
Malfunctions can arise due to system errors, issues in any plugins or shared libraries, or bad pipeline code. For example, someone might make a mistake while editing a Jenkinsfile, causing a pipeline to run indefinitely, a bug in a third-party plugin might cause jobs to slow down, or the Jenkins server might be running out of space.
Periodic monitoring helps with timely detection and investigation of such issues and bottlenecks. For example, by setting an alert on your monitoring system, you can get notified as soon as the disk usage on your Jenkins server reaches 80%. This will give you ample time to free up space before it becomes a problem.
A well-monitored system is less likely to go down. Tracking key health metrics equips you with the insights you need to keep Jenkins up and running. Metrics can also be used to contextualize issues for easier debugging. For instance, if a rise in build failures coincides with spikes in Java Virtual Machine (JVM) memory utilization, you can surmise that the Jenkins server is running out of memory.
If you are monitoring the right metrics, you can even identify avenues for improving the performance of the overall IT infrastructure. For example, you may notice that an increase in the number of concurrently executing builds slows down the Jenkins server. To resolve this, you can increase the number of threads from the Jenkins configuration, which could lead to faster execution of builds and a boost in the performance of the larger system.
Jenkins is highly configurable. To extract maximum performance from a Jenkins server, you must choose the right configuration parameters based on your automation workflows and operational requirements.
One way to do so is by tweaking configurations and monitoring changes in Jenkins performance. For example, you can toggle the number of polling threads and monitor how it impacts pipeline execution or change the number of executors and see how it affects overall performance and throughput.
By monitoring the impact of different configuration sets on the performance of a Jenkins instance, you will be able to identify your optimal configuration set.
Jenkins exposes multiple metrics for real-time monitoring of an instance’s health, performance, and throughput.
The Jenkins API returns a few standard health checks that output PASS/FAIL with an optional message. They are a great way to get a quick overview of an instance’s status.
The disk-space metric reports a failure if a Jenkins space monitor reports that the disk usage has breached the configured threshold. Tracking this metric will allow you to avoid disk-space-related disasters.
The plugins metric reports a failure if any of the Jenkins plugins failed to execute. This failure is either solved by disabling the problematic plugin, or by resolving any dependency issues causing it to fail. This is another critical metric to watch, as even one plugin failure can cause a Jenkins instance to behave unexpectedly.
The thread-deadlock metric reports a failure when there is a thread deadlock in the JVM. A thread deadlock is a condition where two or more threads may hang indefinitely, waiting for each other. Thread deadlocks can significantly impact Jenkins’ performance, potentially causing it to crash.
The temporary-space metric reports a failure if a Jenkins temporary space monitor reports that the temporary space is lower than the configured threshold. This is also an important metric, as Jenkins needs temporary space to create temporary files during job and build execution.
As Jenkins runs inside JVM, it’s crucial to monitor JVM-related metrics when measuring the server’s overall performance. The most important JVM-related metrics are:
system.cpu.load | This metric returns the overall load on the Jenkins controller as reported by the operating system JMX bean of JVM. It’s worth noting that the load calculation is dependent on the operating system. Periodic monitoring of this metric is important, as it lets you know the exact amount of load the Jenkins server is dealing with at any given time. |
vm.uptime.milliseconds | The number of milliseconds since the Jenkins JVM was initialized. On a healthy instance, the value of this metric corresponds to the start-up time of the Jenkins server. |
vm.count | The total number of threads in the JVM. This metric reports the sum of the following six metrics: vm.blocked.count, vm.runnable.count, vm.terminated.count, vm.timed_waiting.count, vm.waiting.count, and vm.new.count. |
vm.new.count | The total number of JVM threads that have not begun execution yet. |
vm.timed_waiting.count | The total number of JVM threads that have suspended execution for a specific period. An exceptional rise in the value of this metric may lead to high memory utilization of the Jenkins instance. |
vm.blocked.count | A count of the blocked threads that are waiting to acquire the monitor lock. Ideally, the value of this metric shouldn’t fluctuate too much over time |
vm.deadlocks | The total number of threads that are in a deadlock with at least one other thread. Ideally, this metric should always report a value of 0. A rapid increase in its value is an immediate cause for concern |
vm.memory.heap.init | The amount of heap memory, in bytes, that the JVM initially requested from the operating system |
vm.memory.heap.committed | The amount of heap memory, in bytes, that the operating system has made available to the Jenkins JVM for object allocation. The desirable range for this metric’s value depends on your infrastructure and operational needs |
vm.memory.heap.max | The maximum amount of heap memory, in bytes, that the JVM can obtain from the OS. If this memory value is greater than the value of vm.memory.heap.committed, the OS may not grant the memory to JVM |
vm.memory.heap.usage | This metric returns the ratio of vm.memory.heap.used to vm.memory.heap.max. It’s a great way to track the usage of heap over time |
vm.memory.non-heap.init | The amount of non-heap memory, in bytes, used for object allocation that JVM initially requested from the operating system |
vm.memory.non-heap.committed | The amount of non-heap memory, in bytes, that the operating system guarantees to be available to the Jenkins JVM |
vm.memory.non-heap.max | The maximum amount of non-heap memory, in bytes, that the JVM can request from the operating system. This amount of memory is not guaranteed to be available to the JVM if it’s greater than the value of vm.memory.non-heap.committed |
vm.memory.total.committed | The total amount of memory (heap and non-heap) the operating system has made available to the Jenkins JVM |
vm.memory.total.max | This metric represents the maximum amount of heap and non-heap memory, in bytes, that the JVM can obtain from the OS. If this memory value is greater than the value of vm.memory.total.committed, a memory allocation request may fai |
vm.daemon.count | The total number of JVM threads that have been marked as daemon threads. Daemon threads run in the background indefinitely |
vm.gc.X.count | The total number of times the garbage collector “X” has run |
A lot of user interactions with the Jenkins server take place through the web UI. Tracking the following metrics will give you a clear idea of how the web UI is performing:
http.requests | The overall rate at which Jenkins is receiving requests and the time taken for the request processing and response generation |
http.activeRequests | A count of the total active requests that the Jenkins server is processing. This metric shouldn’t grow beyond the server’s typical request processing capacity |
http.responseCodes.created | The response rate of requests with HTTP/201 status codes |
http.responseCodes.ok | The response rate of requests with HTTP/200 status codes |
http.responseCodes.badRequest | The response rate of requests with HTTP/400 status codes. Track this metric to ensure that you are not getting only a few failures |
http.responseCodes.noContent | The response rate of requests with HTTP/204 status codes |
http.responseCodes.forbidden | The response rate of requests with HTTP/403 status codes. A growing value of this metric may indicate attempts to gain unauthorized access to the server |
http.responseCodes.notModified | The response rate of requests with HTTP/304 status codes |
http.responseCodes.notFound | The response rate of requests with HTTP/404 status codes. This metric is a count of the number of times the user didn’t find what they were looking for. |
http.responseCodes.serverError | The response rate of requests with HTTP/500 status codes. This metric indicates the number of server errors encountered while processing UI requests |
http.responseCodes.serviceUnavailable | The response rate of requests with HTTP/503 status codes. The 503-error code indicates that the server isn’t ready to handle the request. Ideally, a healthy instance would rarely return 503 error codes, if at all |
http.responseCodes.other | The rate at which the UI is responding with non-informational codes – i.e., a status code that’s not in this list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, and HTTP/503 |
In this section, we will look at Jenkins-specific metrics, which will give us real-time insight into the performance of Jenkins jobs, executors, nodes, plugins, and queues.
jenkins.executor.count.value | The total number of Jenkins executors across all online nodes. |
jenkins.executor.free.value | The total number of Jenkins executors that are available for use. This metric’s desirable range will depend on your infrastructure and operational needs. |
jenkins.executor.in-use.value | The total number of Jenkins executors that are currently in use. |
jenkins.job.blocked.duration | The rate at which jobs are becoming blocked and the time they are spending in the blocked state. Strive to keep this metric’s value to a minimum. |
jenkins.job.building.duration | The rate at which jobs are being built and the time spent in their execution (building). An unreasonable build duration increase typically indicates something is wrong in the Jenkins ecosystem. |
jenkins.job.queuing.duration | The rate at which jobs are being queued and the time they spend waiting in the build queue. Ideally, jobs shouldn’t be queued for too long. |
jenkins.job.buildable.duration | The rate at which jobs from the build queue are assuming the buildable state and the time they spend in that state |
jenkins.job.waiting.duration | The rate at which jobs are entering the quiet period and the time they spend in the quiet period. Quiet period is a configuration parameter that determines how long a Jenkins instance should wait before triggering a job |
jenkins.job.total.duration | The rate at which jobs are entering the queue and the time it’s taking for them to reach completion |
jenkins.job.count.value | The total number of jobs present in the Jenkins instance. The historical value of this metric can be retrieved from the jenkins.job.count.history metric. |
jenkins.job.scheduled | The rate at which jobs are being scheduled. If a job has already been queued and another request to schedule the job is received, Jenkins will combine both requests. If you multiply this metric’s value with that of jenkins.job.building.duration, you will get an estimated number of executors required to service all build requests. |
jenkins.node.count.value | The total number of online and offline build nodes available to the Jenkins server. The historical value of this metric can be analyzed using the jenkins.node.count.history metric. |
jenkins.node.offline.value | The total number of build nodes that are currently offline. To view the historical stats of this metric, use the jenkins.node.offline.history metric. |
jenkins.plugins.active | The number of plugins that started successfully. Ideally, this count should be equal to the total number of plugins |
jenkins.plugins.failed | The number of plugins that didn’t start successfully or malfunctioned. A value other than zero typically indicates that something is wrong with the Jenkins installation. |
jenkins.plugins.withUpdate | The number of plugins for which there is a newer version available in the Jenkins update center. You should strive to keep your plugins up to date to avoid any potential bugs or vulnerabilities – that means that this metric should always return a value of 0 |
jenkins.queue.size.value | The total number of jobs present in the Jenkins build queue. Historical values of this metric can be seen using jenkins.queue.size.history. |
jenkins.queue.stuck.value | The total number of jobs that are stuck in the build queue. View historical statistics of this metric using the jenkins.queue.stuck.history metric. |
Now that we have explored some of the key performance and health metrics of Jenkins, let’s look at two of the best monitoring plugins to measure and track these metrics.
JavaMelody is an open-source plugin for monitoring Java applications. It’s available out-of-the-box with Jenkins. Follow these steps to install and enable it:
Once installed, you can access the monitoring dashboard by visiting http://jenkins-ip/monitoring. To view the report for all your nodes, visit http://jenkins-ip/monitoring/nodes. Across the different web pages of the monitoring dashboard, you can find various metrics, including threads, current HTTP requests, process list, HTTP sessions, build queue length, build time by period, memory and CPU usage charts, errors, logs, and all metrics exposed by MBeans.
Site24x7’s monitoring plugin offers granular visibility into the health and performance of each Jenkins instance. It has a web-based dashboard that displays various key metrics as graphs and charts. You can start monitoring in two simple steps: download the plugin from GitHub, and configure it as per your needs.
Some of the metrics you can track with the plugin are online nodes, disabled and enabled projects, used and free executors, stuck and blocked queues, active and inactive plugins, job schedule rate, job queuing duration, job waiting duration, JVM-related metrics, and web UI metrics.
Jenkins is the go-to tool to automate the testing, analysis, and code deployment. It enables organizations to set up fully automated, multi-step CI/CD pipelines that enhance productivity and reduce time to production. Monitoring a Jenkins instance allows you to increase its performance, predict malfunctions, and avoid downtime.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now