2019.07.24 - Jenkins UI inreachable
Description
Activity
Former user August 9, 2019 at 3:06 PM
Jenkins running stable and no OOMs are present in the log after increasing the limit. Closing the ticket.
Former user July 26, 2019 at 2:37 PM
For now I’ve created /etc/security/limits.d/22-jenkins-nproc.conf
with the following content:
jenkins soft nproc 16384
Restarting Jenkins will apply the new limit while I can work on a patch to puppet since we’ll need to add an extra module to define this
Former user July 26, 2019 at 2:31 PM
The jenkins user can currently run 4096 processes:
ulimit -u
4096
Looking at the java process it currently goes up to 1500 threads during normal operation:
cat /proc/490/status | grep Threads
Threads: 1720
As for memory usage, it seems to be under control:
| total | used | free | shared | cache | available |
---|---|---|---|---|---|---|
mem | 32010 | 11894 | 328 | 22 | 19787 | 18793 |
swap | 5119 | 47 | 5072 |
|
|
|
Former user July 25, 2019 at 8:54 AM
One of the noteworthy error messages in jenkins.log is one like this:
SEVERE: GerritMissedEventsPlaybackEnabledChecker thread failed with error
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at hudson.model.AsyncPeriodicWork.doRun(AsyncPeriodicWork.java:119)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This may mean that we have to bump ulimit or check memory allocation on the Jenkins. I recall @Barak Korren hitting similar issues on containerized Jenkins deployments.
As soon as the UI died, new container workers could no longer spawn as they need to grab http://jenkins.ovirt.org//jnlpJars/remoting.jar in order to connect back to Jenkins.
I received reports about Jenkins UI being unreachable yesterday evening. The jobs were still working as it seems so the core was operational. Need to review the logs to confirm what happened and how this can be avoided in the future. A likely cause is the recent patching of the core and plugins. We didin't hit similar issues on Staging however that system is under much lower load so the conditions that triggered the outage may not exist on that system.