2019.07.24 - Jenkins UI inreachable

Description

I received reports about Jenkins UI being unreachable yesterday evening. The jobs were still working as it seems so the core was operational. Need to review the logs to confirm what happened and how this can be avoided in the future. A likely cause is the recent patching of the core and plugins. We didin't hit similar issues on Staging however that system is under much lower load so the conditions that triggered the outage may not exist on that system.

Activity

Show:

Former user August 9, 2019 at 3:06 PM

Jenkins running stable and no OOMs are present in the log after increasing the limit. Closing the ticket.

Former user July 26, 2019 at 2:37 PM

For now I’ve created /etc/security/limits.d/22-jenkins-nproc.conf with the following content:

jenkins soft nproc 16384

Restarting Jenkins will apply the new limit while I can work on a patch to puppet since we’ll need to add an extra module to define this

Former user July 26, 2019 at 2:31 PM

The jenkins user can currently run 4096 processes:

ulimit -u
4096

 

Looking at the java process it currently goes up to 1500 threads during normal operation:

cat /proc/490/status | grep Threads
Threads: 1720

 

As for memory usage, it seems to be under control:

 

 

total

used

free

shared

cache

available

mem

32010

11894

328

22

19787

18793

swap

5119

47

5072

 

 

 




Former user July 25, 2019 at 8:54 AM

One of the noteworthy error messages in jenkins.log is one like this:

 

SEVERE: GerritMissedEventsPlaybackEnabledChecker thread failed with error
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at hudson.model.AsyncPeriodicWork.doRun(AsyncPeriodicWork.java:119)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 

This may mean that we have to bump ulimit or check memory allocation on the Jenkins. I recall hitting similar issues on containerized Jenkins deployments.

 

As soon as the UI died, new container workers could no longer spawn as they need to grab http://jenkins.ovirt.org//jnlpJars/remoting.jar in order to connect back to Jenkins.

 

Fixed

Details

Assignee

Reporter

Priority

Created July 25, 2019 at 8:38 AM
Updated August 29, 2019 at 2:12 PM
Resolved August 9, 2019 at 3:06 PM

Flag notifications