Memory leak in Jenkins SSE Gateway plugin
Description
Attachments
- 05 Dec 2018, 09:54 AM
causes
relates to
Activity
Anton Marchukov March 4, 2019 at 7:06 AM
If the reason is SSE-gateway plugin memory leak than I think having less jobs just prolongs the time between restarts unfortunately.
Eyal Edri March 4, 2019 at 6:58 AM
shouldn't this be much better now with the move of KubeVirt jobs to new Jenkins?
Galit Rosenthal March 4, 2019 at 6:55 AM
Restarted Jenkins was terrible slow
Anton Marchukov January 9, 2019 at 1:21 PM
Waiting for jenkins ssi plugin fix from u/s
Sandro Bonazzola December 5, 2018 at 11:57 AM
Yuval Turgeman can we change the OST job not to read from the jenkins the artifact?
It might overload the server which is already overloaded. can you read it from another location maybe? perhaps resources.ovirt.org?
Same reason as above, builds from jenkins of oVirt Node are not published automatically to resources.ovirt.org and ISO repo is completely missing for 4.2 branch: https://resources.ovirt.org/pub/ovirt-4.2-snapshot/
Same tracker as above, #OVIRT-2355
Sandro Bonazzola December 5, 2018 at 11:54 AM
@sbonazzo its possible that bots are downloading ISOs from Jenkins using the link on https://www.ovirt.org/node/#ovirt-node-master.
Can we update the link to point to official releases on resources.ovirt.org rather than the Jenkins master which is overloaded?
Sadly no. we can't link to resource.ovirt.org because node images are not published there and the ISO repository is broken. Issue is tracked here: https://ovirt-jira.atlassian.net/browse/OVIRT-2355 opened 5 months ago.
Former user December 5, 2018 at 9:58 AM
I planned to reboot Jenkins tonight but it was busy running pipelines that aren't easy to cancel. As a result, Jenkins completely locked up in the morning with the UI being completely unreachable and backend threads timing out in the background. Had to restart it and it's now coming back up.
The monitoring plugin was partially responsive still during the outage and showed the following info:
Java memory used: | 15,590 Mb / 16,384 Mb *Usage is near the maximum, you may need to optimize or to reconfigure (-Xmx) |
Nb of http sessions: | 8 |
Nb of active threads | 33 |
System load | 2.78 |
% System CPU | 17.17 |
Almost all memory got exhausted which is likely caused by a memory leak in the SSE-gateway plugin coinciding with a large number of CI jobs appearing in the queue. Adding memory to Java will likely just delay the symptoms as the memory leak is still there (see JENKINS-51057)
From the yearly memory graph the leak started around May-June this year and intensified in November.
To confirm the exact root cause we may need some lower-level troubleshooting of the Java process yet I am not familiar with how that's done. @Martin Perina maybe you can assist with the info that can be gathered to identify the root cause?
Eyal Edri December 4, 2018 at 1:46 PM
@Sandro Bonazzola its possible that bots are downloading ISOs from Jenkins using the link on https://www.ovirt.org/node/#ovirt-node-master.
Can we update the link to point to official releases on resources.ovirt.org rather than the Jenkins master which is overloaded?
Former user December 4, 2018 at 1:35 PM
With regards to SSE-gateway I've found a very similar issue on their issue tracker to the one we're having:
https://issues.jenkins-ci.org/browse/JENKINS-51057
People report reboots as a good workaround and I'm planning to do so in OVIRT-2606 so we should be good there. There's also a groovy script published to clean up the heap of SSE-gateway but it needs testing on staging.
Former user December 4, 2018 at 1:27 PM
Here's the page linking directly to Jenkins: https://www.ovirt.org/node/#ovirt-node-master
This is not the root cause of the major slowness so let's split it into a separate ticket and decide where to publish these ISOs and how to ensure they don't pile up. Pretty sure that currently most downloads are performed by search engine crawlers, not end users.
Eyal Edri December 4, 2018 at 12:49 PM
@Former user can we change the OST job not to read from the jenkins the artifact?
It might overload the server which is already overloaded. can you read it from another location maybe? perhaps resources.ovirt.org?
Hi,
jenkins is terribly slow and becoming worse every day.
I tried to gain some speed by adding 4 cores to the VM through engine-phx.
It's a bit better but the real issue doesn't seem related to CPU power.
Can anybody investigate?
–
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com
<https://red.ht/sig>