Memory leak in Jenkins SSE Gateway plugin

Description

Hi,
jenkins is terribly slow and becoming worse every day.
I tried to gain some speed by adding 4 cores to the VM through engine-phx.
It's a bit better but the real issue doesn't seem related to CPU power.
Can anybody investigate?

SANDRO BONAZZOLA

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbonazzo@redhat.com
<https://red.ht/sig>

Attachments

1
  • 05 Dec 2018, 09:54 AM

Activity

Show:

Anton Marchukov March 4, 2019 at 7:06 AM

If the reason is SSE-gateway plugin memory leak than I think having less jobs just prolongs the time between restarts unfortunately.

Eyal Edri March 4, 2019 at 6:58 AM

shouldn't this be much better now with the move of KubeVirt jobs to new Jenkins?

Galit Rosenthal March 4, 2019 at 6:55 AM

Restarted Jenkins was terrible slow

Anton Marchukov January 9, 2019 at 1:21 PM

Waiting for jenkins ssi plugin fix from u/s

Sandro Bonazzola December 5, 2018 at 11:57 AM

Yuval Turgeman can we change the OST job not to read from the jenkins the artifact?
It might overload the server which is already overloaded. can you read it from another location maybe? perhaps resources.ovirt.org?

Same reason as above, builds from jenkins of oVirt Node are not published automatically to resources.ovirt.org and ISO repo is completely missing for 4.2 branch: https://resources.ovirt.org/pub/ovirt-4.2-snapshot/
Same tracker as above, #OVIRT-2355

Sandro Bonazzola December 5, 2018 at 11:54 AM

@sbonazzo its possible that bots are downloading ISOs from Jenkins using the link on https://www.ovirt.org/node/#ovirt-node-master.

 

Can we update the link to point to official releases on resources.ovirt.org rather than the Jenkins master which is overloaded?

Sadly no. we can't link to resource.ovirt.org because node images are not published there and the ISO repository is broken. Issue is tracked here: https://ovirt-jira.atlassian.net/browse/OVIRT-2355 opened 5 months ago.

Former user December 5, 2018 at 9:58 AM

I planned to reboot Jenkins tonight but it was busy running pipelines that aren't easy to cancel. As a result, Jenkins completely locked up in the morning with the UI being completely unreachable and backend threads timing out in the background. Had to restart it and it's now coming back up.

The monitoring plugin was partially responsive still during the outage and showed the following info:

Java memory used:

15,590 Mb / 16,384 Mb *Usage is near the maximum, you may need to optimize or to reconfigure (-Xmx)

Nb of http sessions:

8

Nb of active threads
(current http requests):

33

System load

2.78

% System CPU

17.17

Almost all memory got exhausted which is likely caused by a memory leak in the SSE-gateway plugin coinciding with a large number of CI jobs appearing in the queue. Adding memory to Java will likely just delay the symptoms as the memory leak is still there (see JENKINS-51057)

From the yearly memory graph the leak started around May-June this year and intensified in November.

To confirm the exact root cause we may need some lower-level troubleshooting of the Java process yet I am not familiar with how that's done. maybe you can assist with the info that can be gathered to identify the root cause?

Eyal Edri December 4, 2018 at 1:46 PM

its possible that bots are downloading ISOs from Jenkins using the link on https://www.ovirt.org/node/#ovirt-node-master.

Can we update the link to point to official releases on resources.ovirt.org rather than the Jenkins master which is overloaded?

Former user December 4, 2018 at 1:35 PM

With regards to SSE-gateway I've found a very similar issue on their issue tracker to the one we're having:
https://issues.jenkins-ci.org/browse/JENKINS-51057

People report reboots as a good workaround and I'm planning to do so in OVIRT-2606 so we should be good there. There's also a groovy script published to clean up the heap of SSE-gateway but it needs testing on staging.

Former user December 4, 2018 at 1:27 PM

Here's the page linking directly to Jenkins: https://www.ovirt.org/node/#ovirt-node-master

This is not the root cause of the major slowness so let's split it into a separate ticket and decide where to publish these ISOs and how to ensure they don't pile up. Pretty sure that currently most downloads are performed by search engine crawlers, not end users.

Eyal Edri December 4, 2018 at 12:49 PM

can we change the OST job not to read from the jenkins the artifact?
It might overload the server which is already overloaded. can you read it from another location maybe? perhaps resources.ovirt.org?

Fixed

Details

Assignee

Reporter

Priority

Created November 23, 2018 at 9:09 AM
Updated August 29, 2019 at 2:12 PM
Resolved August 9, 2019 at 3:08 PM