Gerrit outage 2017.04.06

Description

Gerrit was reported down yesterday evening. This ticket is to confirm what caused this outage.

Activity

Show:

Former user July 11, 2018 at 8:36 AM

Haven't seen this issue in a while, closing

Former user April 10, 2017 at 1:55 PM

As memory was used by git processes, may be caused by spikes in cloning requests of large repositories. We may try to limit the number of apache threads which are sesponsible for anonymous cloning, yet that may cause issues if our jobs start cloning near a release and we have quite a lot of builders nowadays.

Eyal Edri April 9, 2017 at 7:06 AM

Thanks for the quick investigation!
IIRC the VM has 32GB memory, should we consider upgrading it or we can do tweaks on the application side?

Former user April 7, 2017 at 8:32 AM
Edited

Looks like the system ran out of memory and the java process running Gerrit was killed:

Apr 6 15:22:46 gerrit kernel: Out of memory: Kill process 17773 (java) score 68 or sacrifice child
Apr 6 15:22:46 gerrit kernel: Killed process 629, UID 500, (python) total-vm:128496kB, anon-rss:2852kB, file-rss:36kB
...
Apr 6 15:22:46 gerrit kernel: Out of memory: Kill process 17773 (java) score 68 or sacrifice child
Apr 6 15:22:46 gerrit kernel: Killed process 17773, UID 500, (java) total-vm:14688224kB, anon-rss:995328kB, file-rss:48kB

Most memory was used by numerous git and git-upload-pack processes. From the logs above, memory usage of java (anon-rss) was just below 1GB so it's not a memory leak in Gerrit.

Cannot Reproduce

Details

Assignee

Reporter

Priority

Created April 7, 2017 at 8:24 AM
Updated September 2, 2018 at 3:50 PM
Resolved July 11, 2018 at 8:36 AM