Stuck job - running 8 days

Description

I found this stuck job in jenkins:
https://jenkins.ovirt.org/job/ovirt-engine_standard-check-patch/5660/

Don't we have a timeout for killing run away jobs?

Activity

Show:
Ehud Yonasi
May 13, 2020, 4:21 PM

Thanks for reporting Nir,
You are right about the timeout - but that exists only during the actual
code. In this case the slave was
somehow out of storage / memory and it stuck during loading our stdci code.

Evgheni Dereveanchin
May 18, 2020, 9:13 AM

Indeed, there’s times when jobs get stuck in certain stages when timeouts don’t apply. Usually this does not decrease capacity since the job is not really running on any host so we periodically kill such jobs or they just get dropped on Jenkins restarts.

 

do you know if we can add a global timeout for the whole pipeline? Something like this:

https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit

Or maybe we’re already using this? IMHO we don’t have any jobs running longer than 4 hours so, let’s say, a 24h timeout should be well enough in all cases.

Ehud Yonasi
May 18, 2020, 10:32 AM

It’s not used anywhere in the code, I can implement it.

Evgheni Dereveanchin
May 18, 2020, 11:48 AM

Thanks, I can confirm I see these stuck jobs from time to time and just kill them since if they’re running for more than a day then their result is of no use to anybody. I’ll assign the ticket to you so that you can prioritize it. Not an urgent matter but a good thing to have global timeouts on our side.

Assignee

Ehud Yonasi

Reporter

Nir Soffer

Blocked By

None

Priority

Medium
Configure