ovirt-node-ng-image_4.3_build-artifacts-fc28-x86_64 #39 stuck for 2 days

Description

Activity

Show:
Eyal Edri
December 30, 2018, 6:41 PM

Barak,
Do we have a timeout for normal STD CI jobs?
or its just for OST?

On Sun, Dec 30, 2018, 19:48 Nir Soffer (oVirt JIRA) <

Yuval Turgeman
December 30, 2018, 7:24 PM

Looks like livemedia-creator installed the VM correctly, but failed to
build the final image file for some reason (disk issues?). Stdci failed
the job on timeout, but probably can't kill the hanging process. Is it
possible to take a look at the slave somehow ?

On Sun, Dec 30, 2018, 19:57 Nir Soffer <nsoffer@redhat.com wrote:

> Started 2 days 11 hr ago
> Build has been executing for 2 days 11 hr on vm0038.workers-phx.ovirt.or
> <https://jenkins.ovirt.org/computer/vm0038.workers-phx.ovirt.org>
>
>
> https://jenkins.ovirt.org/job/ovirt-node-ng-image_4.3_build-artifacts-fc28-x86_64/39/
>
>
> _______________________________________________
> Devel mailing list – devel@ovirt.org
> To unsubscribe send an email to devel-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/FXGWNLV3ZLZSNC4MEXGBAMFP5GBAP7IY/
>

Eyal Edri
December 30, 2018, 7:46 PM

On Sun, Dec 30, 2018, 21:25 Yuval Turgeman <yturgema@redhat.com wrote:

> Looks like livemedia-creator installed the VM correctly, but failed to
> build the final image file for some reason (disk issues?). Stdci failed
> the job on timeout, but probably can't kill the hanging process. Is it
> possible to take a look at the slave somehow ?
>

sure, though you will need someone from the CI team to ssh in, if you don't
have access to infra servers.

> On Sun, Dec 30, 2018, 19:57 Nir Soffer <nsoffer@redhat.com wrote:
>
>> Started 2 days 11 hr ago
>> Build has been executing for 2 days 11 hr on vm0038.workers-phx.ovirt.or
>> <https://jenkins.ovirt.org/computer/vm0038.workers-phx.ovirt.org>
>>
>>
>> https://jenkins.ovirt.org/job/ovirt-node-ng-image_4.3_build-artifacts-fc28-x86_64/39/
>>
>>
>> _______________________________________________
>> Devel mailing list – devel@ovirt.org
>> To unsubscribe send an email to devel-leave@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/FXGWNLV3ZLZSNC4MEXGBAMFP5GBAP7IY/
>>
> _______________________________________________
> Infra mailing list – infra@ovirt.org
> To unsubscribe send an email to infra-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/infra@ovirt.org/message/46JUANKQHKPK7AQN5PD3MJGAESVBMFCC/
>

Daniel Belenky
December 31, 2018, 8:24 AM

The node where the build was running on ( vm0038.workers-phx.ovirt.org ) ran out of disk space.
That's why we see all those issues.
I'm checking what fills the disk.

Daniel Belenky
December 31, 2018, 8:27 AM

The machine where that job ran on ran out of space.
I'll update more details (as I'll have them) on the ticket
https://ovirt-jira.atlassian.net/browse/OVIRT-2638

On Sun, Dec 30, 2018 at 9:45 PM Eyal Edri <eedri@redhat.com> wrote:

>
>
> On Sun, Dec 30, 2018, 21:25 Yuval Turgeman <yturgema@redhat.com wrote:
>
>> Looks like livemedia-creator installed the VM correctly, but failed to
>> build the final image file for some reason (disk issues?). Stdci failed
>> the job on timeout, but probably can't kill the hanging process. Is it
>> possible to take a look at the slave somehow ?
>>
>
> sure, though you will need someone from the CI team to ssh in, if you
> don't have access to infra servers.
>
>
>> On Sun, Dec 30, 2018, 19:57 Nir Soffer <nsoffer@redhat.com wrote:
>>
>>> Started 2 days 11 hr ago
>>> Build has been executing for 2 days 11 hr on vm0038.workers-phx.ovirt.or
>>> <https://jenkins.ovirt.org/computer/vm0038.workers-phx.ovirt.org>
>>>
>>>
>>> https://jenkins.ovirt.org/job/ovirt-node-ng-image_4.3_build-artifacts-fc28-x86_64/39/
>>>
>>>
>>> _______________________________________________
>>> Devel mailing list – devel@ovirt.org
>>> To unsubscribe send an email to devel-leave@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/FXGWNLV3ZLZSNC4MEXGBAMFP5GBAP7IY/
>>>
>> _______________________________________________
>> Infra mailing list – infra@ovirt.org
>> To unsubscribe send an email to infra-leave@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/infra@ovirt.org/message/46JUANKQHKPK7AQN5PD3MJGAESVBMFCC/
>>
>

DANIEL BELENKY

Eyal Edri
December 31, 2018, 9:07 AM

please update what filled the disk, we should add it to the slave cleaner script,
We've seen more and more reports on out of space for the VM slaves, we should find out the root cause for it.

Liora Milbaum
December 31, 2018, 10:42 AM

Do we have a service which tracks the slave disk space. And, if it reaches a certain threshold... performs some remediation steps?

Daniel Belenky
December 31, 2018, 12:45 PM

There is no service that sends an email but those VMs are managed by our oVirt engine phx instance here so the disk can be monitored from there.

Daniel Belenky
December 31, 2018, 12:46 PM
Edited

I've cleaned the host, and it is not up and running.
The problem was that due filled disk, there were stuck VMs there that prevented the host from being cleaned up.

Liora Milbaum
January 1, 2019, 1:02 PM

Remediation and Monitoring are not the same. Don't you think we should have a remediation service which will prevent such cases in the future?

Assignee

Daniel Belenky

Reporter

Nir Soffer

Blocked By

None

Priority

Medium
Configure