Jobs failing in collecting artifacts

Description

Hi,
I have some merge jobs failing collecting artifacts
For example the https://jenkins.ovirt.org/job/ovirt-engine-sdk-ruby_standard-on-merge/17 is blocked in

[Pipeline] writeFile
[Pipeline] archiveArtifacts
20:37:00 Archiving artifacts
[Pipeline] }
[Pipeline] // node
[Pipeline] }

The thread dump shows

Thread #108
at DSL.sh(awaiting process completion in /home/jenkins/workspace/ovirt-engine-sdk-ruby_standard-on-merge/ovirt-engine-sdk-ruby@tmp/durable-4c24beab on vm0045.workers-phx.ovirt.org; recurrence period: 15000ms; check task scheduled; cancelled? false done? false)
at Script4.mock_runner(Script4.groovy:450)
at Script4.run_std_ci_in_mock(Script4.groovy:408)
at Script7.withHook(Script7.groovy:20)
at Script4.run_std_ci_in_mock(Script4.groovy:404)
at DSL.dir(Native Method)
at Script4.run_std_ci_in_mock(Script4.groovy:397)
at Script4.run_std_ci_on_node(Script4.groovy:367)
at Script4.mk_mock_std_ci_runner(Script4.groovy:90)
at DSL.node(running on vm0045.workers-phx.ovirt.org)
at Script4.mk_mock_std_ci_runner(Script4.groovy:89)
at DSL.parallel(Native Method)
at Script4.run_std_ci_jobs(Script4.groovy:64)
at Script1.main(Script1.groovy:130)
at DSL.stage(Native Method)
at Script1.main(Script1.groovy:129)
at WorkflowScript.main(WorkflowScript:81)
at DSL.withEnv(Native Method)
at WorkflowScript.main(WorkflowScript:80)
at WorkflowScript.run(WorkflowScript:17)
at DSL.timestamps(Native Method)
at WorkflowScript.run(WorkflowScript:17)

The same happened to
https://jenkins.ovirt.org/job/ovirt-engine-sdk-ruby_standard-on-merge/16/

But before another job finished correctly
https://jenkins.ovirt.org/job/ovirt-engine-sdk-ruby_standard-on-merge/15/

Activity

Show:
Roberto Ciatti
April 23, 2020, 8:37 AM

Hi all,

this morning happened the same thing as yesterday evening.
I relaunched (via ‘ci re-merge please’) the [https://jenkins.ovirt.org/job/ovirt-engine-sdk-ruby_standard-on-merge/16] that yesterday was aborted and this time went fine.

But launching another job that failed yesterday (always with ‘ci re-merge please’), this time the job hangs like yesterday [https://jenkins.ovirt.org/job/ovirt-engine-sdk-ruby_standard-on-merge/20].

Can be something that remains dirty in the CI env after the first job execution until JOB receive a SIGKILL (i guess from a timeout check… after 3 hours …)?

Thanks for the help
Kind regards

Roberto

Ehud Yonasi
April 23, 2020, 9:37 AM

Hey,
I can see that run #16 was aborted and the slave was vm0045. because of that cleanup scripts were not run and the slave needs to be fixed.
I will offline the slave and ask to clean it up.

Thanks for reporting.

Roberto Ciatti
April 23, 2020, 9:52 AM

Hi,
yes was aborted, not by me ‘cause I don’t have rights, but please look at the fact that before aborting it the job was blocked. And it was launched after one that worked correctly.

This morning happened the same:

And there are no reason to block 'cause the patch phase is working correctly and the merge is doing the same thing (only pom.xml version changes or small changes to doc files).

Something is blocking a merge job after a succeeding one.

I opened an issue for that but please keep me updated.

thanks and regards

Roberto

 

Assignee

infra

Reporter

Roberto Ciatti

Blocked By

None

Components

Priority

High
Configure