multiple Jenkins jobs stuck

Description

http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/16013/console

The job failed as expected, the package is not available yet:

15:00:03 Error: Package: vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64
(/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03
Requires: sanlock >= 3.5.0-1*15:00:03* Installing:
sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03
sanlock = 3.4.0-1.el7*15:00:03* Error: Package:
vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64
(/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03
Requires: sanlock >= 3.5.0-1*15:00:03* Available:
sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03
sanlock = 3.4.0-1.el7*15:00:03* You could try using --skip-broken to
work around the problem*15:00:03* You could try running: rpm -Va
--nofiles --nodigest*15:00:03* Took 498 seconds*15:00:03*
===================================

But then it got stuck:

15:00:03 Build step 'Execute shell' marked build as
failure*15:00:03* $ ssh-agent -k*15:00:03* unset
SSH_AUTH_SOCK;15:00:03 unset SSH_AGENT_PID;15:00:03 echo Agent pid
28320 killed;15:00:03 [ssh-agent] Stopped.

And is still running now.

Nir

Activity

Show:

Former user August 2, 2017 at 9:07 AM

A CPU soft lockup caused this outage:

[2373383.273250] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [java:904]
[2373383.273303] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter xfs libcrc32c kvm_intel kvm irqbypass crc32_pclmul dm_mod ghash_clmulni_intel aesni_intel lrw ppdev gf128mul glue_helper ablk_helper cryptd sg virtio_balloon pcspkr i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi virtio_net virtio_scsi virtio_console virtio_blk ata_piix crct10dif_pclmul crct10dif_common
[2373383.273351] qxl drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm libata drm serio_raw virtio_pci virtio_ring virtio i2c_core floppy
[2373383.273364] CPU: 2 PID: 904 Comm: java Not tainted 3.10.0-514.21.2.el7.x86_64 #1
[2373383.273366] Hardware name: oVirt oVirt Node, BIOS 1.9.1-5.el7_3.2 04/01/2014
[2373383.273368] task: ffff880816730fb0 ti: ffff880027014000 task.ti: ffff880027014000
[2373383.273370] RIP: 0010:[<ffffffff810f9f72>] [<ffffffff810f9f72>] smp_call_function_many+0x202/0x260
[2373383.273376] RSP: 0018:ffff880027017cc8 EFLAGS: 00000202
[2373383.273378] RAX: 0000000000000000 RBX: 00000000000000fc RCX: ffff88081e61ac20
[2373383.273379] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
[2373383.273380] RBP: ffff880027017d00 R08: ffff8801749fbc00 R09: ffffffff81318419
[2373383.273382] R10: ffff88081e699b40 R11: ffffea002054e400 R12: 0000000000000292
[2373383.273383] R13: ffff880027017c78 R14: ffff88017fc56800 R15: 0000000000000001
[2373383.273385] FS: 00007ff4204dd700(0000) GS:ffff88081e680000(0000) knlGS:0000000000000000
[2373383.273387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2373383.273388] CR2: 00007f842e325fa0 CR3: 0000000812b98000 CR4: 00000000000006e0
[2373383.273394] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2373383.273396] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2373383.273397] Stack:
[2373383.273398] 00000001204d8000 ffff88005283af08 ffff88005283abc0 0000000000000000
[2373383.273401] ffffffffffffffff ffff88005283af08 ffff88005283abc0 ffff880027017d50
[2373383.273404] ffffffff8106c5f8 ffff88005283abc0 0000000000000000 ffffffffffffffff
[2373383.273407] Call Trace:
[2373383.273413] [<ffffffff8106c5f8>] native_flush_tlb_others+0xb8/0xc0
[2373383.273417] [<ffffffff8106c6c9>] flush_tlb_mm_range+0x69/0x140
[2373383.273421] [<ffffffff811ac743>] tlb_flush_mmu.part.61+0x33/0xc0
[2373383.273424] [<ffffffff811ada35>] tlb_finish_mmu+0x55/0x60
[2373383.273427] [<ffffffff811afe0a>] zap_page_range+0x13a/0x180
[2373383.273430] [<ffffffff811abe9c>] SyS_madvise+0x38c/0x8d0
[2373383.273435] [<ffffffff8109b501>] ? __set_task_blocked+0x41/0xa0
[2373383.273437] [<ffffffff8109e096>] ? __set_current_blocked+0x36/0x80
[2373383.273442] [<ffffffff81697749>] system_call_fastpath+0x16/0x1b
[2373383.273443] Code: 48 63 35 66 da 9e 00 89 c2 39 f0 0f 8d 86 fe ff ff 48 98 49 8b 0f 48 03 0c c5 c0 99 ad 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 <f6> 41 20 01 75 f8 48 63 35 35 da 9e 00 eb b7 0f b6 4d cc 4c 89

The latest kernel was installed and the system was rebooted. I also evacuated some active VMs from the hypervisor where Jenkins is running to ensure it is not caused by host overload.

Former user August 1, 2017 at 3:42 PM

There is a problem with the Jenkins service. Looks like due to a CPU lockup the Java process missed the moment when jobs finished and disconnected, so now they're stuck. I'll reboot the system and get back with updates.

Nir Soffer August 1, 2017 at 3:34 PM

Same on the fedora 25 job:
http://jenkins.ovirt.org/job/vdsm_master_check-patch-fc25-x86_64/10162/console

On Tue, Aug 1, 2017 at 6:31 PM Nir Soffer <nsoffer@redhat.com> wrote:

> Still running now, 39 minutes passed...
>
> On Tue, Aug 1, 2017 at 6:15 PM Nir Soffer <nsoffer@redhat.com> wrote:
>
>>
>> http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/16013/console
>>
>> The job failed as expected, the package is not available yet:
>>
>> 15:00:03 Error: Package: vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64 (/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03 Requires: sanlock >= 3.5.0-1*15:00:03* Installing: sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03 sanlock = 3.4.0-1.el7*15:00:03* Error: Package: vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64 (/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03 Requires: sanlock >= 3.5.0-1*15:00:03* Available: sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03 sanlock = 3.4.0-1.el7*15:00:03* You could try using --skip-broken to work around the problem*15:00:03* You could try running: rpm -Va --nofiles --nodigest*15:00:03* Took 498 seconds*15:00:03* ===================================
>>
>>
>> But then it got stuck:
>>
>> 15:00:03 Build step 'Execute shell' marked build as failure*15:00:03* $ ssh-agent -k*15:00:03* unset SSH_AUTH_SOCK;15:00:03 unset SSH_AGENT_PID;15:00:03 echo Agent pid 28320 killed;15:00:03 [ssh-agent] Stopped.
>>
>>
>> And is still running now.
>>
>> Nir
>>
>

Nir Soffer August 1, 2017 at 3:33 PM

Still running now, 39 minutes passed...

On Tue, Aug 1, 2017 at 6:15 PM Nir Soffer <nsoffer@redhat.com> wrote:

>
> http://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/16013/console
>
> The job failed as expected, the package is not available yet:
>
> 15:00:03 Error: Package: vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64 (/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03 Requires: sanlock >= 3.5.0-1*15:00:03* Installing: sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03 sanlock = 3.4.0-1.el7*15:00:03* Error: Package: vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64 (/vdsm-4.20.2-1.git33dd5fe.el7.centos.x86_64)15:00:03 Requires: sanlock >= 3.5.0-1*15:00:03* Available: sanlock-3.4.0-1.el7.x86_64 (centos-base-el7)15:00:03 sanlock = 3.4.0-1.el7*15:00:03* You could try using --skip-broken to work around the problem*15:00:03* You could try running: rpm -Va --nofiles --nodigest*15:00:03* Took 498 seconds*15:00:03* ===================================
>
>
> But then it got stuck:
>
> 15:00:03 Build step 'Execute shell' marked build as failure*15:00:03* $ ssh-agent -k*15:00:03* unset SSH_AUTH_SOCK;15:00:03 unset SSH_AGENT_PID;15:00:03 echo Agent pid 28320 killed;15:00:03 [ssh-agent] Stopped.
>
>
> And is still running now.
>
> Nir
>

Fixed

Details

Assignee

Reporter

Priority

Created August 1, 2017 at 3:16 PM
Updated August 3, 2017 at 3:02 PM
Resolved August 2, 2017 at 9:07 AM