oVirt Node build fails due to CPU stuck
Description
Activity

Former user April 10, 2019 at 8:30 AM
Will continue monitoring this in

Former user March 18, 2019 at 2:44 PM
I've reconfigured vm0034 and vm0035 to use VirtIO-SCSI so now all systems with the "80gb-disk" label have disks attached via this bus and should perform much better. please let me know if you see issues like this again.

Former user March 18, 2019 at 2:34 PMEdited
I've checked slave settings and looks like it's using VirtIO as the bus for storage which may explain why I/O is choking and the bus gets reset:
10:44:21 09:44:14,027 ERR kernel:ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
10:44:21 09:44:14,027 ERR kernel:ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in#012 Get event status notification 4a 01 00 00 10 00 00 00 08 00res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
10:44:21 09:44:14,028 ERR kernel:ata2.00: status:
10:44:21 09:44:14,028 INFO kernel:ata2: soft resetting link
10:44:21 09:44:19,296 WARNING kernel:ata2.00: qc timeout (cmd 0xa1)
10:44:21 09:44:19,305 WARNING kernel:ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
10:44:21 09:44:19,305 ERR kernel:ata2.00: revalidation failed (errno=-5)
10:44:21 09:44:19,305 INFO kernel:ata2: soft resetting link*
10:44:21* 09:44:21,510 INFO kernel:ata2.00: configured for MWDMA2
10:44:21 09:44:21,558 INFO kernel:ata2: EH complete
Will go through the disks and move everything to VirtIO-SCSI which is what we use on most VMs in PHX as it performs much better under load. Seems that only VMs on ovirt-srv09 still use plain VirtIO.

Eyal Edri March 18, 2019 at 9:06 AM
If its a nightly job and often fails on VMs, I think we should consider running on BM.
thoughts?

Former user March 18, 2019 at 8:53 AM
If we have BMs, that would be best...
Details
Assignee
Former userFormer user(Deactivated)Reporter
Sandro BonazzolaSandro BonazzolaPriority
Medium
Details
Details
Assignee

Reporter

CPU is getting stuck for the VM running on the slave.
Error is:
*https://jenkins.ovirt.org/job/ovirt-node-ng-image_master_build-artifacts-fc28-x86_64/240/console
<https://jenkins.ovirt.org/job/ovirt-node-ng-image_master_build-artifacts-fc28-x86_64/240/console>*
10:44:14 09:44:13,825 WARNING kernel:ata2: lost interrupt (Status
0x58)10:44:14 09:44:13,834 DEBUG kernel:ata2: drained 65536 bytes to
clear DRQ*10:44:14* 09:44:13,835 EMERG kernel:watchdog: BUG: soft
lockup - CPU#0 stuck for 32s! [scsi_eh_1:85]10:44:14 09:44:13,835
WARNING kernel:Modules linked in: xfs fcoe libfcoe libfc
scsi_transport_fc zram scsi_dh_rdac scsi_dh_emc scsi_dh_alua
parport_pc i2c_piix4 parport joydev loop nls_utf8 isofs 8021q garp mrp
stp llc virtio_console serio_raw qemu_fw_cfg virtio_pci e1000
bochs_drm drm_kms_helper ttm drm ata_generic pata_acpi sunrpc mcryptd
sha256_ssse3 dm_crypt dm_round_robin dm_multipath linear raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs
zstd_decompress xxhash cramfs edd virtio_rng virtio_ring
virtio*10:44:14* 09:44:13,844 WARNING kernel:CPU: 0 PID: 85 Comm:
scsi_eh_1 Not tainted 4.16.3-301.fc28.x86_64 #1*10:44:14* 09:44:13,844
WARNING kernel:Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28
04/01/2014*10:44:21* 09:44:13,845 WARNING kernel:RIP:
0010:_raw_spin_unlock_irqrestore+0xd/0x20*10:44:21* 09:44:13,846
WARNING kernel:RSP: 0018:ffffa31e804cfdf0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff12*10:44:21* 09:44:13,855 WARNING kernel:RAX:
0000000000000000 RBX: ffff9654e8c5c000 RCX: 0000000000000000*10:44:21*
09:44:13,856 WARNING kernel:RDX: 0000000000000000 RSI:
0000000000000202 RDI: 0000000000000202*10:44:21* 09:44:13,856 WARNING
kernel:RBP: ffffffffbd60bd20 R08: 0000000000000038 R09:
00000000000002a4*10:44:21* 09:44:13,857 WARNING kernel:R10:
0000000000000000 R11: 0000000000000001 R12: ffffffffbd60b050*10:44:21*
09:44:13,857 WARNING kernel:R13: ffff9654e8c5c130 R14:
0000000000000202 R15: 0000000000000000*10:44:21* 09:44:13,858 WARNING
kernel:FS: 0000000000000000(0000) GS:ffff9654fbc00000(0000)
knlGS:0000000000000000*10:44:21* 09:44:13,865 WARNING kernel:CS: 0010
DS: 0000 ES: 0000 CR0: 0000000080050033*10:44:21* 09:44:14,008 WARNING
kernel:CR2: 00007fece8177000 CR3: 0000000069c18000 CR4:
00000000000006f0*10:44:21* 09:44:14,008 WARNING kernel:Call
Trace:10:44:21 09:44:14,008 WARNING kernel:
ata_sff_error_handler+0x83/0xe0*10:44:21* 09:44:14,009 WARNING kernel:
ata_scsi_port_error_handler+0x354/0x770*10:44:21* 09:44:14,009 WARNING
kernel: ? scsi_try_target_reset+0x90/0x90*10:44:21* 09:44:14,009
WARNING kernel: ? scsi_eh_get_sense+0x220/0x220*10:44:21* 09:44:14,010
WARNING kernel: ata_scsi_error+0x91/0xc0*10:44:21* 09:44:14,010
WARNING kernel: scsi_error_handler+0xd0/0x5b0*10:44:21* 09:44:14,010
WARNING kernel: ? scsi_eh_get_sense+0x220/0x220*10:44:21* 09:44:14,010
WARNING kernel: kthread+0x112/0x130*10:44:21* 09:44:14,011 WARNING
kernel: ? kthread_create_worker_on_cpu+0x70/0x70*10:44:21*
09:44:14,026 WARNING kernel: ?
kthread_create_worker_on_cpu+0x70/0x70*10:44:21* 09:44:14,026 WARNING
kernel: ret_from_fork+0x35/0x40*10:44:21* 09:44:14,026 WARNING
kernel:Code: a8 08 74 0b 65 81 25 6f 2c 76 42 ff ff ff 7f 89 d0 c3 90
90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 48 89 f7 57
9d <0f> 1f 44 00 00 c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
10:44:21 09:44:14,027 ERR kernel:ata2.00: exception Emask 0x0 SAct
0x0 SErr 0x0 action 0x6 frozen*10:44:21* 09:44:14,027 ERR
kernel:ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio
16392 in#012 Get event status notification 4a 01 00 00 10 00
00 00 08 00res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4
(timeout)10:44:21 09:44:14,028 ERR kernel:ata2.00: status: { DRDY
}10:44:21 09:44:14,028 INFO kernel:ata2: soft resetting
link*10:44:21* 09:44:19,296 WARNING kernel:ata2.00: qc timeout (cmd
0xa1)10:44:21 09:44:19,305 WARNING kernel:ata2.00: failed to
IDENTIFY (I/O error, err_mask=0x4)10:44:21 09:44:19,305 ERR
kernel:ata2.00: revalidation failed (errno=-5)10:44:21 09:44:19,305
INFO kernel:ata2: soft resetting link*10:44:21* 09:44:21,510 INFO
kernel:ata2.00: configured for MWDMA2*10:44:21* 09:44:21,558 INFO
kernel:ata2: EH complete
The slave is vm0034.workers-phx.ovirt.org
<https://jenkins.ovirt.org/computer/vm0034.workers-phx.ovirt.org>
Looking at the slave, it looks like several updates are available including
a kernel update.