OST jobs fails on "address already in use"

Description

Evgheni,
Was there any change recently to Lago slaves?

On Fri, Oct 20, 2017 at 11:05 AM, Piotr Kliczewski <
piotr.kliczewski@gmail.com> wrote:

> I attempted to run manual OST twice and both failed with below issue.
> Can someone take a look?
>
> Thanks,
> Piotr
>
> 2017-10-20 07:59:12,485::log_utils.py::_exit_::607::ovirtlago.prefix:
> grinning face with big eyesEBUG::
> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
> in wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
> 111, in wrapper
> with utils.repo_server_context(args[0]):
> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
> return self.gen.next()
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
> 100, in repo_server_context
> root_dir=prefix.paths.internal_repo(),
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
> in _create_http_server
> generate_request_handler(root_dir),
> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
> self.server_bind()
> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
> SocketServer.TCPServer.server_bind(self)
> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
> self.socket.bind(self.server_address)
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
>
> 2017-10-20 07:59:12,485::cmd.py::do_run::365::root::ERROR::Error
> occured, aborting
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362, in
> do_run
> self.cli_plugins[args.ovirtverb].do_run(args)
> File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
> 184, in do_run
> self._do_run(**vars(args))
> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in
> wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in
> wrapper
> return func(*args, prefix=prefix, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 166,
> in do_deploy
> prefix.deploy()
> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
> in wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
> 111, in wrapper
> with utils.repo_server_context(args[0]):
> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
> return self.gen.next()
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
> 100, in repo_server_context
> root_dir=prefix.paths.internal_repo(),
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
> in _create_http_server
> generate_request_handler(root_dir),
> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
> self.server_bind()
> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
> SocketServer.TCPServer.server_bind(self)
> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
> self.socket.bind(self.server_address)
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 98] Address already in use
> _______________________________________________
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>
>

Eyal edri

MANAGER

RHV DevOps

EMEA VIRTUALIZATION R&D

Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)

Activity

Show:

Former user February 28, 2018 at 2:24 PM

Eyal Edri February 28, 2018 at 2:13 PM

any update?

Former user February 1, 2018 at 7:02 AM

We'll probably add a cleanup function to global setup to ensure that lago can run smoothly. I'll talk with offline to see what needs to be done.

Eyal Edri January 31, 2018 at 2:43 PM

So what is the alternate fix?

Former user January 31, 2018 at 2:20 PM

We've abandoned the fix from mock_runner's side for now

Eyal Edri January 31, 2018 at 2:16 PM

what was the latest fix on this issue? did we abandon the fix on mock runner ? is there a plan to fix it from lago side?

Dafna Ron January 31, 2018 at 12:59 PM

We had 2 failures today on network in use.
in the latest one I can see that the build that ran before failed on libvirt issue.

Here is the failed build: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5173/
Here is the build that failed before: http://jenkins.ovirt.org/computer/ovirt-srv17.phx.ovirt.org/builds
This is the host: http://jenkins.ovirt.org/computer/ovirt-srv17.phx.ovirt.org/

Eyal Edri October 31, 2017 at 2:41 PM

Not sure if there anything else to do here, other than solving it on Lago side and we have a ticket there.
Feel free to close if we found the source ( networking suite ) and educated the maintainer how to use lago serve

Former user October 31, 2017 at 11:38 AM

not sure about the recommended cleanup - may be able to say more, but I did the following:
1) netstat -nlp | grep 8585
this should show the pyhton process using up the port and its PID in the last column
2) kill <PID>

Eyal Edri October 30, 2017 at 9:27 AM

This is happening for me locally now, even with cleaning the networks and running lago destroy, what is the recommended cleanup action needed to resolve this?

Former user October 23, 2017 at 12:24 PM

I've seen the issue at least on the following bare metals on Friday:

ovirt-srv17
ovirt-srv18
ovirt-srv21
ovirt-srv22
ovirt-srv23

Fixed

Details

Assignee

Reporter

Priority

Created October 20, 2017 at 8:28 AM
Updated February 28, 2018 at 3:33 PM
Resolved February 28, 2018 at 2:25 PM

Flag notifications