OST jobs fails on "address already in use"
Description
duplicates
Activity
Eyal Edri October 30, 2017 at 9:27 AM
This is happening for me locally now, even with cleaning the networks and running lago destroy, what is the recommended cleanup action needed to resolve this?
Former user October 23, 2017 at 12:24 PM
@Gal Ben Haim I've seen the issue at least on the following bare metals on Friday:
ovirt-srv17
ovirt-srv18
ovirt-srv21
ovirt-srv22
ovirt-srv23
Gal Ben Haim October 23, 2017 at 8:03 AM
This issue is caused when calling to "lago ovirt serve" (which starts the
repo server) as a subprocess, and not making sure to kill it when it's not
needed anymore (or on failure).
In the past, VDSM's check patch was coded like this, but we fixed it. Could
be that the same bug exists in another suite.
Evgheni, can you specify a slave that had this issue?
On Fri, Oct 20, 2017 at 3:44 PM, Evgheni Dereveanchin <ederevea@redhat.com>
wrote:
> I agree with Barak - checked the slave that was failing and there was a
> process still listening to port 8585.
> The slave was put offline the slave but attempting to run the job on a
> different one caused the exact same error.
> As more slaves are affected this may be a lago bug. No changes were made
> on slaves this week.
>
> On Fri, Oct 20, 2017 at 10:46 AM, Barak Korren <bkorren@redhat.com> wrote:
>
>> looks like there might be a lago localrepo process process left up on the
>> slave from a previous run
>>
>> On 20 October 2017 at 11:26, Eyal Edri <eedri@redhat.com> wrote:
>>
>>> Evgheni,
>>> Was there any change recently to Lago slaves?
>>>
>>> On Fri, Oct 20, 2017 at 11:05 AM, Piotr Kliczewski <
>>> piotr.kliczewski@gmail.com> wrote:
>>>
>>>> I attempted to run manual OST twice and both failed with below issue.
>>>> Can someone take a look?
>>>>
>>>> Thanks,
>>>> Piotr
>>>>
>>>> 2017-10-20 07:59:12,485::log_utils.py::_exit_::607::ovirtlago.prefix:
>>>> EBUG::
>>>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>>>> in wrapper
>>>> return func(*args, **kwargs)
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>>>> 111, in wrapper
>>>> with utils.repo_server_context(args[0]):
>>>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>>>> return self.gen.next()
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>>>> 100, in repo_server_context
>>>> root_dir=prefix.paths.internal_repo(),
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>>>> in _create_http_server
>>>> generate_request_handler(root_dir),
>>>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>>>> self.server_bind()
>>>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
>>>> server_bind
>>>> SocketServer.TCPServer.server_bind(self)
>>>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>>>> self.socket.bind(self.server_address)
>>>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>>>> return getattr(self._sock,name)(*args)
>>>>
>>>> 2017-10-20 07:59:12,485::cmd.py::do_run::365::root::ERROR::Error
>>>> occured, aborting
>>>> Traceback (most recent call last):
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362,
>>>> in do_run
>>>> self.cli_plugins[args.ovirtverb].do_run(args)
>>>> File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
>>>> 184, in do_run
>>>> self._do_run(**vars(args))
>>>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in
>>>> wrapper
>>>> return func(*args, **kwargs)
>>>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in
>>>> wrapper
>>>> return func(*args, prefix=prefix, **kwargs)
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 166,
>>>> in do_deploy
>>>> prefix.deploy()
>>>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>>>> in wrapper
>>>> return func(*args, **kwargs)
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>>>> 111, in wrapper
>>>> with utils.repo_server_context(args[0]):
>>>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>>>> return self.gen.next()
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>>>> 100, in repo_server_context
>>>> root_dir=prefix.paths.internal_repo(),
>>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>>>> in _create_http_server
>>>> generate_request_handler(root_dir),
>>>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>>>> self.server_bind()
>>>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
>>>> server_bind
>>>> SocketServer.TCPServer.server_bind(self)
>>>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>>>> self.socket.bind(self.server_address)
>>>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>>>> return getattr(self._sock,name)(*args)
>>>> error: [Errno 98] Address already in use
>>>> _______________________________________________
>>>> Infra mailing list
>>>> Infra@ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>>
>>>>
>>>>
>>>
>>>
>>> –
>>>
>>> Eyal edri
>>>
>>>
>>> MANAGER
>>>
>>> RHV DevOps
>>>
>>> EMEA VIRTUALIZATION R&D
>>>
>>>
>>> Red Hat EMEA <https://www.redhat.com/>
>>> <https://red.ht/sig> TRIED. TESTED. TRUSTED.
>>> <https://redhat.com/trusted>
>>> phone: +972-9-7692018 <+972%209-769-2018>
>>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>>
>>> _______________________________________________
>>> Infra mailing list
>>> Infra@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>>
>>
>>
>> –
>> Barak Korren
>> RHV DevOps team , RHCE, RHCi
>> Red Hat EMEA
>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>
>
>
>
> –
> Regards,
> Evgheni Dereveanchin
>
–
GAL bEN HAIM
RHV DEVOPS
Barak Korren October 22, 2017 at 7:49 AMEdited
I've created issue #27 on lago-ost-plugin to track improving the 'lago ovirt serve
' mechanism.
Barak Korren October 22, 2017 at 6:52 AM
They also had leftovers of ovirt-master_change-queue-tester in the Jenkins work directory, so this may be the job causing the issue.
No, that last job to run on a slave always leaves its $WORKSPACE behind on that slave so if it runs on it again, some stuff is already cached for it.
We need to check the OST cleanup code and the jobs that previously ran on the slaves to see why the 'lago serve
process on the port was not killed. We should probably also modify how 'lago serve
' works so its less likely to influence other Lago environments trying to run on the same node, and less likely to stay behind.
Former user October 20, 2017 at 3:07 PMEdited
I rebooted ovirt-srv21 which was failing manual tests and @Dusan Fodor started a new build on it. It finished successfully and nothing was listening to port 8585 when I logged in to check after the job finished. I went through all of the bare metals and a few of them had port 8585 still occupied. They also had leftovers of ovirt-master_change-queue-tester in the Jenkins work directory, so this may be the job causing the issue.
Former user October 20, 2017 at 12:46 PM
I agree with Barak - checked the slave that was failing and there was a
process still listening to port 8585.
The slave was put offline the slave but attempting to run the job on a
different one caused the exact same error.
As more slaves are affected this may be a lago bug. No changes were made on
slaves this week.
On Fri, Oct 20, 2017 at 10:46 AM, Barak Korren <bkorren@redhat.com> wrote:
> looks like there might be a lago localrepo process process left up on the
> slave from a previous run
>
> On 20 October 2017 at 11:26, Eyal Edri <eedri@redhat.com> wrote:
>
>> Evgheni,
>> Was there any change recently to Lago slaves?
>>
>> On Fri, Oct 20, 2017 at 11:05 AM, Piotr Kliczewski <
>> piotr.kliczewski@gmail.com> wrote:
>>
>>> I attempted to run manual OST twice and both failed with below issue.
>>> Can someone take a look?
>>>
>>> Thanks,
>>> Piotr
>>>
>>> 2017-10-20 07:59:12,485::log_utils.py::_exit_::607::ovirtlago.prefix:
>>> EBUG::
>>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>>> in wrapper
>>> return func(*args, **kwargs)
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>>> 111, in wrapper
>>> with utils.repo_server_context(args[0]):
>>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>>> return self.gen.next()
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>>> 100, in repo_server_context
>>> root_dir=prefix.paths.internal_repo(),
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>>> in _create_http_server
>>> generate_request_handler(root_dir),
>>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>>> self.server_bind()
>>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
>>> server_bind
>>> SocketServer.TCPServer.server_bind(self)
>>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>>> self.socket.bind(self.server_address)
>>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>>> return getattr(self._sock,name)(*args)
>>>
>>> 2017-10-20 07:59:12,485::cmd.py::do_run::365::root::ERROR::Error
>>> occured, aborting
>>> Traceback (most recent call last):
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362,
>>> in do_run
>>> self.cli_plugins[args.ovirtverb].do_run(args)
>>> File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
>>> 184, in do_run
>>> self._do_run(**vars(args))
>>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in
>>> wrapper
>>> return func(*args, **kwargs)
>>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in
>>> wrapper
>>> return func(*args, prefix=prefix, **kwargs)
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 166,
>>> in do_deploy
>>> prefix.deploy()
>>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>>> in wrapper
>>> return func(*args, **kwargs)
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>>> 111, in wrapper
>>> with utils.repo_server_context(args[0]):
>>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>>> return self.gen.next()
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>>> 100, in repo_server_context
>>> root_dir=prefix.paths.internal_repo(),
>>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>>> in _create_http_server
>>> generate_request_handler(root_dir),
>>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>>> self.server_bind()
>>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
>>> server_bind
>>> SocketServer.TCPServer.server_bind(self)
>>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>>> self.socket.bind(self.server_address)
>>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>>> return getattr(self._sock,name)(*args)
>>> error: [Errno 98] Address already in use
>>> _______________________________________________
>>> Infra mailing list
>>> Infra@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/infra
>>>
>>>
>>>
>>
>>
>> –
>>
>> Eyal edri
>>
>>
>> MANAGER
>>
>> RHV DevOps
>>
>> EMEA VIRTUALIZATION R&D
>>
>>
>> Red Hat EMEA <https://www.redhat.com/>
>> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>> phone: +972-9-7692018 <+972%209-769-2018>
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>> _______________________________________________
>> Infra mailing list
>> Infra@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>
>>
>
>
> –
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>
–
Regards,
Evgheni Dereveanchin
Barak Korren October 20, 2017 at 8:48 AM
looks like there might be a lago localrepo process process left up on the
slave from a previous run
On 20 October 2017 at 11:26, Eyal Edri <eedri@redhat.com> wrote:
> Evgheni,
> Was there any change recently to Lago slaves?
>
> On Fri, Oct 20, 2017 at 11:05 AM, Piotr Kliczewski <
> piotr.kliczewski@gmail.com> wrote:
>
>> I attempted to run manual OST twice and both failed with below issue.
>> Can someone take a look?
>>
>> Thanks,
>> Piotr
>>
>> 2017-10-20 07:59:12,485::log_utils.py::_exit_::607::ovirtlago.prefix:
>> EBUG::
>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>> in wrapper
>> return func(*args, **kwargs)
>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>> 111, in wrapper
>> with utils.repo_server_context(args[0]):
>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>> return self.gen.next()
>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>> 100, in repo_server_context
>> root_dir=prefix.paths.internal_repo(),
>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>> in _create_http_server
>> generate_request_handler(root_dir),
>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>> self.server_bind()
>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
>> SocketServer.TCPServer.server_bind(self)
>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>> self.socket.bind(self.server_address)
>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>> return getattr(self._sock,name)(*args)
>>
>> 2017-10-20 07:59:12,485::cmd.py::do_run::365::root::ERROR::Error
>> occured, aborting
>> Traceback (most recent call last):
>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362, in
>> do_run
>> self.cli_plugins[args.ovirtverb].do_run(args)
>> File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
>> 184, in do_run
>> self._do_run(**vars(args))
>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in
>> wrapper
>> return func(*args, **kwargs)
>> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in
>> wrapper
>> return func(*args, prefix=prefix, **kwargs)
>> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 166,
>> in do_deploy
>> prefix.deploy()
>> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
>> in wrapper
>> return func(*args, **kwargs)
>> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
>> 111, in wrapper
>> with utils.repo_server_context(args[0]):
>> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
>> return self.gen.next()
>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
>> 100, in repo_server_context
>> root_dir=prefix.paths.internal_repo(),
>> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
>> in _create_http_server
>> generate_request_handler(root_dir),
>> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
>> self.server_bind()
>> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
>> SocketServer.TCPServer.server_bind(self)
>> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>> self.socket.bind(self.server_address)
>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>> return getattr(self._sock,name)(*args)
>> error: [Errno 98] Address already in use
>> _______________________________________________
>> Infra mailing list
>> Infra@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>
>>
>>
>
>
> –
>
> Eyal edri
>
>
> MANAGER
>
> RHV DevOps
>
> EMEA VIRTUALIZATION R&D
>
>
> Red Hat EMEA <https://www.redhat.com/>
> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> phone: +972-9-7692018 <+972%209-769-2018>
> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
> _______________________________________________
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>
–
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
Evgheni,
Was there any change recently to Lago slaves?
On Fri, Oct 20, 2017 at 11:05 AM, Piotr Kliczewski <
piotr.kliczewski@gmail.com> wrote:
> I attempted to run manual OST twice and both failed with below issue.
EBUG::
> Can someone take a look?
>
> Thanks,
> Piotr
>
> 2017-10-20 07:59:12,485::log_utils.py::_exit_::607::ovirtlago.prefix:
>
> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
> in wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
> 111, in wrapper
> with utils.repo_server_context(args[0]):
> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
> return self.gen.next()
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
> 100, in repo_server_context
> root_dir=prefix.paths.internal_repo(),
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
> in _create_http_server
> generate_request_handler(root_dir),
> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
> self.server_bind()
> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
> SocketServer.TCPServer.server_bind(self)
> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
> self.socket.bind(self.server_address)
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
>
> 2017-10-20 07:59:12,485::cmd.py::do_run::365::root::ERROR::Error
> occured, aborting
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362, in
> do_run
> self.cli_plugins[args.ovirtverb].do_run(args)
> File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line
> 184, in do_run
> self._do_run(**vars(args))
> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in
> wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in
> wrapper
> return func(*args, prefix=prefix, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 166,
> in do_deploy
> prefix.deploy()
> File "/usr/lib/python2.7/site-packages/lago/log_utils.py", line 636,
> in wrapper
> return func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/ovirtlago/reposetup.py", line
> 111, in wrapper
> with utils.repo_server_context(args[0]):
> File "/usr/lib64/python2.7/contextlib.py", line 17, in _enter_
> return self.gen.next()
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line
> 100, in repo_server_context
> root_dir=prefix.paths.internal_repo(),
> File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 76,
> in _create_http_server
> generate_request_handler(root_dir),
> File "/usr/lib64/python2.7/SocketServer.py", line 419, in _init_
> self.server_bind()
> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in server_bind
> SocketServer.TCPServer.server_bind(self)
> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
> self.socket.bind(self.server_address)
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 98] Address already in use
> _______________________________________________
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>
>
–
Eyal edri
MANAGER
RHV DevOps
EMEA VIRTUALIZATION R&D
Red Hat EMEA <https://www.redhat.com/>
<https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)