Removing build jobs makes change-queue fail with an exception

Description

When changes are added to the change queue, it records the build jobs that build artifacts for changed code.

If the build jobs are removed between the time the change is added to the queue and the time a tested jobs including that change is started, the code checking for completion of the build jobs would fail with an exception looking like the following:

java.lang.NullPointerException: Cannot invoke method getBuild() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:35) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at Script1$_all_builds_done_closure4.doCall(Script1.groovy:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)

'Script1' in this stack trace refers to our 'change-queue-tester.groovy' script. Relevant code section is the following:

@NonCPS def all_builds_done(builds) { return !builds.any { if(Jenkins.instance.getItem(it.job_name).getBuild(it.build_id).isBuilding()) { print("${it.job_name} (${it.build_id}) still building") return true } return false } }

The issue it with not checking for 'null' return from the 'Jenkins.instance.getItem(it.job_name)' funcntion call.

We should probably check for 'null' there and just ignore it. Similar check will probably be needed in other places in the code like for example when composing the 'extra-sources' file from the build job URLs.

Web links

Activity

Show:

Barak Korren February 5, 2018 at 8:19 AM

Fix patch merged

Barak Korren January 15, 2018 at 6:31 AM

updated the patch to try and completely remove build records for removed jobs, that should fix some of the failures of the kind you saw.

Unfortunately even with that fix its still possible to break the system if a job is removed between the time extra-source is generated to the time repoman runs.

Barak Korren January 14, 2018 at 5:57 PM

hmm... I wonder if we could make repoman just ignore that, doing that from the CQ code would be rather difficult at this point.

Gal Ben Haim January 14, 2018 at 5:18 PM
Edited

OST is invoked with "extra-sources' that doesn't exist:

2018-01-14 17:00:00,739::ERROR::repoman.common.sources.jenkins::URL: http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-fc26-x86_64/580//api/json?depth=3 Status: 404 Reason: Not Found Headers: {'cache-control': 'must-revalidate,no-cache,no-store', 'connection': 'close', 'content-length': '372', 'content-type': 'text/html;charset=iso-8859-1', 'date': 'Sun, 14 Jan 2018 17:00:00 GMT', 'server': 'Jetty(9.4.z-SNAPSHOT)', 'set-cookie': 'JSESSIONID.06ce888e=node014wgr9ck9318j11f1oa3bmjasl1680476.node0;Path=/;HttpOnly', 'via': '1.1 jenkins.phx.ovirt.org', 'x-content-type-options': 'nosniff'} Body: <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 404 Not Found</title> </head> <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /job/vdsm_master_build-artifacts-fc26-x86_64/580//api/json. Reason: <pre> Not Found</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/> </body> </html>

Former user January 14, 2018 at 7:24 AM

We had the same issue I think ~1 month ago when someone removed old jobs.

Fixed

Details

Assignee

Reporter

Blocked By

Components

Priority

Created January 14, 2018 at 6:46 AM
Updated February 28, 2018 at 3:33 PM
Resolved February 5, 2018 at 8:19 AM