Removing build jobs makes change-queue fail with an exception
Activity
Show:
Barak Korren February 5, 2018 at 8:19 AM
Fix patch merged
Barak Korren January 15, 2018 at 6:31 AM
@Gal Ben Haim updated the patch to try and completely remove build records for removed jobs, that should fix some of the failures of the kind you saw.
Unfortunately even with that fix its still possible to break the system if a job is removed between the time extra-source is generated to the time repoman runs.
Barak Korren January 14, 2018 at 5:57 PM
hmm... I wonder if we could make repoman just ignore that, doing that from the CQ code would be rather difficult at this point.
Gal Ben Haim January 14, 2018 at 5:18 PMEdited
OST is invoked with "extra-sources' that doesn't exist:
2018-01-14 17:00:00,739::ERROR::repoman.common.sources.jenkins::URL: http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-fc26-x86_64/580//api/json?depth=3
Status: 404
Reason: Not Found
Headers: {'cache-control': 'must-revalidate,no-cache,no-store',
'connection': 'close',
'content-length': '372',
'content-type': 'text/html;charset=iso-8859-1',
'date': 'Sun, 14 Jan 2018 17:00:00 GMT',
'server': 'Jetty(9.4.z-SNAPSHOT)',
'set-cookie': 'JSESSIONID.06ce888e=node014wgr9ck9318j11f1oa3bmjasl1680476.node0;Path=/;HttpOnly',
'via': '1.1 jenkins.phx.ovirt.org',
'x-content-type-options': 'nosniff'}
Body: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /job/vdsm_master_build-artifacts-fc26-x86_64/580//api/json. Reason:
<pre> Not Found</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>
</body>
</html>
Former user January 14, 2018 at 7:24 AM
We had the same issue I think ~1 month ago when someone removed old jobs.
Fixed
Details
Details
Assignee
Barak Korren
Barak Korren(Deactivated)Reporter
Barak Korren
Barak Korren(Deactivated)Blocked By
Blocking on code review
Components
Priority
Created January 14, 2018 at 6:46 AM
Updated February 28, 2018 at 3:33 PM
Resolved February 5, 2018 at 8:19 AM
When changes are added to the change queue, it records the build jobs that build artifacts for changed code.
If the build jobs are removed between the time the change is added to the queue and the time a tested jobs including that change is started, the code checking for completion of the build jobs would fail with an exception looking like the following:
java.lang.NullPointerException: Cannot invoke method getBuild() on null object at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:35) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at Script1$_all_builds_done_closure4.doCall(Script1.groovy:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
'
Script1
' in this stack trace refers to our 'change-queue-tester.groovy
' script. Relevant code section is the following:@NonCPS def all_builds_done(builds) { return !builds.any { if(Jenkins.instance.getItem(it.job_name).getBuild(it.build_id).isBuilding()) { print("${it.job_name} (${it.build_id}) still building") return true } return false } }
The issue it with not checking for '
null
' return from the 'Jenkins.instance.getItem(it.job_name)
' funcntion call.We should probably check for '
null
' there and just ignore it. Similar check will probably be needed in other places in the code like for example when composing the 'extra-sources
' file from the build job URLs.