oVirt's Standard-CI is currently implemented using mock, and this has worked well for us so far.
Changing the implementation to use containers will provide several benefits:
Faster start-up times - Most container provides have some form of image layering and caching that will be faster as bringing up a basic OS image then installing it with yum like mock does.
Broader OS support - mock can only run on the Red Hat family of operating systems, and can only emulate those operating systems. Most container providers can both run on and emulate a broader range of operating systems.
Better isolation and cleanup - Mock only isolates the file system, containers can isolate the file system as well as the networking layer and the process space.
Depending on the container provider, we may gain additional benefits:
Some container providers like Kubernetes, can manage distributed compute resources across many nodes. This means can can stop managing Jenkins slaves and instead just have the Jenkins master start up containers on the provider.
Some providers like OpenShift have built-in CI processes for creating and testing container images.
Note: At some point, David started an effort going this way: https://gerrit.ovirt.org/#/c/54376/
I think its the right way going forward as version 2.0 of the standard CI.
At some point we considered moving to stateless slaves instead mock , but I think this approach might prove to be better, because of the advantages of less maintenance of keeping various VMs with multiple operating systems + the overhead of ensuring the stateless mechanism works.
Worth also looking into fabric8.io, it has integration with Kubernetes & OpenShift already.
more info @ https://fabric8.io/articles/index.html
+1 - I think containers support would be great, especially if we use openshift.
Few issues that come to mind from my little experience with docker:
1. docker has limitation, for example: running systemd is almost impossible, also I doubt if you can run qemu inside docker, so this is not relevant for OST.
2. docker wants one process per container(as for the guidelines), I think we have jobs that need to do more than that.
3. docker images are not always one-to-one with the images we want to test on, i.e. can we claim that centos7:latest on docker hub is equivalent to centos7 we "support"? not sure. Moreover, this means we will need to start maintaining images, what we don't do now(in a way we maintain mock configurations which might be considered as maintaining Dockerfiles).
either way, every solution we go with would probably need to be hybrid one(maintaining both flows).
Using OpenShift as the container running platform can provide some benefits:
It provides a distributed execution environment that can replace all the hacks we need to do to maintain Jenkins slaves (Essentially OpenShift become one big slave)
It includes a built-in container registry that may be faster to use then using DockerHub.
It provides the s2i tool that allows one to build efficient images without even writing a 'Dockerfile'.
Here are some pointers for how we could implement Standard-CI with OpenShift:
OpenShift 'build' objects may allow use to automatically run s2i or other processes on Git changes: Hoe Builds Works
The Kubernetes Plugin allows using OpenShift/Kubernetes as an execution environment for Jenkins
The OpenShift Jenkins Pipeline (DSL) Plugin provides access from Jenkins to all the OpenShift features that are not in Kubernetes
With regard to system tests (Lago/OST), in the long term it may be possible to run them using Kubevirt and get both node sharing and multi-node scaling. In the meantime it may actually be possible to launch Lago from inside a container on OpenShift. Here is a Trello ticket from the platform team asking for something like this to be supported on an internal Red Hat OpneShift instance. Note the workarounds they discuss near the end, it looks like this is indeed possible as long as we don't care about sharing the OpenShift instance.
The benefit of running OST on OpenShift should be obvious - it will allow us to have a unified slave platform instead of having to maintain multiple different solutions.
This issue is particularly important for running vdsm check-patch, which includes network integration tests. These tests modify the host network. Due to a bug in code or in test, they may (and sometime ready do) leave some dirt behind, which cause all tests running on the same slave to fail.
If not moving to true containers, consider running mock in its own netns, in order to give vdsm the isolation it needs.
If we implement OVIRT-2031, we may not really need this in the near future.