Question

sh commands seem to hang when running in parallel stages

We have a primary build pipeline that uses workflow-cps parallel step to run around ~50 containers at the same time, and it’s heavily sh dependent. As more agents are added to the parallel step, sh commands take longer to complete if they are wrapped in groovy script. (this is true for regular Linux static nodes and k8s pods)

I’ve simplified the pipeline so it will be easier to reproduce:

pipeline {
    agent none
    stages {
        stage('Build and Test') {
            parallel {
                stage('Stage1') {
                    agent {
                        kubernetes {
                            yaml """
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
    - name: test-build
      image: ubuntu
      resources:
        requests:
          memory: 4000Mi
          cpu: 4
        limits:
          memory: 4000Mi
          cpu: 4
      command: ['sleep']
      args: ['6h']
      tty: true
                            """
                        }
                    }
                    steps {
                        container('test-build') {
                            script{
                                for (int i = 0; i < 30; i++) {
                                    sh 'cat /etc/hosts'
                                }
                            }
                        }
                    }
                }
                
                
               stage('Stage2') {
                    agent {
                        kubernetes {
                            yaml """
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
    - name: test-build
      image: ubuntu
      resources:
        requests:
          memory: 4000Mi
          cpu: 4
        limits:
          memory: 4000Mi
          cpu: 4
      command: ['sleep']
      args: ['6h']
      tty: true
                            """
                        }
                    }
                    steps {
                        container('test-build') {
                            script{
                                for (int i = 0; i < 30; i++) {
                                    sh 'cat /etc/hosts'
                                }
                            }
                        }
                    }
                }
                
                // ...
               //  ....
               // in here you can keep pasting duplicates of stages
                
                
            }
        }
    }
}

Apologizes for hard coding each stage, but I wasn’t able to come up with a loop that does it better.

My results are as follows:

7 stages: ~50 seconds for each agent to finish

17 stages: ~120 seconds for each agent to finish

35 stages: ~270 for each agent to finish

If I leave only one stage at the parallel part, then run the same build X50 times at the same time, each build finishes very fast (around 15 seconds) So I don’t think it’s a general load issue, but rather maybe parallel step is throttling the sh response in some way, making them hang. (maybe CpsFlowExecution thread?)

Would apricate your advise.

 3  78  3
1 Jan 1970