EAI_AGAIN errors when pushing or running unit tests in a newly created scratch org from Jenkins CI using SFDX

685    Asked by BenButler in QA Testing , Asked on Jul 9, 2021

We sometimes get this error reported from a sfdx force:source:push of a largish code base: ERROR: Error: getaddrinfo EAI_AGAIN nosoftware-momentum-7459-dev-ed.cs9.my.salesforce.com:443 some six and a half minutes into the push. The push starts about 20 seconds after the sfdx force:org:create that creates the scratch org has completed. We are running Jenkins Pipeline CI on AWS.Some Googling suggests this EAI_AGAIN error (coming from Node.js that sfdx runs on) means: Temporary failure in name resolution Has anyone found a workaround for this? Note we are running on AWS. PS We are using parallel pipelines and running many builds at once: does this error get generated when the DNS service is overloaded with requests? PPS From the AWS docs this might be relevant: Each Amazon EC2 instance limits the number of packets that can be sent to the Amazon-provided DNS server to a maximum of 1024 packets per second per network interface. This limit cannot be increased.

More... Interesting to see some explicit checking for this error in yarn.js:

async exec(args = []) { ... try { await this.fork(this.bin, args, options); debug('done'); } catch (err) { // TODO: https://github.com/yarnpkg/yarn/issues/2191 let networkConcurrency = '--network-concurrency=1'; if (err.message.includes('EAI_AGAIN') && !args.includes(networkConcurrency)) { debug('EAI_AGAIN'); return this.exec(args.concat(networkConcurrency)); } else throw err; } }
Answered by Bernadette Bond

Tried adding 3 minute, 6 minute, 9 minute etc delay for each parallel build so that the builds are not at the same phase at the same time. Resulted in one clean build but looks like that was just random good luck. Not a solution. So changed to polling thanks to the answer to this question Any way to use sfdx force:apex:test:report to poll?. That seems to work around the problem but requires ugly code; this is from a Jenkins pipeline for the unit testing part that most frequently has the errors, though the push does too:

            def experiencingEaiAgainErrors = true if (experiencingEaiAgainErrors) { // Use polling to workaround EAI_AGAIN errors def r1 = shWithResult "sfdx force:apex:test:run --testlevel RunLocalTests --targetusername ${org.username} --json" def testRunId = r1.testRunId def totalSleeps = 0 def status = '' while (status != 'Completed' && totalSleeps < 180 query = "select Status, MethodsEnqueued, MethodsCompleted, MethodsFailed from ApexTestRunResult where AsyncApexJobId = '${testRunId}'" xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed>

Your Answer

Interviews

Parent Categories