I am facing EAI_AGAIN errors when pushing or running unit tests in a newly created scratch org from Jenkins CI using SFDX?

864    Asked by AnilJha in Salesforce , Asked on Jul 12, 2021

We sometimes get this error reported from a sfdx force:source:push of a largish code base:

ERROR: Error: getaddrinfo EAI_AGAIN nosoftware-momentum-7459-dev-ed.cs9.my.salesforce.com:443

some six and a half minutes into the push. The push starts about 20 seconds after the sfdx force:org:create that creates the scratch org has completed. We are running Jenkins Pipeline CI on AWS.

Some Googling suggests this EAI_AGAIN error (coming from Node.js that sfdx runs on) means:

Temporary failure in name resolution

Has anyone found a workaround for this? Note we are running on AWS.

PS

We are using parallel pipelines and running many builds at once: does this error get generated when the DNS service is overloaded with requests?

PPS

From the AWS docs this might be relevant:

Each Amazon EC2 instance limits the number of packets that can be sent to the Amazon-provided DNS server to a maximum of 1024 packets per second per network interface. This limit cannot be increased.

More...

Interesting to see some explicit checking for this error in yarn.js:

async exec(args = []) { ... try { await this.fork(this.bin, args, options); debug('done'); } catch (err) { // TODO: https://github.com/yarnpkg/yarn/issues/2191 let networkConcurrency = '--network-concurrency=1'; if (err.message.includes('EAI_AGAIN') && !args.includes(networkConcurrency)) { debug('EAI_AGAIN'); return this.exec(args.concat(networkConcurrency)); } else throw err; } }


Answered by Aashna Saito

I was facing getaddrinfo eai_again error and tried adding 3 minute, 6 minute, 9 minute etc delay for each parallel build so that the builds are not at the same phase at the same time. Resulted in one clean build but looks like that was just random good luck. But it didn't solve my problem.

So I changed to polling and used sfdx force:apex:test:report to poll?. That seems to work around the problem but requires ugly code; this is from a Jenkins pipeline for the unit testing part that most frequently has the errors, though the push does too:

    def experiencingEaiAgainErrors = true if (experiencingEaiAgainErrors) { // Use polling to workaround EAI_AGAIN errors def r1 = shWithResult "sfdx force:apex:test:run --testlevel RunLocalTests --targetusername ${org.username} --json" def testRunId = r1.testRunId def totalSleeps = 0 def status = '' while (status != 'Completed' && totalSleeps < 180 query = "select Status, MethodsEnqueued, MethodsCompleted, MethodsFailed from ApexTestRunResult where AsyncApexJobId = '${testRunId}'" xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed>

Hope this helps!!



Your Answer

Interviews

Parent Categories