How can I use the regex pattern with the exclude function for filtering out unwanted data from a dataset?

233    Asked by DorineHankey in Salesforce , Asked on Apr 18, 2024

 I am currently working on a text-processing project in which I need to extract specific information from large datasets by using the regular expression. How can I use a regex pattern with the exclude function for filtering out unwanted data and only capture the desired information? 

Answered by Deepa bhawana

 In the context of Salesforce, here is the approach given:-

Let us consider a scenario where you have a text dataset that contains email addresses and you want to extract only email addresses that do not end with “.gov”. Here is how you can achieve this by using a negative lookahead assertion on the regex pattern:-

Import java.util.regex.Matcher;
Import java.util.regex.Pattern;
Public class RegexExample {
    Public static void main(String[] args) {
        String text = “Emails: john.doe@example.com, jane_smith@gmail.com, admin@example.gov”;
        // Define the regex pattern to match email addresses not ending with “.gov”
        String regexPattern = \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.(?!gov\b)[A-Za-z]{2,}\b;
        // Compile the pattern and create a matcher object
        Pattern pattern = Pattern.compile(regexPattern);
        Matcher matcher = pattern.matcher(text);
        // Find and print matching email addresses
        While (matcher.find()) {
            System.out.println(“Match: “ + matcher.group());
        }
    }
}

Here is the example given by using the Python programming language

Import re

# Function to extract email addresses not ending with “.gov” from a given text
Def extract_emails(input_text):
    # Define the regex pattern to match email addresses not ending with “.gov”
    Regex_pattern = r’[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.(?!gov)[A-Za-z]{2,}’
    # Find all matching email addresses
    Matches = re.findall(regex_pattern, input_text)
    Return matches
# Read text from a file (e.g., emails.txt)
Def read_text_from_file(file_path):
    With open(file_path, ‘r’) as file:
        Text = file.read()
    Return text
# Example usage
If __name__ == “__main__”:
    # Read text from a file (you can replace ‘emails.txt’ with your file path)
    Input_text = read_text_from_file(‘emails.txt’)
    # Extract email addresses not ending with “.gov”
    Extracted_emails = extract_emails(input_text)
    # Print the extracted email addresses
    Print(“Email addresses not ending with ‘.gov’:”)
    For email in extracted_emails:
        Print(email)

Here is the example given by using HTML:-




<meta</span> charset=”UTF-8”>

<meta</span> name=”viewport” content=”width=device-width, initial-scale=1.0”>

Email Extraction

[removed]

    Function extractEmails() {
        Var inputText = document.getElementById(‘inputText’).value;
        Var regexPattern = /[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.(?!gov)[A-Za-z]{2,}/g;
        Var extractedEmails = inputText.match(regexPattern);
        // Display extracted email addresses
        Var outputDiv = document.getElementById(‘outputDiv’);
        outputDiv[removed] = “Email addresses not ending with ‘.gov’:”;
        if (extractedEmails && extractedEmails.length > 0) {
            for (var I = 0; I < extractedEmails xss=removed xss=removed>

[removed]



    Email Extraction

    Enter text containing email addresses:

   

   

   





Your Answer