How can I use the regex pattern with the exclude function for filtering out unwanted data from a dataset?
I am currently working on a text-processing project in which I need to extract specific information from large datasets by using the regular expression. How can I use a regex pattern with the exclude function for filtering out unwanted data and only capture the desired information?
In the context of Salesforce, here is the approach given:-
Let us consider a scenario where you have a text dataset that contains email addresses and you want to extract only email addresses that do not end with “.gov”. Here is how you can achieve this by using a negative lookahead assertion on the regex pattern:-
Import java.util.regex.Matcher;
Import java.util.regex.Pattern;
Public class RegexExample {
Public static void main(String[] args) {
String text = “Emails: john.doe@example.com, jane_smith@gmail.com, admin@example.gov”;
// Define the regex pattern to match email addresses not ending with “.gov”
String regexPattern = \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.(?!gov\b)[A-Za-z]{2,}\b;
// Compile the pattern and create a matcher object
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(text);
// Find and print matching email addresses
While (matcher.find()) {
System.out.println(“Match: “ + matcher.group());
}
}
}
Here is the example given by using the Python programming language
Import re
# Function to extract email addresses not ending with “.gov” from a given text
Def extract_emails(input_text):
# Define the regex pattern to match email addresses not ending with “.gov”
Regex_pattern = r’[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.(?!gov)[A-Za-z]{2,}’
# Find all matching email addresses
Matches = re.findall(regex_pattern, input_text)
Return matches
# Read text from a file (e.g., emails.txt)
Def read_text_from_file(file_path):
With open(file_path, ‘r’) as file:
Text = file.read()
Return text
# Example usage
If __name__ == “__main__”:
# Read text from a file (you can replace ‘emails.txt’ with your file path)
Input_text = read_text_from_file(‘emails.txt’)
# Extract email addresses not ending with “.gov”
Extracted_emails = extract_emails(input_text)
# Print the extracted email addresses
Print(“Email addresses not ending with ‘.gov’:”)
For email in extracted_emails:
Print(email)
Here is the example given by using HTML:-
<meta</span> charset=”UTF-8”>
<meta</span> name=”viewport” content=”width=device-width, initial-scale=1.0”>
[removed]
Function extractEmails() {
Var inputText = document.getElementById(‘inputText’).value;
Var regexPattern = /[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.(?!gov)[A-Za-z]{2,}/g;
Var extractedEmails = inputText.match(regexPattern);
// Display extracted email addresses
Var outputDiv = document.getElementById(‘outputDiv’);
outputDiv[removed] = “Email addresses not ending with ‘.gov’:”;
if (extractedEmails && extractedEmails.length > 0) {
for (var I = 0; I < extractedEmails xss=removed xss=removed>
[removed]
Email Extraction
Enter text containing email addresses: