C# – Extraction of Emails from a Text – Video

In this article I explain how to perform a simple task, which is usually given to administrators – extract e-mails from a text. Once, about 4 or 5 years ago I was given such a task and I lost about a day or two extracting about 5000 the e-mails manually from a pdf file. It was not much fun and that is why I would like to show to everyone how to do it a few times faster – through programming!

The code I present takes a look at a text and simply collects the valid e-mails and prints them at the console. This is done through regular expression, which I found with simple google check. There are two interesting points in the code – the usage of regular expressions and a good usage of arrays. Roughly, that is all.

Here is the code:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string text = @"X-sender: < sender@sendersdomain.com > X-receiver: < somerecipient@recipientdomain.com > From: Senders Name < sender@sendersdomain.com > To: Recipient vvv@bgb.bg Name < somerecipient@recipientdomain.com >  Message-ID: < 5bec11c119194c14999e592feb46e3cf@sendersdomain.com > Date: Sat, 24 Sep 2005 15:06:49 -0400 Subject: Sample Multi-Part";
        Console.WriteLine("This is our text from which we extract all the valid e-mails:\n\n{0}", text);
        Console.WriteLine("\n\nThese are the emails found:\n\n");

        string[] splitText = text.Split(' ');
        List<string> mailList = new List<string>();
        MailCollector(splitText, mailList);

        foreach (var email in mailList)
        {
            Console.WriteLine(email);
        }
        Console.WriteLine("\n\n");
        Console.ReadKey();
    }
    private static void MailCollector(string[] split, List<string> emails)
    {
        for (int i = 0; i < split.Length; i++)
        {
            if (Regex.IsMatch(split[i], @"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z"))
            {
                emails.Add(split[i]);
            }
        }
    }
}

The code is created as a task from a homework, I have done for the TelerikAcademy, and probably you may find a similar code in my homeworks at the tab C# here:

Untitled


So, if you need to understand how the code is created, please take a look at my video:

C# - Emails Extraction from Text

If you need just to take a look of the program, you may download it from here.

The code for this program is given as a task at the free Bulgarian C# book.

Please, feel free to use the code as you wish! Enjoy it!