Everyone knows that in the English language there are more than 100K words. As in any other language on Earth. The point is that we DO NOT use all of them in our every day life. We simply use about 1000 to 5000 different words per day (probably, depending on our job position and activities).
So, in order to learn a language correctly, it is a great idea to focus on the mostly used words and simply to ignore the words used by people who have studied English philology and want to show you how much you do not speak “their” language.
With the current code, I may show you how to put a target on which words do you want to focus. Let us take as an example from the book for C# Programming. On page 241 we have some text. Thus, we copy the text into a file, named “input.txt” and located in the directory “TreeMapExampls\bin\Debug”. Then we simply run the program, to get the following message:
Is that all? Actually, yes! The program has generated an output file in the same place, where the input file was located. Beautiful! What is the content of the file? It looks like this:
What is this? It simply tells you that the word “first” is present 3 times in the text and the word “element” is present 6 times there.
Ok, we have some useless list, what then? Well, you may go to excel, paste the list in one column, select it and then go to > DATA>Text to Column and separate by COMMAs.
Then simply sort by value. The first 20 mostly used words are present here:
Beautiful, eh? These are the mostly used words in the paragraph, we have used. Thus, if you do not understand their meaning you would probably have difficulties with the whole text. And furthermore – this paragraph has 127 words, which are used in total 347 times. Considering the fact, that the top 20 words are used 195 times, then the other 107 words are used about 152 times, which is obviously less. So, if you do not speak a foreign language and you would like to concentrate on the “Top” words and not the Shakespeare used one, you better learn the words which are mostly used.
Enjoy the program!
And the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
using System.Collections.Generic; using System; using System.IO; using System.Threading.Tasks; class TreeMapDemo { private static readonly string Text = System.IO.File.ReadAllText("input.txt"); static void Main() { IDictionary<string, int> wordOccurrenceMap = GetWordOccurrenceMap(Text); PrintWordOccurrenceCount(wordOccurrenceMap); } private static IDictionary<string, int> GetWordOccurrenceMap(string text) { string[] tokens = text.Split(' ', '.', ',', '-', '?', '!','<','>','&','[',']','(',')'); IDictionary<string, int> words = new SortedDictionary<string, int>(new CaseInsensitiveComparer()); foreach (string word in tokens) { if (string.IsNullOrEmpty(word.Trim())) { continue; } int count; if (!words.TryGetValue(word, out count)) { count = 0; } words[word] = count + 1; } return words; } private static void PrintWordOccurrenceCount(IDictionary<string, int> wordOccurrenceMap) { StreamWriter writer = new StreamWriter("output.txt"); using(writer) foreach (KeyValuePair<string, int> wordEntry in wordOccurrenceMap) { writer.WriteLine("{0},{1}", wordEntry.Key, wordEntry.Value); } Console.WriteLine(@"The document is generated. Please check the file ""output.txt""."); Console.ReadKey(); } } class CaseInsensitiveComparer : IComparer<string> { public int Compare(String s1, string s2) { return string.Compare(s1,s2,true); } } |