Finding all HTML tags from a web page and recording these to a dictionary is magically easy task with Python (if you compare it with #VBA), when Beautiful Soup 4 is used. soup.find_all(True) loops through the tags and a simple if-else is quite a standard recipe for filling out the dictionary.
However, once the dictionary is printed, it looks a bit “ugly”, as far as the key-value pairs looks a bit randomized. Thus, it makes sense to sort the dictionary items to a list, based on their repetitiveness:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import requests from bs4 import BeautifulSoup from operator import itemgetter def main(): url = 'https://www.vitoshacademy.com' reqs = requests.get(url) soup = BeautifulSoup(reqs.text, features="html.parser") dictionary = {} for tag in soup.find_all(True): if tag.name in dictionary: dictionary[tag.name] += 1 else: dictionary[tag.name] = 1 print(dictionary) dictionary_sorted = sorted(dictionary.items(), key=itemgetter(1)) dictionary_sorted.reverse() for k,v in dictionary_sorted: print ("{} -> {}".format(k,v)) if __name__== "__main__": main() |
The code is about 20 lines, and it works quite flawlessly!
Enjoy it! If you like Beautiful Soup, you may consider taking a look at my walk-through tutorial from the official documentation here: