Find all HTML tags in a web page and print them from a sorted dictionary

Finding all HTML tags from a web page and recording these to a dictionary is magically easy task with Python (if you compare it with #VBA), when Beautiful Soup 4 is used. soup.find_all(True) loops through the tags and a simple if-else is quite a standard recipe for filling out the dictionary.

However, once the dictionary is printed, it looks a bit “ugly”, as far as the key-value pairs looks a bit randomized. Thus, it makes sense to sort the dictionary items to a list, based on their repetitiveness:

import requests
from bs4 import BeautifulSoup
from operator import itemgetter

def main():
    url = 'https://www.vitoshacademy.com'
    reqs = requests.get(url)
    soup = BeautifulSoup(reqs.text, features="html.parser")
    dictionary = {}

    for tag in soup.find_all(True):
        if tag.name in dictionary:
            dictionary[tag.name] += 1
        else:
            dictionary[tag.name] = 1
    
    print(dictionary)
    
    dictionary_sorted = sorted(dictionary.items(), key=itemgetter(1))
    dictionary_sorted.reverse()
    for k,v in dictionary_sorted:
        print ("{} -> {}".format(k,v))

if __name__== "__main__":
    main()

The code is about 20 lines, and it works quite flawlessly!

Enjoy it! If you like Beautiful Soup, you may consider taking a look at my walk-through tutorial from the official documentation here:

Beautiful Soup - Python - Tutorial - (Part 1)