Find all HTML tags in a web page and print them from a sorted dictionary
Finding all HTML tags from a web page and recording these to a dictionary is magically easy task with Python (if you compare it with #VBA), when Beautiful Soup 4 is used. soup.find_all(True) loops through the tags and a simple if-else is quite a standard recipe for filling out the dictionary.
However, once the dictionary is printed, it looks a bit “ugly”, as far as the key-value pairs looks a bit randomized. Thus, it makes sense to sort the dictionary items to a list, based on their repetitiveness:
import requests
from bs4 import BeautifulSoup
from operator import itemgetter
def main():
url = 'https://www.vitoshacademy.com'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, features="html.parser")
dictionary = {}
for tag in soup.find_all(True):
if tag.name in dictionary:
dictionary[tag.name] += 1
else:
dictionary[tag.name] = 1
print(dictionary)
dictionary_sorted = sorted(dictionary.items(), key=itemgetter(1))
dictionary_sorted.reverse()
for k,v in dictionary_sorted:
print ("{} -> {}".format(k,v))
if __name__== "__main__":
main()
The code is about 20 lines, and it works quite flawlessly!
Enjoy it! If you like Beautiful Soup, you may consider taking a look at my walk-through tutorial from the official documentation here: