So, my idea was to obtain all the elements, tagged with “li” from here – vitoshacademy.com/all, using BeautifulSoup4 and Python.
Initially, I thought about running a simple soup.findAll("li") , but it also added the text from the menus, which were considered also listed items. Thus, there should have been a way to filter out the not needed items. After checking the printed tags, I have noticed, that these are rather descriptive. E.g., with the “Inspect Element” in chrome, one could not see immediately the class, but the “soup” has printed it nicely:
1 2 3 |
<li class="subpost"><a href="https://www.vitoshacademy.com/c-implement-crud-functionality-asp-net-mvc-with-ef-core-video/">C# - Implement CRUD Functionality - ASP.NET MVC with EF Core - Video</a><span class="righttext">[Vitosh Doynov]</span></li> C# - Get started with EF Core in an ASP.NET MVC Web App - Video[Vitosh Doynov] <li class="subpost"><a href="https://www.vitoshacademy.com/c-get-started-with-ef-core-in-an-asp-net-mvc-web-app-video/">C# - Get started with EF Core in an ASP.NET MVC Web App - Video</a><span class="righttext">[Vitosh Doynov]</span></li> |
Thus, after some research, the way to print the list items of a given class was considered to be the following:
1 |
for tag in soup.findAll("li", attrs={'class':'class_name'}): |
And in my case, the whole code running looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import requests from bs4 import BeautifulSoup def main(): url = 'https://www.vitoshacademy.com/all/' reqs = requests.get(url) soup = BeautifulSoup(reqs.text, features="html.parser") for tag in soup.findAll("li", attrs={'class':'subpost'}): print(tag.text) # print(tag) if __name__== "__main__": main() |
Producing the following “report”: