Showing all links from a website with Python and BeautifulSoup is actually 6 lines of code – 2 to define the used libraries and 4 to do the job:
1 2 3 4 5 6 7 8 |
import urllib.request from bs4 import BeautifulSoup resp = urllib.request.urlopen("https://www.vitoshacademy.com") soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset')) for link in soup.find_all('a', href=True): print(link['href']) |
In general, BeautifulSoup comes out of the box with beautiful documentation, which can teach a lot of stuff. And the results look ok:
These 6 lines inspired me so much, that I even did a 15 minute video available here:
The code from the video is a bit different, as it uploads to a file as well:
1 2 3 4 5 6 7 8 9 10 11 |
import urllib.request from bs4 import BeautifulSoup resp = urllib.request.urlopen("https://www.vitoshacademy.com") soup = BeautifulSoup(resp) with open('extract.txt', 'w+', encoding='utf-8') as file: for link in soup.find_all('a', href=True): lines = f"{link.getText()}\n{link['href']}\n" print(lines) file.write(f'{lines}') |
Enjoy it!