Searching through 399 documents or 40 web pages (the numbers are not random) can be a tough task if you rely on standard tools like ctrl+f. When the data grows, you need a dedicated search engine that understands relevance, handles typos, and delivers results instantly.
In this article, we will only build a simple search engine with Python and the power of Elasticsearch. Elasticsearch is now an industry-standard, distributed search and analytics engine designed for horizontal scalability and real-time search.
The YouTube (below) actually presents the code from 3 parts:
- We write a crawler to go around vitoshacademy.com and index the first 40 pages in real time. (Long time ago, I have built something similar)
- Then we build a local indexer to process 399 static text files directly from the harddrive, simulating some document archive.
- The last part of the video is actually a bit better – we step away from the terminal and wrap everything in a web app with Flask. Simple one, with just a field, a button and divs to display results.

This is what the Flask app with elastic search looks like.
The code of the text searcher is here – the indexer:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
import os from elasticsearch import Elasticsearch, helpers DATA_DIR = os.path.join("target_4_December_release", "EN", "raw-documents") INDEX_NAME = "vitosh_data_txt" es = Elasticsearch("http://127.0.0.1:9200") def run_indexer(): if es.indices.exists(index=INDEX_NAME): es.indices.delete(index=INDEX_NAME) es.indices.create(index=INDEX_NAME) documents = [] if not os.path.exists(DATA_DIR): print("ā Error: Folder not found!") return print(f"š Reading files from {DATA_DIR}...") for filename in os.listdir(DATA_DIR): if filename.endswith(".txt"): path = os.path.join(DATA_DIR, filename) with open(path, "r", encoding="utf-8") as f: text = f.read() documents.append({ "_index": INDEX_NAME, "_source": {"title": filename, "content": text} }) helpers.bulk(es, documents) print(f"ā
Indexed {len(documents)} text files!") if __name__ == "__main__": run_indexer() |
And the search itself:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
from elasticsearch import Elasticsearch es = Elasticsearch("http://127.0.0.1:9200") def search(query): resp = es.search( index="vitosh_data_txt", body={ "query": { "multi_match": { "query": query, "fields": ["content", "title"], "fuzziness": "AUTO" } }, "highlight": { "fields": {"content": {}}, "pre_tags": ["<b>"], "post_tags": ["</b>"] } } ) return resp['hits']['hits'] if __name__ == "__main__": while True: q = input("\nš Search (or 'exit'): ") if q == "exit": break hits = search(q) print(f"Found {len(hits)} results:") for hit in hits[:3]: title = hit['_source']['title'] snippet = hit['highlight']['content'][0] if 'highlight' in hit else hit['_source']['content'][:100] print(f"š {title} \n ...{snippet}...\n") |
The Docker command is that one:
|
1 |
docker run --name kib01 --net elastic -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01:9200" docker.elastic.co/kibana/kibana:8.11.1 |
š
