Elasticsearch – Simple Introduction and Web App

Searching through 399 documents or 40 web pages (the numbers are not random) can be a tough task if you rely on standard tools like ctrl+f. When the data grows, you need a dedicated search engine that understands relevance, handles typos, and delivers results instantly.

In this article, we will only build a simple search engine with Python and the power of Elasticsearch. Elasticsearch is now an industry-standard, distributed search and analytics engine designed for horizontal scalability and real-time search.

The YouTube (below) actually presents the code from 3 parts:

  1. We write a crawler to go around vitoshacademy.com and index the first 40 pages in real time. (Long time ago, I have built something similar)
  2. Then we build a local indexer to process 399 static text files directly from the harddrive, simulating some document archive.
  3. The last part of the video is actually a bit better – we step away from the terminal and wrap everything in a web app with Flask. Simple one, with just a field, a button and divs to display results.
This is what the Flask app with elastic search looks like.

The code of the text searcher is here – the indexer:

import os
from elasticsearch import Elasticsearch, helpers

DATA_DIR = os.path.join("target_4_December_release", "EN", "raw-documents") 
INDEX_NAME = "vitosh_data_txt"
es = Elasticsearch("http://127.0.0.1:9200")

def run_indexer():
    if es.indices.exists(index=INDEX_NAME):
        es.indices.delete(index=INDEX_NAME)
    es.indices.create(index=INDEX_NAME)

    documents = []
    
    if not os.path.exists(DATA_DIR):
        print("❌ Error: Folder not found!")
        return

    print(f"📂 Reading files from {DATA_DIR}...")
    for filename in os.listdir(DATA_DIR):
        if filename.endswith(".txt"):
            path = os.path.join(DATA_DIR, filename)
            with open(path, "r", encoding="utf-8") as f:
                text = f.read()
                
            documents.append({
                "_index": INDEX_NAME,
                "_source": {"title": filename, "content": text}
            })

    helpers.bulk(es, documents)
    print(f"✅ Indexed {len(documents)} text files!")

if __name__ == "__main__":
    run_indexer()

And the search itself:

from elasticsearch import Elasticsearch

es = Elasticsearch("http://127.0.0.1:9200")

def search(query):
    resp = es.search(
        index="vitosh_data_txt",
        body={
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": ["content", "title"],
                    "fuzziness": "AUTO"
                }
            },
            "highlight": {
                "fields": {"content": {}},
                "pre_tags": ["<b>"],
                "post_tags": ["</b>"]
            }
        }
    )
    return resp['hits']['hits']

if __name__ == "__main__":
    while True:
        q = input("\n🔍 Search (or 'exit'): ")
        if q == "exit": break
        
        hits = search(q)
        print(f"Found {len(hits)} results:")
        for hit in hits[:3]:
            title = hit['_source']['title']
            snippet = hit['highlight']['content'][0] if 'highlight' in hit else hit['_source']['content'][:100]
            print(f"📄 {title} \n   ...{snippet}...\n")

The Docker command is that one:

docker run --name kib01 --net elastic -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01:9200" docker.elastic.co/kibana/kibana:8.11.1

The rest is in GitHub!

Elasticsearch with Python

🙂