The Fundamentals of Elasticsearch -1

Ecesu Olgun
6 min readFeb 7, 2022

In this writing, I will tell about the below main topics in order:
1) What is the Elasticsearch?
2) Elasticsearch components and fundamentals
3) The features of Elasticsearch
4) What to expect from the Elasticsearch framework?

1) What is the Elasticsearch?

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. Apache Lucene, as you know, is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for almost any application that requires full-text search, especially cross-platform. Also, Elasticsearch uses Apache Lucene infrastructure to search and analytics engine.

Elasticsearch is especially used for tasks such as searching and analyzing text among Big Data blocks. The underlying reason is that Elasticsearch generates results very quickly by searching through indexes instead of searching directly on the text. In addition, it can also perform statistical analyzes and scoring in order to analyze the queries.

2) Elasticsearch components and fundamentals

There are many key components that Elasticsearch uses. In fact, some of these components are structures that we are familiar with from the classical relational databases we use. The table in which we compare the names used in relational databases and Elasticsearch Singular names is given below.

Elasticsearch and relational databases name comparison
  • Indice: Instead of databases in classical relational databases, Indice concepts are used in Elasticsearch. An Elasticsearch cluster can contain multiple indices (databases).
  • Type: Uses Elasticsearch “Type” for tables in relational databases. An indice can contain more than one type (table).
  • Document: In Elasticsearch, rows in relational databases are represented as documents. Each type has more than one document.
  • Field: Columns in classic databases qualify as Fields in Elasticsearch. Each document has more than one field.
  • Index: Every record added to Elasticsearch is configured as a JSON document. That is, for each word (term) in your documents, there is an indexing system that keeps the information of which document or documents contain that word.
  • Mapping: While indexing the data, we need to show what type of data is this. In other words, while indexing a word, it is the process where the data type (text, keyword, boolean) of that word is defined. For more information on data types, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-data-types.html
An example of mapping

More on mapping tuning and more technical topics like the difference between text and keyword will be covered in my next Medium writing.

  • Cluster: It can be called a cluster or collection of nodes consisting of multiple nodes that hold all your data together and all indexing and search capabilities are executed. There are nodes in the cluster and Elasticsearch runs on each node.
  • Node: Any time that you start an instance of Elasticsearch, you are starting a node. Node is the name given to a single server. In other words, it can be said that it is each of the machines where the data is stored. The indexing and search capabilities of clusters are realized thanks to these nodes.
  • Shard: Shards are small units of stored documents. It also works within nodes. When we work with a single node to index a high-dimensional data, we may encounter problems with the disk capacity being full or running very slowly. Shards can be used to prevent this. An index to be made is divided into a node and a node into shards. The two main purposes of using Shard architecture are; It allows you to distribute and parallelize transactions across multiple nodes. Thus, performance increases.
  • Replicas: There is a replica-shard structure that allows one or more copies of index shards to be created in case the shard is disabled. Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document. A replica of a shard should not be hosted on the same node. When a node crashes, it is essential to have backups of the shard(s) in that node on other nodes to prevent data loss.

In Elasticsearch, Replicas and Shards do not have to be determined from the beginning. However, it can be adjusted optionally.

Example of replica-shard structure in each server(nodes)

3) The features of Elasticsearch

1)It is based on Java and open source.
2)Its infrastructure is Apache Lucene.
3)The data storage format is document-oriented, not relational.
4)It can work in a distributed and scalable structure.
5)It allows to analyze real-time data.
6)It can be used with all programming languages because it serves over RestfullAPI (REST is an architecture for client-server communication. Restfull services can return many different types of responses (JSON, XML, CSV, HTTP) between client-server. RestfullAPI is an API that uses these services.).
7)It can do automatic Mapping according to the data type (But it may still be better to create or edit your own mappings!)
8)It has a cluster structure and the cluster structure is quite simple.
9)It can be used with Kibana which can monitor Elasticsearch and Logstash tools to host logs.
10)It offers high availability in itself.
11)Indexes documents as JSON.
12)It promises fast installation and easy configuration.
13)It also makes it possible to export to ES from NOSQL databases such as HBase, Cassandra, MongoDB.

4) What to expect from the Elasticsearch framework?

1) Speed: It can hold all the data that a relational database system can hold, as well as allowing you to query this data much, much faster than a database system. In order to provide this speed, it uses special index structures that can hold numerical values, geographical values, dates, and texts.

2) Scalability: Cluster installations are done automatically by elasticsearch infrastructure. Elasticsearch also automatically decides how it should keep the data and indexes for you.

3) Easy to Use: You can use it with many programming languages ​​such as Java, C#, Python, Javascript, PHP, Ruby, and language-dependent libraries.

4) Relevance: You can rank your search results based on a variety of factors, from term frequency or relevance to popularity and beyond. You can mix and match them with functions to fine-tune how your results appear to your users. At the same time, Elasticsearch is equipped to handle human errors, including complexities such as typos.

5) Durability: Elasticsearch detects failures to keep your cluster (and data) secure and available. With cross-cluster replication (cross-cluster replication), a second cluster can act as a hot backup (which allows you to take a backup without suspending the db or shutting down the system. It is supported by mssql, but can be provided by MySQL only with additional commercial products).

6) Data Storage: With Elasticsearch, you can balance performance and cost. You can store your data locally for fast queries or access unlimited data in S3 (AWS-Simple Storage Service).

I have come to the end of my first blog post about Elasticsearch. Thank you for reading. In my next writing about Elasticsearch is that I will try to explain download and run both Elasticsearch and Kibana.

--

--