Sunday 16 October 2016

ElasticSearch Snapshot and Backup onto GCE

Hello, everyone as the title suggests the following post describe how to backup ElasticSearch Snapshot to GCE (Google Compute Engine).

Why GCE and not Amazon S3?
Because all are stack are listed in Google Compute. :)

Anyway , before I begin.

SPOILER ALERT:  The backup over GCE is only compatible with ElasticSearch Version 5.0+. So if you are using a lower version of ElasticSearch this post will probably not help you much. Perhaps the easiest solution in the above case would be to backup the snapshot locally and then move it over to Google Compute Engine using gcloud command line utility.

Ok, here we go.

First a little background.

We being using ELK to monitor our application logs. Now, the application logging is so heavy that by the end of the month we mostly run into the low (system) space problem. And the only thing we can do (when this happen) is to delete the old indices (so as to free some system space) without affecting our operation. But deleting the indices is not a good solution(atleast without backup) since we could never recover to the old state (If we ever want to).

And btw, I have to admit this, I have procrastinated this task for quite a while(mainly due to other commitment). But not until the day when our ELK stack went down due to low space constraint and fixing it became the order of the day.

Ok, those who don't know, ElasticSearch provide out the box snapshot support (which is quite amazing) plus it also provides a way to back it up with version.

First,

Download ElasticSearch : You need to have ElasticSearch-5.0+ (download it from here)

- Download GCE Plugin :  Next step would be to install the google repository plugin i.e repository-gcs Just follow the below command.

./bin/elasticsearch-plugin install repository-gcs

- Creating a Bucket: Assuming that you already have a Google Account Setup. Next step would involve creating a bucket (where you need to backup the ElasticSearch snapshot)

  1. Connect to the Google Cloud Platform Console.
  2. Select your project.
  3. Got to the Storage Browser.
  4. Click the "Create Bucket" button.
  5. Enter the bucket name.
  6. Select a storage class.
  7. Select a location.
  8. Click the "Create" button.
The plugin supports couple of authentication mode

Compute Engine authentication: This mode is recommended if your Elasticsearch node is running on a Compute Engine virtual machine.

Service Account: The  authentication mode.

For the sake of this post, we would be covering the Service account. But if you are interested in Compute Engine Authentication you can read more about it from here.

To work with the Service Account we first need to create a service account in Google Compute.

One can create the Service Account under IAM & ADMIN section -> Service Account.

Upon creating the Service Account download the given JSON file and move it into the config directory (I named the file as service-acc.json)

- Repository: Before we can start the backup(to GCE) we need to create a snapshot repository.

curl -XPUT 'localhost:9200/_snapshot/GceRepository?pretty' -d '
{
    "type": "gcs",
    "settings": {
      "bucket": "elkp",
      "service_account": "service-acc.json"   
    }
}'
{acknowleged: true}

Confirming the same.

curl -XGET 'localhost:9200/_snapshot/_all?pretty' 
{
  "GceRepository" : {
    "type" : "gcs",
    "settings" : {
      "bucket" : "elkp",
      "service_account" : "service-acc.json"
    }
  }
}

- Snapshot(ting) & BackupWith all done.Now, we are ready to backup the snapshot onto GCE.
curl -XPUT 'localhost:9200/_snapshot/GceRepository/snapshot_1?wait_for_completion=true'

A note on wait_for_completion extracted from here

“ The wait_for_completion parameter specifies whether or not the request should return immediately after snapshot initialization (default) or wait for snapshot completion. During snapshot initialization, information about all previous snapshots is loaded into the memory, which means that in large repositories it may take several seconds (or even minutes) for this command to return even if thewait_for_completion parameter is set to false - Straight from ElasticSearch.

- Restore(ing) : At last a note on restoring the snapshot. Well, even that quite easy as well. 

curl -XPOST 'http://localhost:9200/_snapshot/GceRepository/snapshot_1/_restore'

Note: As mentioned on Elasticsearch guide.
- A snapshot of an index created in 2.x can be restored to 5.x.

But I think the reverse is not true, at least when I tested it. (correct me, If I'm wrong).

- Other Useful Commands: There are few other commands that are good to know.

## status for a currently running snapshot
GET /_snapshot/_status

## status for a given repository
GET /_snapshot/GceRepository/_status

## status for a given snapshot id.
GET /_snapshot/backups/GceRepository/snapshot_1/_status

## deleting a snapshot
DELETE /_snapshot/GceRespository/snapshot_1

I will encourage you to please go through ElasticSearch guide on Repository and Backup for more information on it.

And btw, if I haven't mentioned this yet. ElasticSearch has seriously amazing documentation. You must check it out its spot on.

Hope that helped. See you later.

Thanks.


No comments:

Post a Comment

What did I learn today?

Welcome to the what did I learn today series. The intention of this blog spot is to compose the stuff that I learnt day-to-day basics and jo...