Intro to Elasticsearch

Intro

Download Elasticsearch tar https://www.elastic.co/start examples are using version 5.4.1
Extract tar into directory
Start elasticsearch

./bin/elasticsearch
#or
./bin/elasticsearch -Ecluster.name=the_cluster -Enode.name=the_node

cURL

 curl -XGET 'localhost:9200/_cat/indices?v&pretty'
 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size


 curl -XGET 'localhost:9200/_cat/nodes?v&pretty'
 ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1            6          99  43    1.72    1.69     2.46 mdi       *      the_node


 curl -XGET 'localhost:9200/_cat/health?v&pretty'
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1            6          99  43    1.72    1.69     2.46 mdi       *      the_node

Create a new index

curl -XPUT 'localhost:9200/products?&pretty'
# response
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

Check new index

# list indices
curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   products nKzxKcEVSoC2OoKtL_HLaQ   5   1          0            0       650b           650b

PUT: create and update resources, POST: only to update resources

Every index in Elasticsearch has 5 shards and 1 replica by default.

Create mode new indices

curl -XPUT 'localhost:9200/customers?&pretty'
curl -XPUT 'localhost:9200/orders?&pretty'

Check indices

curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   products  nKzxKcEVSoC2OoKtL_HLaQ   5   1          0            0       650b           650b
yellow open   customers D8OHORC3T-GHpyJDmCgEnA   5   1          0            0       650b           650b
yellow open   orders    JOxpIn6DRuur-MHDO4h6ZQ   5   1          0            0       650b           650b

Add documents to index

-d is for data

curl -XPUT 'localhost:9200/products/mobiles/1?pretty' -d'
{
"name" : "iPhone",
"camera" : "12MP",
"reviews" : ["Incredibly happy to use it", "I like it a lot"]
}'

# response
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

And create anothe rone

curl -XPUT 'localhost:9200/products/mobiles/2?pretty' -d'
{
"name" : "Samsung Galaxy",
"camera" : "8MP",
"reviews" : ["A bit too big for me", "I will sell it soon"]
}'

# response
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

Create a laprop category within products
Autogenerate the ID!

curl -XPOST 'localhost:9200/products/laptops?pretty' -d'
{
"name" : "Macbook Pro",
"storage" : "500GB",
"reviews" : ["Incredibly happy to use it", "I like it a lot"]
}'

# response
{
  "_index" : "products",
  "_type" : "laptops",
  "_id" : "AWDDl4oPUqJGY70jJJ21",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

List indices


curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   products  nKzxKcEVSoC2OoKtL_HLaQ   5   1          4            0     19.2kb         19.2kb
yellow open   customers D8OHORC3T-GHpyJDmCgEnA   5   1          0            0       650b           650b
yellow open   orders    JOxpIn6DRuur-MHDO4h6ZQ   5   1          0            0       650b           650b

Retrieve whole or partial documents

One document

define index, document type and index

curl -XGET 'localhost:9200/products/mobiles/1?pretty'


{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "iPhone",
    "camera" : "12MP",
    "reviews" : [
      "Incredibly happy to use it",
      "I like it a lot"
    ]
  }
}

Don’t include source in the response

curl -XGET 'localhost:9200/products/mobiles/1?pretty&_source=false'
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "1",
  "_version" : 1,
  "found" : true
}

Retrieve only certain fileds

curl -XGET 'localhost:9200/products/mobiles/1?pretty&_source=name,camera'
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "iPhone",
    "camera" : "12MP"
  }
}

Updating whole document

Before

curl -XGET 'localhost:9200/products/mobiles/2?pretty'
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "Samsung Galaxy",
    "camera" : "8MP",
    "reviews" : [
      "A bit too big for me",
      "I will sell it soon"
    ]
  }
}

Change document

curl -XPUT 'localhost:9200/products/mobiles/2?pretty' -d'
{
    "name" : "Samsung Big One",
    "camera" : "16MP",
    "reviews" : [
      "Love it",
      "I will sell it soon"
    ]
  }
'


# response
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : false
}

After

curl -XGET 'localhost:9200/products/mobiles/2?pretty'
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 2,
  "found" : true,
  "_source" : {
    "name" : "Samsung Big One",
    "camera" : "16MP",
    "reviews" : [
      "Love it",
      "I will sell it soon"
    ]
  }
}

Updating partial document

_update API, uses the POST command with a “doc” field

curl -XPOST 'localhost:9200/products/mobiles/2/_update?pretty' -d'
{
"doc" : {"color": "black"}
}
'

Check update

curl -XGET 'localhost:9200/products/mobiles/2?pretty'
{
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "name" : "Samsung Big One",
    "camera" : "16MP",
    "reviews" : [
      "Love it",
      "I will sell it soon"
    ],
    "color" : "black"
  }
}

Update the number in the doc

curl -XPOST 'localhost:9200/products/budget/1/_update?pretty' -d'
{
"script" :  "ctx._source.money +=2"
}

Deleting Document

curl -XDELETE 'localhost:9200/products/mobiles/2?pretty'

# response
{
  "found" : true,
  "_index" : "products",
  "_type" : "mobiles",
  "_id" : "2",
  "_version" : 4,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}

Check if document exists – check server response

curl -i -XHEAD 'localhost:9200/products/mopbiles/2?pretty'

Delete Index

curl -XDELETE 'localhost:9200/orders?pretty'

#response
{
  "acknowledged" : true
}

Bulk operations – retrieve multiple records

curl -XGET 'localhost:9200/_mget?pretty' -d'
{
"docs" : [
{ "_index" : "products",
"_type" : "mobiles",
"_id" : "1"
},
{ "_index" : "products",
"_type" : "mobiles",
"_id" : "2"
}
]
}

'
{
  "docs" : [
    {
      "_index" : "products",
      "_type" : "mobiles",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "name" : "iPhone",
        "camera" : "12MP",
        "reviews" : [
          "Incredibly happy to use it",
          "I like it a lot"
        ]
      }
    },
    {
      "_index" : "products",
      "_type" : "mobiles",
      "_id" : "2",
      "found" : false
    }
  ]
}

'

Shorter version

curl -XGET 'localhost:9200/products/mobiles/_mget?pretty' -d'
{
"docs" : [
{ 
"_id" : "1"
},
{ 
"_id" : "2"
}
]
}
'

# response
{
  "docs" : [
    {
      "_index" : "products",
      "_type" : "mobiles",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "name" : "iPhone",
        "camera" : "12MP",
        "reviews" : [
          "Incredibly happy to use it",
          "I like it a lot"
        ]
      }
    },
    {
      "_index" : "products",
      "_type" : "mobiles",
      "_id" : "2",
      "found" : false
    }
  ]
}

Import documents from JSON

# json file
{"index": {}}
{"name": "XQ", "age" : "44"}

curl -H "Content Type: application/x-ndjson" -XPOST 'localhost:9200/customers/personal/_bulk?pretty&refresh" data-binary @"customers.json"'

How search works

Knows the document exists – Web Crawler
Index the document for lookup – Inverted index
Know how relevant is the document – Scoring
Retrieve results ranked by relevance – Search

The Query DSL

https://www.elastic.co/guide/en/elasticsearch/guide/current/query-dsl-intro.html

Search language that Elasticsearch uses. You should use it in producton to write your queries

2 contexts
* How well does this document match the query? QUERY CONTEXT
* Does this document match this query clause? FILTER CONTEXT

Query context

is the document included in the result or not
Relevance score calculated for every search term the document maps to

High score, more relevant

Filter context

Is the document included in the result or not
No scoring of individual documents
Used on structured data for exact matches
Queries are faster and more performant as there is no scoring to consider

Generate fake JSON data

1000 customers
https://www.json-generator.com/

[
  {
    'repeat(100, 100)': {

      name: '{{firstName()}} {{surname()}}',
      page: '{{integer(18,55)}}',
      phone: '+1 {{phone()}}',
      city: '{{city()}}'

    }
  }
]

Prepare JSON file for import

remove beginning [ and end ]
regex replace },{ with }\n{
add index field before every item

Find 
{"name"
Replace with: 
{"index":{}} \n{"name"

Import data

Delete customers index

curl -XDELETE 'localhost:9200/customers?pretty'

Import data

curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/customers/personal/_bulk?pretty&refresh' --data-binary @"customers.json"

Check


curl -XGET 'localhost:9200/_cat/indices?v&pretty'

Search using request body – get all documents

relevance score is not calculated
stateless searches

curl -XGET 'localhost:9200/products/_search?pretty' -d'
{
"query" : {"match_all" : {} }
}
'

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {"match_all" : {} }
}
'

# response 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 999,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "personal",
        "_id" : "AWDD7B-vUqJGY70jJJ3C",
        "_score" : 1.0,
        "_source" : {
          "name" : "Imelda Mcdaniel",
          "age" : 68,
          "gender" : "female",
          "email" : "imeldamcdaniel@talkalot.com",
          "phone" : "+1 (856) 480-3825",

change the number of results

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {"match_all" : {} },
"size" : 3
}
'

Sort order

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {"match_all" : {} },
"sort" : {"age" : {"order": "desc"}},
"size" : 5
}
'

Source filtering

- relevance score was calculated for each match

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {"term" : { "name" : "gates" } }
}
'

# response
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.429491,
    "hits" : [
...

Filter results but don’t show sources

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"_source" : false,
"query" : {"term" : { "name" : "gates" } }
}
'

# response 
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.429491,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "personal",
        "_id" : "AWDD7B-xUqJGY70jJJ_7",
        "_score" : 4.429491
      },
      {
        "_index" : "customers",
        "_type" : "personal",
        "_id" : "AWDD7B-wUqJGY70jJJ5z",
        "_score" : 4.3718715
      }
    ]
  }
}

Retrieve only defined fields

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"_source" : "st*",  # or [ "st*", "*n*"]
"query" : {"term" : { "name" : "gates" } }
}
'


# response 
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.429491,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "personal",
        "_id" : "AWDD7B-xUqJGY70jJJ_7",
        "_score" : 4.429491,
        "_source" : {
          "street" : "934 Strong Place",
          "state" : "Nevada, 7571"
        }
      },
      {
        "_index" : "customers",
        "_type" : "personal",
        "_id" : "AWDD7B-wUqJGY70jJJ5z",
        "_score" : 4.3718715,
        "_source" : {
          "street" : "194 Columbia Place",
          "state" : "Ohio, 3096"
        }
      }
    ]
  }
}

Includes / Excludes

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"_source" :  {
            "includes" :  ["st*", "*n*"],
            "excludes" : ["*der"]
            }
"query" : {"term" : { "name" : "gates" } }
}
'

Full text fields

Queries using:
* match
* match_phrase
* match_phrase_prefix

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {
            "match" : 
            { "name" : "webster" } }
}
'

curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
"query" : {
            "match" : {
                    "name" :    {
                                "query" : "frank norris",
                                "operator" : "or"
                    }
        }
}
'

SOURCE:
Pluralsight – Searching and Analyzing Data with Elasticsearch: Getting Started

22nds

Livecoding and building feminist internet.

Intro to Elasticsearch

Intro

cURL

Create a new index

Check new index

Create mode new indices

Check indices

Add documents to index

List indices

Retrieve whole or partial documents

One document

Don’t include source in the response

Retrieve only certain fileds

Updating whole document

Updating partial document

Update the number in the doc

Deleting Document

Check if document exists – check server response

Delete Index

Bulk operations – retrieve multiple records

Shorter version

Import documents from JSON

How search works

The Query DSL

Query context

Filter context

Generate fake JSON data

Prepare JSON file for import

Import data

Search using request body – get all documents

change the number of results

Sort order

Source filtering

Filter results but don’t show sources

Retrieve only defined fields

Includes / Excludes

Full text fields