Elastic Stack Guide Part – 2 (Heartbeat)

In Our previous blog we have seen mainly the filebeat and metric beat and explored the system module of that . In this blog we will see the usage of heart beat and how to monitor the services using heart beat . 

Heartbeat should not be installed on each server you monitor  , it should be installed on some separate servers from which you can monitor all url’s/services . For example we have one server deployed at x.x.x.x:8000 at some server in aws in north region , then we can install heart beat in our four server in each region(north,south,east,west) of aws and can monitor this server from all the servers to check weather services is UP from all india . 

From these four server we can monitors all the services url’s .

For setting up the Heartbeat following is the link : 

https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-installation-configuration.html

For setting up the dashboard download it from

https://github.com/elastic/uptime-contrib

And upload it to kibana in kibana objects using (import option avaiable in following url of kibana )

http://127.0.0.1:5601/app/management/kibana/objects

After that  , On Dashboard tab you can see the MetricBeat monitoring  ,

Also you can see the uptime app in kibana to check status and tls expiry time and history of all downtimes : 

Followings are some screenshots:

Configuration in heartbeat.yml for setting the name of  machine from where url is pinging in heartbeat.yml

processors:
  - add_observer_metadata:
      # Optional, but recommended geo settings for the location Heartbeat is running in
      geo:
        # Token describing this location
        name: sahil-machine
        # Lat, Lon "
        #location: "37.926868, -78.024902"

Configuration in hearebeat.yml for setting  to monitors urls : 

heartbeat.config.monitors:
  # Directory + glob pattern to search for configuration files
  path: ${path.config}/monitors.d/*.yml
  # If enabled, heartbeat will periodically check the config.monitors path for changes
  reload.enabled: false
  # How often to check for changes
  reload.period: 5s

# Configure monitors inline
heartbeat.monitors:
- type: http
  # Set enabled to true (or delete the following line) to enable this example monitor
  enabled: false
  # ID used to uniquely identify this monitor in elasticsearch even if the config changes
  id: my-monitor
  # Human readable display name for this service in Uptime UI and elsewhere
  name: My Monitor
  # List or urls to query
  urls: ["http://localhost:9200"]
  # Configure task schedule
  schedule: '@every 10s'
  # Total test connection and data exchange timeout
  #timeout: 16s
  # Name of corresponding APM service, if Elastic APM is in use for the monitored service.
  #service.name: my-apm-service-name

- type: http
  # Set enabled to true (or delete the following line) to enable this example monitor
  enabled: true
  # ID used to uniquely identify this monitor in elasticsearch even if the config changes
  id: emerge-gurgaon
  # Human readable display name for this service in Uptime UI and elsewhere
  name: emerge-gurgaon
  # List or urls to query
  urls: ["https://app.ameyoemerge.in:8887/"]
  # Configure task schedule
  schedule: '@every 10s'
  # Total test connection and data exchange timeout
  #timeout: 16s
  # Name of corresponding APM service, if Elastic APM is in use for the monitored service.
  #service.name: my-apm-service-name

- type: http
  # Set enabled to true (or delete the following line) to enable this example monitor
  enabled: true
  # ID used to uniquely identify this monitor in elasticsearch even if the config changes
  id: emerge-banglore-app24
  # Human readable display name for this service in Uptime UI and elsewhere
  name: emerge-banglore-app24
  # List or urls to query
  urls: ["https://app24.ameyoemerge.in:8887/"]
  # Configure task schedule
  schedule: '@every 10s'
  # Total test connection and data exchange timeout
  #timeout: 16s
  # Name of corresponding APM service, if Elastic APM is in use for the monitored service.
  #service.name: my-apm-service-name

In the next blog we will explore Logstash with filebeat  . Happy debugging . . . 

Elastic Stack Guide Part – 1

As nowadays lots of our servers are deployed on Cloud and many applications are running on these servers , it is impossible to monitor and analyze logs by going to each servers . Central Logging and Monitoring solution is a must in present time  . 

In this Bog Series , we will learn about usage of Elastic Stack aka ELK  . 

Overview  :   

Elastic Stack is a group of open source products from Elastic designed to help users take data from any type of source and in any format and search, analyze, and visualize that data in real time. The product group was formerly known as ELK Stack, in which the letters in the name stood for the products in the group: ElasticSearch, Logstash and Kibana. A fourth product, Beats, was subsequently added to the stack, rendering the potential acronym unpronounceable. Elastic Stack can be deployed on premises or made available as Software as a Service

Architechture : 

For a small-sized development environment, the classic architecture will look  as follows :

Here there are many different types of beats you can read them from https://www.elastic.co/beats/ . Each beat have different set of usecases  . 

In this blog we will learn about two beats MetricBeat and FileBeat . 

Note – LogStash is an options part in the architecture and should not be needed in most of the cases  . Read more about Logstash at https://www.elastic.co/logstash/

Usage Elastic Stack : 

I am running experiments on CentOS7 machine and using rpm to setup the elastic stack . 

Elastic Search  Installation : 

Commands to install Elastic Search : 

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.14.0-x86_64.rpm
sudo rpm -i elasticsearch-7.14.0-x86_64.rpm
sudo service elasticsearch start

How to check if Elastic Search is running : 

[root@localhost elk]# curl http://127.0.0.1:9200
{
  "name" : "localhost.localdomain",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "MxKYDoJAQRG9D6krdFThsQ",
  "version" : {
    "number" : "7.14.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1",
    "build_date" : "2021-07-29T20:49:32.864135063Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

If you are getting output like above , it means elastic search is installed successfully  . 

Note : To change listen address and port you can change in the following file : /etc/elasticsearch/elasticsearch.yml

Kibana :   

Kibana is the Front end tool which communicates to Elastic search where anyone can monitor and analyze logs . 

Commands to install kibana : 

curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-7.14.0-linux-x86_64.tar.gz
tar xzvf kibana-7.14.0-linux-x86_64.tar.gz
cd kibana-7.14.0-linux-x86_64/
./bin/kibana

Access kibana from the url : 

http://127.0.0.1:5601/app/home#/

Note :  configure vim config/kibana.yml   for port and ip addressed for listening  settings  . 

Beats  

These will be installed on all servers from where we want to collect information  . they are like agents which will send data to Elastic Search  . 

Enabling Metric Beat : 

Every Beats supports different modules , it is up to the use that which module one wnts to enable in each beats  . if we talk about MetricBeat  it has many modules like System,Postgres,Nginx and so on . In this Blog we will see usage of System Module of MetricBeat . 

Commands to install MetricBeat  : 
curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.14.0-x86_64.rpm
sudo rpm -vi metricbeat-7.14.0-x86_64.rpm
Enabling System Module  of Metric Beat :
sudo metricbeat modules enable system
sudo metricbeat setup -e
sudo service metricbeat start

Here we are only enabling system module of metri beats  ,  there are many odule for basic monitoring of aplications like postgresql , nginx , tomcat etc . 

Fo list of modules available in metric beats :  command is 

metricbeat modules list  

Yeipeee  Now we can Monitor System Data in kibana  as follows . 

Open [Metricbeat System] Host overview ECS   in Dashboards in kibana UI . There you can apply filter of host of which one wants to see data  .   

System Module MetricBeat Uses :   What analysis can be Done by System module of MetricBeat : 

Traditionally after accessing linux servers , we gather system information by using many different commands and tools which also takes time , specially when there is some running issue on production . 

Following is the list of information : 

  1. Size information of all partitions 
  2. Read/Write Performance of Hardisk 
  3. InboundOutBound Traffic analysis per Ethernet Port 
  4. Load Avergae analysis of system 
  5. Top Proesses consuming High CPU  and RAM 

All these type of information now can be seen in seconds for some particular host using kibana UI .

Following are some screenshots  : 

Enabling FileBeat

Whether you’re collecting from security devices, cloud, containers, hosts, or OT, Filebeat helps you keep the simple things simple by offering a lightweight way to forward and centralize logs and files.

Commands to install Filebeat : 
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.14.0-x86_64.rpm
rpm -ivh filebeat-7.14.0-x86_64.rpm

Note : For configuring filebeat that where to send data to elastic search or filebeat  configue in /etc/filebeat/filebeat.yml  , cureent as i have only one machine so no need to do an conf as defaut conf will work for me You can check the following lion : https://www.elastic.co/guide/en/beats/filebeat/7.14/configuring-howto-filebeat.html

enabling system logs module in filebeat : 
filebeat modules enable system
(for system logs if we want to set custom paths :   edit the file /etc/filebeat/modules.d/system.yml)  -- Generally no need to change these config in all cases 

filebeat setup -e
sudo service filebeat start

Like  Metric Beat , FileBeats also have list of modules like postgres,nginx , and it also supports logging of popular framework like spring and can collect logs of these applications and provides ways to analyze them easily . 

To check modules list available for filebeat use following command  : 

[root@localhost elk]# filebeat modules list | less

System Module Filebeat Uses :   

Now you can use Kibana UI to analyze system logs like messages etc  . 

Open [Filebeat System] Syslog dashboard ECS  in Dashboard Tab  in Kibana  . 

Following are some screen shots which one can see : 

 

Configure filebeat for custom log files  : 

Now we may have situation where none of Modules and integration with framework logging work in filebeat for our custom application log then in that case you can configure your input manually to configure path of logs to read and analayse them in logs and stream section in kibana UI

Follow the following link to configure your log path : https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html

you can watch logs by : http://127.0.0.1:5601/app/logs/stream 

Here you can search in logs by hostname , filepath  and can also search in whole message which is fetched . 

By default only message column is shown . One can configure then in settings tabs of logs tabs in kibana  . 

Following are some screenshot : 

By Default logs lines are only one column , if for advance debugging we want to break log tine into columns then we need to use Logstash with Grok Filter . 

In next blog we will see the usage of LogStash to break custom logs into columns for better understanding . 

ThankYou all 

Useful Postgresql Commands/Queries Part-1

Introduction

There are many types of queries and of course all are important , but when it comes to monitoring and debugging , we are generally in a hurry and do not remember what to query to check the problem/issue though . We at that time thinks what’s can we find and what will be the query for this , we then search on internet all somehow make some queries and then after issue we forgot . Here in this Blog i will brief some queries and its uses specially in case of Monitoring and Debugging Postgresql Related issue .

In case of Monitoring and Debugging , We generally have following category :

  • Size Related Issues
  • Performance Related issues

Sometimes These issues are interlinked

Size Related Issues

Following will be basic postgres commands/queries which will be very helpful in monitoring or debugging size related issues :

  • How to check Size of Database — — many times we need to check that which database is culprit i.e… which database is consuming major space . Following is query to check size of database in decreasing order according to size consumed
SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database ORDER BY pg_database_size(pg_database.datname) desc;
  • After that We may need to Find Which Table/Index is consuming the most space for this we have the following query :
SELECT
  nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
  • Many Times There are Not much records in the Table But due to dead tuples size of table/index may increase , So to check which table has highest dead tuples following the the query :
  • First Goto the Database then :
SELECT schemaname,relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup DESC limit 10;

Sample Output:

schemaname | relname | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum

— — — — — — + — — — — — — — — — — — — — — — — — — — — — — — — — + — — — — — — + — — — — — — + — — — — — — — — — — — — — — — — — + — — — — — — — — — — — — — — — — —

public | campaign_session_denormalized_data | 1123219 | 114268349 | 2021–01–10 18:27:34.050087+05:30 | 2021–01–19 14:08:58.062574+05:30

Here , from above output, we can also determine if auto vacuum is running properly or not i.e.. When last auto vacuum ran on this any particular table whose dead tuples are high .

Overtime due to MVCC, your table will grow in size (called table bloat) — this is why regular VACUUM is needed. This query will show you list of tables and indexes with the most bloats.

The value represents the number of “wasted bytes”, or the difference between what is actually used by the table and index, and what we compute that it should be.

The way it works is it estimates the optimized size of the table/index by a calculation from each row sizes times total rows, and compare that against the actual table size. Do note that this is an estimate, not an actual figure.

with foo as (
  SELECT
    schemaname, tablename, hdr, ma, bs,
    SUM((1-null_frac)*avg_width) AS datawidth,
    MAX(null_frac) AS maxfracsum,
    hdr+(
      SELECT 1+COUNT(*)/8
      FROM pg_stats s2
      WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
    ) AS nullhdr
  FROM pg_stats s, (
    SELECT
      (SELECT current_setting('block_size')::NUMERIC) AS bs,
      CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
      CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
    FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5  
), rs as (
  SELECT
    ma,bs,schemaname,tablename,
    (datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
    (maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
  FROM foo  
), sml as (
  SELECT
    schemaname, tablename, cc.reltuples, cc.relpages, bs,
    CEIL((cc.reltuples*((datahdr+ma-
      (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
    COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
    COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
  FROM rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
)
SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
  ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
  CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
  CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM sml
ORDER BY wastedbytes DESC

Query extracted but rewrote from checkpostgres

Sample Output:

current_database | schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes

— — — — — — — — — + — — — — — — + — — — — — — — — — — — — — — — — -+ — — — — + — — — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -+ — — — — + — — — — — — —

dashboard | public | job_logs | 1.1 | 4139507712 | job_logs_pkey | 0.2 | 0

dashboard | public | job_logs | 1.1 | 4139507712 | index_job_logs_on_job_id_and_created_at | 0.4 | 0

dashboard | public | events | 1.1 | 3571736576 | events_pkey | 0.1 | 0

dashboard | public | events | 1.1 | 3571736576 | index_events_on_tenant_id | 0.1 | 0

dashboard | public | events | 1.1 | 3571736576 | index_events_on_event_type | 0.2 | 0

dashboard | public | jobs | 1.1 | 2013282304 | index_jobs_on_status | 0.0 | 0

dashboard | public | jobs | 1.1 | 2013282304 | index_jobs_on_tag | 0.3 | 0

dashboard | public | jobs | 1.1 | 2013282304 | index_jobs_on_tenant_id | 0.2 | 0

dashboard | public | jobs | 1.1 | 2013282304 | index_jobs_on_created_at | 0.2 | 0

dashboard | public | jobs | 1.1 | 2013282304 | index_jobs_on_created_at_queued_or_running | 0.0 | 21086208

You should be looking at:

  • tbloat: table bloat, ratio between what it current is, and what it can be optimized to.
  • wastedbytes: number of bytes wasted
  • ibloat & wastedibytes: same as above, but for indexes.

When you see a table with high bloats, then consider running VACUUM ANALYZE on it,

Performance Related issues

For performance Related monitoring we need to run following queries too much in day to day scenario

  • Get Running Queries (And Lock statuses) in PostgreSQL
SELECT S.pid, age(clock_timestamp(), query_start),usename,query,L.mode,L.locktype,L.granted,s.datname FROM pg_stat_activity S inner join pg_locks L on S.pid = L.pid order by L.granted, L.pid DESC;

Above query many help if some particular query is running for hours and has taken some lock due to which others queries are stuck on lock . It also shows database name on which qurie is running . We can also determine if some connection are open and no query running so there may be some session leak in your database (for more for about session leak read following article)

  • To kill some particular query which is causing the issue following is the query :
SELECT pg_cancel_backend(pid);

This will send a SIGINT to the current process.

Hope It will help all of you in altleast some times in your debugging . Please comment and follow me.