AE-2334 : Update build process to have the docker and other dependency images ready with image
Review Request #1313 — Created Jan. 14, 2026 and submitted — Latest diff uploaded
| Information | |
|---|---|
| pmurugaiyan | |
| AMP | |
| amp_4_0 | |
| AE-2334 | |
| Reviewers | |
| apoorva.sn, pradeep, shuinvy | |
AMP High Availability (HA) Cluster Deployment Guide
This guide describes how to deploy the Array Management Platform (AMP) on a 3-node (or larger) Docker Swarm cluster using the
manage_amp.shautomation script.1. Prerequisites
Hardware
- 3 Nodes (Physical or Virtual Machines)
- OS: Rocky Linux 9 / RHEL 9 (Recommended)
- Resources: Minimum 8GB RAM, 4 vCPUs per node.
Network
- All nodes must be on the same LAN/VLAN.
- Static IPs are recommended for stability.
Firewall Configuration (On ALL Nodes)
You must open the following ports for Docker Swarm and AMP services:
# Docker Swarm Ports
firewall-cmd --add-port=2377/tcp --permanent
firewall-cmd --add-port=7946/tcp --permanent
firewall-cmd --add-port=7946/udp --permanent
firewall-cmd --add-port=4789/udp --permanent
# AMP Service Ports
firewall-cmd --add-port=80/tcp --permanent # HTTP
firewall-cmd --add-port=443/tcp --permanent # HTTPS
firewall-cmd --add-port=5000/tcp --permanent # Local Registry
firewall-cmd --add-port=5432/tcp --permanent # Database (HAProxy/PGBouncer)
firewall-cmd --add-port=5433/tcp --permanent # Database (Direct Patroni)
firewall-cmd --add-port=2379-2380/tcp --permanent # Etcd
firewall-cmd --add-port=8008/tcp --permanent # Patroni API
firewall-cmd --add-port=9200/tcp --permanent # OpenSearch REST (Inter-node & Dashboards)
firewall-cmd --add-port=9300/tcp --permanent # OpenSearch Transport (Cluster)
firewall-cmd --add-port=5601/tcp --permanent # OpenSearch Dashboards
# IMPORTANT: Traffic between nodes on Overlay (UDP 4789) and OpenSearch (9200) MUST be allowed!
firewall-cmd --add-port=3000/tcp --permanent # Grafana
firewall-cmd --add-port=514/tcp --permanent # Logstash Syslog TCP
firewall-cmd --add-port=514/udp --permanent # Logstash Syslog UDP
# Reload
firewall-cmd --reload
⚠️ WARNING: Do NOT manually add Docker interfaces (
docker0,docker_gwbridge) to firewalld zones. Docker manages these interfaces itself. Manual zone assignments causeZONE_CONFLICTerrors that prevent Docker from starting.
System Tuning (On ALL Nodes)
OpenSearch requires increased virtual memory. Run on each node:
# Using manage_amp.sh (recommended)
./manage_amp.sh system_tune
# Or manually:
sudo sysctl -w vm.max_map_count=262144
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
2. Docker Swarm Setup
-
Initialize Swarm on Manager Node (Node 1):
bash docker swarm init --advertise-addr <NODE_1_IP>Copy the "docker swarm join" command output.
-
Join Worker Nodes (Node 2, Node 3):
Run the command copied from step 1 on the other nodes:bash docker swarm join --token <TOKEN> <NODE_1_IP>:2377 -
Rename Nodes (Optional but Recommended):
Assign readable hostnames if not already set (e.g.,amp-node-1,amp-node-2). The script uses Docker Hostnames.To rename a node (run on the respective node):
bash hostnamectl set-hostname <new_hostname> -
Promote Workers to Managers:
Promote the newly joined worker nodes to managers, as all nodes function as managers in this setup.On Node 1 (Manager):
bash docker node ls docker node promote <NODE_ID>
3. Offline Deployment (Air-Gapped Environments)
If you've created an offline bundle using ./manage_amp.sh bundle on a build machine with internet access, follow these steps:
Prerequisites
- Complete Section 1: Ensure all prerequisites (hardware, network, firewall rules) are met on all nodes.
- Complete Section 2: Set up Docker Swarm cluster (init, join workers, promote to managers).
On the Offline Target Machine (Manager Node)
- Transfer Files: Copy these files to the offline machine:
amp_offline_bundle.tar.gz-
tar-bootstrap.rpm(for minimal Rocky Linux installations) -
Install tar (if not present):
bash
rpm -ivh tar-bootstrap.rpm
- Extract Bundle:
bash
tar -xf amp_offline_bundle.tar.gz
cd amp_offline_bundle
- Load Offline Bundle:
bash
./manage_amp.sh load_offline
This installs all dependencies (Docker, rsync, keepalived, Java, Python) and loads Docker images into the local registry.
- Continue with Standard Deployment: After
load_offlinecompletes successfully, jump to Section 4 below (starting from Step 0: Configure VIP).
4. Standard Online Deployment
All deployment actions are handled by the manage_amp.sh script on Node 1 (Manager).
Step 0: Configure Virtual IP (VIP) for HA
To ensure High Availability, configure a Floating VIP that will automatically failover between nodes.
On Node 1 (Master):
./manage_amp.sh vip --vip <VIP_ADDRESS> --priority 101
On Node 2 (Backup):
./manage_amp.sh vip --vip <VIP_ADDRESS> --priority 100
This command configures Keepalived and updates your configuration to use this VIP.
Step 1: Prepare Environment
Navigate to the container directory:
cd container/
Check .env file (Optional). The defaults are usually sufficient. You mainly only need to set passwords if you want non-defaults.
vi .env
Step 2: Build & Push Images (First Time or Updates)
This step pulls the required images from the internet (or loads them) and pushes them to the local registry so all Swarm nodes can access them.
./manage_amp.sh build
This may take a while depending on your internet connection.
Step 3: Auto-Configure & Deploy
Run the deploy command with the --auto flag. This will:
- Detect all Swarm nodes.
- Auto-populate IPs in
.env. - Generate the
stack.ymldynamically (adding Etcd/DB services for each node). - Deploy the stack.
./manage_amp.sh deploy --auto
Step 4: Setup Certificates (Automated)
The deployment script (deploy --auto) automatically checks for and triggers certificate generation if they are missing.
No manual action required.
Step 5: Initialize Security (First Time Only)
Initialize the OpenSearch security index.
./manage_amp.sh security_init
Step 6: Initialize Grafana DB (First Time Only)
Create the Grafana database user and schema in the HA Postgres cluster.
./manage_amp.sh create_grafana_db
Step 7: Configure OpenSearch Dashboards (First Time Only)
Import Dashboards, Index Patterns, and Index Templates (ISM Policies).
./manage_amp.sh configurator
4. Verification
Check Services
Ensure all services are up and running (expected: 3/3 replicas for global services, 1/1 for others).
docker service ls
Verify HA / Failover
- Web Access: Open
https://<Any_Node_IP>/orhttps://<VIP>/. You should see the AMP login. -
Database: Connect to Port
5432on any node. It routes to the current Primary.bash psql -h 127.0.0.1 -p 5432 -U amp_ts_user amp_ts -
Failover Test: Reboot a node inside the cluster.
- Result: The cluster should remain operational.
- Services will reschedule to remaining nodes.
- Database leadership will failover automatically via Patroni/Etcd.
5. Troubleshooting
- Logs:
docker service logs -f amp_<service_name> -
Manual Config Update: If you add a new node to the swarm, re-run:
bash ./manage_amp.sh deploy --auto
Common Issues
| Symptom | Cause | Solution |
|---|---|---|
502 Bad Gateway on some requests |
Dashboards not running on all nodes | Ensure mode: global in stack.yml |
ZONE_CONFLICT Docker crash |
Manual firewalld zone assignment | Remove Docker interfaces from manual zones |
invalid mount config |
Missing log directory on node | Create /var/log/amp/opensearch on all nodes |
ECONNREFUSED to OpenSearch |
Firewall blocking port 9200 | Open port 9200 on all nodes |
Appendix A: Docker Images
The following Docker images are bundled/used by AMP:
| Service | Image | Description |
|---|---|---|
| opensearch | opensearchproject/opensearch |
Search and analytics engine |
| opensearch-dashboards | opensearchproject/opensearch-dashboards |
Visualization UI |
| timescaledb | Custom build (Patroni) | Time-series database with HA |
| pgbouncer | edoburu/pgbouncer |
Connection pooling |
| grafana | grafana/grafana |
Monitoring dashboards |
| telegraf | telegraf |
Metrics collection agent |
| logstash | opensearchproject/logstash-oss-with-opensearch-output-plugin |
Log ingestion |
| nginx | nginx |
Reverse proxy |
| etcd | quay.io/coreos/etcd |
Distributed key-value store |
| haproxy | haproxy |
Database load balancer |
| registry | registry |
Local Docker registry |
| busybox | busybox |
Utility container |
| rocky | rockylinux |
Base OS image |
Appendix B: RPM Packages (Offline Bundle)
The offline bundle includes these packages and their dependencies:
| Package | Purpose |
|---|---|
| docker-ce, docker-ce-cli, containerd.io | Docker runtime |
| docker-buildx-plugin, docker-compose-plugin | Docker plugins |
| keepalived | VIP failover (VRRP) |
| rsync | File synchronization |
| python3 | Scripting |
| java-17-openjdk | OpenSearch security tools |
| tar | Archive extraction |
| openssl, httpd-tools | Certificate generation |
| curl, jq | API calls and JSON parsing |
| bind-utils | DNS tools (dig, nslookup) |
| iputils | Network tools (ping) |
| net-tools | Network debugging (netstat, ifconfig) |
Appendix C: Service Deployment Modes
| Service | Mode | Port | Rationale |
|---|---|---|---|
| nginx | global | 80, 443 | Web access on every node |
| opensearch-dashboards | global | 5601 | Local proxy access |
| grafana | global | 3000 | Local proxy access |
| logstash | global | 514 | Syslog on all nodes |
| telegraf | global | - | Docker monitoring per node |
| haproxy | global | 5432 | Database LB per node |
| pgbouncer | global | - | Connection pooling |
| opensearch | replicated (3) | 9200, 9300 | Stateful cluster |
| timescaledb | replicated (3) | 5433 | Patroni HA cluster |
| etcd | replicated (3) | 2379-2380 | Raft consensus |
Appendix D: Default Ports
| Port | Protocol | Service | Notes |
|---|---|---|---|
| 80 | TCP | Nginx (HTTP) | Redirects to HTTPS |
| 443 | TCP | Nginx (HTTPS) | Web UI entry point |
| 514 | TCP/UDP | Logstash | Syslog ingestion |
| 2377 | TCP | Docker Swarm | Cluster management |
| 2379-2380 | TCP | Etcd | Cluster coordination |
| 3000 | TCP | Grafana | Monitoring UI |
| 4789 | UDP | Docker Overlay | Container networking |
| 5000 | TCP | Registry | Local image storage |
| 5432 | TCP | HAProxy | Database (via LB) |
| 5433 | TCP | TimescaleDB | Direct Patroni access |
| 5601 | TCP | Dashboards | OpenSearch UI |
| 7946 | TCP/UDP | Docker Swarm | Node communication |
| 8008 | TCP | Patroni | Health API |
| 9200 | TCP | OpenSearch | REST API |
| 9300 | TCP | OpenSearch | Cluster transport |
The changes has been tested locally.
