Index: /branches/amp_4_0/platform/tools/container/ARCHITECTURE.md
===================================================================
--- /branches/amp_4_0/platform/tools/container/ARCHITECTURE.md	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/ARCHITECTURE.md	(working copy)
@@ -2,39 +2,67 @@
 
 ## 1. Executive Summary
 
-This document details the architecture of the AMP Platform deployed on a 3-Node Docker Swarm cluster. The current design utilizes a **Hybrid High Availability (HA)** model: it provides full redundancy for network ingress and stateless processing, while pinning stateful storage to a designated primary node to ensure data consistency without external shared storage dependencies (NFS/SAN).
+This document details the architecture of the AMP Platform deployed on a Multi-Node Docker Swarm cluster (minimum 3 nodes). The design utilizes a **Full High Availability (HA)** model with **Dynamic Autoscaling**:
 
+* **Data Layer**: Fully replicated across all nodes (OpenSearch Replicas + Patroni/Postgres Replication). No Single Point of Failure (SPOF).
+* **Compute Layer**: Stateless services run in `global` mode on every node.
+* **Access Layer**: Uses a Floating VIP (Keepalived) and Host Networking for performance and reliability.
+* **Deployment**: Fully automated via `manage_amp.sh` which dynamically detects nodes, **configures VIPs**, and generates configuration.
+
 ## 2. Deployment Topology
 
-The cluster consists of three nodes participating in a Docker Swarm.
+The cluster scales horizontally. A typical 3-node setup is described below.
 
 ### Node Roles
 
 | Node | Hostname | Swarm Role | Labels | Primary Responsibility |
 | :--- | :--- | :--- | :--- | :--- |
-| **Node 1** | `amp-node-1` | **Manager (Leader)** | `type=storage` | **Storage + Compute**. Hosts the database files (OpenSearch, TimescaleDB, Registry) on local high-speed disk. |
-| **Node 2** | `amp-node-2` | **Manager/Worker** | `type=compute` | **Compute Only**. Runs stateless services (Logstash, Nginx, Dashboards). |
-| **Node 3** | `amp-node-3` | **Manager/Worker** | `type=compute` | **Compute Only**. Runs stateless services (Logstash, Nginx, Dashboards). |
+| **Node 1** | `amp-node-1` | **Manager (Leader)** | `type=storage` | **Full Stack**. Runs localized Datastores (OpenSearch/Timescale/Etcd) + Stateless Services. |
+| **Node 2** | `amp-node-2` | **Manager/Worker** | `type=storage` | **Full Stack Replica**. Replicates all data. Can take over as Leader instantly. |
+| **Node 3** | `amp-node-3` | **Manager/Worker** | `type=storage` | **Full Stack Replica**. Replicates all data. Ensures Quorum (Split-Brain protection). |
 
 ### Component Diagram
 
 ```mermaid
-graph TD
+flowchart TD
     Client["Client / Log Sources"] --> VIP["Virtual IP (Keepalived)"]
     
+    subgraph cluster_physical ["Physical Network Layer"]
+        VIP --> Node1_IP["Node 1 IP"]
+        VIP --> Node2_IP["Node 2 IP"]
+        VIP --> Node3_IP["Node 3 IP"]
+    end
+
     subgraph cluster_swarm ["Docker Swarm Cluster"]
-        VIP --> Nginx_Service["Nginx (Web Port 443)"]
-        VIP --> Logstash["Logstash (Syslog Port 514)"]
-        
-        subgraph cluster_stateless ["Stateless Layer (Any Node)"]
-            Nginx_Service --> Dashboards
-            Nginx_Service --> Grafana
+        direction TB
+
+        subgraph Access_Layer ["Global Access Layer (Host Mode)"]
+            Nginx["Nginx (Global) - :443"]
+            HAProxy["HAProxy (Global) - :5432"]
+            PgBouncer["PgBouncer (Global)"]
         end
+
+        subgraph Compute_Layer ["Stateless Compute"]
+            Logstash["Logstash"]
+            Dashboards["OpenSearch Dashboards"]
+            Grafana["Grafana"]
+            Telegraf["Telegraf (Global Monitor)"]
+        end
         
-        subgraph cluster_stateful ["Stateful Layer (Node 1 Only)"]
-            Logstash --> OpenSearch[("OpenSearch Data")]
-            Grafana --> Timescale[("TimescaleDB Data")]
+        subgraph Storage_Layer ["Stateful Storage (Clustered)"]
+            OS["OpenSearch Cluster (3 Primary + 3 Replica)"]
+            DB["TimescaleDB Cluster (Patroni + Etcd)"]
         end
+
+        Node1_IP --> Nginx
+        Nginx --> Dashboards
+        Nginx --> Grafana
+        
+        Node1_IP --> HAProxy
+        HAProxy --> DB
+
+        Logstash --> OS
+        Grafana --> HAProxy
     end
 ```
 
@@ -42,48 +70,59 @@
 
 ### 3.1 Network Layer (Ingress)
 
-* **Technology**: `Keepalived` (VRRP).
-* **Mechanism**: A floating Virtual IP (VIP) is assigned to the active Leader (Node 1). If Node 1 fails, the VIP automatically migrates to Node 2 or Node 3 within seconds.
-* **Benefit**: External systems (Syslog senders, User Browsers) never need to reconfigure IPs. The "Front Door" is always open.
+* **Keepalived (VRRP)**: Manages a floating Virtual IP (VIP). If a node fails, the VIP floats to a healthy node in <2 seconds.
+* **Host Networking**: Nginx and HAProxy bind directly to the Host IP (`mode: host`). This bypasses the Docker Swarm "Ingress Mesh" to avoid timeout issues and improve throughput.
 
-### 3.2 Compute Layer (Stateless Services)
+### 3.2 Access Layer (Load Balancing)
 
-* **Services**: `nginx`, `logstash`, `opensearch-dashboards`, `grafana`.
-* **Mechanism**: Docker Swarm Orchestration.
-* **Behavior**: These services are not pinned. If a node fails, Swarm reschedules replicas to remaining healthy nodes.
-* **Benefit**: Continuous request processing. Dashboards and Ingestion endpoints remain accessible.
+* **HAProxy (Global)**: Runs on *every* node. Listens on `0.0.0.0:5432`.
+  * It health-checks all Patroni instances via API (`:8008`).
+  * It *always* routes traffic to the current **Cluster Leader**, regardless of which node the client connected to.
+* **Nginx (Global)**: Runs on *every* node. Ensures the Web UI is accessible via the VIP no matter where it lands.
 
-### 3.3 Storage Layer (Stateful Services)
+### 3.3 Storage Layer (Data Persistence)
 
-* **Services**: `opensearch`, `timescaledb`.
-* **Mechanism**: Pinned placement (`constraints: - node.labels.type == storage`) using Local Docker Volumes (`driver: local`).
-* **Behavior**: These services MUST run on Node 1 because their data files exist physically on Node 1's disk. They cannot start on other nodes.
+* **OpenSearch**: Configured with `number_of_replicas: 1` (or more). Shards are distributed. Loss of 1 node results in ZERO data loss and ZERO downtime.
+* **TimescaleDB (Patroni)**: Uses **Synchronous/Asynchronous Replication**.
+  * **Etcd**: A 3-node Raft cluster (one per node) manages the cluster state.
+  * **Failover**: If the Leader fails, Patroni elects a new Leader within seconds. App connections (via HAProxy) are reset and immediately reconnect to the new Leader.
 
-## 4. Architectural Analysis
+## 4. Automation & Scaling
 
-### 4.1 Advantages
+The platform uses a **Dynamic Configuration** engine (`manage_amp.sh`).
 
-1. **High Performance (I/O)**: By using local disks (NVMe/SSD) on Node 1 instead of NFS, database write speeds are maximized. This is critical for high-volume log ingestion.
-2. **Stateless Scalability**: Heavy processing tasks (Log Parsing via Logstash, SSL Termination via Nginx) are distributed across 3 nodes. Node 1 is not overwhelmed by CPU tasks, leaving it free to handle Database I/O.
-3. **Simplicity**: No requirement for external NAS, SAN, or complex Ceph/GlusterFS configurations. Reduced maintenance overhead.
-4. **Partial Failover**: In the event of a Node 1 failure, the UI remains accessible (with connection errors) and VIP remains pingable, preventing "Connection Refused" errors on client side.
+### Auto-Configuration Workflow
 
-### 4.2 Limitations & Risks
+1. **Discovery**: The script inspects the Docker Swarm registry to find all active nodes.
+2. **Generation**:
+    * Generates a dynamic `.env` file with detected IPs.
+    * Generates a `stack.yml` from a template, appending an `etcd` service for every detected node.
+    * Generates `haproxy.cfg` with backend servers for every detected node.
+3. **Deployment**: Updates the stack in real-time.
 
-1. **Single Point of Failure (Data)**: If Node 1 fails, **OpenSearch and TimescaleDB go offline**. No new logs can be written, and no historical data can be queried.
-2. **Data Loss Risk (Buffer Overflow)**: While Logstash (on surviving nodes) will buffer incoming logs in memory, it will eventually fill up and start dropping data if Node 1 does not recover quickly.
-3. **Manual Recovery**: If Node 1 has a hardware failure, recovering the data requires restoring from backups, as the data does not exist on Nodes 2/3.
+### Adding a Node
 
-## 5. Solution: Path to Full HA
+To scale from 3 to 4+ nodes:
 
-To mitigate the storage limitation and achieve **Zero Downtime**, the cluster can be upgraded to use **Shared Storage**.
+1. Run `docker swarm join` on the new node.
+2. On the Manager, run:
 
-### Implementation: NFS Migration
+    ```bash
+    ./manage_amp.sh deploy --auto
+    ```
 
-* **Dependency**: An external NFS Server accessible by all 3 nodes.
-* **Configuration**:
-    1. Update `stack.yml` volumes to use `driver_opts: type: nfs`.
-    2. Remove `placement.constraints` from database services.
-* **Result**: Database containers effectively become stateless. If Node 1 fails, Swarm simply restarts OpenSearch on Node 2, which mounts the same NFS path and resumes operations immediately.
+3. The system automatically detects the new node, scales OpenSearch/Database/Etcd, and reconfigures Load Balancers.
 
-This upgrade can be performed without reinstalling the cluster, by modifying the `stack.yml` as detailed in `README.md`.
+## 5. Architectural Analysis
+
+### Advantages
+
+1. **Zero Single Point of Failure**: Every component (Storage, Compute, Network) is redundant.
+2. **Seamless Failover**: User sessions and Database connections survive node failures.
+3. **Operational Simplicity**: No external NAS/SAN required. Uses local disk performance (NVMe recommended).
+4. **Auto-Scaling**: "One-Command" expansion capability.
+
+### Requirements
+
+* **Latency**: Low latency (<10ms) between nodes is required for synchronous replication.
+* **Quorum**: A minimum of 3 nodes is required to maintain Quorum for Etcd and OpenSearch Master election. (2-node clusters are possible but not capable of automatic failover).
Index: /branches/amp_4_0/platform/tools/container/DEPLOYMENT.md
===================================================================
--- /branches/amp_4_0/platform/tools/container/DEPLOYMENT.md	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/DEPLOYMENT.md	(working copy)
@@ -0,0 +1,212 @@
+# AMP High Availability (HA) Cluster Deployment Guide
+
+This guide describes how to deploy the Array Management Platform (AMP) on a 3-node (or larger) Docker Swarm cluster using the `manage_amp.sh` automation script.
+
+## 1. Prerequisites
+
+### Hardware
+
+* **3 Nodes** (Physical or Virtual Machines)
+* **OS**: Rocky Linux 9 / RHEL 9 (Recommended)
+* **Resources**: Minimum 8GB RAM, 4 vCPUs per node.
+
+### Network
+
+* All nodes must be on the same LAN/VLAN.
+* Static IPs are recommended for stability.
+
+### Firewall Configuration (On ALL Nodes)
+
+You must open the following ports for Docker Swarm and AMP services:
+
+```bash
+# Docker Swarm Ports
+firewall-cmd --add-port=2377/tcp --permanent
+firewall-cmd --add-port=7946/tcp --permanent
+firewall-cmd --add-port=7946/udp --permanent
+firewall-cmd --add-port=4789/udp --permanent
+
+# AMP Service Ports
+firewall-cmd --add-port=80/tcp --permanent   # HTTP
+firewall-cmd --add-port=443/tcp --permanent  # HTTPS
+firewall-cmd --add-port=5000/tcp --permanent # Local Registry
+firewall-cmd --add-port=5432/tcp --permanent # Database (HAProxy/PGBouncer)
+firewall-cmd --add-port=5433/tcp --permanent # Database (Direct Patroni)
+firewall-cmd --add-port=2379-2380/tcp --permanent # Etcd
+firewall-cmd --add-port=8008/tcp --permanent # Patroni API
+firewall-cmd --add-port=9200/tcp --permanent # OpenSearch REST
+firewall-cmd --add-port=9300/tcp --permanent # OpenSearch Transport (Cluster)
+firewall-cmd --add-port=5601/tcp --permanent # OpenSearch Dashboards
+firewall-cmd --add-port=3000/tcp --permanent # Grafana
+firewall-cmd --add-port=514/tcp --permanent  # Logstash Syslog TCP
+firewall-cmd --add-port=514/udp --permanent  # Logstash Syslog UDP
+
+# Reload
+firewall-cmd --reload
+```
+
+---
+
+## 2. Docker Swarm Setup
+
+1. **Initialize Swarm on Manager Node (Node 1)**:
+
+    ```bash
+    docker swarm init --advertise-addr <NODE_1_IP>
+    ```
+
+    *Copy the "docker swarm join" command output.*
+
+2. **Join Worker Nodes (Node 2, Node 3)**:
+    Run the command copied from step 1 on the other nodes:
+
+    ```bash
+    docker swarm join --token <TOKEN> <NODE_1_IP>:2377
+    ```
+
+3. **Rename Nodes (Optional but Recommended)**:
+    Assign readable hostnames if not already set (e.g., `amp-node-1`, `amp-node-2`). The script uses Docker Hostnames.
+
+    To rename a node (run on the respective node):
+
+    ```bash
+    hostnamectl set-hostname <new_hostname>
+    ```
+
+4. **Promote Workers to Managers**:
+    Promote the newly joined worker nodes to managers, as all nodes function as managers in this setup.
+
+    On **Node 1 (Manager)**:
+
+    ```bash
+    docker node ls
+    docker node promote <NODE_ID>
+    ```
+
+---
+
+## 3. Deployment
+
+All deployment actions are handled by the `manage_amp.sh` script on **Node 1 (Manager)**.
+
+### Step 0: Configure Virtual IP (VIP) for HA
+
+To ensure High Availability, configure a Floating VIP that will automatically failover between nodes.
+
+On **Node 1 (Master)**:
+
+```bash
+./manage_amp.sh vip --vip <VIP_ADDRESS> --priority 101
+```
+
+On **Node 2 (Backup)**:
+
+```bash
+./manage_amp.sh vip --vip <VIP_ADDRESS> --priority 100
+```
+
+*This command configures Keepalived and updates your configuration to use this VIP.*
+
+### Step 1: Prepare Environment
+
+Navigate to the container directory:
+
+```bash
+cd container/
+```
+
+Check `.env` file (Optional). The defaults are usually sufficient. You mainly only need to set passwords if you want non-defaults.
+
+```bash
+vi .env
+```
+
+### Step 2: Build & Push Images (First Time or Updates)
+
+This step pulls the required images from the internet (or loads them) and pushes them to the local registry so all Swarm nodes can access them.
+
+```bash
+./manage_amp.sh build
+```
+
+*This may take a while depending on your internet connection.*
+
+### Step 3: Auto-Configure & Deploy
+
+Run the deploy command with the `--auto` flag. This will:
+
+1. Detect all Swarm nodes.
+2. Auto-populate IPs in `.env`.
+3. Generate the `stack.yml` dynamically (adding Etcd/DB services for each node).
+4. Deploy the stack.
+
+```bash
+./manage_amp.sh deploy --auto
+```
+
+### Step 4: Setup Certificates (Automated)
+
+The deployment script (`deploy --auto`) automatically checks for and triggers certificate generation if they are missing.
+
+*No manual action required.*
+
+### Step 5: Initialize Security (First Time Only)
+
+Initialize the OpenSearch security index.
+
+```bash
+./manage_amp.sh security_init
+```
+
+### Step 6: Initialize Grafana DB (First Time Only)
+
+Create the Grafana database user and schema in the HA Postgres cluster.
+
+```bash
+./manage_amp.sh create_grafana_db
+```
+
+### Step 7: Configure OpenSearch Dashboards (First Time Only)
+
+Import Dashboards, Index Patterns, and Index Templates (ISM Policies).
+
+```bash
+./manage_amp.sh configurator
+```
+
+---
+
+## 4. Verification
+
+### Check Services
+
+Ensure all services are up and running (expected: `3/3` replicas for global services, `1/1` for others).
+
+```bash
+docker service ls
+```
+
+### Verify HA / Failover
+
+1. **Web Access**: Open `https://<Any_Node_IP>/` or `https://<VIP>/`. You should see the AMP login.
+2. **Database**: Connect to Port `5432` on any node. It routes to the current Primary.
+
+    ```bash
+    psql -h 127.0.0.1 -p 5432 -U amp_ts_user amp_ts
+    ```
+
+3. **Failover Test**: Reboot a node inside the cluster.
+    * **Result**: The cluster should remain operational.
+    * Services will reschedule to remaining nodes.
+    * Database leadership will failover automatically via Patroni/Etcd.
+
+---
+
+## 5. Troubleshooting
+
+* **Logs**: `docker service logs -f amp_<service_name>`
+* **Manual Config Update**: If you add a new node to the swarm, re-run:
+
+    ```bash
+    ./manage_amp.sh deploy --auto
+    ```
Index: /branches/amp_4_0/platform/tools/container/README.md
===================================================================
--- /branches/amp_4_0/platform/tools/container/README.md	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/README.md	(working copy)
@@ -99,6 +99,17 @@
 ./install_prerequisites.sh
 ```
 
+### 1a. Kernel Configuration (CRITICAL)
+
+On **ALL NODES** (Manager and Workers), you must increase the memory map limit for OpenSearch:
+
+```bash
+sysctl -w vm.max_map_count=262144
+echo "vm.max_map_count=262144" >> /etc/sysctl.conf
+```
+
+*Failure to do this will cause OpenSearch containers to crash with `Exit 1`.*
+
 ### 2. Configure Hostnames (Critical for Swarm)
 
 Docker Swarm uses hostnames to identify nodes. If all nodes are named `localhost`, it will be very confusing.
@@ -136,6 +147,19 @@
 docker swarm join --token <TOKEN> <MANAGER_IP>:2377
 ```
 
+### 4. Configure Registry on Workers (CRITICAL)
+
+Worker nodes need to trust the insecure registry running on the Manager node (HTTP).
+
+On **each Worker node**, run:
+
+```bash
+# Replace <MANAGER_IP> with the actual IP of your Manager Node
+./manage_amp.sh config_registry <MANAGER_IP>
+```
+
+*This adds the Manager IP to `/etc/docker/daemon.json` and restarts Docker.*
+
 ### 4. Configure Storage Node
 
 Choose **one** node (usually the manager or a specific high-disk node) to host the databases. Find its Node ID and label it:
@@ -167,7 +191,7 @@
 On the **Manager** node, specify the **Manager IP** for the registry so workers can reach it:
 
 ```bash
-export REGISTRY=<MANAGER_IP>:5000
+# The script will auto-detect the host IP and configure the registry URL
 ./manage_amp.sh build
 ./manage_amp.sh deploy
 ```
@@ -186,26 +210,35 @@
 
 ## 5. Multi-Node Deployment with Manager and Worker (Small Clusters)
 
-This guide is for small clusters (1-5 nodes) where nodes must act as both **Managers** and **Workers** to maximize resource utilization, often requiring high availability (HA) for the control plane.
+This guide is for small clusters (e.g., 3 nodes) where nodes must act as both **Managers** and **Workers** to provide High Availability (HA) for both the control plane and workloads.
 
-### 1. Prerequisite Setup (Small Cluster)
+In this setup:
 
-On **every node** (Managers and future Workers), run the installation script:
+- **OpenSearch**: Runs on all 3 nodes (Active-Active).
+- **TimescaleDB**: Runs on all 3 nodes (Active-Passive with Patroni & HAProxy).
+- **Grafana**: Pinned to one node (labeled `type=storage`).
 
-```bash
-# Copy install_prerequisites.sh to the node
-./install_prerequisites.sh
-```
+### 1. Prerequisites (All Nodes)
 
-# Copy install_prerequisites.sh to the node
+On **every node** (Node 1, 2, 3), perform these steps:
 
-./install_prerequisites.sh
+1. **Install Docker & Dependencies**:
 
-```
+    ```bash
+    ./install_prerequisites.sh
+    ```
 
+2. **Configure Kernel Limits (CRITICAL)**:
+    Required for OpenSearch to start.
+
+    ```bash
+    sysctl -w vm.max_map_count=262144
+    echo "vm.max_map_count=262144" >> /etc/sysctl.conf
+    ```
+
 ### 2. Configure Hostnames
 
-Since all nodes act as both Managers and Workers, give them generic numbered names:
+Give each node a unique name.
 
 ```bash
 # Node 1
@@ -218,138 +251,133 @@
 hostnamectl set-hostname amp-node-3
 ```
 
-*Re-login to the shell for the change to take effect.*
+*Re-login to ensure the hostname updates.*
 
-### 3. Initialize First Manager
+### 3. Initialize Swarm Cluster
 
-On the **first node** (Node 1):
+1. **Initialize Primary Manager** (On Node 1):
 
-```bash
-./manage_amp.sh init
-```
-
-*This initializes the Swarm and generates the join token.*
+    ```bash
+    ./manage_amp.sh init
+    ```
 
-### 3. Join & Promote Other Nodes
+    **For multi-homed hosts** (servers with multiple network interfaces), specify the advertise address:
 
-1. **Join Node 2 & 3**: Run the join command (output from Step 2) on the other nodes.
+    ```bash
+    # Replace <NODE_1_IP> with the IP address you want to use for cluster communication
+    docker swarm init --advertise-addr <NODE_1_IP>
+    ```
 
+    *Copy the "docker swarm join" command output.*
+
+2. **Join Other Nodes**:
+    Run the join command on **Node 2** and **Node 3**:
+
     ```bash
     docker swarm join --token <TOKEN> <NODE_1_IP>:2377
     ```
 
-2. **Promote to Manager** (On Node 1):
-    To ensure HA for the Swarm control plane, promote the new nodes to Managers.
+3. **Promote to Managers** (On Node 1):
+    For a 3-node HA cluster, all nodes should be managers.
 
     ```bash
-    docker node promote <NODE_2_HOSTNAME>
-    docker node promote <NODE_3_HOSTNAME>
+    docker node promote amp-node-2
+    docker node promote amp-node-3
     ```
 
-### 4. Configure Mixed Roles (Active)
+4. **Verify VIP Endpoint Mode** (Optional):
+    Docker Swarm uses VIP (Virtual IP) mode by default for service discovery, which allows services like OpenSearch to discover each other via the service name (e.g., `opensearch`). This is already configured in `stack.yml` and requires no additional setup.
 
-By default, we want these managers to also run workloads (containers). Ensure they are set to `Active` availability (not `Drain`).
+    To verify:
 
-On a Manager node:
+    ```bash
+    docker service inspect amp_opensearch --format '{{.Endpoint.Spec.Mode}}'
+    # Should output: vip
+    ```
 
+### 4. Configure Floating VIP with Keepalived (Optional but Recommended)
+
+For high availability external access, configure a **floating Virtual IP (VIP)** using the provided script:
+
 ```bash
-# Repeat for all node IDs
-docker node update --availability active <NODE_ID>
+# On Node 1 (highest priority = master)
+./configure_vip.sh --vip 192.168.162.100 --priority 101
+
+# On Node 2
+./configure_vip.sh --vip 192.168.162.100 --priority 100
+
+# On Node 3
+./configure_vip.sh --vip 192.168.162.100 --priority 99
 ```
 
-### 5. Configure Storage Node
+The VIP (e.g., `192.168.162.100`) will automatically failover between nodes. Access services via:
 
-Choose **ONE** specific node to host the database data (pinning).
+- Nginx: `http://192.168.162.100`
+- Grafana: `http://192.168.162.100:3000`
+- OpenSearch Dashboards: `http://192.168.162.100:5601`
 
+### 5. Configure Registry Trust (CRITICAL)
+
+All nodes need to trust the insecure registry running on Node 1 (where we run the build).
+
+On **Node 2** AND **Node 3**:
+
 ```bash
-docker node ls
-docker node update --label-add type=storage <NODE_ID>
+# Replace <NODE_1_IP> with the actual IP of Node 1
+./manage_amp.sh config_registry <NODE_1_IP>
 ```
 
-### 6. High Availability: Traffic Routing (VIP)
+### 5. Deployment Workflow
 
-Since any node might go down, do not point users to a specific Node IP. Instead, configure a Floating VIP using the provided script.
+Perform all these steps on **Node 1**:
 
-1. **Run `configure_vip.sh`** on **EACH** node with a unique priority:
+1. **Generate Certificates**:
 
     ```bash
-    # Node 1 (Primary)
-    ./configure_vip.sh --vip 192.168.200.100 --priority 102
+    ./manage_amp.sh setup
+    ```
 
-    # Node 2 (Backup)
-    ./configure_vip.sh --vip 192.168.200.100 --priority 101
+2. **Distribute Certificates**:
+    *Since volumes are local, certificates generated on Node 1 must be copied to Node 2 and Node 3.*
 
-    # Node 3 (Backup)
-    ./configure_vip.sh --vip 192.168.200.100 --priority 100
+    ```bash
+    # Run this on Node 1
+    SOURCE_PATH=$(docker volume inspect certs-vol --format '{{.Mountpoint}}')
+    
+    for NODE in amp-node-2 amp-node-3; do
+        echo "--- Syncing to $NODE ---"
+        ssh root@$NODE "docker volume create certs-vol"
+        DEST_PATH=$(ssh root@$NODE "docker volume inspect certs-vol --format '{{.Mountpoint}}'")
+        scp -r $SOURCE_PATH/* root@$NODE:$DEST_PATH/
+        echo "✅ Done for $NODE"
+    done
     ```
 
-    *Replace `192.168.200.100` with your desired VIP.*
+3. **Build & Deploy**:
+    The script automatically detects the host IP for the registry.
 
-2. **DNS**: Point your domain to this VIP.
-
-### 7. Deploy Stack
-
-On **Node 1** (or any Manager):
-
-1. **Generate Certificates**:
-
     ```bash
-    ./manage_amp.sh setup
+    ./manage_amp.sh build
+    ./manage_amp.sh deploy
     ```
 
-> [!IMPORTANT]
-    > **Manual Certificate Distribution**: Since `certs-vol` is a local volume, other nodes cannot access certificates generated on Node 1. You must copy them manually.
+4. **Initialize Grafana Database (Postgres)**:
+    *Required once to create the database/user for Grafana in the HA cluster.*
 
-    **Steps to distribute certificates:**
+    Wait for `docker service ls` to show `amp_timescaledb` running, then:
 
-    1.  **Locate the Volume Path** (On Node 1):
-        ```bash
-        SOURCE_PATH=$(docker volume inspect certs-vol --format '{{.Mountpoint}}')
-        echo "Certs are at: $SOURCE_PATH"
-        ```
-
-    2.  **Copy to Other Nodes** (Run on Node 1):
-        *Replace `amp-node-2`, `amp-node-3` with your other node IPs/Hostnames.*
-
-        ```bash
-        # Loop through other nodes
-        for NODE in amp-node-2 amp-node-3; do
-            echo "--- Processing $NODE ---"
-            
-            # 1. Create volume on destination
-            ssh root@$NODE "docker volume create certs-vol"
-
-            # 2. Get destination path
-            DEST_PATH=$(ssh root@$NODE "docker volume inspect certs-vol --format '{{.Mountpoint}}'")
-            
-            # 3. Copy files
-            scp -r $SOURCE_PATH/* root@$NODE:$DEST_PATH/
-            
-            echo "✅ Copied to $NODE"
-        done
-        ```
-
-2. **Deploy**:
-
-    You cannot use `127.0.0.1` because other nodes will try to pull from *their own* localhost and fail.
-    
-    Use the **Physical IP** of Node 1 (where the registry is running).
-
     ```bash
-    # Replace with Node 1's actual IP (e.g., 192.168.162.139)
-    export REGISTRY=192.168.162.139:5000 
+    ./manage_amp.sh create_grafana_db
     
-    ./manage_amp.sh build
-    ./manage_amp.sh deploy
-    ./manage_amp.sh security_init
+    # Force restart Grafana to connect to the new DB
+    docker service update --force amp_grafana
     ```
 
-3. **Configure OpenSearch**:
+5. **Configure OpenSearch**:
+    Wait for OpenSearch to be ready, then:
 
-    The OpenSearch configuration (including admin certificates) is managed in `services/opensearch/opensearch.yml`.
-    If you change environment variables or certificate names, update this file and redeploy.
-
     ```bash
+    ./manage_amp.sh security_init
     ./manage_amp.sh configurator
     ```
 
Index: /branches/amp_4_0/platform/tools/container/configure_vip.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/configure_vip.sh	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/configure_vip.sh	(nonexistent)
@@ -1,115 +0,0 @@
-#!/bin/bash
-
-# configure_vip.sh
-# Usage: ./configure_vip.sh --vip <VIP> --priority <PRIORITY> [--interface <IFACE>]
-# Example (Node 1): ./configure_vip.sh --vip 192.168.1.100 --priority 101
-# Example (Node 2): ./configure_vip.sh --vip 192.168.1.100 --priority 100
-
-set -e
-
-VIP=""
-PRIORITY=""
-INTERFACE=""
-ROUTER_ID=51
-AUTH_PASS="array_amp"
-
-usage() {
-    echo "Usage: $0 --vip <IP_ADDRESS> --priority <INT> [--interface <IFACE>]"
-    echo "  --vip       : The Floating Virtual IP (e.g., 192.168.1.100)"
-    echo "  --priority  : Higher number = Higher preference (Master). Use 101, 100, 99..."
-    echo "  --interface : (Optional) Network interface. Auto-detected if omitted."
-    exit 1
-}
-
-# Parse Arguments
-while [[ "$#" -gt 0 ]]; do
-    case $1 in
-        --vip) VIP="$2"; shift ;;
-        --priority) PRIORITY="$2"; shift ;;
-        --interface) INTERFACE="$2"; shift ;;
-        *) echo "Unknown parameter: $1"; usage ;;
-    esac
-    shift
-done
-
-if [ -z "$VIP" ] || [ -z "$PRIORITY" ]; then
-    usage
-fi
-
-# Auto-detect interface if not specified
-if [ -z "$INTERFACE" ]; then
-    # Get the interface used for the default route
-    INTERFACE=$(ip route get 8.8.8.8 | awk '{print $5; exit}')
-    
-    if [ -z "$INTERFACE" ]; then
-        echo "❌ Could not auto-detect network interface. Please specify with --interface."
-        exit 1
-    fi
-    echo "ℹ️  Auto-detected interface: $INTERFACE"
-fi
-
-CONFIG_FILE="/etc/keepalived/keepalived.conf"
-BACKUP_FILE="/etc/keepalived/keepalived.conf.bak.$(date +%s)"
-
-echo "--- Configuring Keepalived ---"
-echo "VIP: $VIP"
-echo "Interface: $INTERFACE"
-echo "Priority: $PRIORITY"
-
-# Backup existing config
-if [ -f "$CONFIG_FILE" ]; then
-    echo "Backing up existing config to $BACKUP_FILE"
-    cp "$CONFIG_FILE" "$BACKUP_FILE"
-fi
-
-# Generate Config
-# We use 'state BACKUP' for all nodes. 
-# The one with the highest priority will be elected Master.
-# This avoids preemption flaps if we used MASTER/BACKUP explicitly without nopreempt.
-
-cat > "$CONFIG_FILE" <<EOF
-! Configuration File for AMP Swarm VIP
-! Generated by configure_vip.sh
-
-global_defs {
-   router_id AMP_NODE_$(hostname)
-}
-
-vrrp_instance VI_1 {
-    state BACKUP
-    interface $INTERFACE
-    virtual_router_id $ROUTER_ID
-    priority $PRIORITY
-    advert_int 1
-    
-    # Preempt: If a higher priority node comes online, it takes over.
-    # Set to 'nopreempt' if you want to minimize failovers.
-    preempt
-    
-    authentication {
-        auth_type PASS
-        auth_pass $AUTH_PASS
-    }
-    
-    virtual_ipaddress {
-        $VIP
-    }
-}
-EOF
-
-echo "✅ Configuration written to $CONFIG_FILE"
-
-# Restart Keepalived
-if systemctl is-active --quiet keepalived; then
-    echo "Restarting Keepalived..."
-    systemctl restart keepalived
-    echo "✅ Keepalived restarted."
-else
-    echo "⚠️  Keepalived is not running. Starting it..."
-    systemctl enable --now keepalived
-    echo "✅ Keepalived started."
-fi
-
-# Status check
-echo "--- Current IP Status ---"
-ip addr show $INTERFACE | grep "$VIP" || echo "Note: VIP '$VIP' is not currently active on this node (it may be on another node)."

Property changes on: platform/tools/container/configure_vip.sh
___________________________________________________________________
Deleted: svn:executable
## -1 +0,0 ##
-*
\ No newline at end of property
Index: /branches/amp_4_0/platform/tools/container/manage_amp.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/manage_amp.sh	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/manage_amp.sh	(working copy)
@@ -5,7 +5,6 @@
 
 ACTION=${1:-status}
 STACK_NAME="amp"
-REGISTRY="localhost:5000"
 
 # REPO_ROOT Resolution
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
@@ -27,7 +26,448 @@
 fi
 
 
-# Function: Calculate Dynamic Heap Size (50% of Host RAM)
+# Function: Generate dynamic stack.yml from template and nodes list
+generate_stack_yaml() {
+    local nodes_list="$1" # Format: "hostname:ip hostname:ip ..."
+    local template_file="$SCRIPT_DIR/stack.yml.template"
+    local output_file="$SCRIPT_DIR/stack.yml"
+
+    echo "--- Generating stack.yml from template ---"
+
+    if [ ! -f "$template_file" ]; then
+        echo "❌ Template $template_file not found!"
+        exit 1
+    fi
+
+    # 1. Copy template part 1 (everything before ETCD placeholder)
+    # We use awk to print until placeholder
+    awk '1;/ETCD Services will be/{exit}' "$template_file" | grep -v "ETCD Services will be" > "$output_file"
+
+    # 2. Append Dynamic Etcd Services
+    # We need to construct the ETCD_INITIAL_CLUSTER string first
+    local etcd_cluster=""
+    local idx=1
+    for node_pair in $nodes_list; do
+        local node_host="${node_pair%%:*}"
+        local node_ip="${node_pair##*:}"
+        if [ -n "$etcd_cluster" ]; then etcd_cluster+=","; fi
+        etcd_cluster+="etcd${idx}=http://${node_ip}:2380"
+        idx=$((idx + 1))
+    done
+
+    idx=1
+    for node_pair in $nodes_list; do
+        local node_host="${node_pair%%:*}"
+        local node_ip="${node_pair##*:}"
+
+        cat <<EOF >> "$output_file"
+  etcd${idx}:
+    image: \${REGISTRY:-127.0.0.1:5000}/amp/etcd:latest
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.hostname == ${node_host}
+    environment:
+      - ETCD_NAME=etcd${idx}
+      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
+      - ETCD_ADVERTISE_CLIENT_URLS=http://${node_ip}:2379
+      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
+      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${node_ip}:2380
+      - ETCD_INITIAL_CLUSTER=${etcd_cluster}
+      - ETCD_INITIAL_CLUSTER_STATE=new
+      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
+      - ETCD_DATA_DIR=/data
+    volumes:
+      - etcd-data:/data
+    networks:
+      - hostnet
+
+EOF
+        idx=$((idx + 1))
+    done
+
+    # 3. Append template part 2 (everything after placeholder)
+    awk '/ETCD Services will be/{flag=1;next} flag' "$template_file" >> "$output_file"
+
+    # 4. Config Rotation: Calculate Hash and Substitute
+    local ep_script="$SERVICES_DIR/opensearch/entrypoint_wrapper.sh"
+    local ep_hash="default"
+    if [ -f "$ep_script" ]; then
+        if [[ "$OSTYPE" == "darwin"* ]]; then
+            ep_hash=$(md5 -q "$ep_script" | cut -c1-8)
+        else
+            ep_hash=$(md5sum "$ep_script" | awk '{print $1}' | cut -c1-8)
+        fi
+    fi
+    local config_name="opensearch_entrypoint_${ep_hash}"
+    echo "ℹ️  Entrypoint Config Name: $config_name"
+
+    # Replace ${ENTRYPOINT_CONFIG} in the generated stack.yml
+    if [[ "$OSTYPE" == "darwin"* ]]; then
+        sed -i "" "s|\${ENTRYPOINT_CONFIG}|$config_name|g" "$output_file"
+    else
+        sed -i "s|\${ENTRYPOINT_CONFIG}|$config_name|g" "$output_file"
+    fi
+
+    
+
+
+    echo "✅ Generated $output_file"
+}
+
+# Function: Auto-Configure Cluster
+auto_configure() {
+    echo "--- Auto-Configuring Cluster Nodes ---"
+    
+    check_swarm
+
+    # Detect Nodes (Hostname and IP)
+    # Status.Addr is the swarm advertised address (usually host key)
+    NODES=$(docker node inspect $(docker node ls -q) --format '{{.Description.Hostname}}:{{.Status.Addr}}' | sort)
+    
+    if [ -z "$NODES" ]; then
+        echo "❌ Could not detect Swarm nodes."
+        exit 1
+    fi
+
+    echo "Detected Nodes:"
+    echo "$NODES"
+
+    # Generate .env content
+    local seed_hosts=""
+    local master_nodes=""
+    local etcd_hosts=""
+    local db_hosts_env=""
+    local db_hosts_env=""
+    local db_hosts_csv=""
+    local os_hosts_json=""
+    local os_url_list=""
+    local db_jdbc_hosts=""
+    
+    local idx=1
+    for node_pair in $NODES; do
+        local node_host="${node_pair%%:*}"
+        local node_ip="${node_pair##*:}"
+        
+        # OpenSearch Seeds
+        if [ -n "$seed_hosts" ]; then seed_hosts+=","; fi
+        seed_hosts+="$node_ip"
+        
+        # Master Nodes (Using internal node names to match stack.yml NODE_NAME=node-{{.Task.Slot}})
+        if [ -n "$master_nodes" ]; then master_nodes+=","; fi
+        master_nodes+="node-$idx"
+        
+        # Etcd / Patroni Hosts
+        if [ -n "$etcd_hosts" ]; then etcd_hosts+=","; fi
+        etcd_hosts+="${node_ip}:2379"
+
+        # DB Hosts (for .env export and templates)
+        db_hosts_env+="export DB_HOST_${idx}=${node_ip}"$'\n'
+        
+        # Grafana connection string (HAProxy ports)
+        if [ -n "$db_hosts_csv" ]; then db_hosts_csv+=","; fi
+        db_hosts_csv+="${node_ip}:5432"
+
+        # OpenSearch Dashboards (JSON Array)
+        if [ -n "$os_hosts_json" ]; then os_hosts_json+=", "; fi
+        os_hosts_json+="\"https://${node_ip}:9200\""
+
+        # Logstash Output (List injection)
+        if [ -n "$os_url_list" ]; then os_url_list+=", "; fi
+        os_url_list+="\"https://${node_ip}:9200\""
+        
+        idx=$((idx + 1))
+    done
+
+    # Wrap JSON array
+    os_hosts_json="[${os_hosts_json}]"
+
+    # Update .env with Block Replacement Strategy
+    local START_MARKER="# >>> AUTO-GENERATED CONFIGURATION >>>"
+    local END_MARKER="# <<< AUTO-GENERATED CONFIGURATION <<<"
+    
+    if [ -f "$ENV_FILE" ]; then
+        # 1. Cleaner approach: Remove the existing block if markers exist
+        sed -i.bak "/$START_MARKER/,/$END_MARKER/d" "$ENV_FILE"
+        
+        # 2. Legacy Cleanup: Remove old style duplicate blocks
+        # We look for the old header and delete it and next few lines if possible, 
+        # or simplified: just delete the specific old flag lines to be safe.
+        sed -i.bak '/# AUTO-GENERATED CONFIGURATION (Do Not Edit Manually)/d' "$ENV_FILE"
+        sed -i.bak '/# Generated at/d' "$ENV_FILE"
+        sed -i.bak '/# ---------------------------------------------------------/d' "$ENV_FILE"
+        sed -i.bak '/# ---------------------------------------/d' "$ENV_FILE"
+        
+        # Remove old dynamic vars (just in case they are outside blocks)
+        sed -i.bak '/^\(export \)*OPENSEARCH_SEED_HOSTS=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*OPENSEARCH_INITIAL_MASTER_NODES=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*DB_HOST_/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*PATRONI_ETCD3_HOSTS=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*AMP_DB_HOSTS_CSV=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*AMP_OS_HOSTS_JSON=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*AMP_OS_URL_LIST=/d' "$ENV_FILE"
+        sed -i.bak '/^\(export \)*AMP_DB_JDBC_HOSTS=/d' "$ENV_FILE"
+        sed -i.bak '/^AMP_REPLICAS=/d' "$ENV_FILE"
+        
+        rm -f "$ENV_FILE.bak"
+    else
+        touch "$ENV_FILE"
+    fi
+    
+    # Calculate Replica Count (Equal to Number of Nodes, max 3 recommended but we follow node count)
+    NODE_COUNT=$(echo "$NODES" | wc -l | tr -d ' ')
+    [[ "$NODE_COUNT" -lt 1 ]] && NODE_COUNT=1
+    
+    echo "   + Detected $NODE_COUNT nodes. Setting AMP_REPLICAS=$NODE_COUNT"
+
+    cat <<EF >> "$ENV_FILE"
+$START_MARKER
+# ---------------------------------------------------------
+# Generated at $(date)
+# ---------------------------------------------------------
+AMP_REPLICAS=$NODE_COUNT
+OPENSEARCH_SEED_HOSTS=$seed_hosts
+OPENSEARCH_INITIAL_MASTER_NODES=$master_nodes
+PATRONI_ETCD3_HOSTS=$etcd_hosts
+$db_hosts_env
+AMP_DB_HOSTS_CSV=$db_hosts_csv
+AMP_OS_HOSTS_JSON='$os_hosts_json'
+AMP_OS_URL_LIST='$os_url_list'
+AMP_DB_JDBC_HOSTS=$db_hosts_csv
+$END_MARKER
+EF
+
+    echo "✅ Updated .env"
+
+    # Reload env
+    set -a
+    source "$ENV_FILE"
+    set +a
+
+    # Generate stack.yml dynamically
+    generate_stack_yaml "$NODES"
+}
+
+# Helper: Configure Insecure Registry and Restart Docker
+configure_insecure_registry() {
+    local REGISTRY_HOST="$1"
+    local DAEMON_FILE="/etc/docker/daemon.json"
+    
+    echo "--- Configuring Docker Insecure Registry ($REGISTRY_HOST) ---"
+    
+    if [ ! -f "$DAEMON_FILE" ]; then
+        echo "{}" > "$DAEMON_FILE"
+    fi
+    
+    # Use Python to safely update JSON (idempotent)
+    if command -v python3 &>/dev/null; then
+        RESULT=$(python3 -c "
+import json
+import sys
+
+registry = '$REGISTRY_HOST'
+file_path = '$DAEMON_FILE'
+
+try:
+    with open(file_path, 'r') as f:
+        content = f.read().strip()
+        if not content:
+            data = {}
+        else:
+            data = json.loads(content)
+except Exception:
+    data = {}
+
+insecure = data.get('insecure-registries', [])
+if registry not in insecure:
+    insecure.append(registry)
+    data['insecure-registries'] = insecure
+    
+    with open(file_path, 'w') as f:
+        json.dump(data, f, indent=2)
+    print('UPDATED')
+else:
+    print('EXISTS')
+")
+    else
+        echo "⚠️  Python3 not found. Skipping auto-configuration of daemon.json."
+        echo "Please manually add $REGISTRY_HOST to 'insecure-registries' in $DAEMON_FILE"
+        return
+    fi
+
+    if [ "$RESULT" == "UPDATED" ]; then
+        echo "✅ Updated $DAEMON_FILE."
+        echo "Restarting Docker to apply changes..."
+        systemctl restart docker
+        echo "✅ Docker restarted."
+    else
+        echo "✅ Registry configuration already exists."
+    fi
+}
+
+# Function: Setup VIP (Integrated from configure_vip.sh)
+setup_vip() {
+    local VIP=""
+    local PRIORITY=""
+    local INTERFACE=""
+    local ROUTER_ID=51
+    local AUTH_PASS="array_amp"
+    local CONFIG_FILE="/etc/keepalived/keepalived.conf"
+    
+    # Simple arg parsing for this sub-function
+    while [[ "$#" -gt 0 ]]; do
+        case $1 in
+            --vip) VIP="$2"; shift ;;
+            --priority) PRIORITY="$2"; shift ;;
+            --interface) INTERFACE="$2"; shift ;;
+            *) echo "Unknown parameter for setup-vip: $1"; exit 1 ;;
+        esac
+        shift
+    done
+
+    if [ -z "$VIP" ] || [ -z "$PRIORITY" ]; then
+        echo "Usage: ./manage_amp.sh vip --vip <IP> --priority <INT> [--interface <IFACE>]"
+        echo "  --vip       : Floating VIP Address"
+        echo "  --priority  : 101 (Master), 100 (Backup), etc."
+        exit 1
+    fi
+
+    echo "--- Configuring Keepalived VIP ---"
+
+    # Auto-detect interface
+    if [ -z "$INTERFACE" ]; then
+        if [[ "$OSTYPE" == "darwin"* ]]; then
+             echo "❌ VIP setup not supported on macOS directly (Keepalived is Linux-only)."
+             exit 1
+        fi
+        INTERFACE=$(ip route get 8.8.8.8 | awk '{print $5; exit}')
+        if [ -z "$INTERFACE" ]; then
+            echo "❌ Could not auto-detect interface. Use --interface."
+            exit 1
+        fi
+        echo "ℹ️  Auto-detected interface: $INTERFACE"
+    fi
+
+    # Backup
+    if [ -f "$CONFIG_FILE" ]; then
+        cp "$CONFIG_FILE" "$CONFIG_FILE.bak.$(date +%s)"
+    fi
+
+    # Generate Config
+    cat > "$CONFIG_FILE" <<EOF
+! Configuration File for AMP Swarm VIP
+! Generated by manage_amp.sh (setup-vip)
+
+global_defs {
+   router_id AMP_NODE_$(hostname)
+}
+
+vrrp_instance VI_1 {
+    state BACKUP
+    interface $INTERFACE
+    virtual_router_id $ROUTER_ID
+    priority $PRIORITY
+    advert_int 1
+    preempt
+    
+    authentication {
+        auth_type PASS
+        auth_pass $AUTH_PASS
+    }
+    
+    virtual_ipaddress {
+        $VIP
+    }
+}
+EOF
+    echo "✅ Keepalived configured at $CONFIG_FILE"
+
+    # Restart Service
+    if systemctl is-active --quiet keepalived; then
+        systemctl restart keepalived
+        echo "✅ Keepalived restarted."
+    else
+        systemctl enable --now keepalived
+        echo "✅ Keepalived started."
+    fi
+
+    # Update .env with AMP_DOMAIN_OR_IP
+    if [ -f "$ENV_FILE" ]; then
+        # Remove existing if any
+        sed -i.bak '/^AMP_DOMAIN_OR_IP=/d' "$ENV_FILE"
+        rm -f "$ENV_FILE.bak"
+    else
+        touch "$ENV_FILE"
+    fi
+     
+    echo "AMP_DOMAIN_OR_IP=$VIP" >> "$ENV_FILE"
+    echo "✅ Updated .env with AMP_DOMAIN_OR_IP=$VIP"
+    
+    # Reload env
+    export AMP_DOMAIN_OR_IP="$VIP"
+
+    # Configure Docker Daemon for Insecure Registry (VIP:5000)
+    configure_insecure_registry "${VIP}:5000"
+}
+
+# Function: Detect Host IP
+detect_host_ip() {
+    if [ -n "$AMP_DOMAIN_OR_IP" ]; then
+        return
+    fi
+
+    echo "--- Detecting Host IP ---"
+    if [[ "$OSTYPE" == "darwin"* ]]; then
+        # macOS
+        HOST_IP=$(ipconfig getifaddr en0)
+        if [ -z "$HOST_IP" ]; then HOST_IP=$(ipconfig getifaddr en1); fi
+    else
+        # Linux (hostname -I | first ip)
+        HOST_IP=$(hostname -I | awk '{print $1}')
+    fi
+
+    if [ -z "$HOST_IP" ]; then
+        echo "⚠️  Could not auto-detect IP. Defaulting to localhost."
+        HOST_IP="localhost"
+    fi
+
+    export AMP_DOMAIN_OR_IP="$HOST_IP"
+    echo "✅ Auto-Detected Host IP: $AMP_DOMAIN_OR_IP"
+
+    # Auto-Detect Interface for OpenSearch Publish Host (Host Networking)
+    echo "--- Detecting Network Interface ---"
+    if [[ "$OSTYPE" == "darwin"* ]]; then
+        HOST_IFACE=$(route get 8.8.8.8 | grep interface | awk '{print $2}')
+    else
+        # Linux: Find interface associated with the default route or the Host IP
+        HOST_IFACE=$(ip route get 8.8.8.8 | grep -oP 'dev \K\S+')
+        if [ -z "$HOST_IFACE" ]; then
+             HOST_IFACE=$(ip addr | grep "$HOST_IP" | awk '{print $NF}')
+        fi
+    fi
+    
+    if [ -n "$HOST_IFACE" ]; then
+        export OPENSEARCH_PUBLISH_HOST="_${HOST_IFACE}_"
+        echo "✅ Auto-Detected Interface: $HOST_IFACE (Publish Host: $OPENSEARCH_PUBLISH_HOST)"
+    else
+        echo "⚠️  Could not detect Interface. Defaulting to _site_"
+        export OPENSEARCH_PUBLISH_HOST="_site_"
+    fi
+}
+
+# Run detection early
+detect_host_ip
+
+# Default REGISTRY to Host IP for Swarm worker access
+if [ -z "$REGISTRY" ]; then
+    if [ "$AMP_DOMAIN_OR_IP" == "localhost" ]; then
+         BASE_REG="127.0.0.1"
+    else
+         BASE_REG="$AMP_DOMAIN_OR_IP"
+    fi
+    export REGISTRY="${BASE_REG}:5000"
+fi
 calculate_heap_size() {
     # If explicitly set in .env or shell, skip auto-calc
     if [ -n "$OPENSEARCH_JAVA_OPTS" ]; then
@@ -49,53 +489,50 @@
             TOTAL_MEM_KB=$(grep MemTotal /proc/meminfo | awk '{print $2}')
             TOTAL_MEM_MB=$((TOTAL_MEM_KB / 1024))
         else
-            echo "⚠️  Cannot detect memory (neither macOS nor Linux /proc/meminfo). Defaulting to 1GB."
-            TOTAL_MEM_MB=2048 # Fallback assumes at least 2GB machine -> 1GB heap
+            echo "⚠️  Cannot detect memory. Defaulting to 2GB."
+            TOTAL_MEM_MB=2048 
         fi
     fi
 
     echo "Host Total Memory: ${TOTAL_MEM_MB}MB"
 
-    # 50% Rule
-    HEAP_SIZE_MB=$((TOTAL_MEM_MB / 2))
+    # Conservative Allocation logic
+    if [ "$TOTAL_MEM_MB" -lt 7168 ]; then
+        echo "⚠️  Low RAM detected (<7GB). Using conservative heap settings."
+        HEAP_SIZE_MB=512
+        LS_HEAP_SIZE_MB=512
+    else
+        # 50% Rule for OS
+        HEAP_SIZE_MB=$((TOTAL_MEM_MB / 2))
+        # 25% Rule for LS
+        LS_HEAP_SIZE_MB=$((TOTAL_MEM_MB / 4))
+    fi
 
     # Min/Max Clamping
-    # Min: 1GB (1024MB)
-    # Max: 32GB (32768MB) - JVM compressed oops limit
     MIN_HEAP=1024
     MAX_HEAP=32768
+    
+    if [ "$HEAP_SIZE_MB" -lt "$MIN_HEAP" ]; then HEAP_SIZE_MB=$MIN_HEAP; fi
+    if [ "$HEAP_SIZE_MB" -gt "$MAX_HEAP" ]; then HEAP_SIZE_MB=$MAX_HEAP; fi
 
-    if [ "$HEAP_SIZE_MB" -lt "$MIN_HEAP" ]; then
-        echo "⚠️  Calculated Heap (${HEAP_SIZE_MB}MB) is below minimum. Setting to 1GB."
-        HEAP_SIZE_MB=$MIN_HEAP
-    elif [ "$HEAP_SIZE_MB" -gt "$MAX_HEAP" ]; then
-        echo "⚠️  Calculated Heap (${HEAP_SIZE_MB}MB) exceeds 32GB. Clamping to 32GB."
-        HEAP_SIZE_MB=$MAX_HEAP
-    fi
-
     export OPENSEARCH_JAVA_OPTS="-Xms${HEAP_SIZE_MB}m -Xmx${HEAP_SIZE_MB}m"
-    echo "✅ Auto-Allocated OpenSearch Heap: $OPENSEARCH_JAVA_OPTS"
+    echo "✅ Allocated OpenSearch Heap: $OPENSEARCH_JAVA_OPTS"
 
-    # --- Logstash Memory (25% of Host RAM) ---
+    # --- Logstash Memory ---
     if [ -n "$LOGSTASH_JAVA_OPTS" ]; then
         echo "Using Logstash Memory Override from .env: $LOGSTASH_JAVA_OPTS"
     else
-        LS_HEAP_SIZE_MB=$((TOTAL_MEM_MB / 4))
-        # Clamp Logstash: Min 1GB, Max 8GB
-        MIN_LS=1024
+        # Clamp Logstash
+        MIN_LS=512
         MAX_LS=8192
-
-        if [ "$LS_HEAP_SIZE_MB" -lt "$MIN_LS" ]; then
-             LS_HEAP_SIZE_MB=$MIN_LS
-        elif [ "$LS_HEAP_SIZE_MB" -gt "$MAX_LS" ]; then
-             LS_HEAP_SIZE_MB=$MAX_LS
-        fi
+        if [ "$LS_HEAP_SIZE_MB" -lt "$MIN_LS" ]; then LS_HEAP_SIZE_MB=$MIN_LS; fi
+        if [ "$LS_HEAP_SIZE_MB" -gt "$MAX_LS" ]; then LS_HEAP_SIZE_MB=$MAX_LS; fi
+        
         export LOGSTASH_JAVA_OPTS="-Xms${LS_HEAP_SIZE_MB}m -Xmx${LS_HEAP_SIZE_MB}m"
-        echo "✅ Auto-Allocated Logstash Heap:  $LOGSTASH_JAVA_OPTS"
+        echo "✅ Allocated Logstash Heap:  $LOGSTASH_JAVA_OPTS"
     fi
 }
 
-
 # Function: Generate Dynamic Configuration Files
 generate_configs() {
     echo "--- Generating Dynamic Configurations ---"
@@ -154,6 +591,42 @@
     echo "\"$AMP_DB_USER\" \"md5$MD5_APP\"" >> "$USERLIST"
     
     echo "✅ Generated $USERLIST"
+
+    # Generate HAProxy Config
+    HAPROXY_DIR="$SERVICES_DIR/haproxy"
+    HP_TEMPLATE="$HAPROXY_DIR/haproxy.cfg.template"
+    HP_OUTPUT="$HAPROXY_DIR/haproxy.cfg"
+
+    if [ -f "$HP_TEMPLATE" ]; then
+        echo "Generating haproxy.cfg from template..."
+        
+        # 1. Copy template excluding the hardcoded server lines
+        # We assume the template ends with the 'server' lines or we just strip them.
+        # A safer bet is to strip valid 'server sX' lines and append strict ones.
+        grep -v "server s" "$HP_TEMPLATE" > "$HP_OUTPUT"
+        
+        # 2. Append Dynamic Server List
+        # We try to get nodes from Swarm, if active. If not, use DB_HOST envs as fallback.
+        
+        echo "    # Dynamic Backends" >> "$HP_OUTPUT"
+        
+        if docker info 2>/dev/null | grep -q "Swarm: active"; then
+            # Get IP addresses of all nodes
+            NODES_IPS=$(docker node inspect $(docker node ls -q) --format '{{.Status.Addr}}' | sort)
+            idx=1
+            for ip in $NODES_IPS; do
+                echo "    server s${idx} ${ip}:5433" >> "$HP_OUTPUT"
+                idx=$((idx + 1))
+            done
+        else
+            # Fallback for non-swarm or pre-init (uses .env vars if available)
+            if [ -n "$DB_HOST_1" ]; then echo "    server s1 ${DB_HOST_1}:5433" >> "$HP_OUTPUT"; fi
+            if [ -n "$DB_HOST_2" ]; then echo "    server s2 ${DB_HOST_2}:5433" >> "$HP_OUTPUT"; fi
+            if [ -n "$DB_HOST_3" ]; then echo "    server s3 ${DB_HOST_3}:5433" >> "$HP_OUTPUT"; fi
+        fi
+
+        echo "✅ Generated $HP_OUTPUT"
+    fi
 }
 
 cleanup() {
@@ -173,11 +646,18 @@
 }
 
 init_swarm() {
+    ADVERTISE_ADDR="$1"  # Optional: IP address to advertise
+    
     if docker info | grep -q "Swarm: active"; then
         echo "✅ Swarm is already active."
     else
         echo "Initializing Docker Swarm..."
-        docker swarm init
+        if [ -n "$ADVERTISE_ADDR" ]; then
+            echo "Using advertise address: $ADVERTISE_ADDR"
+            docker swarm init --advertise-addr "$ADVERTISE_ADDR"
+        else
+            docker swarm init
+        fi
     fi
 
     # Label this node as storage node for pinning
@@ -203,13 +683,15 @@
     # Format: "service_name:upstream_image"
     IMAGES=(
         "opensearch:opensearchproject/opensearch:2.11.0"
-        "timescaledb:timescale/timescaledb:latest-pg16"
+        "timescaledb:custom_build" # Marker for custom build
         "pgbouncer:edoburu/pgbouncer:latest"
         "grafana:grafana/grafana:latest"
         "telegraf:telegraf:latest"
         "logstash:opensearchproject/logstash-oss-with-opensearch-output-plugin:latest"
         "nginx:nginx:alpine"
         "opensearch-dashboards:opensearchproject/opensearch-dashboards:2.11.0"
+        "etcd:quay.io/coreos/etcd:v3.5.11"
+        "haproxy:haproxy:alpine"
     )
 
     for INFO in "${IMAGES[@]}"; do
@@ -218,11 +700,16 @@
         LOCAL_TAG="$REGISTRY/amp/$SERVICE_NAME:latest"
 
         echo "Processing $SERVICE_NAME..."
-        echo "  - Pulling $UPSTREAM"
-        docker pull -q $UPSTREAM
         
-        echo "  - Tagging as $LOCAL_TAG"
-        docker tag $UPSTREAM $LOCAL_TAG
+        if [ "$SERVICE_NAME" == "timescaledb" ]; then
+            echo "  - Building Custom Image from services/timescaledb/Dockerfile.patroni..."
+            docker build -t $LOCAL_TAG -f "$SERVICES_DIR/timescaledb/Dockerfile.patroni" "$SERVICES_DIR/timescaledb"
+        else
+            echo "  - Pulling $UPSTREAM"
+            docker pull -q $UPSTREAM
+            echo "  - Tagging as $LOCAL_TAG"
+            docker tag $UPSTREAM $LOCAL_TAG
+        fi
         
         echo "  - Pushing to Registry..."
         docker push -q $LOCAL_TAG
@@ -245,67 +732,97 @@
         fi
     }
 
-    # Passwords from .env
     create_secret_if_missing "opensearch_initial_admin_password" "${OPENSEARCH_INITIAL_ADMIN_PASSWORD:-admin}"
     create_secret_if_missing "pg_password" "${POSTGRES_PASSWORD:-postgres}"
     create_secret_if_missing "opensearch_jwt_secret" "${OPENSEARCH_JWT_SECRET:-supersecretjwtkey}"
 }
 
-deploy_stack() {
-    # Detect Host IP for Grafana Domain if not set
-    detect_host_ip() {
-        if [ -n "$AMP_DOMAIN_OR_IP" ]; then
-            echo "Using configured AMP_DOMAIN_OR_IP: $AMP_DOMAIN_OR_IP"
-            return
-        fi
-
-        echo "--- Detecting Host IP ---"
-        if [[ "$OSTYPE" == "darwin"* ]]; then
-             # macOS
-             HOST_IP=$(ipconfig getifaddr en0)
-             if [ -z "$HOST_IP" ]; then HOST_IP=$(ipconfig getifaddr en1); fi
+create_cert_configs() {
+    echo "--- Creating Certificate Configs from Volume ---"
+    
+    # Extract certs from volume to temp location
+    TEMP_CERTS="/tmp/amp_certs_$$"
+    mkdir -p "$TEMP_CERTS"
+    
+    # Copy certs from volume using a temporary container
+    docker run --rm -v certs-vol:/certs -v "$TEMP_CERTS:/output" busybox sh -c "cp -r /certs/* /output/"
+    
+    # Create Docker configs for each certificate file
+    for cert_file in root-ca.pem node.pem node-key.pem admin.pem admin-key.pem; do
+        CONFIG_NAME="cert_${cert_file//./_}"  # Static name: cert_root-ca_pem
+        
+        if [ -f "$TEMP_CERTS/$cert_file" ]; then
+             # Check if config exists (Exact match using inspect)
+             if docker config inspect "$CONFIG_NAME" >/dev/null 2>&1; then
+                 echo "ℹ️  Config $CONFIG_NAME already exists (skipping)."
+             else
+                 # Create new
+                 docker config create "$CONFIG_NAME" "$TEMP_CERTS/$cert_file" >/dev/null
+                 echo "✅ Created config: $CONFIG_NAME"
+             fi
         else
-             # Linux (hostname -I | first ip)
-             HOST_IP=$(hostname -I | awk '{print $1}')
+            echo "⚠️  Warning: $cert_file not found in certs volume"
         fi
+    done
+    
+    # Cleanup
+    rm -rf "$TEMP_CERTS"
+    echo "✅ Certificate configs updated."
+}
 
-        if [ -z "$HOST_IP" ]; then
-             echo "⚠️  Could not auto-detect IP. Defaulting to localhost."
-             HOST_IP="localhost"
-        fi
 
-        export AMP_DOMAIN_OR_IP="$HOST_IP"
-        echo "✅ Auto-Detected Host IP: $AMP_DOMAIN_OR_IP"
-    }
+# Function: Check and Start Local Registry
+check_and_start_registry() {
+    echo "--- Checking Local Registry ---"
+    if [ ! "$(docker ps -q -f name=registry)" ]; then
+        if [ "$(docker ps -aq -f name=registry)" ]; then
+            echo "🔄 Registry container found but stopped. Starting..."
+            docker start registry
+        else
+            echo "🚀 Starting new local registry container..."
+            # Try to use the local registry image if available, else pull
+            if [[ "$(docker images -q registry:2 2> /dev/null)" == "" ]]; then
+                 echo "⚠️  Image registry:2 not found locally. Attempting pull..."
+            fi
+            docker run -d -p 5000:5000 --restart=always --name registry registry:2
+        fi
+    else
+        echo "✅ Local registry is running."
+    fi
+}
 
+deploy_stack() {
     check_swarm
-    create_secrets
-    detect_host_ip
-    create_secrets
-    calculate_heap_size
-    generate_configs
     
-    # Allow user to override REGISTRY for multi-node (e.g. export REGISTRY=192.168.1.5:5000)
-    export REGISTRY=${REGISTRY:-127.0.0.1:5000}
-    echo "--- Deploying Stack: $STACK_NAME (Registry: $REGISTRY) ---"
+    # Ensure Registry is Up
+    check_and_start_registry
     
-    # Create external volumes if missing
-    CERTS_VOL_CREATED=false
+    echo "--- Preparing Deployment ---"
+
+    # 1. Ensure Volumes Exist
     for vol in certs-vol security-config-vol; do
         if ! docker volume ls | grep -q "$vol"; then
-            docker volume create $vol
-            echo "Created volume: $vol"
-            if [ "$vol" == "certs-vol" ]; then
-                CERTS_VOL_CREATED=true
-            fi
+            docker volume create $vol >/dev/null
+            echo "✅ Created volume: $vol"
         fi
     done
-    
-    if $CERTS_VOL_CREATED; then
-        echo "⚠️  Volume 'certs-vol' was just created and is empty."
-        echo "⚠️  You MUST run './manage_swarm.sh setup' to generate certificates, otherwise services will fail."
+
+    # 2. Check if Certificates Exist
+    echo "Checking for existing certificates..."
+    if docker run --rm -v certs-vol:/certs busybox sh -c '[ -z "$(ls -A /certs)" ]'; then
+        echo "⚠️  Certificates not found in 'certs-vol'. Auto-running setup..."
+        run_setup
+    else
+        echo "✅ Certificates found."
     fi
 
+    # 3. Proceed with Configuration
+    create_secrets
+    create_cert_configs
+    calculate_heap_size
+    generate_configs
+    
+    echo "--- Deploying Stack: $STACK_NAME (Registry: $REGISTRY) ---"
     docker stack deploy -c "$STACK_FILE" $STACK_NAME
 }
 
@@ -344,12 +861,24 @@
 
 security_init() {
     echo "--- Initializing OpenSearch Security Index ---"
-    # Find running opensearch container (exclude dashboards)
-    # We use --format to check names but output ID
-    CONTAINER_ID=$(docker ps --format '{{.ID}} {{.Names}}' | grep "amp_opensearch" | grep -v "dashboards" | awk '{print $1}' | head -n 1)
+    echo "Waiting for OpenSearch container to start..."
+    local retries=30
+    local count=0
+    local CONTAINER_ID=""
     
+    until [ -n "$CONTAINER_ID" ] || [ $count -ge $retries ]; do
+         CONTAINER_ID=$(docker ps --format '{{.ID}} {{.Names}}' | grep "amp_opensearch" | grep -v "dashboards" | awk '{print $1}' | head -n 1)
+         if [ -n "$CONTAINER_ID" ]; then
+             break
+         fi
+         echo "  [$count/$retries] Waiting for container..."
+         sleep 5
+         count=$((count + 1))
+    done
+
     if [ -z "$CONTAINER_ID" ]; then
-        echo "❌ OpenSearch container not found. Please run 'deploy' and wait for it to start." 
+        echo "❌ OpenSearch container not found after waiting." 
+        echo "Please check 'docker service ls' to ensure 'amp_opensearch' is running."
         exit 1
     fi
 
@@ -366,11 +895,11 @@
       chmod +x /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh && \
       /usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh \
       -cd /tmp/sec-config \
-      -icl \
       -nhnv \
       --accept-red-cluster \
       -h localhost \
       -p 9200 \
+      -icl \
       -cacert /usr/share/opensearch/config/certs/root-ca.pem \
       -cert /usr/share/opensearch/config/certs/admin.pem \
       -key /usr/share/opensearch/config/certs/admin-key.pem
@@ -405,7 +934,7 @@
     docker run --rm --name amp-configurator-ephemeral \
         --network=${STACK_NAME}_amp-overlay \
         -e OPENSEARCH_INITIAL_ADMIN_PASSWORD="${OPENSEARCH_INITIAL_ADMIN_PASSWORD:-admin}" \
-        -e OPENSEARCH_URL="https://opensearch:9200" \
+        -e OPENSEARCH_URL="https://${AMP_DOMAIN_OR_IP}:9200" \
         -e OPENSEARCH_DASHBOARDS_URL="https://opensearch-dashboards:5601" \
         -v "certs-vol:/usr/share/opensearch/config/certs:ro" \
         -v "$SERVICES_DIR/setup:/setup:ro" \
@@ -423,11 +952,102 @@
     echo "✅ Configurator finished."
 }
 
+create_grafana_db() {
+    echo "--- Initializing Grafana Database in HA Postgres ---"
+    
+    # Find any running TimescaleDB container to use as a psql client
+    TS_CONTAINER=$(docker ps -q -f name=amp_timescaledb | head -n 1)
+    
+    if [ -z "$TS_CONTAINER" ]; then
+        echo "❌ No running TimescaleDB container found. Is the stack deployed?"
+        exit 1
+    fi
+    
+    echo "Using container $TS_CONTAINER as psql client..."
+    
+    # Get Password
+    if [ -f "services/timescaledb/patroni.yml" ]; then
+         # Try to be smart, otherwise default
+         PG_PASS="Arr@y2050"
+    else
+         PG_PASS="Arr@y2050"
+    fi
+    
+    # Create User and DB via HAProxy (Leader)
+    # We use -h haproxy so we land on the Leader even if this specific container is a Replica.
+    
+    docker exec -e PGPASSWORD=$PG_PASS $TS_CONTAINER psql -U postgres -h ${AMP_DOMAIN_OR_IP} -p 5433 -c "CREATE USER grafana WITH PASSWORD '$PG_PASS';" || echo "User likely exists."
+    docker exec -e PGPASSWORD=$PG_PASS $TS_CONTAINER psql -U postgres -h ${AMP_DOMAIN_OR_IP} -p 5433 -c "CREATE DATABASE grafana OWNER grafana;" || echo "DB likely exists."
+    
+    echo "✅ Grafana Database Initialized."
+}
+
+configure_registry() {
+    # If an IP is provided as an argument, use it. Otherwise, auto-detect.
+    if [ -n "$1" ]; then
+        export AMP_DOMAIN_OR_IP="$1"
+        echo "Using Provided Registry IP: $AMP_DOMAIN_OR_IP"
+    else
+        detect_host_ip
+    fi
+
+    local REGISTRY_URL="${AMP_DOMAIN_OR_IP}:5000"
+    local DAEMON_JSON="/etc/docker/daemon.json"
+    
+    echo "--- Configuring Insecure Registry: $REGISTRY_URL ---"
+    
+    # Use Python to safely update JSON (Avoids dependency on jq)
+    python3 -c "
+import json
+import os
+import sys
+
+registry_url = '$REGISTRY_URL'
+daemon_file = '$DAEMON_JSON'
+
+data = {}
+if os.path.exists(daemon_file):
+    try:
+        with open(daemon_file, 'r') as f:
+            content = f.read()
+            if content.strip():
+                data = json.loads(content)
+    except json.JSONDecodeError:
+        print(f'⚠️  Warning: {daemon_file} contains invalid JSON. Overwriting.')
+        data = {}
+
+insecure_registries = data.get('insecure-registries', [])
+if registry_url not in insecure_registries:
+    insecure_registries.append(registry_url)
+    data['insecure-registries'] = insecure_registries
+    with open(daemon_file, 'w') as f:
+        json.dump(data, f, indent=2)
+    print(f'✅ Added {registry_url} to {daemon_file}')
+else:
+    print(f'ℹ️  {registry_url} already exists in {daemon_file}')
+"
+
+    if [ $? -eq 0 ]; then
+        echo "Restarting Docker Daemon..."
+        systemctl restart docker
+        echo "✅ Docker Restarted."
+    else
+        echo "❌ Failed to update registry config."
+        exit 1
+    fi
+}
+
 case $ACTION in
     init)
-        init_swarm
+        init_swarm "$2"  # Pass optional advertise-addr as second argument
         ensure_registry
         ;;
+    config_registry)
+        configure_registry "$2"
+        ;;
+    create_grafana_db)
+        create_grafana_db
+        ;;
     build)
         build_and_push
         ;;
@@ -444,8 +1064,19 @@
         security_init
         ;;
     deploy)
+        if [ "$2" == "--auto" ]; then
+             auto_configure
+        fi
         deploy_stack
         ;;
+    auto-config)
+        auto_configure
+        ;;
+    vip)
+        # Pass all remaining arguments to setup_vip
+        shift
+        setup_vip "$@"
+        ;;
     rm|remove)
         rm_stack
         ;;
@@ -453,6 +1084,7 @@
         docker stack services $STACK_NAME
         ;;
     *)
-        echo "Usage: $0 {init|build|setup|deploy|security_init|status|configurator|rm}"
+        echo "Usage: $0 {init|build|setup|deploy|security_init|status|configurator|rm|vip}"
+        echo "  vip commands: ./manage_amp.sh vip --vip <IP> --priority <INT> [--interface <IFACE>]"
         exit 1
 esac
Index: /branches/amp_4_0/platform/tools/container/services/grafana/provisioning/datasources/datasources.yaml
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/grafana/provisioning/datasources/datasources.yaml	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/services/grafana/provisioning/datasources/datasources.yaml	(working copy)
@@ -3,7 +3,7 @@
 datasources:
   - name: PostgreSQL
     type: postgres
-    url: timescaledb:5432
+    url: ${GF_DATABASE_HOST} # Using shared env var for host:port (via HAProxy at 5432)
     user: ${DS_POSTGRES_USER}
     secureJsonData:
       password: ${DS_POSTGRES_PASSWORD}
@@ -16,7 +16,8 @@
 
   - name: TimescaleDB
     type: postgres
-    url: timescaledb:5432
+    url: ${GF_DATABASE_HOST}
+
     user: ${DS_AMP_DB_USER}
     secureJsonData:
       password: ${DS_AMP_DB_PASSWORD}
@@ -29,7 +30,7 @@
   - name: OpenSearch
     type: grafana-opensearch-datasource
     access: proxy
-    url: https://opensearch:9200
+    url: ${DS_OS_URL}
     basicAuth: true
     basicAuthUser: ${DS_OS_USER}
     basicAuthPassword: ${DS_OS_PASSWORD}
Index: /branches/amp_4_0/platform/tools/container/services/haproxy/haproxy.cfg.template
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/haproxy/haproxy.cfg.template	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/haproxy/haproxy.cfg.template	(working copy)
@@ -0,0 +1,20 @@
+global
+    maxconn 100
+
+defaults
+    log global
+    mode tcp
+    retries 2
+    timeout client 30m
+    timeout connect 4s
+    timeout server 30m
+    timeout check 5s
+
+# resolvers docker REMOVED (Host Networking)
+
+listen postgres
+    bind *:5432
+    option httpchk GET /master
+    http-check expect status 200
+    default-server inter 2s fall 3 rise 1 on-marked-down shutdown-sessions check port 8008
+
Index: /branches/amp_4_0/platform/tools/container/services/logstash/pipeline/syslog.conf
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/logstash/pipeline/syslog.conf	(revision 2855)
+++ /branches/amp_4_0/platform/tools/container/services/logstash/pipeline/syslog.conf	(working copy)
@@ -270,7 +270,7 @@
         jdbc_streaming {
           jdbc_driver_library => "/usr/share/logstash/drivers/postgresql.jar"
           jdbc_driver_class => "org.postgresql.Driver"
-          jdbc_connection_string => "jdbc:postgresql://timescaledb:5432/amp_ts"
+          jdbc_connection_string => "jdbc:postgresql://${DB_HOST}/amp_ts"
           jdbc_user => "amp_ts_user"
           jdbc_password => "Array@123$"
           statement => "SELECT name, type, device_group FROM device WHERE ip_address = :device_ip"
@@ -308,7 +308,7 @@
 
 output {
   opensearch {
-    hosts => ["https://opensearch:9200"]
+    hosts => ["${OPENSEARCH_URL}"]
     index => "acm-%{+YYYY.MM.dd}"
     user => "admin"
     password => "${OPENSEARCH_INITIAL_ADMIN_PASSWORD}"
Index: /branches/amp_4_0/platform/tools/container/services/nginx/conf.d/app.conf
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/nginx/conf.d/app.conf	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/services/nginx/conf.d/app.conf	(working copy)
@@ -101,8 +101,8 @@
 
     # Grafana at /monitoring/
     location /monitoring/ {
-        set $grafana "http://grafana:3000";
-        proxy_pass $grafana;  # No trailing slash!
+        # set $grafana "http://host.docker.internal:3000";
+        proxy_pass http://host.docker.internal:3000; # Static pass allows system resolver to read /etc/hosts
         proxy_http_version 1.1;
         proxy_set_header Host $host;
         proxy_set_header X-Real-IP $remote_addr;
Index: /branches/amp_4_0/platform/tools/container/services/opensearch/entrypoint_wrapper.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/opensearch/entrypoint_wrapper.sh	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/opensearch/entrypoint_wrapper.sh	(working copy)
@@ -0,0 +1,110 @@
+#!/bin/bash
+set -e
+set -x # Enable Debug Mode to see exactly what fails
+
+echo "--- Starting OpenSearch Entrypoint Wrapper (Debug v2 - No XARGS) ---"
+echo "Hostname: $(hostname)"
+echo "Path: $PATH"
+
+# Accumulate all possible IPs found
+CANDIDATE_IPS=""
+
+# Method 1: 'hostname -I' (Capital I)
+if command -v hostname >/dev/null 2>&1; then
+    IPS=$(hostname -I 2>/dev/null || true)
+    CANDIDATE_IPS="$CANDIDATE_IPS $IPS"
+fi
+
+# Method 2: 'hostname -i' (Lowercase i)
+if command -v hostname >/dev/null 2>&1; then
+    IPS=$(hostname -i 2>/dev/null || true)
+    CANDIDATE_IPS="$CANDIDATE_IPS $IPS"
+fi
+
+# Method 3: 'getent hosts'
+if command -v getent >/dev/null 2>&1; then
+    IPS=$(getent hosts $(hostname) 2>/dev/null | awk '{print $1}' || true)
+    CANDIDATE_IPS="$CANDIDATE_IPS $IPS"
+fi
+
+# Method 4: Parse /proc/net/fib_trie (No tools needed, works on most kernels)
+if [ -f /proc/net/fib_trie ]; then
+    # Look for lines like "|-- 192.168.x.x" followed by "/32 host LOCAL"
+    IPS=$(grep -B1 "host LOCAL" /proc/net/fib_trie | grep -oP '(?<= \|-- )\d+(\.\d+){3}' || true)
+    CANDIDATE_IPS="$CANDIDATE_IPS $IPS"
+fi
+
+echo "Debug: Candidate IPs found: $CANDIDATE_IPS"
+
+# Select Best IP
+SELECTED_IP=""
+
+# Convert comma-separated string to space-separated for easier matching
+if [ -n "$DISCOVERY_SEED_HOSTS" ]; then
+    SEED_LIST=$(echo "$DISCOVERY_SEED_HOSTS" | tr ',' ' ')
+else
+    SEED_LIST=""
+fi
+
+echo "Debug: Discovery Seeds: $SEED_LIST"
+
+# Pass 1: Look for an IP that is strictly in the Seed List (Best Match)
+for ip in $CANDIDATE_IPS; do
+    if [ -z "$ip" ]; then continue; fi
+    # Skip loopback
+    if [[ "$ip" == 127.* ]]; then continue; fi
+
+    for seed in $SEED_LIST; do
+        if [[ "$ip" == "$seed" ]]; then
+            echo "✅ Found IP in Seed List: $ip"
+            SELECTED_IP="$ip"
+            break 2
+        fi
+    done
+done
+
+# Pass 2: If no seed match, use heuristic (192.168.x.x)
+if [ -z "$SELECTED_IP" ]; then
+    echo "⚠️  No IP matched Seed List. Using Heuristic."
+    for ip in $CANDIDATE_IPS; do
+        if [ -z "$ip" ]; then continue; fi
+        if [[ "$ip" == 127.* ]]; then continue; fi
+        
+        # Prioritize 192.168.
+        if [[ "$ip" == 192.168.* ]]; then
+            # Avoid known bad IPs? (e.g. if we knew them)
+            # For now, just take first 192.
+            SELECTED_IP="$ip"
+            break
+        fi
+        
+        # Keep looking but store non-preferred as fallback
+        if [ -z "$SELECTED_IP" ]; then
+             if [[ "$ip" != 172.17.* ]] && [[ "$ip" != 172.18.* ]]; then
+                 SELECTED_IP="$ip"
+             fi
+        fi
+    done
+fi
+
+# Last Resort Fallback (if only docker ip was found)
+if [ -z "$SELECTED_IP" ]; then
+    for ip in $CANDIDATE_IPS; do
+        if [[ "$ip" != 127.* ]]; then
+             SELECTED_IP="$ip"
+             break
+        fi
+    done
+fi
+
+if [ -n "$SELECTED_IP" ]; then
+    export OPENSEARCH_PUBLISH_HOST="$SELECTED_IP"
+    echo "✅ Detected Publish Host: $OPENSEARCH_PUBLISH_HOST"
+else
+    echo "❌ Failed to detect Host IP. Defaulting to 127.0.0.1"
+    export OPENSEARCH_PUBLISH_HOST="127.0.0.1"
+fi
+
+# Hand over to original entrypoint
+set +x
+exec /usr/share/opensearch/opensearch-docker-entrypoint.sh opensearch

Property changes on: platform/tools/container/services/opensearch/entrypoint_wrapper.sh
___________________________________________________________________
Added: svn:executable
## -0,0 +1 ##
+*
\ No newline at end of property
Index: /branches/amp_4_0/platform/tools/container/services/opensearch/opensearch.yml
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/opensearch/opensearch.yml	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/services/opensearch/opensearch.yml	(working copy)
@@ -1,15 +1,13 @@
+# Cluster configuration
 cluster.name: opensearch-cluster
-network.host: 0.0.0.0
+network.host: "0.0.0.0"
+network.publish_host: ${OPENSEARCH_PUBLISH_HOST}
 
-# Bind to all interfaces
-transport.host: 0.0.0.0
+discovery.seed_hosts: ${DISCOVERY_SEED_HOSTS}
+cluster.initial_master_nodes: ${CLUSTER_INITIAL_MASTER_NODES}
 
-# Disable memory lock (swarm usually doesn't allow memlock unless configured)
-bootstrap.memory_lock: false
+node.name: ${NODE_NAME}
 
-# Single node discovery for now (as per stack.yml)
-discovery.type: single-node
-
 # --- Security Configuration ---
 plugins.security.ssl.http.enabled: true
 plugins.security.ssl.http.pemkey_filepath: /usr/share/opensearch/config/certs/node-key.pem
@@ -20,10 +18,20 @@
 plugins.security.ssl.transport.pemkey_filepath: /usr/share/opensearch/config/certs/node-key.pem
 plugins.security.ssl.transport.pemcert_filepath: /usr/share/opensearch/config/certs/node.pem
 plugins.security.ssl.transport.pemtrustedcas_filepath: /usr/share/opensearch/config/certs/root-ca.pem
+plugins.security.ssl.transport.enforce_hostname_verification: false
 
 # Allowed Admin DNs (Fixing the parsing issue)
 plugins.security.authcz.admin_dn:
   - "CN=admin,O=OpenSearchAdmin,L=Bengaluru,ST=Karnataka,C=IN"
 
+plugins.security.nodes_dn:
+  - "CN=node-1,O=OpenSearchNode,L=Bengaluru,ST=Karnataka,C=IN"
+
 plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
 plugins.security.allow_default_init_securityindex: true
+
+plugins.security.disabled: false
+
+# Robust Discovery Timeouts
+discovery.request_peers_timeout: 15s
+transport.connect_timeout: 15s
Index: /branches/amp_4_0/platform/tools/container/services/setup/setup.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/setup/setup.sh	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/services/setup/setup.sh	(working copy)
@@ -44,14 +44,14 @@
     chmod 600 "$OUTPUT_DIR"/*.pem
     chmod 600 "$OUTPUT_DIR"/*-key.pem
     
-    # Critical: Change ownership to 1000:1000 so OpenSearch container (non-root) can access
-    # Since we are likely running as root or a different user in setup, we force the ownership.
-    chown -R 1000:1000 "$OUTPUT_DIR"
-    chown -R 1000:1000 "$CONFIG_DIR"
 else
     log "Certificates exist."
 fi
 
+# Critical: Always ensure ownership is 1000:1000 so OpenSearch container (non-root) can access
+chown -R 1000:1000 "$OUTPUT_DIR"
+chown -R 1000:1000 "$CONFIG_DIR"
+
 # --- 2. Config Generation (config.yml) ---
 CONFIG_FILE="$CONFIG_DIR/config.yml"
 log "Generating config.yml with JWT secret..."
Index: /branches/amp_4_0/platform/tools/container/services/telegraf/telegraf.conf
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/telegraf/telegraf.conf	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/services/telegraf/telegraf.conf	(working copy)
@@ -15,7 +15,7 @@
 
 # === Output: TimescaleDB/PostgreSQL ===
 [[outputs.postgresql]]
-  connection = "host=timescaledb port=5432 user=amp_ts_user password=Array@123$$ dbname=amp_ts sslmode=disable"
+  connection = "host=127.0.0.1 port=5432 user=amp_ts_user password=Array@123$ dbname=amp_ts sslmode=disable"
   schema     = "public"
   # Route metrics to Timescale, configured to drop logs
   namedrop = ["docker_log"]
Index: /branches/amp_4_0/platform/tools/container/services/timescaledb/Dockerfile.patroni
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/timescaledb/Dockerfile.patroni	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/timescaledb/Dockerfile.patroni	(working copy)
@@ -0,0 +1,29 @@
+FROM timescale/timescaledb:latest-pg14
+
+# Install Python and Patroni
+# Install Python and Patroni
+USER root
+# It seems the base image is Alpine-based (or Wolfi), so we use apk.
+# We also need build dependencies for psycopg2.
+RUN apk add --no-cache python3 py3-pip git gcc musl-dev python3-dev postgresql-dev su-exec
+
+# Create a virtual environment for Patroni
+RUN python3 -m venv /opt/patroni-venv
+ENV PATH="/opt/patroni-venv/bin:$PATH"
+
+# Install Patroni with Etcd3 support
+# We use psycopg2 instead of binary to ensure compatibility with Alpine
+RUN pip install --no-cache-dir "patroni[etcd3]" psycopg2
+
+# Create config directory
+RUN mkdir -p /etc/patroni
+RUN chown postgres:postgres /etc/patroni
+
+COPY patroni.yml /etc/patroni/patroni.yml
+COPY entrypoint.sh /usr/local/bin/entrypoint.sh
+USER root
+RUN chmod +x /usr/local/bin/entrypoint.sh
+USER postgres
+
+# We override the entrypoint to run Patroni
+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
Index: /branches/amp_4_0/platform/tools/container/services/timescaledb/entrypoint.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/timescaledb/entrypoint.sh	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/timescaledb/entrypoint.sh	(working copy)
@@ -0,0 +1,7 @@
+#!/bin/bash
+export POD_NAME=$(hostname)
+export POD_IP=$(hostname -i)
+
+echo "Starting Patroni on $POD_NAME ($POD_IP)..."
+
+exec /opt/patroni-venv/bin/patroni /etc/patroni/patroni.yml
Index: /branches/amp_4_0/platform/tools/container/services/timescaledb/patroni.yml
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/timescaledb/patroni.yml	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/timescaledb/patroni.yml	(working copy)
@@ -0,0 +1,61 @@
+scope: timescaledb-cluster-stable
+namespace: /db/
+
+name: amp_timescaledb
+
+restapi:
+  listen: 0.0.0.0:8008
+  connect_address: '__HOST_IP__:8008'
+
+# etcd3 hosts are configured via PATRONI_ETCD3_HOSTS env var
+etcd3: {}
+
+bootstrap:
+  post_bootstrap: /etc/patroni/post_bootstrap.sh
+  dcs:
+    ttl: 30
+    loop_wait: 10
+    retry_timeout: 10
+    maximum_lag_on_failover: 1048576
+    postgresql:
+      use_pg_rewind: true
+      parameters:
+        max_connections: 100
+        max_worker_processes: 8
+        wal_level: replica
+        hot_standby: "on"
+        wal_keep_segments: 8
+        max_wal_senders: 25
+        max_replication_slots: 25
+        wal_log_hints: "on"
+        # archive_mode: "on"
+        # archive_command: "mkdir -p /var/lib/postgresql/data/archive && test ! -f /var/lib/postgresql/data/archive/%f && cp %p /var/lib/postgresql/data/archive/%f"
+
+  initdb:
+  - encoding: UTF8
+  - data-checksums
+  pg_hba:
+  - host all all 0.0.0.0/0 md5
+  - host replication replica 0.0.0.0/0 md5
+
+postgresql:
+  listen: 0.0.0.0:5433 # Listen on 5433 for Host Networking
+  connect_address: '__HOST_IP__:5433'
+  data_dir: /var/lib/postgresql/data/db
+  pgpass: /tmp/pgpass
+  authentication:
+    replication:
+      username: replica
+      password: '${PATRONI_REPLICATION_PASSWORD}'
+    superuser:
+      username: postgres
+      password: '${PATRONI_SUPERUSER_PASSWORD}'
+      
+  parameters:
+    unix_socket_directories: '.'
+
+tags:
+  nofailover: false
+  noloadbalance: false
+  clonefrom: false
+  nosync: false
Index: /branches/amp_4_0/platform/tools/container/services/timescaledb/post_bootstrap.sh
===================================================================
--- /branches/amp_4_0/platform/tools/container/services/timescaledb/post_bootstrap.sh	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/services/timescaledb/post_bootstrap.sh	(working copy)
@@ -0,0 +1,100 @@
+#!/bin/bash
+set -e
+
+# This script is run by Patroni after the cluster is bootstrapped.
+# It expects standard PG environment variables or variables passed via proper application config.
+export PGPORT=5433
+export PGHOST=127.0.0.1
+# PGPASSWORD is implicitly used if set in environment, or we specifically set it below
+export PGPASSWORD="${PATRONI_SUPERUSER_PASSWORD}"
+
+echo "--- Post-Bootstrap: User Creation Started ---"
+
+# 1. Create Grafana User and Database
+GRAFANA_USER="grafana"
+# Use provided password or default
+GRAFANA_PASSWORD="${GRAFANA_PASSWORD:-GArr@y2050}"
+GRAFANA_DB="grafana"
+
+# Check if user exists (idempotency mainly for manual reruns, normally bootstrap runs once)
+if psql -U postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='$GRAFANA_USER'" | grep -q 1; then
+    echo "User $GRAFANA_USER already exists."
+else
+    echo "Creating user $GRAFANA_USER..."
+    psql -U postgres -c "CREATE USER $GRAFANA_USER WITH PASSWORD '$GRAFANA_PASSWORD';"
+fi
+
+if psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='$GRAFANA_DB'" | grep -q 1; then
+    echo "Database $GRAFANA_DB already exists."
+else
+    echo "Creating database $GRAFANA_DB..."
+    psql -U postgres -c "CREATE DATABASE $GRAFANA_DB OWNER $GRAFANA_USER;"
+fi
+
+
+# 2. Create AMP Application User and Database
+# These variables should be passed into the container
+AMP_USER="${AMP_DB_USER:-amp_ts_user}"
+AMP_PASS="${AMP_DB_PASSWORD:-Array@123\$}"
+AMP_DB="${AMP_DB_NAME:-amp_ts}"
+
+if psql -U postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='$AMP_USER'" | grep -q 1; then
+    echo "User $AMP_USER already exists."
+else
+    echo "Creating user $AMP_USER..."
+    psql -U postgres -c "CREATE USER $AMP_USER WITH PASSWORD '$AMP_PASS';"
+fi
+
+if psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='$AMP_DB'" | grep -q 1; then
+    echo "Database $AMP_DB already exists."
+else
+    echo "Creating database $AMP_DB..."
+    psql -U postgres -c "CREATE DATABASE $AMP_DB OWNER $AMP_USER;"
+fi
+
+# 3. Create Admin User and CM Database
+# Corresponds to pgbouncer configuration
+ADMIN_USER="amp_admin"
+ADMIN_PASS="${PATRONI_SUPERUSER_PASSWORD}" # Same as postgres superuser for simplicity in this setup
+CM_DB="cm"
+
+if psql -U postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='$ADMIN_USER'" | grep -q 1; then
+    echo "User $ADMIN_USER already exists."
+else
+    echo "Creating user $ADMIN_USER..."
+    psql -U postgres -c "CREATE USER $ADMIN_USER WITH PASSWORD '$ADMIN_PASS';"
+fi
+
+if psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='$CM_DB'" | grep -q 1; then
+    echo "Database $CM_DB already exists."
+else
+    echo "Creating database $CM_DB..."
+    psql -U postgres -c "CREATE DATABASE $CM_DB OWNER $ADMIN_USER;"
+    
+    # Run init_db.sql against cm
+    if [ -f /docker-entrypoint-initdb.d/init_db.sql ]; then
+        echo "Running init_db.sql against $CM_DB..."
+        # Running as ADMIN_USER or postgres? Script has '\c cm' which requires connection privileges.
+        # Run as postgres to be safe, grant ownership later if needed, or rely on OWNER clause above.
+        psql -U postgres -d $CM_DB -f /docker-entrypoint-initdb.d/init_db.sql
+    fi
+fi
+
+# 4. Initialize Tables for AMP_TS (Hypertables)
+echo "Running initialization scripts for $AMP_DB..."
+if [ -f /docker-entrypoint-initdb.d/02_telegraf_snmp.sql ]; then
+    echo "Executing 02_telegraf_snmp.sql..."
+    # These scripts explicitly do '\c amp_ts', so we can connect to default and let script handle switch,
+    # OR connect directly. Script starts with BEGIN; \c amp_ts ...
+    # BUT psql -f runs in single session. ' \c' might not work if not in interactive mode or allowed.
+    # Safe way: Connect directly to amp_ts.
+    # Note: Scripts have '\c amp_ts' which might fail if we are already connected? No, it works in psql.
+    psql -U postgres -d $AMP_DB -f /docker-entrypoint-initdb.d/02_telegraf_snmp.sql
+fi
+
+if [ -f /docker-entrypoint-initdb.d/03_docker_metrics.sql ]; then
+    echo "Executing 03_docker_metrics.sql..."
+    psql -U postgres -d $AMP_DB -f /docker-entrypoint-initdb.d/03_docker_metrics.sql
+fi
+
+echo "--- Post-Bootstrap: User Creation and Schema Initialization Completed ---"
Index: /branches/amp_4_0/platform/tools/container/stack.yml
===================================================================
--- /branches/amp_4_0/platform/tools/container/stack.yml	(revision 2857)
+++ /branches/amp_4_0/platform/tools/container/stack.yml	(working copy)
@@ -2,6 +2,9 @@
   amp-overlay:
     driver: overlay
     attachable: true
+  hostnet:
+    external: true
+    name: host
 
 configs:
   nginx_app_conf:
@@ -26,6 +29,29 @@
     file: services/grafana/provisioning/datasources/datasources.yaml
   opensearch_config:
     file: services/opensearch/opensearch.yml
+  haproxy_cfg:
+    file: services/haproxy/haproxy.cfg
+  patroni_yml:
+    file: services/timescaledb/patroni.yml
+  timescaledb_post_bootstrap:
+    file: services/timescaledb/post_bootstrap.sh
+  init_db_sql:
+    file: services/postgres/initdb.d/init_db.sql
+  telegraf_snmp_sql:
+    file: services/postgres/initdb.d/02_telegraf_snmp.sql
+  docker_metrics_sql:
+    file: services/postgres/initdb.d/03_docker_metrics.sql
+  # Certificate configs (created by manage_amp.sh from certs-vol)
+  cert_root-ca_pem:
+    external: true
+  cert_node_pem:
+    external: true
+  cert_node-key_pem:
+    external: true
+  cert_admin_pem:
+    external: true
+  cert_admin-key_pem:
+    external: true
 
 secrets:
   opensearch_initial_admin_password:
@@ -41,51 +67,87 @@
     image: ${REGISTRY:-127.0.0.1:5000}/amp/opensearch:latest
     deploy:
       mode: replicated
-      replicas: 1
+      replicas: 3
+      # endpoint_mode: dnsrr  <-- Not supported/needed with host networking
       placement:
-        constraints:
-          - node.labels.type == storage
+        max_replicas_per_node: 1  # Vital for Host Networking to avoid port conflicts
       restart_policy:
         condition: on-failure
     environment:
-      - bootstrap.memory_lock=false
+      - "bootstrap.memory_lock=false"
       - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
-      - OPENSEARCH_INITIAL_ADMIN_PASSWORD_FILE=/run/secrets/opensearch_initial_admin_password
+      - "OPENSEARCH_INITIAL_ADMIN_PASSWORD_FILE=/run/secrets/opensearch_initial_admin_password"
+      # Host Networking Discovery
+      - "DISCOVERY_SEED_HOSTS=${OPENSEARCH_SEED_HOSTS}"
+      - "CLUSTER_INITIAL_MASTER_NODES=${OPENSEARCH_INITIAL_MASTER_NODES}"
+      - "OPENSEARCH_PUBLISH_HOST=${OPENSEARCH_PUBLISH_HOST}"
+      - "NODE_NAME=node-{{.Task.Slot}}"
+      - "OPENSEARCH_PREFIX=https://192.168.162.139:9200" # For reference/other tools might use?
     configs:
       - source: opensearch_config
         target: /usr/share/opensearch/config/opensearch.yml
-    healthcheck:
-      test: ["CMD-SHELL", "curl -f -k -u admin:$$(cat /run/secrets/opensearch_initial_admin_password) https://localhost:9200/_cluster/health || exit 1"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-      start_period: 60s
+      - source: cert_root-ca_pem
+        target: /usr/share/opensearch/config/certs/root-ca.pem
+      - source: cert_node_pem
+        target: /usr/share/opensearch/config/certs/node.pem
+      - source: cert_node-key_pem
+        target: /usr/share/opensearch/config/certs/node-key.pem
+      - source: cert_admin_pem
+        target: /usr/share/opensearch/config/certs/admin.pem
+      - source: cert_admin-key_pem
+        target: /usr/share/opensearch/config/certs/admin-key.pem
     command: /usr/share/opensearch/opensearch-docker-entrypoint.sh opensearch
     secrets:
       - opensearch_initial_admin_password
     volumes:
       - opensearch-data:/usr/share/opensearch/data
-      - certs-vol:/usr/share/opensearch/config/certs:ro
       - security-config-vol:/usr/share/opensearch/config/opensearch-security-mount:ro
       - opensearch-logs:/usr/share/opensearch/logs
     networks:
-      - amp-overlay
+      - hostnet
 
   timescaledb:
     image: ${REGISTRY:-127.0.0.1:5000}/amp/timescaledb:latest
     deploy:
       mode: replicated
-      replicas: 1
+      replicas: 3
       placement:
         constraints:
-          - node.labels.type == storage
+          - node.labels.type != witness # Optional
+        max_replicas_per_node: 1 # CRITICAL for Host Networking
+    user: root # Run as root to fix permissions, then drop to postgres
     healthcheck:
-      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres}"]
+      test: ["CMD", "patronictl", "list"]
       interval: 10s
       timeout: 5s
       retries: 5
-    ports:
-      - "${AMP_TIMESCALEDB_PORT:-5432}:5432"
+    # Host Networking - Dynamic IP Detection
+    # we use sh -c to calculate the IP at runtime, then exec patroni.
+    # ip route get 1.1.1.1 gives us the route to internet, usually via the main interface.
+    # awk '{print $7}' extracts the source IP.
+    entrypoint:
+      - /bin/sh
+      - -c
+      - |
+        umask 0077
+        export MY_IP=$$(ip route get 1.1.1.1 | awk '{print $$7}')
+        echo "✅ Detected Host IP: $$MY_IP"
+        # Fix permissions for the data directory (since volume is root-owned)
+        mkdir -p /var/lib/postgresql/data/db 
+        chown -R postgres:postgres /var/lib/postgresql/data/db
+        chmod 0700 /var/lib/postgresql/data/db
+        
+        cp /etc/patroni_template.yml /tmp/patroni.yml
+        sed -i "s/__HOST_IP__/$$MY_IP/g" /tmp/patroni.yml
+        export PATRONI_POSTGRESQL_CONNECT_ADDRESS="$$MY_IP:5433"
+        export PATRONI_RESTAPI_CONNECT_ADDRESS="$$MY_IP:8008"
+        export POD_IP="$$MY_IP"
+        
+        # Ensure config is readable by postgres user
+        chown postgres:postgres /tmp/patroni.yml
+        
+        # Drop privileges to postgres user and run Patroni
+        exec su-exec postgres patroni /tmp/patroni.yml
     environment:
       - POSTGRES_PASSWORD_FILE=/run/secrets/pg_password
       - POSTGRES_USER=${POSTGRES_USER}
@@ -93,19 +155,130 @@
       - AMP_DB_USER=${AMP_DB_USER}
       - AMP_DB_PASSWORD=${AMP_DB_PASSWORD}
       - AMP_DB_NAME=${AMP_DB_NAME}
+      - GRAFANA_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-GArr@y2050}
+      # Patroni Config
+      - PATRONI_REPLICATION_PASSWORD=${POSTGRES_PASSWORD}
+      - PATRONI_SUPERUSER_PASSWORD=${POSTGRES_PASSWORD}
+      - PATRONI_NAME={{.Task.Name}}
+      - POD_NAME={{.Task.Name}}
+      # POD_IP removed (calculated in command)
+      - PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5433
+      - PATRONI_RESTAPI_LISTEN=0.0.0.0:8008
+      - PATRONI_ETCD3_HOSTS=${PATRONI_ETCD3_HOSTS}
+    configs:
+      - source: patroni_yml
+        target: /etc/patroni_template.yml
+      - source: timescaledb_post_bootstrap
+        target: /etc/patroni/post_bootstrap.sh
+        mode: 0555
+      - source: init_db_sql
+        target: /docker-entrypoint-initdb.d/init_db.sql
+      - source: telegraf_snmp_sql
+        target: /docker-entrypoint-initdb.d/02_telegraf_snmp.sql
+      - source: docker_metrics_sql
+        target: /docker-entrypoint-initdb.d/03_docker_metrics.sql
     secrets:
       - pg_password
     volumes:
+      # Local volume for each instance
       - timescaledb-data:/var/lib/postgresql/data
-      - ./services/postgres/initdb.d:/docker-entrypoint-initdb.d:ro
     networks:
-      - amp-overlay
+      - hostnet
+
+  etcd1:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/etcd:latest
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.hostname == amp-node-1
+    environment:
+      - ETCD_NAME=etcd1
+      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
+      - ETCD_ADVERTISE_CLIENT_URLS=http://${DB_HOST_1}:2379
+      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
+      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${DB_HOST_1}:2380
+      - ETCD_INITIAL_CLUSTER=etcd1=http://${DB_HOST_1}:2380,etcd2=http://${DB_HOST_2}:2380,etcd3=http://${DB_HOST_3}:2380
+      - ETCD_INITIAL_CLUSTER_STATE=new
+      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
+      - ETCD_DATA_DIR=/data
+    volumes:
+      - etcd-data:/data
+    networks:
+      - hostnet
 
+  etcd2:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/etcd:latest
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.hostname == amp-node-2
+    environment:
+      - ETCD_NAME=etcd2
+      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
+      - ETCD_ADVERTISE_CLIENT_URLS=http://${DB_HOST_2}:2379
+      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
+      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${DB_HOST_2}:2380
+      - ETCD_INITIAL_CLUSTER=etcd1=http://${DB_HOST_1}:2380,etcd2=http://${DB_HOST_2}:2380,etcd3=http://${DB_HOST_3}:2380
+      - ETCD_INITIAL_CLUSTER_STATE=new
+      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
+      - ETCD_DATA_DIR=/data
+    volumes:
+      - etcd-data:/data
+    networks:
+      - hostnet
+
+  etcd3:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/etcd:latest
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.hostname == amp-node-3
+    environment:
+      - ETCD_NAME=etcd3
+      - ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
+      - ETCD_ADVERTISE_CLIENT_URLS=http://${DB_HOST_3}:2379
+      - ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
+      - ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${DB_HOST_3}:2380
+      - ETCD_INITIAL_CLUSTER=etcd1=http://${DB_HOST_1}:2380,etcd2=http://${DB_HOST_2}:2380,etcd3=http://${DB_HOST_3}:2380
+      - ETCD_INITIAL_CLUSTER_STATE=new
+      - ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
+      - ETCD_DATA_DIR=/data
+    volumes:
+      - etcd-data:/data
+    networks:
+      - hostnet
+
+  haproxy:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/haproxy:latest
+    deploy:
+      mode: global
+      # replicas: 1
+      # placement:
+        # constraints:
+          # - node.hostname == amp-node-1
+    # ports: REMOVED (Host Networking binds 5432 directly)
+    environment:
+      - DB_HOST_1=${DB_HOST_1}
+      - DB_HOST_2=${DB_HOST_2}
+      - DB_HOST_3=${DB_HOST_3}
+    configs:
+      - source: haproxy_cfg
+        target: /usr/local/etc/haproxy/haproxy.cfg
+    networks:
+      - hostnet
+
   # --- Middleware ---
   pgbouncer:
     image: ${REGISTRY:-127.0.0.1:5000}/amp/pgbouncer:latest
     deploy:
-      replicas: 1
+      mode: global
+      # replicas: 1
+      # placement:
+      #   constraints:
+      #     - node.hostname == amp-node-1 # Must be on same node as HAProxy (which binds 5432 host port)
     environment:
       - AGENT_HOSTNAME=amp-docker-agent
       - HOST_PROC=/host/proc
@@ -114,7 +287,7 @@
       - AMP_DB_NAME=${AMP_DB_NAME:-amp_ts}
       - AMP_DB_USER=${AMP_DB_USER:-amp_ts_user}
       - AMP_DB_PASSWORD=${AMP_DB_PASSWORD:-Array@123$}
-      - DB_HOST=timescaledb
+      - DB_HOST=127.0.0.1
       - DB_NAME=cm
     configs:
       - source: pgbouncer_ini
@@ -122,24 +295,31 @@
       - source: pgbouncer_userlist
         target: /etc/pgbouncer/userlist.txt
     networks:
-      - amp-overlay
+      - hostnet
 
   # --- Frontend / Observability ---
   nginx:
     image: ${REGISTRY:-127.0.0.1:5000}/amp/nginx:latest
     deploy:
-      replicas: 1
-      placement:
-        constraints:
-          - node.role == manager # Often helps to run ingress on manager or explicit edge nodes
+      mode: global
+      # replicas: 1
+      # placement:
+      #   constraints:
+      #     - node.role == manager # Often helps to run ingress on manager or explicit edge nodes
     healthcheck:
       test: ["CMD-SHELL", "wget --spider http://localhost/health || wget --spider http://localhost/ || exit 1"]
       interval: 30s
       timeout: 5s
       retries: 3
     ports:
-      - "${AMP_NGINX_HTTPS_PORT:-443}:443"
-      - "${AMP_NGINX_HTTP_PORT:-80}:80"
+      - target: 443
+        published: ${AMP_NGINX_HTTPS_PORT:-443}
+        protocol: tcp
+        mode: host
+      - target: 80
+        published: ${AMP_NGINX_HTTP_PORT:-80}
+        protocol: tcp
+        mode: host
     configs:
       - source: nginx_app_conf
         target: /etc/nginx/conf.d/app.conf
@@ -156,7 +336,7 @@
     deploy:
       replicas: 1
     environment:
-      OPENSEARCH_HOSTS: '["https://opensearch:9200"]'
+      OPENSEARCH_HOSTS: '["https://${AMP_DOMAIN_OR_IP}:9200"]' # USE PHYSICAL IP
       OPENSEARCH_SSL_VERIFICATIONMODE: certificate
       OPENSEARCH_USERNAME: admin
       OPENSEARCH_SSL_CERTIFICATEAUTHORITIES: '["/usr/share/opensearch-dashboards/config/certs/root-ca.pem"]'
@@ -180,15 +360,20 @@
     image: ${REGISTRY:-127.0.0.1:5000}/amp/grafana:latest
     deploy:
       replicas: 1
-      placement:
-        constraints:
-          - node.labels.type == storage # Pinned for sqlite db
+      restart_policy:
+        condition: on-failure
     healthcheck:
       test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
       interval: 30s
       timeout: 10s
       retries: 3
     environment:
+      - GF_DATABASE_TYPE=postgres
+      - GF_DATABASE_HOST=${AMP_DOMAIN_OR_IP}:5432 # Connects to HAProxy (Host Networking)
+      - GF_DATABASE_NAME=grafana
+      - GF_DATABASE_USER=grafana
+      - GF_DATABASE_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-GArr@y2050}
+      - GF_DATABASE_SSL_MODE=disable
       - GF_LOG_LEVEL=debug
       - GF_SERVER_ROOT_URL=https://${AMP_DOMAIN_OR_IP:-localhost}/monitoring/
       - GF_SERVER_SERVE_FROM_SUB_PATH=true
@@ -210,11 +395,14 @@
       - DS_AMP_DB_PASSWORD=${AMP_DB_PASSWORD:-Array@123$}
       - DS_OS_USER=admin
       - DS_OS_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD:-Arr@y2050}
+      - DS_OS_URL=https://${AMP_DOMAIN_OR_IP}:9200
     configs:
       - source: grafana_datasources
         target: /etc/grafana/provisioning/datasources/datasources.yaml
     volumes:
-      - grafana-data:/var/lib/grafana
+      # Removed local sqlite volume, data is now in Postgres
+      # - grafana-data:/var/lib/grafana
+      - /dev/null:/dev/null # Placeholder to keep yaml valid if empty
     networks:
       - amp-overlay
     ports:
@@ -248,7 +436,7 @@
       - /sys:/rootfs/sys:ro
       - /etc:/rootfs/etc:ro
     networks:
-      - amp-overlay
+      - hostnet
 
   logstash:
     image: ${REGISTRY:-127.0.0.1:5000}/amp/logstash:latest
@@ -258,10 +446,11 @@
       - "514:5514/tcp"
       - "514:5514/udp"
     environment:
-      OPENSEARCH_URL: https://opensearch:9200
+      OPENSEARCH_URL: https://${AMP_DOMAIN_OR_IP:-localhost}:9200 # Clean URL for Logstash
       POSTGRES_PASSWORD_FILE: /run/secrets/pg_password
       POSTGRES_USER: postgres
       POSTGRES_DB: cm
+      DB_HOST: ${AMP_DOMAIN_OR_IP} # For JDBC Filter
     command: >
       bash -c "export OPENSEARCH_INITIAL_ADMIN_PASSWORD=\$$(cat /run/secrets/opensearch_initial_admin_password) && /usr/share/logstash/bin/logstash"
 
@@ -282,6 +471,7 @@
 volumes:
   opensearch-data:
   timescaledb-data:
+  etcd-data:
   grafana-data:
   certs-vol:
     external: true
Index: /branches/amp_4_0/platform/tools/container/stack.yml.template
===================================================================
--- /branches/amp_4_0/platform/tools/container/stack.yml.template	(nonexistent)
+++ /branches/amp_4_0/platform/tools/container/stack.yml.template	(working copy)
@@ -0,0 +1,433 @@
+networks:
+  amp-overlay:
+    driver: overlay
+    attachable: true
+  hostnet:
+    external: true
+    name: host
+
+configs:
+  nginx_app_conf:
+    file: services/nginx/conf.d/app.conf
+  telegraf_main_conf:
+    file: services/telegraf/telegraf.conf
+  telegraf_ag_conf:
+    file: services/telegraf/telegraf.d/ag.toml
+  telegraf_apv_conf:
+    file: services/telegraf/telegraf.d/apv.toml
+  telegraf_asf_conf:
+    file: services/telegraf/telegraf.d/asf.toml
+  logstash_pipeline_conf:
+    file: services/logstash/pipeline/syslog.conf
+  logstash_config_yml:
+    file: services/logstash/config/logstash.yml
+  pgbouncer_ini:
+    file: services/pgbouncer/pgbouncer.ini
+  pgbouncer_userlist:
+    file: services/pgbouncer/userlist.txt
+  grafana_datasources:
+    file: services/grafana/provisioning/datasources/datasources.yaml
+  opensearch_config:
+    file: services/opensearch/opensearch.yml
+  haproxy_cfg:
+    file: services/haproxy/haproxy.cfg
+  patroni_yml:
+    file: services/timescaledb/patroni.yml
+  timescaledb_post_bootstrap:
+    file: services/timescaledb/post_bootstrap.sh
+  init_db_sql:
+    file: services/postgres/initdb.d/init_db.sql
+  telegraf_snmp_sql:
+    file: services/postgres/initdb.d/02_telegraf_snmp.sql
+  docker_metrics_sql:
+    file: services/postgres/initdb.d/03_docker_metrics.sql
+  ${ENTRYPOINT_CONFIG}:
+    file: services/opensearch/entrypoint_wrapper.sh
+  # Certificate configs (created by manage_amp.sh from certs-vol)
+  cert_root-ca_pem:
+    external: true
+  cert_node_pem:
+    external: true
+  cert_node-key_pem:
+    external: true
+  cert_admin_pem:
+    external: true
+  cert_admin-key_pem:
+    external: true
+
+secrets:
+  opensearch_initial_admin_password:
+    external: true
+  pg_password:
+    external: true
+  opensearch_jwt_secret:
+    external: true
+
+services:
+  # --- Core Storage (Pinned to Storage Node) ---
+  opensearch:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/opensearch:latest
+    deploy:
+      mode: replicated
+      replicas: ${AMP_REPLICAS:-3}
+
+      placement:
+        max_replicas_per_node: 1  # Vital for Host Networking to avoid port conflicts
+      restart_policy:
+        condition: on-failure
+    environment:
+      - "bootstrap.memory_lock=false"
+      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
+      - "OPENSEARCH_INITIAL_ADMIN_PASSWORD_FILE=/run/secrets/opensearch_initial_admin_password"
+      # Host Networking Discovery
+      - "DISCOVERY_SEED_HOSTS=${OPENSEARCH_SEED_HOSTS}"
+      - "CLUSTER_INITIAL_MASTER_NODES=${OPENSEARCH_INITIAL_MASTER_NODES}"
+      - "OPENSEARCH_PUBLISH_HOST=${OPENSEARCH_PUBLISH_HOST}"
+      - "NODE_NAME=node-{{.Task.Slot}}"
+      - "OPENSEARCH_PREFIX=https://${AMP_DOMAIN_OR_IP:-localhost}:9200" # For reference/other tools might use?
+    configs:
+      - source: opensearch_config
+        target: /usr/share/opensearch/config/opensearch.yml
+      - source: cert_root-ca_pem
+        target: /usr/share/opensearch/config/certs/root-ca.pem
+      - source: cert_node_pem
+        target: /usr/share/opensearch/config/certs/node.pem
+      - source: cert_node-key_pem
+        target: /usr/share/opensearch/config/certs/node-key.pem
+      - source: cert_admin_pem
+        target: /usr/share/opensearch/config/certs/admin.pem
+      - source: cert_admin-key_pem
+        target: /usr/share/opensearch/config/certs/admin-key.pem
+      - source: ${ENTRYPOINT_CONFIG}
+        target: /usr/local/bin/entrypoint_wrapper.sh
+        mode: 0755
+    command: ["/usr/local/bin/entrypoint_wrapper.sh"]
+
+    secrets:
+      - opensearch_initial_admin_password
+    volumes:
+      - opensearch-data:/usr/share/opensearch/data
+      - security-config-vol:/usr/share/opensearch/config/opensearch-security-mount:ro
+      - opensearch-logs:/usr/share/opensearch/logs
+    networks:
+      - hostnet
+
+  timescaledb:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/timescaledb:latest
+    deploy:
+      mode: replicated
+      replicas: ${AMP_REPLICAS:-3}
+      placement:
+        constraints:
+          - node.labels.type != witness # Optional
+        max_replicas_per_node: 1 # CRITICAL for Host Networking
+    user: root # Run as root to fix permissions, then drop to postgres
+    healthcheck:
+      test: ["CMD", "patronictl", "list"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    # Host Networking - Dynamic IP Detection
+    # we use sh -c to calculate the IP at runtime, then exec patroni.
+    # ip route get 1.1.1.1 gives us the route to internet, usually via the main interface.
+    # awk '{print $7}' extracts the source IP.
+    entrypoint:
+      - /bin/sh
+      - -c
+      - |
+        umask 0077
+        export MY_IP=$$(ip route get 1.1.1.1 | awk '{print $$7}')
+        echo "✅ Detected Host IP: $$MY_IP"
+        # Fix permissions for the data directory (since volume is root-owned)
+        mkdir -p /var/lib/postgresql/data/db 
+        chown -R postgres:postgres /var/lib/postgresql/data/db
+        chmod 0700 /var/lib/postgresql/data/db
+        
+        cp /etc/patroni_template.yml /tmp/patroni.yml
+        sed -i "s/__HOST_IP__/$$MY_IP/g" /tmp/patroni.yml
+        export PATRONI_POSTGRESQL_CONNECT_ADDRESS="$$MY_IP:5433"
+        export PATRONI_RESTAPI_CONNECT_ADDRESS="$$MY_IP:8008"
+        export POD_IP="$$MY_IP"
+        
+        # Ensure config is readable by postgres user
+        chown postgres:postgres /tmp/patroni.yml
+        
+        # Drop privileges to postgres user and run Patroni
+        exec su-exec postgres patroni /tmp/patroni.yml
+    environment:
+      - POSTGRES_PASSWORD_FILE=/run/secrets/pg_password
+      - POSTGRES_USER=${POSTGRES_USER}
+      - POSTGRES_DB=${POSTGRES_DB}
+      - AMP_DB_USER=${AMP_DB_USER}
+      - AMP_DB_PASSWORD=${AMP_DB_PASSWORD}
+      - AMP_DB_NAME=${AMP_DB_NAME}
+      - GRAFANA_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-GArr@y2050}
+      # Patroni Config
+      - PATRONI_REPLICATION_PASSWORD=${POSTGRES_PASSWORD}
+      - PATRONI_SUPERUSER_PASSWORD=${POSTGRES_PASSWORD}
+      - PATRONI_NAME={{.Task.Name}}
+      - POD_NAME={{.Task.Name}}
+      # POD_IP removed (calculated in command)
+      - PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5433
+      - PATRONI_RESTAPI_LISTEN=0.0.0.0:8008
+      - PATRONI_ETCD3_HOSTS=${PATRONI_ETCD3_HOSTS}
+    configs:
+      - source: patroni_yml
+        target: /etc/patroni_template.yml
+      - source: timescaledb_post_bootstrap
+        target: /etc/patroni/post_bootstrap.sh
+        mode: 0555
+      - source: init_db_sql
+        target: /docker-entrypoint-initdb.d/init_db.sql
+      - source: telegraf_snmp_sql
+        target: /docker-entrypoint-initdb.d/02_telegraf_snmp.sql
+      - source: docker_metrics_sql
+        target: /docker-entrypoint-initdb.d/03_docker_metrics.sql
+    secrets:
+      - pg_password
+    volumes:
+      # Local volume for each instance
+      - timescaledb-data:/var/lib/postgresql/data
+    networks:
+      - hostnet
+
+  # ETCD Services will be dynamically appended here by manage_amp.sh
+
+
+  haproxy:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/haproxy:latest
+    deploy:
+      mode: global
+
+    configs:
+      - source: haproxy_cfg
+        target: /usr/local/etc/haproxy/haproxy.cfg
+    networks:
+      - hostnet
+
+  # --- Middleware ---
+  pgbouncer:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/pgbouncer:latest
+    deploy:
+      mode: global
+
+    environment:
+      - AGENT_HOSTNAME=amp-docker-agent
+      - HOST_PROC=/host/proc
+      - HOST_SYS=/host/sys
+      - HOST_ETC=/host/etc
+      - AMP_DB_NAME=${AMP_DB_NAME:-amp_ts}
+      - AMP_DB_USER=${AMP_DB_USER:-amp_ts_user}
+      - AMP_DB_PASSWORD=${AMP_DB_PASSWORD:-Array@123$}
+      - DB_HOST=127.0.0.1
+      - DB_NAME=cm
+    configs:
+      - source: pgbouncer_ini
+        target: /etc/pgbouncer/pgbouncer.ini
+      - source: pgbouncer_userlist
+        target: /etc/pgbouncer/userlist.txt
+    networks:
+      - hostnet
+
+  # --- Frontend / Observability ---
+  nginx:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/nginx:latest
+    deploy:
+      mode: global
+
+    healthcheck:
+      test: ["CMD-SHELL", "wget --spider http://localhost/health || wget --spider http://localhost/ || exit 1"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+    ports:
+      - target: 443
+        published: ${AMP_NGINX_HTTPS_PORT:-443}
+        protocol: tcp
+        mode: host
+      - target: 80
+        published: ${AMP_NGINX_HTTP_PORT:-80}
+        protocol: tcp
+        mode: host
+    configs:
+      - source: nginx_app_conf
+        target: /etc/nginx/conf.d/app.conf
+      - source: cert_node_pem
+        target: /etc/nginx/certs/node.pem
+      - source: cert_node-key_pem
+        target: /etc/nginx/certs/node-key.pem
+      - source: cert_root-ca_pem
+        target: /etc/nginx/certs/root-ca.pem
+    volumes:
+      - type: bind
+        source: /dev/null
+        target: /etc/nginx/conf.d/default.conf
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    networks:
+      - amp-overlay
+
+  opensearch-dashboards:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/opensearch-dashboards:latest
+    deploy:
+      replicas: ${AMP_REPLICAS:-2}
+    environment:
+      OPENSEARCH_HOSTS: '${AMP_OS_HOSTS_JSON}' # USE PHYSICAL IP ARRAY
+      OPENSEARCH_SSL_VERIFICATIONMODE: certificate
+      OPENSEARCH_USERNAME: admin
+      OPENSEARCH_SSL_CERTIFICATEAUTHORITIES: '["/usr/share/opensearch-dashboards/config/certs/root-ca.pem"]'
+      SERVER_SSL_ENABLED: "true"
+      SERVER_SSL_KEY: /usr/share/opensearch-dashboards/config/certs/node-key.pem
+      SERVER_SSL_CERTIFICATE: /usr/share/opensearch-dashboards/config/certs/node.pem
+      SERVER_BASEPATH: /visualization
+      SERVER_REWRITEBASEPATH: "true"
+    configs:
+      - source: cert_root-ca_pem
+        target: /usr/share/opensearch-dashboards/config/certs/root-ca.pem
+      - source: cert_node_pem
+        target: /usr/share/opensearch-dashboards/config/certs/node.pem
+      - source: cert_node-key_pem
+        target: /usr/share/opensearch-dashboards/config/certs/node-key.pem
+    command: >
+      bash -c "export OPENSEARCH_PASSWORD=\$$(cat /run/secrets/opensearch_initial_admin_password) && /usr/share/opensearch-dashboards/opensearch-dashboards-docker-entrypoint.sh opensearch-dashboards"
+    secrets:
+      - opensearch_initial_admin_password
+
+    networks:
+      - amp-overlay
+    ports:
+      - "5601:5601"
+
+  grafana:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/grafana:latest
+    deploy:
+      mode: global
+      restart_policy:
+        condition: on-failure
+        delay: 10s
+    healthcheck:
+      test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    environment:
+      - GF_DATABASE_TYPE=postgres
+      - GF_DATABASE_HOST=host.docker.internal:5432 # Connects to Local Host's HAProxy via host-gateway
+      - GF_DATABASE_NAME=grafana
+      - GF_DATABASE_USER=grafana
+      - GF_DATABASE_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-GArr@y2050}
+      - GF_DATABASE_SSL_MODE=disable
+      - GF_LOG_LEVEL=debug
+      - GF_SERVER_ROOT_URL=https://${AMP_DOMAIN_OR_IP:-localhost}/monitoring/
+      - GF_SERVER_SERVE_FROM_SUB_PATH=true
+      - GF_SECURITY_COOKIE_SECURE=true
+      - GF_SECURITY_COOKIE_SAMESITE=lax
+      - GF_SERVER_DOMAIN=${AMP_DOMAIN_OR_IP:-localhost}
+      - GF_SERVER_ENFORCE_DOMAIN=false
+      - GF_SECURITY_COOKIE_PATH=/
+      - GF_AUTH_TOKEN_ROTATION_INTERVAL_MINUTES=1440
+      - GF_AUTH_LOGIN_MAXIMUM_INACTIVE_LIFETIME_DAYS=30
+      - GF_AUTH_LOGIN_MAXIMUM_LIFETIME_DAYS=30
+      - GF_SECURITY_ADMIN_USER=admin
+      - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-GArr@y2050}
+      - GF_INSTALL_PLUGINS=grafana-opensearch-datasource
+      - DS_POSTGRES_USER=${POSTGRES_USER:-postgres}
+      - DS_POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-Arr@y2050}
+      - DS_AMP_DB_NAME=${AMP_DB_NAME:-amp_ts}
+      - DS_AMP_DB_USER=${AMP_DB_USER:-amp_ts_user}
+      - DS_AMP_DB_PASSWORD=${AMP_DB_PASSWORD:-Array@123$}
+      - DS_OS_USER=admin
+      - DS_OS_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD:-Arr@y2050}
+      - DS_OS_URL=https://${AMP_DOMAIN_OR_IP}:9200
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    configs:
+      - source: grafana_datasources
+        target: /etc/grafana/provisioning/datasources/datasources.yaml
+    volumes:
+      - /dev/null:/dev/null # Placeholder to keep yaml valid if empty
+    networks:
+      - amp-overlay
+    ports:
+      - target: 3000
+        published: ${AMP_GRAFANA_PORT:-3000}
+        mode: host
+        protocol: tcp
+
+  telegraf:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/telegraf:latest
+    user: root
+    security_opt:
+      - label=disable
+    deploy:
+      mode: global # Run on EVERY node to monitor docker
+    entrypoint: ["telegraf"]
+    environment:
+      PG_PASSWORD_FILE: /run/secrets/pg_password
+    secrets:
+      - pg_password
+    configs:
+      - source: telegraf_main_conf
+        target: /etc/telegraf/telegraf.conf
+      - source: telegraf_ag_conf
+        target: /etc/telegraf/telegraf.d/ag.toml
+      - source: telegraf_apv_conf
+        target: /etc/telegraf/telegraf.d/apv.toml
+      - source: telegraf_asf_conf
+        target: /etc/telegraf/telegraf.d/asf.toml
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+      - /dev:/dev:ro
+      - /proc:/rootfs/proc:ro
+      - /sys:/rootfs/sys:ro
+      - /etc:/rootfs/etc:ro
+    networks:
+      - hostnet
+
+  logstash:
+    image: ${REGISTRY:-127.0.0.1:5000}/amp/logstash:latest
+    deploy:
+      replicas: ${AMP_REPLICAS:-2}
+    ports:
+      - "514:5514/tcp"
+      - "514:5514/udp"
+    environment:
+      OPENSEARCH_URL: https://${AMP_DOMAIN_OR_IP:-localhost}:9200 # Clean URL for Logstash
+      POSTGRES_PASSWORD_FILE: /run/secrets/pg_password
+      POSTGRES_USER: postgres
+      POSTGRES_DB: cm
+      DB_HOST: ${AMP_DB_JDBC_HOSTS} # For JDBC Filter (Multi-host string)
+    command: >
+      bash -c "export OPENSEARCH_INITIAL_ADMIN_PASSWORD=\$$(cat /run/secrets/opensearch_initial_admin_password) && /usr/share/logstash/bin/logstash"
+
+    secrets:
+      - opensearch_initial_admin_password
+      - pg_password
+    configs:
+      - source: logstash_pipeline_conf
+        target: /usr/share/logstash/pipeline/syslog.conf
+      - source: logstash_config_yml
+        target: /usr/share/logstash/config/logstash.yml
+      - source: cert_root-ca_pem
+        target: /usr/share/logstash/config/certs/root-ca.pem
+      - source: cert_node_pem
+        target: /usr/share/logstash/config/certs/node.pem
+      - source: cert_node-key_pem
+        target: /usr/share/logstash/config/certs/node-key.pem
+      - source: cert_admin_pem
+        target: /usr/share/logstash/config/certs/admin.pem
+    volumes:
+      - ./services/logstash/drivers:/usr/share/logstash/drivers:ro
+    networks:
+      - amp-overlay
+
+volumes:
+  opensearch-data:
+  timescaledb-data:
+  etcd-data:
+  grafana-data:
+  certs-vol:
+    external: true
+  security-config-vol:
+    external: true
+  opensearch-logs:
