Bug 968 - Data Disk is getting full ||AS-14689||SBI CRM

Review Request #397 — Created Aug. 7, 2024 and submitted

mmiriam
APV10
rel_apv_10_7
968
kdutta, pradeep, prajesh

python.log getting filled with "epoll wait error: No such file or directory" prints

[root@AN test]# ps -ef | grep dnad
root 8366 837 0 03:51 ? 00:00:00 /ca/bin/dnad
root 12973 10895 0 04:02 pts/0 00:00:00 grep --color=auto dnad
[root@AN test]# ls -l /proc/8366/fd
total 0
lr-x------. 1 root root 64 Aug 7 03:57 0 -> /dev/null
lrwx------. 1 root root 64 Aug 7 03:57 1 -> /tmp/ca_pipe_dnad
lrwx------. 1 root root 64 Aug 7 03:57 2 -> /tmp/ca_pipe_dnad
lrwx------. 1 root root 64 Aug 7 03:57 3 -> anon_inode:[eventpoll]
lrwx------. 1 root root 64 Aug 7 03:57 6 -> /tmp/ca_pipe_dnad

Try to close epoll fd of dnad process
[root@AN test]# gdb -p 8366
(gdb) call (int)close(3)
$1 = 0
(gdb) quit
A debugging session is active.

    Inferior 1 [process 8366] will be detached.

Quit anyway? (y or n) y
Detaching from program: /ca/bin/dnad, process 8366

=================================================================
Observation in Buggy build:
Continuious logs exhausting the memory until you kill dnad process manually

[root@AN test]# tail -f /var/crash/ca_log/dnad.log
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait error: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:13:epoll wait e
rror: Bad file descriptor
GMT Wed 07 Aug 2024 01:23:24:epoll wait error: Bad file descriptor

=======================================
Observation in fixed build:
New process 14643 restarted for dnad
[root@AN test]# ps -ef | grep dnad
root 14542 13838 0 04:09 pts/2 00:00:00 tail -f /var/crash/ca_log/dnad.log
root 14643 837 0 04:09 ? 00:00:00 /ca/bin/dnad
root 14663 10895 0 04:09 pts/0 00:00:00 grep --color=auto dnad

Print the log in /var/crash/ca_log/dnad.log only one time and restart the dnad process

[root@AN test]# tail -f /var/crash/ca_log/dnad.log
GMT Wed 07 Aug 2024 03:29:35:Begin to write log......
CST Wed 07 Aug 2024 11:51:21:Begin to write log......
GMT Wed 07 Aug 2024 04:09:14:epoll wait error: Interrupted system call

Description From Last Updated

can you please attach proper UT here.

kduttakdutta

if we "break". process exists right? does the watchdog reschedule it ? If the process starts again it may exit …

pradeeppradeep

We should not remove this clock_t start_time, current_time; double elapsed_time; double interval = 1.0; start_time = clock(); current_time = clock(); …

kduttakdutta
prajesh
  1. Ship It!
  2. 
      
kdutta
  1. 
      
  2. can you please attach proper UT here.

  3. 
      
mmiriam
pradeep
  1. 
      
  2. if we "break". process exists right? does the watchdog reschedule it ?
    If the process starts again it may exit again repeatadly?
    If not printing "epoll wait error" can be fix?

    1. Yes, after break process will restart and yes there can be chances that the fd can be closed repeatadly by any reason so removing the printf statement

  3. 
      
mmiriam
kdutta
  1. 
      
  2. We should not remove this

    clock_t start_time, current_time;
    double elapsed_time;
    double interval = 1.0;
    
    start_time = clock();
    
    current_time = clock(); 
        elapsed_time = (double)(current_time - start_time) / CLOCKS_PER_SEC;
    
    if (elapsed_time >= interval) {
            printf("epoll wait error: %s\n", strerror(errno));
            start_time = clock(); 
            break; // If needed not sure though
    }
    
    1. can we add something like this.

    2. As per offline discussion, we will add the printf back.

  3. 
      
mmiriam
pradeep
  1. Ship It!
  2. 
      
mmiriam
Review request changed

Status: Closed (submitted)

Loading...