Bug 928: Fix for Nic Flap issue on apv

Review Request #387 — Created Aug. 5, 2024 and updated

satyendra
APV10
pradeep, prajesh, roland, tanya, wli

RCA : Mostly nic flap when burst of traffic comes .The NIC reset is initated by dpdk and it fires the command as the VF, There is no mechanism from PF to VF singal that NIC reset is Succesful .The VF waits for certain time and then start to fire command to PF and if that command is not execuated , it will go in pending state and will become deadlock situation .

Fix. Added VF reset status check facility to fire command after nic reset is done to avoid race condition .

Unit testing is done

Description From Last Updated

do we need sleep ?

prajeshprajesh

this function logic is same as i40evf_reset_vf(). We are just waiting for some more time in this function. Instead of …

prajeshprajesh

VF reset stil in progress.

prajeshprajesh

change the log message to "VF reset stil in progress." Also additionally, add function name in the log to correlate …

prajeshprajesh

this is already checking reset here. Should we call i40evf_check_vf_reset_done() after this call ?

prajeshprajesh
prajesh
  1. 
      
    1. For more clearity .Based on my understanding DPDK is running mutiple threads and reset can be initiated by any threads .

    1. Yes because admin queue is seeing pending command from VF side and due to that it can't fire another command .goes on race condition .

  2. 
      
prajesh
  1. 
      
  2. this function logic is same as i40evf_reset_vf(). We are just waiting for some more time in this function. Instead of adding a new function, can we change MAX_RESET_WAIT_CNT to a bugger valuse? However, I still dont understand how this is going to solve the problem.

    1. Before initated admin queue , I added check to verify that reset is completed or not .Admin queue is responsible for the passing command to PF after reset .
      If reset is going on and VF fire command and then it goes deadlock .This fix avoid another reset as well as delalock on commmand.

      Its good idea to increase MAx_REST_WAIT_CNT .i will raise review again with bigger value

  3. 
      
prajesh
  1. 
      
    1. The sleep is needed to overcome the lag between the reset of nic . So that the queue properly get cleared before being used.

  2. change the log message to "VF reset stil in progress."

    Also additionally, add function name in the log to correlate the logs better.

    1. Will do it . raise review again

  3. this is already checking reset here. Should we call i40evf_check_vf_reset_done() after this call ?

    1. If you referring here about i40evf_reset_vf(hw) . This function signal the to reset the h/w to PF . To check from VF that reset is done on H/W , We need anothe call to from VF . that is done by i40evf_check_vf_reset_done(hw)

  4. 
      
prajesh
  1. Fix the log message and then ship it.

  2. 
      
Loading...