Replacing failed NetApp drive

I know this isn’t a study guide post.  I promise to continue as I get time but this one is a short and sweet one for future reference to myself and hopefully anyone else who needs it.

Each morning I wake up and before I get started with my day I tend to check my email on my phone.  Typically it’s filled with event messages about jobs that ran over the night or the occasional disk space warning message.  However I was greeted with a nice AutoSupport message to the tune of:


That sucks.  This filer resides at my other datacenter.  A quick login to the console and this is what I find:

 FILER1> vol status -f
Broken disks
RAID Disk             Device                  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
--------- ------      ------------- ---- ---- ---- ----- --------------    --------------
not responding 0a.10.7 0a    10  7   SA:A   0   SAS 10000 560000/1146880000 572325/1172123568

Looks like a failed drive.  Double check and make sure that the spare has taken over and is reconstructing:

FILER1> sysconfig -r
data      0a.10.3 0a    10  3   SA:A   0   SAS 10000 560000/1146880000 572325/1172123568 (reconstruction 8% completed)

Looks like the filer is doing everything it’s supposed to do just fine.  Since there’s nothing I can really do, I notify our sysadmin in that office and I go ahead with my morning routine and head into the office.  The beautiful thing about AutoSupport is that it goes ahead and creates a ticket with NetApp support and I just wait for the phone call from the technician concerning my 4-hour response replacement.

When I arrived at the office, our sysadmin tells me that he can’t locate the broken drive as it’s not blinking in the chassis.  This seems strange.

This is easily fixable.  From the CLI there’s an option to blink the LED on any drive in the array.  Since we know that 0a.10.7 is the failed drive, I go ahead and set the drive LED to blink for our sysadmin so he’s completely sure he’s replacing the correct drive.

 FILER1> priv set advanced
Warning: These advanced commands are potentially dangerous; use
them only when directed to do so by NetApp

FILER1*> blink_on 0a.10.7
<drive is now blinking and is then replaced by sysadmin>
FILER1*> Mon Apr  8 12:03:00 EDT [FILER1:monitor.globalStatus.ok:info]: The system's global status is normal.
FILER1*> blink_off 0a.10.7
FILER1*> priv set
FILER1> disk show –n

DISK       OWNER                      POOL   SERIAL NUMBER         HOME
------------ -------------              -----  -------------         -------------
0a.10.7      Not Owned                  NONE   PPWSDPRD

FILER1> disk assign 0a.10.7
Mon Apr  8 12:04:57 EDT [FILER1:diskown.changingOwner:info]: changing ownership for disk 0a.10.7 (S/N PPWSDPRD) from unowned (ID 4294967295) to FILER1 (ID XXXXXXXXXX)

And that takes care of that.  A pretty easy thing to fix, especially if you’re not on-site and you have to direct someone on which drive to change out remotely.