<-- Please click if you found this site useful ;-)
NetApp cluster-mode
cDOT 101
-
With OnTap 9.x, in cluster mode, view is really divided between HW and VM. Think much more like VM and host of VMware.
OnTap 9.x cluster mode now separate
Data (File) Access (done by Storage Virtual Machine - SVM)
vs physical setup (nodes)
-
maybe cDOT backported to 8.3? There are support of Storage VM in there.
or is it that in 9.x, 7-mode has been dropped and must use cDOT?
-
cDOT can run on a single head system.
It is just that the adoption of Storage VM wrap things around in extra layer.
It is POSSIBLE to reuse storage shelf of 7-mode data and upgrade to cDOT mode, 7MTT will update ONLY the metadata. data and WAFL structure are preserved, so snapshot, storage saving, etc are preserved.
-
Not clear if 7-mode to cDOT can do in-place upgrade on the same head, anectodally yes.
document TR-4025.pdf only talk about attaching to new head. maybe just want to sale!
-
snapvault has a 7mode (qtree based) and cluster mode (volume based), and are not compatible.
Thus, maybe forced to upgrade to a cDOT to use snapvault compatible with target system.
-
qtree is still supported by mostly for legacy migration, little need for it when things all contained inside an SVM.
- cDOT OnTap 9.x ? Exports are done via policy. but apparently each export can only have 1 policy. it seems to create an extra layer and not getting much in return. maybe same client list can be placed in single export whence be used in multiple exports, since each qtree have its own exports
- export policy at SVM is hierarchical. top level / root export need to open wide, then vol-level export a bit narrower, then qtree-level export with the exact specific. ie hierarchy of subsets need to be unioned to higher lever export policy. PITA. Furthermore, the policy is per svm, though there is a copy command
cDOT New Commads
version # show ontap version eg 9.3P8
system node show -node crfs4-prodcl-01 # get serial number, usable to open support case
system chassis show # some chassis serial that can *NOT* be used to open support case
show-serial-number # some other serial that can *NOT* be used to open support case
show-summary # simple stats
cDOT cluster-wide commands
cluster statistics show # CPU, NFS usage
cluster peer show # DR peer info
cluster peer ping # see if peer is pingable
event log show
# protocol options apply to whole cluster? Don't seems to be restricted to specific SVM...
nfs option nfs.mount_rootonly on # on is default, reject mount req from non reserved ports
nfs option nfs.nfs_rootonly on # off is default, which allow NFS request from non-reserved port (nfs mount tunneled by sshutle would come in as non priviledged port, thus need to turn this on.
security login show # list user who can login to the cluster, their role, login "application" (ssh,console,http)
security login delete -user-or-group-name red8 # should delete user
security login delete -user-or-group-name red8 -application telnet # remove ability to login via telnet
cluster image show # boot image, uploaded image for OS upgrade
cluster image package show-repository # applicatin packages... netapp app store??
cDOT commands on physical node
Commands on hw node, which may contain multiple SVM
system node run -node local sysconfig -a # sysconfig -a (across multiple SVM)
node run -node NodeName -command sysstat -c 100 -x 5 # systat 100 times, at every 5 sec interval
system node reboot -node NodeName -reason Msg # HA policy may bring it up elsewhere?
system node rename -node OldName -newname NewNodeName
storage failover show
storage failover takeover -bynode NodeName
storage failover giveback -bynode NodeName
# aggregates
aggr show -node NodeName -fields ha-policy
aggr show -space
aggr relocation show
cDOT commands creating volume/qtree
# 2023.0215 Tin
volume create -volume gretina_bk -aggregate fs4_01_aggr1_4tb -size 200G -state online -policy nfsclients -unix-permissions ---rwxr-x--- -type RW
volume modify -volume gretina_bk -policy nfsclients -security-style unix
vol mount -volume gretina_bk -junction-path /gretina_bk
qtree create -volume gretina_bk -qtree gretina -security-style unix -oplock-mode enable -unix-permissions ---rwxr-x---
qtree create -volume gretina_bk -qtree wwulf -security-style unix -oplock-mode enable -unix-permissions ---rwxr-x---
cDOT commands for managing exports
Note that qtree can have their own export-policy.
export policy has a number of rules, order matter, can setindex to renumber them, overall, the whole thing is a mess from a cli perspective.
export-policy used to be within vserer, but the command can now be run without prefixing with server.
vserver show
vserver show -vserver als-svm
vserver setup
vserver create -vserver SVMname -rootvolume VolName # create new SVM
vserver nfs show # which protocol is enabled (3,4,4.1, TCP,UDP)
vserver nfs show -vserver nas1 -fields v4-numeric-ids # see whether numeric id is used instead of string for UID/GID
# make netapp respond with numeric UID for client that cannot understand string,
# this would generally make things easier (eg for transition period) [TR-4072 page 46]
volume show -volume devel -instance # see what volume export
volume show -volume prod -fields
volume show -fields policy,junction-path -vserver svm1 # see which volume has what export policy, filtered by specified SVM
volume show -fields policy,junction-path,junction-parent
qtree show -fields qtree-path,export-policy # qtree and exports
qtree show -fields qtree-path,export-policy -vserver svm1 # same as above, filter by specific svm, but vserver field will still be shown.
# some fields such as vserver and volume will always be shown even when not requested.
# these cli are so clunky, it is API and GUI oriented, not sys admin oriented :(
vserver export-policy show
export-policy show # same as above, don't need the vserver declaration
export-policy rule show -vserver als-filercl -policyname export_perf_test
export-policy rule modify -policyname export_perf_test -ruleindex ...
export-policy rule create -policyname vm_backup-export -clientmatch 131.243.73.0/24 -rorule sys -rwrule sys -vserver als-filercl -ruleindex 14 -protocol nfs
export-policy rule create -policyname transition_exportpolicy -clientmatch 10.0.1.169 -rorule sys -rwrule sys -allow-suid true -allow-dev true -superuser sys -vserver fs3-filer -ruleindex 10 -protocol nfs
### rule number are applied sequentially, first match get access applied.
### thus if later rule have more access (eg root sys access), it may not get applied.
### while gap in rule number is ok, so could just add ruleindex with arbitrary large index number :P
### so default catch all access should be assigned a large rule index
### so fine tune at higher number is allowed.
#### changing rule index number (in GUI would be re-arranging order of rules)
export-policy rule setindex -vserver fs3-filer -policyname transition_exportpolicy -ruleindex 10 -newruleindex 20
## check detail of specific rule index
export-policy rule show -vserver fs3-filer -policyname transition_exportpolicy_0 -ruleindex 10
## verify a specific client have access to a volume:
vserver export-policy check-access -vserver svmA -client-ip 10.3.33.89 -volume data -authentication-method sys -protocol nfs3 -access-type read-write
## export-policy is per svm, can duplicate policy via copy:
export-policy copy -vserver svm0 -policyname 10_8_x_x -newvserver svm1 -newpolicyname 10_8_x_x
snapmirror show # snapmirror status - approx from 8.x
#### qtree add their twist to the export-policy
#### if NOT inherited, qtree have their own policy, which need to be adjusted independetly
volume qtree show
volume qtree show -fields export-policy
qtree show -fields export-policy,security-style,unix-permissions,qtree-path,oplock-mode,is-export-policy-inherited
vserver nfs show -vserver svm1 -fields showmount
vserver nfs modify -vserver svm1 -showmount disabled # showmount -e reports exports open to everyone in OnTap 9.x this disable responding to showmount -e
# show if require use of priviledge port (below 1024)
vserver nfs show -vserver axiom88 -fields nfs-rootonly # def disabled. sshuttle mitigation need to enable this.
vserver nfs show -vserver axiom88 -fields mount-rootonly # def enabled
# to actually make changes:
vserve nfs modify axiom88 -nfs-rootonly enable
cDOT commands for managing snapshots
snapshot policy show
There are several snapshot policies:
* default-1weekly # smaller snapshot than default, only 1 weekly
* default # all volume use this automatically, keep 2 weekly
* none # apply this policy to volume where snapshot should be disabled
volume show -vserver axiom88 -fields snapshot-policy # list what policy exist for all volumes of a given vserver
volume modify -volume scratch -snapshot-policy none
delete snapshot:
volume snapshot delete -vserver svm_name -volume vol_name -snapshot snapshot_name
ref: https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/snapshots-ontap.html#snapshot-policies (no paywall!)
deleting snapshot is kinda dangerous. changing it expiration time maybe sufficient...
snapshot modify -vserver svm1 -volume scratch -snapshot daily.2023-02-17_0010 -expiry-time 02/18/2023 12:00:00
~~~
# change snapshot expiration time:
snapshot modify -vserver fs4svm1 -expiry-time 08/20/2024 13:00:00 -volume data2b -snapshot weekly.2024-08-18_0015
# check expiration time:
snapshot show -vserver fs4svm1 -volume data2a -snapshot weekly.2024-08-18_0015 -expiry-time *
snapshot show -fields expiry-time
snapshot show -fields expiry-time -vserver fs4svm1
snapshot show -fields expiry-time -vserver fs4svm1 -volume data2a
snapshot show -fields expiry-time,size -vserver fs4svm1 -volume data2a
snapshot show -fields expiry-time,size,state -vserver fs4svm1 -volume data2a
# -fields doesn't show up in tab completion, so annoying!
# expired snapshot may not get deleted. it is mostly a lock to prevent accidental deletion??
snapshot list
setting new custom policy for use with volume
blog to setup custom schedule
https://www.storagefreak.net/2017/07/netapp-cdot-how-to-create-a-snapshot-schedule
# first, check out existing policy in use
volume show -vserver fs4svm1 -fields snapshot-policy
# show build-in snapshot schedule time.
job schedule cron show
# create a 4-hourly to take place 3 min after the marked hours
job schedule cron create -name 4-hourly -minute 3 -hour 0,4,8,12,16,20
# eg create a custom scheuld that is 4 hours apart, keep 6 of these "hourly" snapshot,
# and 4 daily
# NOTE: no weekly! (or monthly)
# policy name is every4hr-policy
# apply to data2a volume
snapshot policy create -vserver fs4svm1 -policy every4hr_policy \
-schedule1 daily -count1 4 \
-schedule2 4-hourly -count2 6 -enabled true
# check policy
snapshot policy show -vserver fs4svm1 -policy every4hr_policy
# change existing volume to use this new policy
volume modify -volume data2a -snapshot-policy every4hr_policy
# in addition to the 6 snapshots 4-hours apart at fixed time,
# also have hourly backup, kept for the rotating last 4 hours
snapshot policy add-schedule -vserver fs4svm1 -policy every4hr_policy -schedule hourly -count 4
# also harmonize that the hourly and 4hour are taken at 5 min after the hour mark:
job schedule cron modify -name 4-hourly -minute 5 -hour 0,4,8,12,16,20
cDOT commands on VM
Commands contained inside an SVM
snapmirror show # snapmirror status - approx from 8.x
system node run node NODENAME -command sysstat -x # cpu and disk usage
# cDOT, need to start stat collection before being able to see any counters:
statistics start -node NODENME -object OBJNAME
statistics show object OBJ -instance INST -counter [TAB]
[TAB] is friend to see list
NTP
ntp server create -server 10.108.17.18 # use internal server
ntp server create -server 2.pool.ntp.org # use public NTP pool
ntp server show
ntp server show -server 2.pool.ntp.org # should see list of connected ntp server in use
# check these if ntp problems:
route show -vserver SVM_NAME
firewall policy show -vserver SVM_NAME -policy mgmt ntp
Network
LIF are assigned to SVM.
net int show # show LIF for all nodes/SVM
vlan show # vlan and tagging info
ifgrp show # LACP/LAG config on each node, mac address
net ipspace show # brief view of IP networks used in cluster
net port broadcast-domain show # MTU, subnet size
system service-processor show # show IP assigned for service processor (RLM port)
firewall policy show # data/mgnt/interconnect dns/ndmp/dns/http/ntp/snmp/ssh access policy
ping -vserver ... -lif ... # even ping is a convoluted command
DR snapmirror traffic config on a dedicated private vlan.
network interface create -vserver crfs4-prodcl -lif crfs4-prodcl01-icl1 -role intercluster -home-node crfs4-prodcl-01 -home-port a0b -address 172.27.84.60 -netmask-length 24 -status-admin up -failover-policy disabled
network ping -lif crfs4-prodcl-01_icl1 -vserver crfs4-prodcl -destination 172.27.84.59
set adv
cluster ring show
set
Arista switch config for LACP/LAG
channel group NUMBER mode on # "hard coded" mode, switch to switch tend to use this.
channel group NUMBER mode active # auto negotiation mode via daemon for LAGP, cisco "etherchannel", netapp and old sgi need this mode
cDOT clustered-mode backup config
Ref: https://www.sysadmintutorials.com/tutorials/netapp/netapp-clustered-ontap/netapp-clustered-ontap-cli/
system configuration backup create -backup-name node1-backup -node node1 (Create a cluster backup from node1)
system configuration backup create -backup-name node1-backup -node node1 -backup-type node (Create a node backup of node1)
system configuration backup settings modify -destination ftp://192.168.1.10/ -username backups (Sets scheduled backups to go to this destination URL)
system configuration backup settings set-password # (Set the backup password for the destination URL above)
Shutdown / Startup
cDOT stop snapmirror + shutdown
snapmirror quiesce -destination-path als-filercl:*
# then can stop nodes here like:
halt -node node01 -inhibit-takeover true -skip-lif-migration-before-shutdown true -reason test
snapmirror resume ...
Cluster mode shutdown
# If on a 2-node cluster, run the following:
cluster ha modify -configured false
storage failover modify -node * -enabled false
# If on a 4+-node cluster, run the following:
storage failover modify -node * -enabled false
# Log in to each/every nodes, one at a time (preferably using serial console or RLM/SP) and run:
halt local -inhibit-takeover true
# Ensure all nodes are down before halting the last node:
system node show
# Power off from Head:
system power off
Cluster mode startup
LOADER} boot_ontap # (will boot node)
# above command maybe if head is in standby and so need to issue command to actually boot, but should be automatic if power is restored?
# Re-enable HA cluster:
cluster ha modify -configured true
storage failover modify -node * -enabled true
HW health check for cDOT
system health alerts show -instance
system health status show
system health subsystem show
system chassis fru show
system controller environment show
system controller environment show -node Nodename -fru-name "PSU4 FRU" -instance # iterate for PSU1 .. PSU4
system node run -node * -command storage show fault -v
storage shelf show -instance
DR Setup with SVM (OnTap 9.3)
# create SVM in the DR cluster to hold the SVM fail over:
dr:: vserver create -vserver svm1-dr -subtype dp-destination
# check and sync all snapmirror jobs so that in event of DR activate, all schedules are consistent:
dr:: job schedule cron show
dr:: job schedule cron create -name transition_snapshot_schedule_8 -hour 8,12,16,20 -minute 0
dr:: job schedule cron create -name transition_snapshot_schedule_9 -hour 0 -minute 0
# create a peering request to the production SVM:
dr:: vserver peer create -vserver svm1-dr -peer-vserver svm1 -applications snapmirror -peer-cluster prodcl
dr:: vserver show
dr:: vserver peer show
# accept the request, establishing a peering relationship with the DR SVM:
prod:: vserver peer show
prod:: vserverpeer accept -vserver svm1 -peer-vserver svm1-dr
# mirror the SVM config
dr:: snapmirror create -source-path svm1: -destination-path svm1-dr: -type DP -throttle unlimited -policy DPDefault -schedule hourly -identity-preserve true
dr:: snapmirror initialize -destination-path svm1-dr:
dr:: snapmirror show -destination-path svm-dr:
Deployment checklist
- sshuttle mitigation: restrict to reserved port (nfs-rootonly)
- security login show
- firewall policy show
- exports-policy show
Cluster-Mode Ref
- OnTap 7 to 9 command map
(cache)
- Gist on SVM root volume data protection
- Pocket Guide Clustered Mode from sysadmin tutorials
- TR-3580: NFSv4 Best Practices Guide
(cache)
New NetApp cDOT planning questionaire
Each Netapp is an HA pair of 2 nodes.
Each node has 4-6 network interfaces,
for which LAG(s) (NetApp Interface Group) are typically created for data and management traffic to flow.
There is also a service port that would best be connected to IPMI network.
Each node should be treated like a VM host, and as many VM carved on them as desired.
- do we segregate backup traffic vs nfs data traffic to different vlan?
- backup traffic may need to leave the cluster to external site
- do we segregate management traffic vs nfs data traffic
(eg CLI/GUI to manage node and VM could be on separate VLAN,
not actually recommended by netapp, but see next point)
- netapp has "call home" feature, where it sends weekly report and autosupport via http.
This go out via management interface and well, would need to get to internet if we want to use this feature.
So, management interface may need to reside in the "internet services VLAN"?
Which VLAN should the node perform NFS mounts against the netapp(s)?
- warewulf provision network would be on RJ45 1Gbps NIC.
- the alternate high speed NIC that could be 10/100 Gbps?
- DTN traffic in a separate vlan? or use the Internet Services vlan?
Netapp suports VLAN tagging. The number of LAGs and number of ports they use probably depends on how many VLAN we need the system to connect to?
- MTU size ?
New NetApp cDOT bare metal setup
e0M is an rj45 port with a switch behind it.
- 1 IP would be for the node, for management (which eventually will get a VIP to provide HA across)
- 1 IP for the service processor (RLM), which provide serial over LAN.
Interfaces needing IP addresses:
- cluster_mgnt # VIP that fail over between nodes
- head1_mgnt # for use with ssh or https://node1_mgnt switched port behind e0M typical
- head2_mgnt # for use with ssh or https://node2_mgnt
- head1_sp # service processor, but share cable with head1_mgnt, so on same IP subnet unless vlan tag used
- head2_sp
cable connectivity
no external shell, no cluster switch:
head1 head2
sas-a --- sas-b # crossed over for sas loop
sas-b --- sas-a
e0a --- e0a # matched for ethernet
e0b --- e0b # think of this as 2 separate ethernet islands
if have additional shell:
sas will daisy chain up to the last shelf, then loop back, crossed over, to the other head.
(complete basic node setup first, like assign IP to the server (mgnt on the main interface: e0M)
network interface show
cluster show
cluster setup # prompted interactive Q+A
run cluster setup again on 2nd node, enter ip on e0a of primary node to join
(could have run `cluster join` ?)
Change node name, IP
# netapp default to clusterName-NN, starting with 01.
# not really good idea to rename it to start with 0,
# as many default cluster interconnect also have it as 01 and 02
# and changing those are not trivial, albeit probably never need to mess with it
system node rename -node sysbase-01 -newname sysbase0 # change hostname
system node show
net interface rename -vserver sysbase -lif sysbase-01_mgmt1 -newname sysbase0_mgnt # change name of LIF used by node
# change server's IP (ssh/http address, behind the switched nic e0M) ## probably best done when serialed in
net interface modify -vserver sysbase -lif sysbase0_mgnt -address 10.3.122.140
net interface show
# change RLM (Service processor) IP address:
system service-processor network modify -node sysbase0 -address-family IPv4 -enable true -dhcp none -ip-address 10.3.122.137 -netmask 255.255.255.224 -gateway 10.3.122.129
system service-processor show # to check result
clean up broadcast stuff
default is too chatty
broadcast-domain show
broadcast-domain remove-ports sysbase0:e0c
repeat for e0d e0e e0f and for sysbase1:e0c...
net port show # see which port has connected cable
net route show
# how to change default route?
# need to delete and create? was that for vserver/svm ?
# config/change syslog, ntp, dns?
# LAG need port group config on netapp
# LIF assigned to SVM
net interface create -vserver axiom -lif axiom-01_IC01 -role intercluster -home-node axiom-01 -home-port a1a -address 10.3.32.46 -netmask-length 22
net interface create -vserver axiom -lif axiom-02_IC01 -role intercluster -home-node axiom-02 -home-port a1a -address 10.3.32.47 -netmask-length 22
run -node sysbase0 sysconfig
++TBD: OneNote.
firewall
firewall policy modify -vserver sysbase -policy mgmt -service http -allow-list 10.3.7.0/24,131.243.220.0/22,131.243.130.0/24,10.3.10.10/32
firewall policy modify -vserver sysbase -policy mgmt-nfs -service http -allow-list 10.3.7.0/24,131.243.220.0/22,131.243.130.0/24,10.3.10.10/32
firewall policy modify -vserver sysbase -policy mgmt -service dns -allow-list 131.243.5.1/32,131.243.5.2/32,131.243.121.21/32,131.243.64.2/32,8.8.8.8/32,8.8.4.4/32
firewall policy modify -vserver sysbase -policy mgmt-nfs -service dns -allow-list 131.243.5.1/32,131.243.5.2/32,131.243.121.21/32,131.243.64.2/32,8.8.8.8/32,8.8.4.4/32
firewall policy modify -vserver sysbase -policy mgmt -service ntp -allow-list 131.243.64.12/24
firewall policy modify -vserver sysbase -policy mgmt -service ndmp -allow-list 10.3.7.87/32
firewall policy modify -vserver sysbase -policy mgmt -service snmp -allow-list 10.3.7.87/32
firewall policy modify -vserver sysbase -policy mgmt-nfs -service snmp -allow-list 10.3.7.87/32
# ditto ndmps
firewall policy modify -vserver sysbase -policy intercluster -service ndmp -allow-list 10.3.7.87/32
# ditto ndmps
# ssh? https?
firewall policy modify -vserver sysbase -policy data -service dns -allow-list 131.243.5.1/32,131.243.5.2/32,131.243.121.21/32,131.243.64.2/32,8.8.8.8/32,8.8.4.4/32
firewall policy modify -vserver sysbase -policy data -service ndmp -allow-list 10.3.7.87/32
firewall policy modify -vserver sysbase -policy data -service portmap -allow-list 10.3.7.87/32,10.3.32.58/32,10.3.32.56/32,10.3.10.10/32 # pending
# pending -policy intercluster -service https ndmp ndmps
# -vserver svm00 -policy mgmt-nfs -service ssh ... or delete them all ?
aggr initial config
The gui call aggregate "tier"
by default, 2 aggregates are created, with 50% of disks (after overhead) so that each head will own an aggregate with 50% of disks on them.
But for small config, disk spindles in single aggregate would provide better performance.
The head will not be bottleneck, so active/standby setup would suffice.
aggr0 is for OnTap system aggregate, don't mess with it, don't use name aggr0.
start with aggr1 for user aggregate..
rows 0 # no pause at end of screen
If GUI created aggregate not to the liking, destroy it as:
aggr offline -aggregate sysbase_01_NL_SAS_1
aggr delete -aggregate sysbase_01_NL_SAS_1
can't move disks between aggr, must remove and readd
disk option modify -node * -autoassign off
disk removeowner -data true -disk 1.0.0
disk assign -owner sysbase0 -data true -disk 1.0.0
# repeat for 1.0.2 .. 1.0.10
disk option modify -node * -autoassign on
# create 11 disk aggregate, with future max expansion to 23 disks
# raid_dp uses 2 parity disks and actually allow disk count range of 3..20 for fake sas disks
# raid_tec uses 3 parity disks
# -simulate is a dry run to see result
aggr create -aggregate aggr1 -node sysbase0 -raidtype raid_dp -maxraidsize 23 -diskcount 11 -simulate
aggr create -aggregate aggr1 -node sysbase0 -raidtype raid_tec -maxraidsize 23 -diskcount 11 -simulate
"options" in dDot
may of the options.* in 7-mode is now spread all over the places.
here are a few
vserver services nis-domain show
vserver services dns show
system node autosupport show
system node autosupport invoke -node fs3-02 -type test -message "manual trigger fs3 netapp autosupport test"
system node autosupport history show
ntp server show
snmp community
smtp?
system node autosupport modify -node nodename -transport {http|https|smtp}
system node autosupport modify -node nodename -to
mtu?
system node run -node fs3-01 -command sysconfig -r # disk info
system node run -node fs3-01 -command sysconfig -a
Example autosupport with smtp server specification:
system node autosupport modify -node fs3-01 -state enable -mail-hosts 35.8.33.79 -from bofh@bofh.com -to autosupport@netapp.com,bofh@bofh.com -support enable -transport smtp -proxy-url "" -hostname-subj autosupport-email-test -nht true -perf false -retry-interval 4m -retry-count 15 -reminder true -max-http-size 50MB -max-smtp-size 5MB -remove-private-data false
NetApp 7-mode
NetApp 101
https://netapp.myco.com/na_admin # web gui URL. Most feature avail there, including a console.
ssh -o PubkeyAuthentication=no -oKexAlgorithms=+diffie-hellman-group1-sha1 netapp.myco.com
get root mount of /vol/vol0/etc in a unix machint do to direct config on files.
NOW = NetApp Support Site
NetApp man pages ("mirror" by uwaterloo)
RAID-DP
IMHO Admin Notes
Notes about NetApp export, NFS and Windows CIFS ACL permission issues.
Best practices is for most (if not all) export points of NFS server is to
implement root_squash. root on
the nfs client is translated to user 'nobody' and would effectively have
the lowest access permission. This is done to reduce accidents of user
wiping out the whole NFS server content from their desktops.
Sometime NetApp NFS exports are actually on top of filesystem using windows NT ACL,
their file permission may show up as 777, but when it comes to accessing
the file, it will require authentication from the Windows server (PDC/BDC
or AD). Any user login name that does not have a match in
windows user DB will have permission denied problems.
Most unix client with automount can access nfs server thru /net.
However, admin should discourage the heavy reliance on /net. It is good
for occassional use.
/home/SHARE_NAME or other mount points should be
provided, such as /corp-eng and /corp-it. This is because mount path will
be more controllable, and also avoid older AIX bug of accessing /net when
accessing NFS mounted volumes, access them as user instead of root, which
get most priviledges squashed away.
If the FS is accessible by Windows and Unix, it is best to make share name
simple and keep them consistent. Some admin like to create
matching
\\net-app-svr1\share1 /net-app-svr1/share1
\\net-app-svr2\share2 /net-app-svr2/share2
I would recommend that in the unix side, that /net-app-svr1 be unified into a
single automount map called like /project . This would mean
all share names need to be uniq across all servers, but it help keep
transparency that allows for server migration w/o affecting user's work
behaviour.
Old Filer to New Filer Migration problems:
If copy files from Unix FS to Windows-style FS, there are likely going to
be pitfalls. NDMP would copy the files, and permissions and date would be
preserved, but ownership of the files may not be preserved. XCOPY from
DOS (or robocopy) may work a tad better in the sense that the files will
go thru the normal windows access of checking access and ownership
creation. Clear Case needed to run chown on the files that correspond to
the view, and not having the ownership preserved becomes a big problem.
Ultimately, User that run CC script for ownership change was made part of
the NetApp Local Admin Group. A more refined ACL would be safer.
Filer data migration:
NDMP is the quickest. One can even turn off NFS and CIFS access to ensure
no one is writting to the server anymore. NDMP is a different protocol
with its own access mechanism.
Mixed NFS and CIFS security mode:
Mix mode security (NT and Unix) is typically a real pain in the rear.
Migrating from NT a/o Unix to mix mode would mean filer has to fabricate
permissions, which may have unintenteded side effects.
Switch from mixed mode to either NT or Unix just drop the extra permission
info, thus some consultant say this is a safer step.
Clear Case and NetApp each point the other as recommending Mixed Mode
security. It maybe nighmare if really used. Unix mode worked flawlessly for
3+ years.
Different NetApp support/consultant says different things about mix mode,
but my own experience match this description:
Mix-Mode means the filer either store Unix or NTFS acl on a file by file basis.
If a given file (or dir) ACL is set on unix, it will get to have only Unix ACL on it.
If last set on NTFS, then it will get Windows ACL.
The dual mode options is not both stored, only one of the two is stored, and the rest
resolved in real time by the filer.
This has a nasty side effect that flipping security style from mixed mode to say NTFS,
some files permissions are left alone and even windows admin can't change/erase the files,
because they are not seen as root.
In short, avoid mix-mode like a plague!!
LVM
Layers:
Qtree, and/or subdirectories, export-able
|
Volume (TradVol, FlexVol), export-able, snapshot configured at this level.
|
agregate (OnTap 7.0 and up)
|
plex (relevant mostly in mirror conf)
|
raid group
|
disk
Disks - Physical hardware device :)
Spares are global, auto replace failed disk in any raid group.
Sys will pick correct size spare.
If no hot spare avail, filer run in degraded mode if disk fail, and
def shutdown after 24 hours! (options raid.timeout, in hours)
sysconfig -d # display all disk and some sort of id
sysconfig -r # contain info about usable and physical disk size
# as well as which raid group the disk belongs to
disk zero spare # zero all spare disk so they can be added quickly to a volume.
vol status -s # check whether spare disks are zeroed
web gui: Filer, Status
= display number of spares avail on system
web gui: Storage, Disk, Manage
= list of all disks, size, parity/data/spare/partner info,
which vol the disk is being used for.
(raid group info is omited)
Disk Naming:
.
2a.17 SCSI adaptor 2, disk scsi id 17
3b.97 SCSI adaptor 3, disk scsi id 97
a = the main channel, typically for filer normal use
b = secondary channel, typically hooked to partner's disk for takeover use only.
Raid group - a grouping of disks.
Should really have hot spare, or else degraded mode if disk fail, and shut
down in 24 hours by def (so can't tolerate weekend failure).
max raid group size:
raid4 raid-dp (def/max)
FC 8/14 16/28
SATA, R200 7/7 14/16
Some models are slightly diff than above.
Raid-DP?
2 parity disk per raid group instead of 1 in raid4.
If you are going to have a large volume/aggregate that spans 2 raid group (in
a single plex), then may as well use raid-dp.
Larger raid group size save storage by saving parity disk.
at expense of slightly less data safety in case of multi-disks failure.
Plex
- mirrored volume/aggregate have two plexes, one for each complete copy of the
data.
- raid4/raid_dp has only one plex, raid groups are "serialized".
aggregate - OnTap 7.0 addition, layer b/w volume and disk. With this, NA
recommend creating a huge aggregate that span all disks with
same RAID level, then carve out as many volume as desired.
Volume - traditional mgnt unit, called an "independent file system".
aka Traditional Volume, starting in OnTap 7.0
Made up of one ore more raid groups.
- disk(s) can be added to volume, default add to existing raid group
in the vol, but if it is maxed out, then it will create a new raid
group.
- vol size can be expanded , but no shrink, concat or split. (new flexvol can shrink)
- vol can be exported to another filer (foreign vol).
- small vol implies small raid group, therefore waste more space.
- max size = 250 GB recommended max vol size in 6.0. TB's by 7.0
vol status -v [vol0] # display status of all [or specific] volume,
# -v gives all details on volume options
vol lang vol0 # display [set] character set of a volume
vol status -r # display volume and raid status
sysconfig -r # same as vol status -r
vol create newvol 14 # create new vol w/ 14 disks
vol create newvol2 -t raid4 -r 14 6@136G
# vol size is 6 disks of 133 GB
# use raid4 (alt, use raid_dp)
# use raid group of 14 disks (def in cli),
# each raid group need a parity disk, so
# larger raid group save space (at expense of ??)
# 28 disks usable in raid_dp only?
vol add newvol2 3 # add 3 more disks to a volume called newvol2
vol options vol1 nosnap on # turn off snapshot on a vol
vol offline vol2
vol online vol2
FlexVol - OnTap 7.0 and up, resembles a TradVol, but build ontop of aggregate
- grow and srink as needed
vol size myvol +50g # flexvol, enlarge volume called myvol by size specified
vol size myvol +50g # flexvol, shrink volume called myvol by size specified
QTree - "Quota Tree", store security style config, oplock, disk space usage and file limits.
Multiple qtrees per volume. QTrees are not req, NA can hae simple/plain
subdir at the the "root level" in a vol, but such dir cannot be converted to qtree.
Any files/dirs not explicitly under any qtree will be placed in a
default/system QTree 0.
qtree create /vol/vol1/qtree1 # create a qtree under vol1
qtree security /vol/vol1/qtree1 unix # set unix security mode for the qtree
# could also be ntfs or mixed
qtree oplocks /vol/vol1/qtree1 enable # enable oplock (windows access can perform catching)
Config Approach
Aggregate:
Create largest aggregate, 1 per filer head is fine, unless need traditional vol.
Can create as many FlexVol as desired, since FlexVol can growth and srink as needed.
Max vol per aggregate = 100 ??
TradVol vs QTree?
- use fewer traditional volume when possible, since volume has parity disk overhead
- and space fragmentation problem.
- use QTree as size management unit.
FlexVol vs QTree?
- Use Volume for same "conceptual management unit"
- Use diff vol to separate production data vs test data
- QTree should still be created under the volume instead of simple plain subdirectories
at the "root" of the volume.
This way, quota can be turned on if just to monitor space usage.
- One FlexVol per Project is good. Start Vol small and expand as needed.
Strink as it dies off.
- Use QTree for different pieces of the same project.
- Depending on the backup approach, smaller volume may make backup easier.
Should try to limit volume to 3 TB or less.
Quotas
mount root dir of the netapp volume in a unix or windows machine.
vi (/) etc/quotas (in dos, use edit, not notepad!!)
then telnet to netapp server, issue command of quota resize vol1 .
quota on vol1
quota off vol0
quota report
quota resize # update/re-read quotas (per-vol)
# for user quota creation, may need to turn quota off,on for volume
# for changes to be parsed correctly.
Netapp quota support hard limit, threshold, and soft limit.
However, only hard limit return error to FS. The rest is largely useless,
quota command on linux is not functional :(
Best Practices:
Other than user home directory, probably don't want to enforce quota limits.
However, still good to turn on quota so that space utilization can be monitored.
/etc/quotas
## hard limit | thres |soft limit
##Quota Target type disk files| hold |disk file
##------------- ----- ---- ----- ----- ----- -----
* tree@/vol/vol0 - - - - - # monitor usage on all qtree in vol0
* tree@/vol/vol1 - - - - -
* tree@/vol/vol2 - - - - -
/vol/vol2/qtree1 tree 200111000k 75K - - - # enforce qtree quota, use kb is easier to compare on report
/vol/vol2/qtree2 tree - - 1000M - - # enable threshold notification for qtree (useless)
* user@/vol/vol2 - - - - - # provide usage based on file ownership, w/in specified volume
tinh user 50777000k - 5M 7M - # user quota, on ALL fs ?! may want to avoid
tinh user@/vol/vol2 10M - 5M 7M - # enforce user's quota w/in a specified volume
tinh user@/vol/vol2/qtree1 100M - - - - # enforce user's quota w/in a specified qtree
# exceptions for +/- space can be specified for given user/location
# 200111000k = 200 GB
# 50777000k = 50 GB
# they make output of quota report a bit easier to read
# * = default user/group/qtree
# - = placeholder, no limit enforced, just enable stats collection
Snapshot
Snapshots are configured at the volume level.
Thus, if different data need to have different snapshot characteristics, then
they should be in different volume rather than just being in different QTree.
WAFL automatically reserve 20% for snapshot use.
snap list vol1
snap create vol1 snapname # manual snapshots creation.
snap sched # print all snapshot schedules for all volumes
snap sched vol1 2 4 # scheduled snapshots for vol1: keep 2 weekly, 4 daily, 0 hourly snapshots
snap sched vol1 2 4 6 # same as above, but keep 6 hourly snapshots,
snap sched vol1 2 4 6@9,16,20 # same as above, specifying which 3 hourly snapshot to keep + last 3 hours
snap reserve vol1 # display the percentage of space that is reserved for snapshot (def=20%)
snap reserve vol1 30 # set 30% of volume space for snapshot
vol options vol1 nosnap on # turn off snapshot, it is for whole volume!
gotchas, as per netapp:
"There is no way to tell how much space will be freed by deleting a particular snapshot or group of snapshots."
DeDup A/SIS
Advance Single Instance Storage (ie DeDuplication).
DeDuplication finds duplicate data and collapse them into a single unit. NetApp A/SIS works on the block-level (4KB), and operates in the background for individual FlexVol (not usable on Traditional Volume).
Like snapshot that have inodes pointing to same block, SIS use the same tech to reduce storage need. "same" block are indexed by hash, and "sameness" is verified via a byte-by-byte comparison before re-org of the inode pointers to free space.
Performance impact:
File read just traverse thru a series of blocks in the i-node map. Random read is same. Sequential read may no longer be sequential, but large number of client request hardly makes read request really sequential anymore.
Unlike EMC NS-series (as of Celerra v5.6), NetApp's dedup does not bundle together with compression, so there is no
"re-hydration" time when accessing files (due to de-compression).
Write operations seems to take a real-time impact if SIS is turned on.
Once SIS is on (and started), all write generate fingerprint on the fly and the info written to the change log.
This calculation takes cpu power. Won't be impactful on system with less-than 50% load, but busy system can see degradation from 15% to 35% on FC disk.
Page 6 of TR-3505:
In real time, as additional data is written to the deduplicated volume, a fingerprint is created for each new block and written to a change log file. When deduplication is run subsequently, the change log is sorted and its sorted fingerprints are merged with those in the fingerprint file, and then the deduplication processing occurs.
Note that there are really two change log files, so that as deduplication is running and merging the new blocks from one change log file into the fingerprint file, new data that is being written to the flexible volume is causing fingerprints for these new blocks to be written to the second change log file. The roles of the two files are then reversed the next time that deduplication is run.
Page 15 of TR-3505:
If the load on a system is low—that is, for systems in which the CPU utilization is around 50% or lower—there is a negligible difference in performance when writing data to a deduplicated volume, and there is no noticeable impact on other applications running on the system. On heavily used systems, however, where the system is nearly saturated with the amount of load on it, the impact on write performance can be expected to be around 15% for most NetApp systems. The performance impact is more noticeable on higher-end systems than on lower-end systems. On the FAS6080 system, this performance impact can be as much as 35%. The higher degradation is usually experienced in association with random writes. Note that these numbers are for FC drives; if ATA drives are used in a system, the performance impact would be greater.
Real dedup workload (finding duplicate block) can be scheduled to run at night
or run on demand when sa knows filer is not busy.
SIS won't operate on block marked by a snapshot, so saving maybe low when sis is turned on, till old snapshot expires. It is recommended to run sis before taking snapshot.
sis on /vol/unixhome
sis start -s /vol/unixhome # run scan for the first time (generate fingerprint)
sis status # show status and progress of scan if running
df -s # report on saving by dedup
sis config # see when sis is scheduled to run
sis config -s auto /vol/home # use "auto" for when to rescan (when change amount is high)
# recommend enable on all volume to reduce concurrent scan at mid-nite.
sis off /vol/unixhome # disable dedup. stops fingerprint from being generated and written to change log
# presumably with just this, write perf degradation should stops.
sis undo /vol/unixhome # recreate dedup block, delete fingerprint db when done.
# use "priv set diag" to enter diag mode to run "undo".
On a really busy FS but has slow cycles once in a while, perhaps dedup can
result in no perf degradation yet save space:
- sis on FlexVol
- sis start -s FlexVol
- sis off
- (work)
- sis start ... (when system is idle)
- sis off (once scan is complete and busy working for user req again)
Ref: TR-3050:
NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide
NFS
(/) etc/export
is the file containing what is exported, and who can mount root fs as root. Unix NFS related only.
/vol/vol0 -access=sco-i:10.215.55.220,root=sco-i:10.215.55.220
/vol/vol0/50gig -access=alaska:siberia:root=alaska
Unlike most Unices, NetApp allow export of ancestors and descendants.
other options:
-sec=sys # unix security, ie use uid/gid to define access
# other options are kerberos-based.
Besides just having export for nfs and ahare for cifs,
there is another setting about fs security permission style, nfs, ntfs, or mixed.
this control characteristic of chmod and files ACL.
Once edit is done, ssh to netapp and issue cmd:
exportfs -a # re-add all exports as per new etc/export file
exportfs -u # unexport everything. Careful!
exportfs -u vol/vol1 # unexport vol1 (everything else remains intact)
exportfs -r # remove all exports that are no longer listed in etc/exports, maintain those that are still listed
# -r is NOT the same as -au!
exportfs -p opts path # -p for permanent export, ie, will add the entry into etc/exports, which help with creating export without manually editing a file.
The bug that Solaris and Linux NFS seems to exist on NetApp also.
Hosts listed in exports sometime need to be given by IP address, or an
explicit entry in the hosts file need to be setup. Somehow, sometime
the hostname does not get resolved thru DNS :(
maybe it is a dns-cache poisoning problem...
options nfs.per_client_stats.enable on
# enable the collection of detained nfs stat per client
options nfs.v3.enable on
options nfs.tcp.enable on
# enable NFS v3 and TCP for better performance.
Note that starting with OnTap ~9.x a new middle layer called export policy is needed to govern between exported volume/qtree and client list.
vserver export-policy show
vserver export-policy rule show -vserver als-filercl -policyname export_perf_test
vserver export-policy rule modify -policyname export_perf_test -ruleindex ...
volume qtree show -fields export-policy
nfsstat # display nfs sttistics, separte v2 and v3
nfsstat -z # zero the nfsstat counter
nfsstat -h # show detailed nfs statistics, several lines per client, since zero
nfsstat -l # show 1 line stat per client, since boot (non resetable stat)
Cuz of sshuttle, mac and linux may become part of the allowed network client :(
Mitigation to prevent NFS access thru sshuttle is to require access on priviledge ports only (since sshuttle still run as user space.)
priviledge ports: below 1024, ie up to 1023
7-mode
options nfs
nfs.mount_rootonly on
nfs.nfs_rootonly off
cDOT config:
nfs option nfs.mount_rootonly on # on is default, reject mount req from non reserved ports
nfs option nfs.nfs_rootonly on # off is default, which allow NFS request from non-reserved port (nfs mount tunneled by sshutle would come in as non priviledged port, thus need to turn this on.
nfs modify -vserver als-vFiler -mount-rootonly # not changed, (def is on/enable)
nfs modify -vserver als-vFiler -nfs-rootonly enable # this was changed
vserver nfs show -vserver als-vFiler -nfs-rootonly
vserver show -vserver als-filercl -instance
nfs show -vserver als-filercl
# see root mount and root nfs
# root nfs is now enabled...
# so if client still works, then i guess that's what we want
# hmm... i suppose stunnel make nfs req as non root.
# whereas linux nfs client is kernel calls thus root only...
# 7-mode stat counter
stats show nfsv3:nfs:nfsv3_read_ops
stats show nfsv3:nfs:nfsv3_write_ops
stats show nfsv3:nfs:nfsv3_read_latency
stats show nfsv3:nfs:nfsv3_write_latency
stats show nfsv3:nfs:nfsv3_read_size_histo
stats show nfsv3:nfs:nfsv3_write_size_histo
sysstat -x # cpu and disk util
cDOT, need to start stat collection:
statistics start -node NODENME -object OBJNAME
statistics show object OBJ -instance INST -counter [TAB]
[TAB] is friend to see list
system node run node -command sysstat -x
NIS domain
changing NIS domain. no reboot should be necessary
options nis.enable off
options nis.domain new.nis.dom
options nis.servers 10.0.91.44,10.0.91.82
options nis.enable on
cdot::
vserver services nis-domain show
vserver services dns show
CIFS
cifs disable # turn off CIFS service
cifs enable
cifs setup # configure domainname, wins. only work when cifs is off.
cifs testdc # check registration w/ Windows Domain Controller
cifs shares # display info about all shares
cifs shares -add sharename path -comment desc # create new share and give it some descriptive info
cifs shares -change shrname -forcegroup grpname # specify that all cifs user will use a forced unix group on Unix-style FS.
# this is for both read and write, so the mapping unix user need not be
# defined in this forcegroup in passwd or group map/file.
# the groupname is a string, not gid number, this name need to be resolvable
# from NIS, LDAP, or local group file.
cifs shares -change shrname -umask 002 # define umask to be used.
cifs access -delete wingrow Everyone
# by default, share is accessible to "everyone" (who is connected to the domain)
# above delete this default access
# Note that this is equiv to exports, not file level ACL
cifs access wingrow "authenticated users" "Full Control"
# make share usable by authenticated users only
cifs access it$ AD\Administrator "Full Control"
# make share "hidden" and only give access to admin
# (not sure if can use group "administrators")
cifs sessions ... # list current active cifs connections
options cifs.wins_servers
list what WINS server machine is using
ifconfig wins # enable WINS on the given interface
ifconfig -wins # disable WINS on the given interface
# WINS registration only happens when "cifs enable" is run.
# re-registration means stopping and starting cifs service.
# enabling or disabling wins on an interface will NOT cause re-registration
etc/lslgroups.cfg # list local group and membership SSID
# don't manually edit, use windows tool to update it!
wcc wafle cache control, oft use to check windows to unix mapping
-u uid/uname uname may be a UNIX account name or a numeric UID
-s sid/ntname ntname may be an NT account name or a numeric SID
SID has a long strings for domainname, then last 4-5 digits is the user.
All computer in the same domain will use the domain SID.
-x remove entries from WAFL cache
-a add entrie
-d display stats
cDOT:
vserver name-mapping show
options wafl.default_nt_user username
# set what nt user will be mapped to unix by def (blank)
options wafl.default_unix_user username
# set what unix username will be used when mapped to NT (def = pcuser)
user mapping b/w nt and unix, where user name are not the same.
It is stored in the (/) etc/usermap.cfg file.
NT acc unix acc username
Optionally, can have <= and => for single direction mapping instead of default both way.
eg:
tileg\Administrator root
tileg\fgutierrez frankg
tileg\vmaddipati venkat
tileg\thand thand2
tileg\thand thand1
tileg\kbhagavath krishnan
*\eric => allen
ad\administrator <= sunbox:root
nt4dom\pcuser <= tinh
This mapping will be done so that users will gain full permission of the files under both env.
a lot of time, they get nt account first, and thus end up with read only access to their
home dir in windows, as they are mapped as non owner.
< !-- -- >
usermap.cfg does get read by windows user writting to unix-style FS.
Be careful when doing M-1 mapping. While this may allow many unix user to use same NT account
to gain access to NF-style FS as part of "everyone", the reverse access would be problematic.
eg:
hybridautoAD\tho sa
hybridautoAD\tho tho
While unix sa and tho maps to same user on windows, when Windows tho login, and try to write
to UNIX-style FS, permission will assume that of unix user sa, will not be tho!!
It maybe possible to use <== and ==> to indicate direction of mapping ??
(??) another map does the reverse of windows mapping back to NFS when fs is NFS and access is from windows.
(or was it the same file?). It was pretty stupid in that it needed all users to be explicityly mapped.
NetApp Web Interface control the share access (akin to exports)
Windows Explorer file namager control each file ACL (akin to chmod on files).
Can use Windows Manager to manage NetApp, general user can connect and browse.
User list may not work too well.
CIFS Commands
cifs_setup # configure CIFS, require CIFS service to be restarted
# - register computer to windows domain controller
# - define WINS server
options cifs.wins_server # display which WINS server machine is using
# prior to OnTap 7.0.1, this is read only
cifs domaininfo # see DC info
cifs testdc # query DC to see if they are okay
cifs prefdc print # (display) which DC is used preferentially
WINS info from NetApp, login req:
http://now.netapp.com/Knowledgebase/solutionarea.asp?id=3.0.4321463.2683610
# etc/cifsconfig_setup.cfg
# generated by cifs_setup, command is used to start up CIFS at boot
# eg:
cifs setup -w 192.168.20.2 -w 192.168.30.2 -security unix -cp 437
# usermap.cfg
# one way mapping
*\lys => lks
NETAPP\administrator <= unixhost:root
# two way mapping
WINDOM\tinh tin
## these below are usually default, but sometime need to be explicitly set
## by some old NT DC config.
WINDOM\* == * # map all user of a specific domain
# *\* == * # map all user in all domains
Command
commads usable with 7-mode
Commands for NetApp CLI (logged in thru telnet/ssh/rsh)
? = help, cmd list
help cmd
dns info # display DNS domain,
# extracted from WINDOWS if not defined in resolve.conf
options dns.domainname # some /etc/rc script set domain here
sysconfig -v
sysconfig -a # display netapp hw system info, include serial number and product model number
sysconfig -c # check to ensure that there are no hardware misconfig problem, auto chk at boot
sysstat 1 # show stats on the server, refresh every 1 sec.
df -h
similar to unix df, -h for "human readable"
.snapshot should be subset of the actual volume
df -s report sis/dedup saving on a volume
ndmpd status
list active sessions
ndmpd killall
terminate all active ntmpd sessions.
Needed sometime when backup software is hung. kill ndmpd session to free it.
useradmin useradd UID
add new user (to telnet in for admin work)
useradmin userlist
list all users
options # list run time options.
options KEY VALUE # set specific options
#eg, autosupport with email:
options autosupport.mailhost mailhost.myco.com,mailhost2.myco.com
# comma list of up to 5 host (tried till one work?)
options autosupport.support.transport smtp
options autosupport.support.to autosupport@netapp.com
options autosupport.to tin.ho@e-ville.com,bofh@e-ville.com
# Change who receives notification emails.
options autosupport.doit case_number_or_name
# Generate an autosupport email to NetApp (to predefined users).
# autosupport via web (but then local admin don't get emaiL?)
options autosupport.support.transport https
options autosupport.support.proxy na-useh-proxy:2010
cdot::
system node autosupport show
system node autosupport invoke -node fs3-02 -type test -message "manual trigger fs3 netapp autosupport test"
system node autosupport history show
#find out about ntp config:
cat registry| grep timed
options.cf.timed.max_skew=
options.service.cf.timed.enable=off
options.service.timed.enable=on
options.timed.log=off
options.timed.max_skew=30m
options.timed.min_skew=10
options.timed.proto=ntp
options.timed.sched=hourly
options.timed.servers=time-server-name # time server to use
options.timed.window=0s
state.timed.cycles_per_msec=2384372
state.timed.extra_microseconds=-54
state.timed.version=1
rdfile read data file (raw format)
eg rdfile /etc/exports
inside telnet channel, will read the root etc/exports file to std out.
equiv to unix cat
wrfile write stdin to file
not edit, more like cat - > file kind of thing.
FilerView
FilerView is the Web GUI. If SSL certificate is broken, then it may load up a blank page.
secureadmin status
secureadmin disable ssl
secureadmin setup -f ssl # follow prompt to setup new ssl cert
FilerView got dropped in OnTap 8.1 :(
Why they do that is ill advised!
It is replaced with OnCoomand System Manager.
Yes, it can manage multiple filer.
But what happen when the sys admin is not on a computer with the software installed?
FilerView provided a perfect quick and easy way to manage the filer.
No More!
And if you are on a mac. well, you are a user, not a sysadmin!
(or are you??!! Explain that to the netapp management that decided to remove FilerView! ^_^ ).
SSH
To allow root login to netapp w/o password, add root's id_rsa.pub to
vol/vol0/etc/sshd/root/.ssh/authorized_keys2
vol/vol0/etc/sshd/USERNAME/.ssh/authorized_keys2 # but not sure what to do with AD user, DOMAIN\\username didn't work
Beware of the security implications!
OnTap 8.1... sshd only accepts 3des cipher. On Linux side, can configure ~/.ssh/config
ciphers section, add to use 3des.
Note that cipher is for use in ssh (v1), and ciphers with is for ssh2.
3des is secure enough, (des is not secure enough anymore), but it is slower than blowfish, etc
so many linux does not use 3des anymore.
OnTap 9.x
ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 admin@netapp
ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -o PubkeyAuthentication=no -l naroot netapp2
vFiler
Volumes are created on the physical (host) filer.
It is then added (assigned) to a virtual filer.
qtree are created by the vFiler.
exports of fs is done by the vFiler. Each vFiler has its own etc/exports for this purpose.
in cDOT 8.3, vfiler is called svm (storage virtual machine).
it encapsulate the volumes, all protocol access, lif. admin access is assigned to svm (so that netapp can do multi-tenancy thing)
the hardware is really to just host the SVM. most everything else is done inside the SVM (using vfiler commands)
vol create vf1_vol0 # good practice to have vfiler name on the volume name
# SVM would have volume to itself, not visible by other SVM admin, so not needed in OnTap 9.x cluster mode (?)
vfiler add vfiler1 /vol/vf1_vol0 # this would be root vol, aka /etc
vfiler add vfiler1 /vol/vf1_vol1 # addional vol for data
vfiler run vfiler1 qtree create /vol/vf1_vol1/qt1 # qtree is created inside the vfiler context
# it is also possible to enable ssh into the vfiler and run this qtree on it without the special syntax
vfiler status # list all vfilers on system
vfiler status -a vfiler1 # get all info on vfiler, such as what vol is used as their root (vol0),
vfiler status -a # see what other vol is associated with which vfiler
vfiler run vfiler1 exportfs -p sec=sys,rw=client1 /vol/vf1_vol1/qt1 # export the qtree permanently
vfiler run vfiler1 exportfs -p sec=sys,rw=adminhost,root=adminhost,anon=0 /etc # export etc
#vfiler run vfiler1 exportfs -p sec=sys,rw=adminhost,root=adminhost,anon=0 /vol/vol0/etc # this is probably not true, or a pseudo path to above
exportfs -v /path/of/fs # permanently remove the entry from etc/exports (but live still have export live?)
# potentially can export to specific client mapping rootsquash to a specific uid number rather than nfsnobody
-anon=uid -clientmatch
don't really ssh to vfiler, but ssh to physical filer and run command under a vfiler context.
ssh netapp
vfiler context vf1 # switch to contect of a given vfiler so subsequent command run will be under this virtual filer
# has most commands, but not all. eg no vol create, no fdfile
vfiler context vfiler0 # swtich back to physical (hosting) filer.
CIFS share are also created inside the vfiler like NFS exports. the config is saved in the cifsconfig_share.cfg file in the etc folder.
vfiler export-policy show
Config Files
all stored in etc folder.
resolve.conf
nsswitch.conf
# etc/exports
/vol/unix02 -rw=192.168.1.0/24:172.27.1.5:www,root=www
/vol/unix02/dir1 -rw=10.10.10.0/8
# can export subdirs with separate permissions
# issue exportfs -a to reread file
Logs
(/) etc/messages.* unix syslog style logs. can configure to use remote syslog host.
(/) etc/log/auditlog
log all filer level command. Not changes on done on the FS.
The "root" of vol 0,1,etc in the netapp can be choose as the netapp root and store the /etc directory,
where all the config files are saved. eg.
/mnt/nar_200_vol0/etc
/mnt/na4_vol1/etc
other command that need to be issued is to be done via telnet/rsh/ssh to the netapp box.
< ! - - - - >
Howto
Create new vol, qtree, and make access for CIFS
vol create win01 ...
qtree create /vol/win01/wingrow
qtree security /vol/win01/wingrow ntfs
qtree oplocks /vol/win01/wingrow enable
cifs shares -add wingrow /vol/win01/wingrow -comment "Windows share growing"
#-cifs access wingrow ad\tinh "Full Control" # share level control is usually redundant
cifs access -delete wingrow Everyone
cifs access wingrow "authenticated users" "Full Control"
# still need to go to the folder and set file/folder permission,
# added corresponding department (MMC share, permission, type in am\Dept-S
# the alt+k to complete list (ie, checK names).
# also remove inherit from parent, so took out full access to everyone.
Network Interface Config
vif = virtual interface, eg use: create etherchannel
link agregation (netapp typically calls it trunking, cisco EtherChannel).
single mode = HA fail over, only single link active at the same time.
multi mode = Performance, multiple link active at the same time. Req swich support
Only good when multiple host access filer. Switch do the
traffic direction (per host).
Many filer comes with 4 build in ethernet port, can do:
2 pair of multi mode (e0a+e0b, e0c+e0d).
then single mode on the above pair to get HA, filer will always have 2 link
active at the same time.
pktt start all -d /etc # packets tracing, like tcpdump
pktt stop all
# trace all itnerfaces, put them in /etc dir,
# one file per interface.
# files can be read by ethereal/wireshark
Backup and Restore, Disaster Recovery
NetApp supports dump/restore commands, a la Solaris format. Thus, the archive created can even be read by Solaris ufsrestore command.
NetApp championed NDMP, and it is fast. But it backup whole volume as a unit, and restore has to be done as a whole unit. This may not be convinient.
volcopy is fast, it will also copy all the snapshots associated with the volume.
Disk sanitazation
Before decommisioning a filer, best to remove volume, and maybe zero or sanitize the disks.
Sanitazion is more secure than just simply zeroing the disk.
For sanitizing, can choose number of pass with -c N, max 7, def 3.
Sanitization happens in the background, in parallel for each disk. But a 4 TB SAS drive can still take hours for a single pass.
ref: https://www.pickysysadmin.ca/2012/04/19/how-to-securely-erase-your-data-on-a-netapp/
vol offline ...
vol destroy ...
aggr offline ...
aggr destroy ...
aggregate status -v
disk status
options licensed_feature.disk_sanitization.enable on
#no longer need to get lic from netapp sales. but still get warning about some side effects, etc
# get a list of disk, exclude disk that are still in use
disk sanitize start -p 0x666 -c 1 0a.00.1 0a.00.19 ... DISK_LIST # single pass, choosing my own pattern
# can carry out for a single disk, but prompt warning, so best to do a list
# it need to list each disk individually, it does not take range(s)
disk sanitize start -c 7 0a.00.5 0b.00.22 ... DISK_LIST # 7 pass as req by DoD
disk sanitize status
disk status
disk sanitize release DISK_LIST # restore disk from FAILED to spare
DFM
Data Fabric Manager, now known as ...
typically https://dfm:443/
Ment to manage multiple filer in one place, but seems to just collect stats.
Kinda slow at time.
And to get volume config, still have to use FilerView,
so not one-stop thing. ==> limited use.
Links/Ref
- RAID_DP
- Pocket Guide Clustered Mode from sysadmin tutorials
- Pocket Guide for NetApp 7-mode
- OnTap 9 Doc Center
History
- ca 2000 = Alpha Chip
OnTap 5.3 -
OnTap 6.1 - 2003? Intel based ?
OnTap 6.5 - 2004? RAID_DP Dual Paritiy introduced here.
OnTap 7.0 - 2005? Aggregate introduced here.
OnTap 7.3.1 - 2008? DeDuplication (a/sis) single instance storage available.
OnTap 8.1.4 - Last version with NFSv2 support
OnTap 8.2 - Dropped support for NFSv2. Sorry Epic VxWorks.
OnTap 9 - 2017? Introduced SVM - Storage VM - segregated Data vs HW access for multi-tenancy support. 7-mode support has been dropped
OnTap 8.3 - ?? backport? cDOT mode avail, is it using Storage VM or vfiler??
nSarCoV2
hoti1
sn5050
psg101 sn50 tin6150