Linux Audio

Check our new training course

Loading...
Note: File does not exist in v6.13.7.
  1dm-switch
  2=========
  3
  4The device-mapper switch target creates a device that supports an
  5arbitrary mapping of fixed-size regions of I/O across a fixed set of
  6paths.  The path used for any specific region can be switched
  7dynamically by sending the target a message.
  8
  9It maps I/O to underlying block devices efficiently when there is a large
 10number of fixed-sized address regions but there is no simple pattern
 11that would allow for a compact representation of the mapping such as
 12dm-stripe.
 13
 14Background
 15----------
 16
 17Dell EqualLogic and some other iSCSI storage arrays use a distributed
 18frameless architecture.  In this architecture, the storage group
 19consists of a number of distinct storage arrays ("members") each having
 20independent controllers, disk storage and network adapters.  When a LUN
 21is created it is spread across multiple members.  The details of the
 22spreading are hidden from initiators connected to this storage system.
 23The storage group exposes a single target discovery portal, no matter
 24how many members are being used.  When iSCSI sessions are created, each
 25session is connected to an eth port on a single member.  Data to a LUN
 26can be sent on any iSCSI session, and if the blocks being accessed are
 27stored on another member the I/O will be forwarded as required.  This
 28forwarding is invisible to the initiator.  The storage layout is also
 29dynamic, and the blocks stored on disk may be moved from member to
 30member as needed to balance the load.
 31
 32This architecture simplifies the management and configuration of both
 33the storage group and initiators.  In a multipathing configuration, it
 34is possible to set up multiple iSCSI sessions to use multiple network
 35interfaces on both the host and target to take advantage of the
 36increased network bandwidth.  An initiator could use a simple round
 37robin algorithm to send I/O across all paths and let the storage array
 38members forward it as necessary, but there is a performance advantage to
 39sending data directly to the correct member.
 40
 41A device-mapper table already lets you map different regions of a
 42device onto different targets.  However in this architecture the LUN is
 43spread with an address region size on the order of 10s of MBs, which
 44means the resulting table could have more than a million entries and
 45consume far too much memory.
 46
 47Using this device-mapper switch target we can now build a two-layer
 48device hierarchy:
 49
 50    Upper Tier - Determine which array member the I/O should be sent to.
 51    Lower Tier - Load balance amongst paths to a particular member.
 52
 53The lower tier consists of a single dm multipath device for each member.
 54Each of these multipath devices contains the set of paths directly to
 55the array member in one priority group, and leverages existing path
 56selectors to load balance amongst these paths.  We also build a
 57non-preferred priority group containing paths to other array members for
 58failover reasons.
 59
 60The upper tier consists of a single dm-switch device.  This device uses
 61a bitmap to look up the location of the I/O and choose the appropriate
 62lower tier device to route the I/O.  By using a bitmap we are able to
 63use 4 bits for each address range in a 16 member group (which is very
 64large for us).  This is a much denser representation than the dm table
 65b-tree can achieve.
 66
 67Construction Parameters
 68=======================
 69
 70    <num_paths> <region_size> <num_optional_args> [<optional_args>...]
 71    [<dev_path> <offset>]+
 72
 73<num_paths>
 74    The number of paths across which to distribute the I/O.
 75
 76<region_size>
 77    The number of 512-byte sectors in a region. Each region can be redirected
 78    to any of the available paths.
 79
 80<num_optional_args>
 81    The number of optional arguments. Currently, no optional arguments
 82    are supported and so this must be zero.
 83
 84<dev_path>
 85    The block device that represents a specific path to the device.
 86
 87<offset>
 88    The offset of the start of data on the specific <dev_path> (in units
 89    of 512-byte sectors). This number is added to the sector number when
 90    forwarding the request to the specific path. Typically it is zero.
 91
 92Messages
 93========
 94
 95set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
 96
 97Modify the region table by specifying which regions are redirected to
 98which paths.
 99
100<index>
101    The region number (region size was specified in constructor parameters).
102    If index is omitted, the next region (previous index + 1) is used.
103    Expressed in hexadecimal (WITHOUT any prefix like 0x).
104
105<path_nr>
106    The path number in the range 0 ... (<num_paths> - 1).
107    Expressed in hexadecimal (WITHOUT any prefix like 0x).
108
109R<n>,<m>
110    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
111    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
112    slots.
113
114Status
115======
116
117No status line is reported.
118
119Example
120=======
121
122Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
123the same size.
124
125Create a switch device with 64kB region size:
126    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
127	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
128
129Set mappings for the first 7 entries to point to devices switch0, switch1,
130switch2, switch0, switch1, switch2, switch1:
131    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
132
133Set repetitive mapping. This command:
134    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
135is equivalent to:
136    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
137	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
138