Linux Audio

Check our new training course

In-person Linux kernel drivers training

Jun 16-20, 2025
Register
Loading...
v6.13.7
  1.. SPDX-License-Identifier: GPL-2.0
  2
  3===============
  4Shared Subtrees
  5===============
  6
  7.. Contents:
  8	1) Overview
  9	2) Features
 10	3) Setting mount states
 11	4) Use-case
 12	5) Detailed semantics
 13	6) Quiz
 14	7) FAQ
 15	8) Implementation
 16
 17
 181) Overview
 19-----------
 20
 21Consider the following situation:
 22
 23A process wants to clone its own namespace, but still wants to access the CD
 24that got mounted recently.  Shared subtree semantics provide the necessary
 25mechanism to accomplish the above.
 26
 27It provides the necessary building blocks for features like per-user-namespace
 28and versioned filesystem.
 29
 302) Features
 31-----------
 32
 33Shared subtree provides four different flavors of mounts; struct vfsmount to be
 34precise
 35
 36	a. shared mount
 37	b. slave mount
 38	c. private mount
 39	d. unbindable mount
 40
 41
 422a) A shared mount can be replicated to as many mountpoints and all the
 43replicas continue to be exactly same.
 44
 45	Here is an example:
 46
 47	Let's say /mnt has a mount that is shared::
 48
 49	    mount --make-shared /mnt
 50
 51	Note: mount(8) command now supports the --make-shared flag,
 52	so the sample 'smount' program is no longer needed and has been
 53	removed.
 54
 55	::
 56
 57	    # mount --bind /mnt /tmp
 58
 59	The above command replicates the mount at /mnt to the mountpoint /tmp
 60	and the contents of both the mounts remain identical.
 61
 62	::
 63
 64	    #ls /mnt
 65	    a b c
 66
 67	    #ls /tmp
 68	    a b c
 69
 70	Now let's say we mount a device at /tmp/a::
 71
 72	    # mount /dev/sd0  /tmp/a
 73
 74	    #ls /tmp/a
 75	    t1 t2 t3
 76
 77	    #ls /mnt/a
 78	    t1 t2 t3
 79
 80	Note that the mount has propagated to the mount at /mnt as well.
 81
 82	And the same is true even when /dev/sd0 is mounted on /mnt/a. The
 83	contents will be visible under /tmp/a too.
 84
 85
 862b) A slave mount is like a shared mount except that mount and umount events
 87	only propagate towards it.
 88
 89	All slave mounts have a master mount which is a shared.
 90
 91	Here is an example:
 92
 93	Let's say /mnt has a mount which is shared.
 94	# mount --make-shared /mnt
 95
 96	Let's bind mount /mnt to /tmp
 97	# mount --bind /mnt /tmp
 98
 99	the new mount at /tmp becomes a shared mount and it is a replica of
100	the mount at /mnt.
101
102	Now let's make the mount at /tmp; a slave of /mnt
103	# mount --make-slave /tmp
104
105	let's mount /dev/sd0 on /mnt/a
106	# mount /dev/sd0 /mnt/a
107
108	#ls /mnt/a
109	t1 t2 t3
110
111	#ls /tmp/a
112	t1 t2 t3
113
114	Note the mount event has propagated to the mount at /tmp
115
116	However let's see what happens if we mount something on the mount at /tmp
117
118	# mount /dev/sd1 /tmp/b
119
120	#ls /tmp/b
121	s1 s2 s3
122
123	#ls /mnt/b
124
125	Note how the mount event has not propagated to the mount at
126	/mnt
127
128
1292c) A private mount does not forward or receive propagation.
130
131	This is the mount we are familiar with. Its the default type.
132
133
1342d) A unbindable mount is a unbindable private mount
135
136	let's say we have a mount at /mnt and we make it unbindable::
137
138	    # mount --make-unbindable /mnt
139
140	 Let's try to bind mount this mount somewhere else::
141
142	    # mount --bind /mnt /tmp
143	    mount: wrong fs type, bad option, bad superblock on /mnt,
144		    or too many mounted file systems
145
146	Binding a unbindable mount is a invalid operation.
147
148
1493) Setting mount states
150-----------------------
151
152	The mount command (util-linux package) can be used to set mount
153	states::
154
155	    mount --make-shared mountpoint
156	    mount --make-slave mountpoint
157	    mount --make-private mountpoint
158	    mount --make-unbindable mountpoint
159
160
1614) Use cases
162------------
163
164	A) A process wants to clone its own namespace, but still wants to
165	   access the CD that got mounted recently.
166
167	   Solution:
168
169		The system administrator can make the mount at /cdrom shared::
170
171		    mount --bind /cdrom /cdrom
172		    mount --make-shared /cdrom
173
174		Now any process that clones off a new namespace will have a
175		mount at /cdrom which is a replica of the same mount in the
176		parent namespace.
177
178		So when a CD is inserted and mounted at /cdrom that mount gets
179		propagated to the other mount at /cdrom in all the other clone
180		namespaces.
181
182	B) A process wants its mounts invisible to any other process, but
183	still be able to see the other system mounts.
184
185	   Solution:
186
187		To begin with, the administrator can mark the entire mount tree
188		as shareable::
189
190		    mount --make-rshared /
191
192		A new process can clone off a new namespace. And mark some part
193		of its namespace as slave::
194
195		    mount --make-rslave /myprivatetree
196
197		Hence forth any mounts within the /myprivatetree done by the
198		process will not show up in any other namespace. However mounts
199		done in the parent namespace under /myprivatetree still shows
200		up in the process's namespace.
201
202
203	Apart from the above semantics this feature provides the
204	building blocks to solve the following problems:
205
206	C)  Per-user namespace
207
208		The above semantics allows a way to share mounts across
209		namespaces.  But namespaces are associated with processes. If
210		namespaces are made first class objects with user API to
211		associate/disassociate a namespace with userid, then each user
212		could have his/her own namespace and tailor it to his/her
213		requirements. This needs to be supported in PAM.
214
215	D)  Versioned files
216
217		If the entire mount tree is visible at multiple locations, then
218		an underlying versioning file system can return different
219		versions of the file depending on the path used to access that
220		file.
221
222		An example is::
223
224		    mount --make-shared /
225		    mount --rbind / /view/v1
226		    mount --rbind / /view/v2
227		    mount --rbind / /view/v3
228		    mount --rbind / /view/v4
229
230		and if /usr has a versioning filesystem mounted, then that
231		mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
232		/view/v4/usr too
233
234		A user can request v3 version of the file /usr/fs/namespace.c
235		by accessing /view/v3/usr/fs/namespace.c . The underlying
236		versioning filesystem can then decipher that v3 version of the
237		filesystem is being requested and return the corresponding
238		inode.
239
2405) Detailed semantics
241---------------------
242	The section below explains the detailed semantics of
243	bind, rbind, move, mount, umount and clone-namespace operations.
244
245	Note: the word 'vfsmount' and the noun 'mount' have been used
246	to mean the same thing, throughout this document.
247
2485a) Mount states
249
250	A given mount can be in one of the following states
251
252	1) shared
253	2) slave
254	3) shared and slave
255	4) private
256	5) unbindable
257
258	A 'propagation event' is defined as event generated on a vfsmount
259	that leads to mount or unmount actions in other vfsmounts.
260
261	A 'peer group' is defined as a group of vfsmounts that propagate
262	events to each other.
263
264	(1) Shared mounts
265
266		A 'shared mount' is defined as a vfsmount that belongs to a
267		'peer group'.
268
269		For example::
270
271			mount --make-shared /mnt
272			mount --bind /mnt /tmp
273
274		The mount at /mnt and that at /tmp are both shared and belong
275		to the same peer group. Anything mounted or unmounted under
276		/mnt or /tmp reflect in all the other mounts of its peer
277		group.
278
279
280	(2) Slave mounts
281
282		A 'slave mount' is defined as a vfsmount that receives
283		propagation events and does not forward propagation events.
284
285		A slave mount as the name implies has a master mount from which
286		mount/unmount events are received. Events do not propagate from
287		the slave mount to the master.  Only a shared mount can be made
288		a slave by executing the following command::
289
290			mount --make-slave mount
291
292		A shared mount that is made as a slave is no more shared unless
293		modified to become shared.
294
295	(3) Shared and Slave
296
297		A vfsmount can be both shared as well as slave.  This state
298		indicates that the mount is a slave of some vfsmount, and
299		has its own peer group too.  This vfsmount receives propagation
300		events from its master vfsmount, and also forwards propagation
301		events to its 'peer group' and to its slave vfsmounts.
302
303		Strictly speaking, the vfsmount is shared having its own
304		peer group, and this peer-group is a slave of some other
305		peer group.
306
307		Only a slave vfsmount can be made as 'shared and slave' by
308		either executing the following command::
309
310			mount --make-shared mount
311
312		or by moving the slave vfsmount under a shared vfsmount.
313
314	(4) Private mount
315
316		A 'private mount' is defined as vfsmount that does not
317		receive or forward any propagation events.
318
319	(5) Unbindable mount
320
321		A 'unbindable mount' is defined as vfsmount that does not
322		receive or forward any propagation events and cannot
323		be bind mounted.
324
325
326   	State diagram:
327
328   	The state diagram below explains the state transition of a mount,
329	in response to various commands::
330
331	    -----------------------------------------------------------------------
332	    |             |make-shared |  make-slave  | make-private |make-unbindab|
333	    --------------|------------|--------------|--------------|-------------|
334	    |shared	  |shared      |*slave/private|   private    | unbindable  |
335	    |             |            |              |              |             |
336	    |-------------|------------|--------------|--------------|-------------|
337	    |slave	  |shared      | **slave      |    private   | unbindable  |
338	    |             |and slave   |              |              |             |
339	    |-------------|------------|--------------|--------------|-------------|
340	    |shared       |shared      | slave        |    private   | unbindable  |
341	    |and slave    |and slave   |              |              |             |
342	    |-------------|------------|--------------|--------------|-------------|
343	    |private      |shared      |  **private   |    private   | unbindable  |
344	    |-------------|------------|--------------|--------------|-------------|
345	    |unbindable   |shared      |**unbindable  |    private   | unbindable  |
346	    ------------------------------------------------------------------------
347
348	    * if the shared mount is the only mount in its peer group, making it
349	    slave, makes it private automatically. Note that there is no master to
350	    which it can be slaved to.
351
352	    ** slaving a non-shared mount has no effect on the mount.
353
354	Apart from the commands listed below, the 'move' operation also changes
355	the state of a mount depending on type of the destination mount. Its
356	explained in section 5d.
357
3585b) Bind semantics
359
360	Consider the following command::
361
362	    mount --bind A/a  B/b
363
364	where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
365	is the destination mount and 'b' is the dentry in the destination mount.
366
367	The outcome depends on the type of mount of 'A' and 'B'. The table
368	below contains quick reference::
369
370	    --------------------------------------------------------------------------
371	    |         BIND MOUNT OPERATION                                           |
372	    |************************************************************************|
373	    |source(A)->| shared      |       private  |       slave    | unbindable |
374	    | dest(B)  |              |                |                |            |
375	    |   |      |              |                |                |            |
376	    |   v      |              |                |                |            |
377	    |************************************************************************|
378	    |  shared  | shared       |     shared     | shared & slave |  invalid   |
379	    |          |              |                |                |            |
380	    |non-shared| shared       |      private   |      slave     |  invalid   |
381	    **************************************************************************
382
383     	Details:
384
385    1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
386	which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
387	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
388	are created and mounted at the dentry 'b' on all mounts where 'B'
389	propagates to. A new propagation tree containing 'C1',..,'Cn' is
390	created. This propagation tree is identical to the propagation tree of
391	'B'.  And finally the peer-group of 'C' is merged with the peer group
392	of 'A'.
393
394    2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
395	which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
396	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
397	are created and mounted at the dentry 'b' on all mounts where 'B'
398	propagates to. A new propagation tree is set containing all new mounts
399	'C', 'C1', .., 'Cn' with exactly the same configuration as the
400	propagation tree for 'B'.
401
402    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
403	mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
404	'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
405	'C3' ... are created and mounted at the dentry 'b' on all mounts where
406	'B' propagates to. A new propagation tree containing the new mounts
407	'C','C1',..  'Cn' is created. This propagation tree is identical to the
408	propagation tree for 'B'. And finally the mount 'C' and its peer group
409	is made the slave of mount 'Z'.  In other words, mount 'C' is in the
410	state 'slave and shared'.
411
412    4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
413	invalid operation.
414
415    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
416	unbindable) mount. A new mount 'C' which is clone of 'A', is created.
417	Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
418
419    6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
420	which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
421	mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
422	peer-group of 'A'.
423
424    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
425	new mount 'C' which is a clone of 'A' is created. Its root dentry is
426	'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
427	slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
428	'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
429	mount/unmount on 'A' do not propagate anywhere else. Similarly
430	mount/unmount on 'C' do not propagate anywhere else.
431
432    8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
433	invalid operation. A unbindable mount cannot be bind mounted.
434
4355c) Rbind semantics
436
437	rbind is same as bind. Bind replicates the specified mount.  Rbind
438	replicates all the mounts in the tree belonging to the specified mount.
439	Rbind mount is bind mount applied to all the mounts in the tree.
440
441	If the source tree that is rbind has some unbindable mounts,
442	then the subtree under the unbindable mount is pruned in the new
443	location.
444
445	eg:
446
447	  let's say we have the following mount tree::
448
449		A
450	      /   \
451	      B   C
452	     / \ / \
453	     D E F G
454
455	  Let's say all the mount except the mount C in the tree are
456	  of a type other than unbindable.
457
458	  If this tree is rbound to say Z
459
460	  We will have the following tree at the new location::
461
462		Z
463		|
464		A'
465	       /
466	      B'		Note how the tree under C is pruned
467	     / \ 		in the new location.
468	    D' E'
469
470
471
4725d) Move semantics
473
474	Consider the following command
475
476	mount --move A  B/b
477
478	where 'A' is the source mount, 'B' is the destination mount and 'b' is
479	the dentry in the destination mount.
480
481	The outcome depends on the type of the mount of 'A' and 'B'. The table
482	below is a quick reference::
483
484	    ---------------------------------------------------------------------------
485	    |         		MOVE MOUNT OPERATION                                 |
486	    |**************************************************************************
487	    | source(A)->| shared      |       private  |       slave    | unbindable |
488	    | dest(B)  |               |                |                |            |
489	    |   |      |               |                |                |            |
490	    |   v      |               |                |                |            |
491	    |**************************************************************************
492	    |  shared  | shared        |     shared     |shared and slave|  invalid   |
493	    |          |               |                |                |            |
494	    |non-shared| shared        |      private   |    slave       | unbindable |
495	    ***************************************************************************
496
497	.. Note:: moving a mount residing under a shared mount is invalid.
498
499      Details follow:
500
501    1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
502	mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
503	are created and mounted at dentry 'b' on all mounts that receive
504	propagation from mount 'B'. A new propagation tree is created in the
505	exact same configuration as that of 'B'. This new propagation tree
506	contains all the new mounts 'A1', 'A2'...  'An'.  And this new
507	propagation tree is appended to the already existing propagation tree
508	of 'A'.
509
510    2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
511	mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
512	are created and mounted at dentry 'b' on all mounts that receive
513	propagation from mount 'B'. The mount 'A' becomes a shared mount and a
514	propagation tree is created which is identical to that of
515	'B'. This new propagation tree contains all the new mounts 'A1',
516	'A2'...  'An'.
517
518    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
519	mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
520	'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
521	receive propagation from mount 'B'. A new propagation tree is created
522	in the exact same configuration as that of 'B'. This new propagation
523	tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
524	propagation tree is appended to the already existing propagation tree of
525	'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
526	becomes 'shared'.
527
528    4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
529	is invalid. Because mounting anything on the shared mount 'B' can
530	create new mounts that get mounted on the mounts that receive
531	propagation from 'B'.  And since the mount 'A' is unbindable, cloning
532	it to mount at other mountpoints is not possible.
533
534    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
535	unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
536
537    6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
538	is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
539	shared mount.
540
541    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
542	The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
543	continues to be a slave mount of mount 'Z'.
544
545    8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
546	'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
547	unbindable mount.
548
5495e) Mount semantics
550
551	Consider the following command::
552
553	    mount device  B/b
554
555	'B' is the destination mount and 'b' is the dentry in the destination
556	mount.
557
558	The above operation is the same as bind operation with the exception
559	that the source mount is always a private mount.
560
561
5625f) Unmount semantics
563
564	Consider the following command::
565
566	    umount A
567
568	where 'A' is a mount mounted on mount 'B' at dentry 'b'.
569
570	If mount 'B' is shared, then all most-recently-mounted mounts at dentry
571	'b' on mounts that receive propagation from mount 'B' and does not have
572	sub-mounts within them are unmounted.
573
574	Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
575	each other.
576
577	let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
578	'B1', 'B2' and 'B3' respectively.
579
580	let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
581	mount 'B1', 'B2' and 'B3' respectively.
582
583	if 'C1' is unmounted, all the mounts that are most-recently-mounted on
584	'B1' and on the mounts that 'B1' propagates-to are unmounted.
585
586	'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
587	on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
588
589	So all 'C1', 'C2' and 'C3' should be unmounted.
590
591	If any of 'C2' or 'C3' has some child mounts, then that mount is not
592	unmounted, but all other mounts are unmounted. However if 'C1' is told
593	to be unmounted and 'C1' has some sub-mounts, the umount operation is
594	failed entirely.
595
5965g) Clone Namespace
597
598	A cloned namespace contains all the mounts as that of the parent
599	namespace.
600
601	Let's say 'A' and 'B' are the corresponding mounts in the parent and the
602	child namespace.
603
604	If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
605	each other.
606
607	If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
608	'Z'.
609
610	If 'A' is a private mount, then 'B' is a private mount too.
611
612	If 'A' is unbindable mount, then 'B' is a unbindable mount too.
613
614
6156) Quiz
616-------
617
618	A. What is the result of the following command sequence?
619
620		::
621
622		    mount --bind /mnt /mnt
623		    mount --make-shared /mnt
624		    mount --bind /mnt /tmp
625		    mount --move /tmp /mnt/1
626
627		what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
628		Should they all be identical? or should /mnt and /mnt/1 be
629		identical only?
630
631
632	B. What is the result of the following command sequence?
633
634		::
635
636		    mount --make-rshared /
637		    mkdir -p /v/1
638		    mount --rbind / /v/1
639
640		what should be the content of /v/1/v/1 be?
641
642
643	C. What is the result of the following command sequence?
644
645		::
646
647		    mount --bind /mnt /mnt
648		    mount --make-shared /mnt
649		    mkdir -p /mnt/1/2/3 /mnt/1/test
650		    mount --bind /mnt/1 /tmp
651		    mount --make-slave /mnt
652		    mount --make-shared /mnt
653		    mount --bind /mnt/1/2 /tmp1
654		    mount --make-slave /mnt
655
656		At this point we have the first mount at /tmp and
657		its root dentry is 1. Let's call this mount 'A'
658		And then we have a second mount at /tmp1 with root
659		dentry 2. Let's call this mount 'B'
660		Next we have a third mount at /mnt with root dentry
661		mnt. Let's call this mount 'C'
662
663		'B' is the slave of 'A' and 'C' is a slave of 'B'
664		A -> B -> C
665
666		at this point if we execute the following command
667
668		mount --bind /bin /tmp/test
669
670		The mount is attempted on 'A'
671
672		will the mount propagate to 'B' and 'C' ?
673
674		what would be the contents of
675		/mnt/1/test be?
676
6777) FAQ
678------
679
680	Q1. Why is bind mount needed? How is it different from symbolic links?
681		symbolic links can get stale if the destination mount gets
682		unmounted or moved. Bind mounts continue to exist even if the
683		other mount is unmounted or moved.
684
685	Q2. Why can't the shared subtree be implemented using exportfs?
686
687		exportfs is a heavyweight way of accomplishing part of what
688		shared subtree can do. I cannot imagine a way to implement the
689		semantics of slave mount using exportfs?
690
691	Q3 Why is unbindable mount needed?
692
693		Let's say we want to replicate the mount tree at multiple
694		locations within the same subtree.
695
696		if one rbind mounts a tree within the same subtree 'n' times
697		the number of mounts created is an exponential function of 'n'.
698		Having unbindable mount can help prune the unneeded bind
699		mounts. Here is an example.
700
701		step 1:
702		   let's say the root tree has just two directories with
703		   one vfsmount::
704
705				    root
706				   /    \
707				  tmp    usr
708
709		    And we want to replicate the tree at multiple
710		    mountpoints under /root/tmp
711
712		step 2:
713		      ::
714
715
716			mount --make-shared /root
717
718			mkdir -p /tmp/m1
719
720			mount --rbind /root /tmp/m1
721
722		      the new tree now looks like this::
723
724				    root
725				   /    \
726				 tmp    usr
727				/
728			       m1
729			      /  \
730			     tmp  usr
731			     /
732			    m1
733
734			  it has two vfsmounts
735
736		step 3:
737		    ::
738
739			    mkdir -p /tmp/m2
740			    mount --rbind /root /tmp/m2
741
742			the new tree now looks like this::
743
744				      root
745				     /    \
746				   tmp     usr
747				  /    \
748				m1       m2
749			       / \       /  \
750			     tmp  usr   tmp  usr
751			     / \          /
752			    m1  m2      m1
753				/ \     /  \
754			      tmp usr  tmp   usr
755			      /        / \
756			     m1       m1  m2
757			    /  \
758			  tmp   usr
759			  /  \
760			 m1   m2
761
762		       it has 6 vfsmounts
763
764		step 4:
765		      ::
766			  mkdir -p /tmp/m3
767			  mount --rbind /root /tmp/m3
768
769			  I won't draw the tree..but it has 24 vfsmounts
770
771
772		at step i the number of vfsmounts is V[i] = i*V[i-1].
773		This is an exponential function. And this tree has way more
774		mounts than what we really needed in the first place.
775
776		One could use a series of umount at each step to prune
777		out the unneeded mounts. But there is a better solution.
778		Unclonable mounts come in handy here.
779
780		step 1:
781		   let's say the root tree has just two directories with
782		   one vfsmount::
783
784				    root
785				   /    \
786				  tmp    usr
787
788		    How do we set up the same tree at multiple locations under
789		    /root/tmp
790
791		step 2:
792		      ::
793
794
795			mount --bind /root/tmp /root/tmp
796
797			mount --make-rshared /root
798			mount --make-unbindable /root/tmp
799
800			mkdir -p /tmp/m1
801
802			mount --rbind /root /tmp/m1
803
804		      the new tree now looks like this::
805
806				    root
807				   /    \
808				 tmp    usr
809				/
810			       m1
811			      /  \
812			     tmp  usr
813
814		step 3:
815		      ::
816
817			    mkdir -p /tmp/m2
818			    mount --rbind /root /tmp/m2
819
820		      the new tree now looks like this::
821
822				    root
823				   /    \
824				 tmp    usr
825				/   \
826			       m1     m2
827			      /  \     / \
828			     tmp  usr tmp usr
829
830		step 4:
831		      ::
832
833			    mkdir -p /tmp/m3
834			    mount --rbind /root /tmp/m3
835
836		      the new tree now looks like this::
837
838				    	  root
839				      /    	  \
840				     tmp    	   usr
841			         /    \    \
842			       m1     m2     m3
843			      /  \     / \    /  \
844			     tmp  usr tmp usr tmp usr
845
8468) Implementation
847-----------------
848
8498A) Datastructure
850
851	4 new fields are introduced to struct vfsmount:
852
853	*   ->mnt_share
854	*   ->mnt_slave_list
855	*   ->mnt_slave
856	*   ->mnt_master
857
858	->mnt_share
859		links together all the mount to/from which this vfsmount
860		send/receives propagation events.
861
862	->mnt_slave_list
863		links all the mounts to which this vfsmount propagates
864		to.
865
866	->mnt_slave
867		links together all the slaves that its master vfsmount
868		propagates to.
869
870	->mnt_master
871		points to the master vfsmount from which this vfsmount
872		receives propagation.
873
874	->mnt_flags
875		takes two more flags to indicate the propagation status of
876		the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
877		vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
878		replicated.
879
880	All the shared vfsmounts in a peer group form a cyclic list through
881	->mnt_share.
882
883	All vfsmounts with the same ->mnt_master form on a cyclic list anchored
884	in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
885
886	 ->mnt_master can point to arbitrary (and possibly different) members
887	 of master peer group.  To find all immediate slaves of a peer group
888	 you need to go through _all_ ->mnt_slave_list of its members.
889	 Conceptually it's just a single set - distribution among the
890	 individual lists does not affect propagation or the way propagation
891	 tree is modified by operations.
892
893	All vfsmounts in a peer group have the same ->mnt_master.  If it is
894	non-NULL, they form a contiguous (ordered) segment of slave list.
895
896	A example propagation tree looks as shown in the figure below.
897	[ NOTE: Though it looks like a forest, if we consider all the shared
898	mounts as a conceptual entity called 'pnode', it becomes a tree]::
899
900
901		        A <--> B <--> C <---> D
902		       /|\	      /|      |\
903		      / F G	     J K      H I
904		     /
905		    E<-->K
906			/|\
907		       M L N
908
909	In the above figure  A,B,C and D all are shared and propagate to each
910	other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
911	mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
912	'E' is also shared with 'K' and they propagate to each other.  And
913	'K' has 3 slaves 'M', 'L' and 'N'
914
915	A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
916
917	A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
918
919	E's ->mnt_share links with ->mnt_share of K
920
921	'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
922
923	'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
924
925	K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
926
927	C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
928
929	J and K's ->mnt_master points to struct vfsmount of C
930
931	and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
932
933	'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
934
935
936	NOTE: The propagation tree is orthogonal to the mount tree.
937
9388B Locking:
939
940	->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
941	by namespace_sem (exclusive for modifications, shared for reading).
942
943	Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
944	There are two exceptions: do_add_mount() and clone_mnt().
945	The former modifies a vfsmount that has not been visible in any shared
946	data structures yet.
947	The latter holds namespace_sem and the only references to vfsmount
948	are in lists that can't be traversed without namespace_sem.
949
9508C Algorithm:
951
952	The crux of the implementation resides in rbind/move operation.
953
954	The overall algorithm breaks the operation into 3 phases: (look at
955	attach_recursive_mnt() and propagate_mnt())
956
957	1. prepare phase.
958	2. commit phases.
959	3. abort phases.
960
961	Prepare phase:
962
963	for each mount in the source tree:
964
965		   a) Create the necessary number of mount trees to
966		   	be attached to each of the mounts that receive
967			propagation from the destination mount.
968		   b) Do not attach any of the trees to its destination.
969		      However note down its ->mnt_parent and ->mnt_mountpoint
970		   c) Link all the new mounts to form a propagation tree that
971		      is identical to the propagation tree of the destination
972		      mount.
973
974		   If this phase is successful, there should be 'n' new
975		   propagation trees; where 'n' is the number of mounts in the
976		   source tree.  Go to the commit phase
977
978		   Also there should be 'm' new mount trees, where 'm' is
979		   the number of mounts to which the destination mount
980		   propagates to.
981
982		   if any memory allocations fail, go to the abort phase.
983
984	Commit phase
985		attach each of the mount trees to their corresponding
986		destination mounts.
987
988	Abort phase
989		delete all the newly created trees.
990
991	.. Note::
992	   all the propagation related functionality resides in the file pnode.c
993
994
995------------------------------------------------------------------------
996
997version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
998
999version 0.2  (Incorporated comments from Al Viro)
v6.2
  1.. SPDX-License-Identifier: GPL-2.0
  2
  3===============
  4Shared Subtrees
  5===============
  6
  7.. Contents:
  8	1) Overview
  9	2) Features
 10	3) Setting mount states
 11	4) Use-case
 12	5) Detailed semantics
 13	6) Quiz
 14	7) FAQ
 15	8) Implementation
 16
 17
 181) Overview
 19-----------
 20
 21Consider the following situation:
 22
 23A process wants to clone its own namespace, but still wants to access the CD
 24that got mounted recently.  Shared subtree semantics provide the necessary
 25mechanism to accomplish the above.
 26
 27It provides the necessary building blocks for features like per-user-namespace
 28and versioned filesystem.
 29
 302) Features
 31-----------
 32
 33Shared subtree provides four different flavors of mounts; struct vfsmount to be
 34precise
 35
 36	a. shared mount
 37	b. slave mount
 38	c. private mount
 39	d. unbindable mount
 40
 41
 422a) A shared mount can be replicated to as many mountpoints and all the
 43replicas continue to be exactly same.
 44
 45	Here is an example:
 46
 47	Let's say /mnt has a mount that is shared::
 48
 49	    mount --make-shared /mnt
 50
 51	Note: mount(8) command now supports the --make-shared flag,
 52	so the sample 'smount' program is no longer needed and has been
 53	removed.
 54
 55	::
 56
 57	    # mount --bind /mnt /tmp
 58
 59	The above command replicates the mount at /mnt to the mountpoint /tmp
 60	and the contents of both the mounts remain identical.
 61
 62	::
 63
 64	    #ls /mnt
 65	    a b c
 66
 67	    #ls /tmp
 68	    a b c
 69
 70	Now let's say we mount a device at /tmp/a::
 71
 72	    # mount /dev/sd0  /tmp/a
 73
 74	    #ls /tmp/a
 75	    t1 t2 t3
 76
 77	    #ls /mnt/a
 78	    t1 t2 t3
 79
 80	Note that the mount has propagated to the mount at /mnt as well.
 81
 82	And the same is true even when /dev/sd0 is mounted on /mnt/a. The
 83	contents will be visible under /tmp/a too.
 84
 85
 862b) A slave mount is like a shared mount except that mount and umount events
 87	only propagate towards it.
 88
 89	All slave mounts have a master mount which is a shared.
 90
 91	Here is an example:
 92
 93	Let's say /mnt has a mount which is shared.
 94	# mount --make-shared /mnt
 95
 96	Let's bind mount /mnt to /tmp
 97	# mount --bind /mnt /tmp
 98
 99	the new mount at /tmp becomes a shared mount and it is a replica of
100	the mount at /mnt.
101
102	Now let's make the mount at /tmp; a slave of /mnt
103	# mount --make-slave /tmp
104
105	let's mount /dev/sd0 on /mnt/a
106	# mount /dev/sd0 /mnt/a
107
108	#ls /mnt/a
109	t1 t2 t3
110
111	#ls /tmp/a
112	t1 t2 t3
113
114	Note the mount event has propagated to the mount at /tmp
115
116	However let's see what happens if we mount something on the mount at /tmp
117
118	# mount /dev/sd1 /tmp/b
119
120	#ls /tmp/b
121	s1 s2 s3
122
123	#ls /mnt/b
124
125	Note how the mount event has not propagated to the mount at
126	/mnt
127
128
1292c) A private mount does not forward or receive propagation.
130
131	This is the mount we are familiar with. Its the default type.
132
133
1342d) A unbindable mount is a unbindable private mount
135
136	let's say we have a mount at /mnt and we make it unbindable::
137
138	    # mount --make-unbindable /mnt
139
140	 Let's try to bind mount this mount somewhere else::
141
142	    # mount --bind /mnt /tmp
143	    mount: wrong fs type, bad option, bad superblock on /mnt,
144		    or too many mounted file systems
145
146	Binding a unbindable mount is a invalid operation.
147
148
1493) Setting mount states
 
150
151	The mount command (util-linux package) can be used to set mount
152	states::
153
154	    mount --make-shared mountpoint
155	    mount --make-slave mountpoint
156	    mount --make-private mountpoint
157	    mount --make-unbindable mountpoint
158
159
1604) Use cases
161------------
162
163	A) A process wants to clone its own namespace, but still wants to
164	   access the CD that got mounted recently.
165
166	   Solution:
167
168		The system administrator can make the mount at /cdrom shared::
169
170		    mount --bind /cdrom /cdrom
171		    mount --make-shared /cdrom
172
173		Now any process that clones off a new namespace will have a
174		mount at /cdrom which is a replica of the same mount in the
175		parent namespace.
176
177		So when a CD is inserted and mounted at /cdrom that mount gets
178		propagated to the other mount at /cdrom in all the other clone
179		namespaces.
180
181	B) A process wants its mounts invisible to any other process, but
182	still be able to see the other system mounts.
183
184	   Solution:
185
186		To begin with, the administrator can mark the entire mount tree
187		as shareable::
188
189		    mount --make-rshared /
190
191		A new process can clone off a new namespace. And mark some part
192		of its namespace as slave::
193
194		    mount --make-rslave /myprivatetree
195
196		Hence forth any mounts within the /myprivatetree done by the
197		process will not show up in any other namespace. However mounts
198		done in the parent namespace under /myprivatetree still shows
199		up in the process's namespace.
200
201
202	Apart from the above semantics this feature provides the
203	building blocks to solve the following problems:
204
205	C)  Per-user namespace
206
207		The above semantics allows a way to share mounts across
208		namespaces.  But namespaces are associated with processes. If
209		namespaces are made first class objects with user API to
210		associate/disassociate a namespace with userid, then each user
211		could have his/her own namespace and tailor it to his/her
212		requirements. This needs to be supported in PAM.
213
214	D)  Versioned files
215
216		If the entire mount tree is visible at multiple locations, then
217		an underlying versioning file system can return different
218		versions of the file depending on the path used to access that
219		file.
220
221		An example is::
222
223		    mount --make-shared /
224		    mount --rbind / /view/v1
225		    mount --rbind / /view/v2
226		    mount --rbind / /view/v3
227		    mount --rbind / /view/v4
228
229		and if /usr has a versioning filesystem mounted, then that
230		mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
231		/view/v4/usr too
232
233		A user can request v3 version of the file /usr/fs/namespace.c
234		by accessing /view/v3/usr/fs/namespace.c . The underlying
235		versioning filesystem can then decipher that v3 version of the
236		filesystem is being requested and return the corresponding
237		inode.
238
2395) Detailed semantics
240---------------------
241	The section below explains the detailed semantics of
242	bind, rbind, move, mount, umount and clone-namespace operations.
243
244	Note: the word 'vfsmount' and the noun 'mount' have been used
245	to mean the same thing, throughout this document.
246
2475a) Mount states
248
249	A given mount can be in one of the following states
250
251	1) shared
252	2) slave
253	3) shared and slave
254	4) private
255	5) unbindable
256
257	A 'propagation event' is defined as event generated on a vfsmount
258	that leads to mount or unmount actions in other vfsmounts.
259
260	A 'peer group' is defined as a group of vfsmounts that propagate
261	events to each other.
262
263	(1) Shared mounts
264
265		A 'shared mount' is defined as a vfsmount that belongs to a
266		'peer group'.
267
268		For example::
269
270			mount --make-shared /mnt
271			mount --bind /mnt /tmp
272
273		The mount at /mnt and that at /tmp are both shared and belong
274		to the same peer group. Anything mounted or unmounted under
275		/mnt or /tmp reflect in all the other mounts of its peer
276		group.
277
278
279	(2) Slave mounts
280
281		A 'slave mount' is defined as a vfsmount that receives
282		propagation events and does not forward propagation events.
283
284		A slave mount as the name implies has a master mount from which
285		mount/unmount events are received. Events do not propagate from
286		the slave mount to the master.  Only a shared mount can be made
287		a slave by executing the following command::
288
289			mount --make-slave mount
290
291		A shared mount that is made as a slave is no more shared unless
292		modified to become shared.
293
294	(3) Shared and Slave
295
296		A vfsmount can be both shared as well as slave.  This state
297		indicates that the mount is a slave of some vfsmount, and
298		has its own peer group too.  This vfsmount receives propagation
299		events from its master vfsmount, and also forwards propagation
300		events to its 'peer group' and to its slave vfsmounts.
301
302		Strictly speaking, the vfsmount is shared having its own
303		peer group, and this peer-group is a slave of some other
304		peer group.
305
306		Only a slave vfsmount can be made as 'shared and slave' by
307		either executing the following command::
308
309			mount --make-shared mount
310
311		or by moving the slave vfsmount under a shared vfsmount.
312
313	(4) Private mount
314
315		A 'private mount' is defined as vfsmount that does not
316		receive or forward any propagation events.
317
318	(5) Unbindable mount
319
320		A 'unbindable mount' is defined as vfsmount that does not
321		receive or forward any propagation events and cannot
322		be bind mounted.
323
324
325   	State diagram:
326
327   	The state diagram below explains the state transition of a mount,
328	in response to various commands::
329
330	    -----------------------------------------------------------------------
331	    |             |make-shared |  make-slave  | make-private |make-unbindab|
332	    --------------|------------|--------------|--------------|-------------|
333	    |shared	  |shared      |*slave/private|   private    | unbindable  |
334	    |             |            |              |              |             |
335	    |-------------|------------|--------------|--------------|-------------|
336	    |slave	  |shared      | **slave      |    private   | unbindable  |
337	    |             |and slave   |              |              |             |
338	    |-------------|------------|--------------|--------------|-------------|
339	    |shared       |shared      | slave        |    private   | unbindable  |
340	    |and slave    |and slave   |              |              |             |
341	    |-------------|------------|--------------|--------------|-------------|
342	    |private      |shared      |  **private   |    private   | unbindable  |
343	    |-------------|------------|--------------|--------------|-------------|
344	    |unbindable   |shared      |**unbindable  |    private   | unbindable  |
345	    ------------------------------------------------------------------------
346
347	    * if the shared mount is the only mount in its peer group, making it
348	    slave, makes it private automatically. Note that there is no master to
349	    which it can be slaved to.
350
351	    ** slaving a non-shared mount has no effect on the mount.
352
353	Apart from the commands listed below, the 'move' operation also changes
354	the state of a mount depending on type of the destination mount. Its
355	explained in section 5d.
356
3575b) Bind semantics
358
359	Consider the following command::
360
361	    mount --bind A/a  B/b
362
363	where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
364	is the destination mount and 'b' is the dentry in the destination mount.
365
366	The outcome depends on the type of mount of 'A' and 'B'. The table
367	below contains quick reference::
368
369	    --------------------------------------------------------------------------
370	    |         BIND MOUNT OPERATION                                           |
371	    |************************************************************************|
372	    |source(A)->| shared      |       private  |       slave    | unbindable |
373	    | dest(B)  |              |                |                |            |
374	    |   |      |              |                |                |            |
375	    |   v      |              |                |                |            |
376	    |************************************************************************|
377	    |  shared  | shared       |     shared     | shared & slave |  invalid   |
378	    |          |              |                |                |            |
379	    |non-shared| shared       |      private   |      slave     |  invalid   |
380	    **************************************************************************
381
382     	Details:
383
384    1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
385	which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
386	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
387	are created and mounted at the dentry 'b' on all mounts where 'B'
388	propagates to. A new propagation tree containing 'C1',..,'Cn' is
389	created. This propagation tree is identical to the propagation tree of
390	'B'.  And finally the peer-group of 'C' is merged with the peer group
391	of 'A'.
392
393    2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
394	which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
395	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
396	are created and mounted at the dentry 'b' on all mounts where 'B'
397	propagates to. A new propagation tree is set containing all new mounts
398	'C', 'C1', .., 'Cn' with exactly the same configuration as the
399	propagation tree for 'B'.
400
401    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
402	mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
403	'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
404	'C3' ... are created and mounted at the dentry 'b' on all mounts where
405	'B' propagates to. A new propagation tree containing the new mounts
406	'C','C1',..  'Cn' is created. This propagation tree is identical to the
407	propagation tree for 'B'. And finally the mount 'C' and its peer group
408	is made the slave of mount 'Z'.  In other words, mount 'C' is in the
409	state 'slave and shared'.
410
411    4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
412	invalid operation.
413
414    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
415	unbindable) mount. A new mount 'C' which is clone of 'A', is created.
416	Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
417
418    6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
419	which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
420	mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
421	peer-group of 'A'.
422
423    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
424	new mount 'C' which is a clone of 'A' is created. Its root dentry is
425	'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
426	slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
427	'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
428	mount/unmount on 'A' do not propagate anywhere else. Similarly
429	mount/unmount on 'C' do not propagate anywhere else.
430
431    8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
432	invalid operation. A unbindable mount cannot be bind mounted.
433
4345c) Rbind semantics
435
436	rbind is same as bind. Bind replicates the specified mount.  Rbind
437	replicates all the mounts in the tree belonging to the specified mount.
438	Rbind mount is bind mount applied to all the mounts in the tree.
439
440	If the source tree that is rbind has some unbindable mounts,
441	then the subtree under the unbindable mount is pruned in the new
442	location.
443
444	eg:
445
446	  let's say we have the following mount tree::
447
448		A
449	      /   \
450	      B   C
451	     / \ / \
452	     D E F G
453
454	  Let's say all the mount except the mount C in the tree are
455	  of a type other than unbindable.
456
457	  If this tree is rbound to say Z
458
459	  We will have the following tree at the new location::
460
461		Z
462		|
463		A'
464	       /
465	      B'		Note how the tree under C is pruned
466	     / \ 		in the new location.
467	    D' E'
468
469
470
4715d) Move semantics
472
473	Consider the following command
474
475	mount --move A  B/b
476
477	where 'A' is the source mount, 'B' is the destination mount and 'b' is
478	the dentry in the destination mount.
479
480	The outcome depends on the type of the mount of 'A' and 'B'. The table
481	below is a quick reference::
482
483	    ---------------------------------------------------------------------------
484	    |         		MOVE MOUNT OPERATION                                 |
485	    |**************************************************************************
486	    | source(A)->| shared      |       private  |       slave    | unbindable |
487	    | dest(B)  |               |                |                |            |
488	    |   |      |               |                |                |            |
489	    |   v      |               |                |                |            |
490	    |**************************************************************************
491	    |  shared  | shared        |     shared     |shared and slave|  invalid   |
492	    |          |               |                |                |            |
493	    |non-shared| shared        |      private   |    slave       | unbindable |
494	    ***************************************************************************
495
496	.. Note:: moving a mount residing under a shared mount is invalid.
497
498      Details follow:
499
500    1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
501	mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
502	are created and mounted at dentry 'b' on all mounts that receive
503	propagation from mount 'B'. A new propagation tree is created in the
504	exact same configuration as that of 'B'. This new propagation tree
505	contains all the new mounts 'A1', 'A2'...  'An'.  And this new
506	propagation tree is appended to the already existing propagation tree
507	of 'A'.
508
509    2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
510	mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
511	are created and mounted at dentry 'b' on all mounts that receive
512	propagation from mount 'B'. The mount 'A' becomes a shared mount and a
513	propagation tree is created which is identical to that of
514	'B'. This new propagation tree contains all the new mounts 'A1',
515	'A2'...  'An'.
516
517    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
518	mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
519	'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
520	receive propagation from mount 'B'. A new propagation tree is created
521	in the exact same configuration as that of 'B'. This new propagation
522	tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
523	propagation tree is appended to the already existing propagation tree of
524	'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
525	becomes 'shared'.
526
527    4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
528	is invalid. Because mounting anything on the shared mount 'B' can
529	create new mounts that get mounted on the mounts that receive
530	propagation from 'B'.  And since the mount 'A' is unbindable, cloning
531	it to mount at other mountpoints is not possible.
532
533    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
534	unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
535
536    6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
537	is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
538	shared mount.
539
540    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
541	The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
542	continues to be a slave mount of mount 'Z'.
543
544    8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
545	'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
546	unbindable mount.
547
5485e) Mount semantics
549
550	Consider the following command::
551
552	    mount device  B/b
553
554	'B' is the destination mount and 'b' is the dentry in the destination
555	mount.
556
557	The above operation is the same as bind operation with the exception
558	that the source mount is always a private mount.
559
560
5615f) Unmount semantics
562
563	Consider the following command::
564
565	    umount A
566
567	where 'A' is a mount mounted on mount 'B' at dentry 'b'.
568
569	If mount 'B' is shared, then all most-recently-mounted mounts at dentry
570	'b' on mounts that receive propagation from mount 'B' and does not have
571	sub-mounts within them are unmounted.
572
573	Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
574	each other.
575
576	let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
577	'B1', 'B2' and 'B3' respectively.
578
579	let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
580	mount 'B1', 'B2' and 'B3' respectively.
581
582	if 'C1' is unmounted, all the mounts that are most-recently-mounted on
583	'B1' and on the mounts that 'B1' propagates-to are unmounted.
584
585	'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
586	on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
587
588	So all 'C1', 'C2' and 'C3' should be unmounted.
589
590	If any of 'C2' or 'C3' has some child mounts, then that mount is not
591	unmounted, but all other mounts are unmounted. However if 'C1' is told
592	to be unmounted and 'C1' has some sub-mounts, the umount operation is
593	failed entirely.
594
5955g) Clone Namespace
596
597	A cloned namespace contains all the mounts as that of the parent
598	namespace.
599
600	Let's say 'A' and 'B' are the corresponding mounts in the parent and the
601	child namespace.
602
603	If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
604	each other.
605
606	If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
607	'Z'.
608
609	If 'A' is a private mount, then 'B' is a private mount too.
610
611	If 'A' is unbindable mount, then 'B' is a unbindable mount too.
612
613
6146) Quiz
 
615
616	A. What is the result of the following command sequence?
617
618		::
619
620		    mount --bind /mnt /mnt
621		    mount --make-shared /mnt
622		    mount --bind /mnt /tmp
623		    mount --move /tmp /mnt/1
624
625		what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
626		Should they all be identical? or should /mnt and /mnt/1 be
627		identical only?
628
629
630	B. What is the result of the following command sequence?
631
632		::
633
634		    mount --make-rshared /
635		    mkdir -p /v/1
636		    mount --rbind / /v/1
637
638		what should be the content of /v/1/v/1 be?
639
640
641	C. What is the result of the following command sequence?
642
643		::
644
645		    mount --bind /mnt /mnt
646		    mount --make-shared /mnt
647		    mkdir -p /mnt/1/2/3 /mnt/1/test
648		    mount --bind /mnt/1 /tmp
649		    mount --make-slave /mnt
650		    mount --make-shared /mnt
651		    mount --bind /mnt/1/2 /tmp1
652		    mount --make-slave /mnt
653
654		At this point we have the first mount at /tmp and
655		its root dentry is 1. Let's call this mount 'A'
656		And then we have a second mount at /tmp1 with root
657		dentry 2. Let's call this mount 'B'
658		Next we have a third mount at /mnt with root dentry
659		mnt. Let's call this mount 'C'
660
661		'B' is the slave of 'A' and 'C' is a slave of 'B'
662		A -> B -> C
663
664		at this point if we execute the following command
665
666		mount --bind /bin /tmp/test
667
668		The mount is attempted on 'A'
669
670		will the mount propagate to 'B' and 'C' ?
671
672		what would be the contents of
673		/mnt/1/test be?
674
6757) FAQ
 
676
677	Q1. Why is bind mount needed? How is it different from symbolic links?
678		symbolic links can get stale if the destination mount gets
679		unmounted or moved. Bind mounts continue to exist even if the
680		other mount is unmounted or moved.
681
682	Q2. Why can't the shared subtree be implemented using exportfs?
683
684		exportfs is a heavyweight way of accomplishing part of what
685		shared subtree can do. I cannot imagine a way to implement the
686		semantics of slave mount using exportfs?
687
688	Q3 Why is unbindable mount needed?
689
690		Let's say we want to replicate the mount tree at multiple
691		locations within the same subtree.
692
693		if one rbind mounts a tree within the same subtree 'n' times
694		the number of mounts created is an exponential function of 'n'.
695		Having unbindable mount can help prune the unneeded bind
696		mounts. Here is an example.
697
698		step 1:
699		   let's say the root tree has just two directories with
700		   one vfsmount::
701
702				    root
703				   /    \
704				  tmp    usr
705
706		    And we want to replicate the tree at multiple
707		    mountpoints under /root/tmp
708
709		step 2:
710		      ::
711
712
713			mount --make-shared /root
714
715			mkdir -p /tmp/m1
716
717			mount --rbind /root /tmp/m1
718
719		      the new tree now looks like this::
720
721				    root
722				   /    \
723				 tmp    usr
724				/
725			       m1
726			      /  \
727			     tmp  usr
728			     /
729			    m1
730
731			  it has two vfsmounts
732
733		step 3:
734		    ::
735
736			    mkdir -p /tmp/m2
737			    mount --rbind /root /tmp/m2
738
739			the new tree now looks like this::
740
741				      root
742				     /    \
743				   tmp     usr
744				  /    \
745				m1       m2
746			       / \       /  \
747			     tmp  usr   tmp  usr
748			     / \          /
749			    m1  m2      m1
750				/ \     /  \
751			      tmp usr  tmp   usr
752			      /        / \
753			     m1       m1  m2
754			    /  \
755			  tmp   usr
756			  /  \
757			 m1   m2
758
759		       it has 6 vfsmounts
760
761		step 4:
762		      ::
763			  mkdir -p /tmp/m3
764			  mount --rbind /root /tmp/m3
765
766			  I won't draw the tree..but it has 24 vfsmounts
767
768
769		at step i the number of vfsmounts is V[i] = i*V[i-1].
770		This is an exponential function. And this tree has way more
771		mounts than what we really needed in the first place.
772
773		One could use a series of umount at each step to prune
774		out the unneeded mounts. But there is a better solution.
775		Unclonable mounts come in handy here.
776
777		step 1:
778		   let's say the root tree has just two directories with
779		   one vfsmount::
780
781				    root
782				   /    \
783				  tmp    usr
784
785		    How do we set up the same tree at multiple locations under
786		    /root/tmp
787
788		step 2:
789		      ::
790
791
792			mount --bind /root/tmp /root/tmp
793
794			mount --make-rshared /root
795			mount --make-unbindable /root/tmp
796
797			mkdir -p /tmp/m1
798
799			mount --rbind /root /tmp/m1
800
801		      the new tree now looks like this::
802
803				    root
804				   /    \
805				 tmp    usr
806				/
807			       m1
808			      /  \
809			     tmp  usr
810
811		step 3:
812		      ::
813
814			    mkdir -p /tmp/m2
815			    mount --rbind /root /tmp/m2
816
817		      the new tree now looks like this::
818
819				    root
820				   /    \
821				 tmp    usr
822				/   \
823			       m1     m2
824			      /  \     / \
825			     tmp  usr tmp usr
826
827		step 4:
828		      ::
829
830			    mkdir -p /tmp/m3
831			    mount --rbind /root /tmp/m3
832
833		      the new tree now looks like this::
834
835				    	  root
836				      /    	  \
837				     tmp    	   usr
838			         /    \    \
839			       m1     m2     m3
840			      /  \     / \    /  \
841			     tmp  usr tmp usr tmp usr
842
8438) Implementation
 
844
8458A) Datastructure
846
847	4 new fields are introduced to struct vfsmount:
848
849	*   ->mnt_share
850	*   ->mnt_slave_list
851	*   ->mnt_slave
852	*   ->mnt_master
853
854	->mnt_share
855		links together all the mount to/from which this vfsmount
856		send/receives propagation events.
857
858	->mnt_slave_list
859		links all the mounts to which this vfsmount propagates
860		to.
861
862	->mnt_slave
863		links together all the slaves that its master vfsmount
864		propagates to.
865
866	->mnt_master
867		points to the master vfsmount from which this vfsmount
868		receives propagation.
869
870	->mnt_flags
871		takes two more flags to indicate the propagation status of
872		the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
873		vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
874		replicated.
875
876	All the shared vfsmounts in a peer group form a cyclic list through
877	->mnt_share.
878
879	All vfsmounts with the same ->mnt_master form on a cyclic list anchored
880	in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
881
882	 ->mnt_master can point to arbitrary (and possibly different) members
883	 of master peer group.  To find all immediate slaves of a peer group
884	 you need to go through _all_ ->mnt_slave_list of its members.
885	 Conceptually it's just a single set - distribution among the
886	 individual lists does not affect propagation or the way propagation
887	 tree is modified by operations.
888
889	All vfsmounts in a peer group have the same ->mnt_master.  If it is
890	non-NULL, they form a contiguous (ordered) segment of slave list.
891
892	A example propagation tree looks as shown in the figure below.
893	[ NOTE: Though it looks like a forest, if we consider all the shared
894	mounts as a conceptual entity called 'pnode', it becomes a tree]::
895
896
897		        A <--> B <--> C <---> D
898		       /|\	      /|      |\
899		      / F G	     J K      H I
900		     /
901		    E<-->K
902			/|\
903		       M L N
904
905	In the above figure  A,B,C and D all are shared and propagate to each
906	other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
907	mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
908	'E' is also shared with 'K' and they propagate to each other.  And
909	'K' has 3 slaves 'M', 'L' and 'N'
910
911	A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
912
913	A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
914
915	E's ->mnt_share links with ->mnt_share of K
916
917	'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
918
919	'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
920
921	K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
922
923	C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
924
925	J and K's ->mnt_master points to struct vfsmount of C
926
927	and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
928
929	'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
930
931
932	NOTE: The propagation tree is orthogonal to the mount tree.
933
9348B Locking:
935
936	->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
937	by namespace_sem (exclusive for modifications, shared for reading).
938
939	Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
940	There are two exceptions: do_add_mount() and clone_mnt().
941	The former modifies a vfsmount that has not been visible in any shared
942	data structures yet.
943	The latter holds namespace_sem and the only references to vfsmount
944	are in lists that can't be traversed without namespace_sem.
945
9468C Algorithm:
947
948	The crux of the implementation resides in rbind/move operation.
949
950	The overall algorithm breaks the operation into 3 phases: (look at
951	attach_recursive_mnt() and propagate_mnt())
952
953	1. prepare phase.
954	2. commit phases.
955	3. abort phases.
956
957	Prepare phase:
958
959	for each mount in the source tree:
960
961		   a) Create the necessary number of mount trees to
962		   	be attached to each of the mounts that receive
963			propagation from the destination mount.
964		   b) Do not attach any of the trees to its destination.
965		      However note down its ->mnt_parent and ->mnt_mountpoint
966		   c) Link all the new mounts to form a propagation tree that
967		      is identical to the propagation tree of the destination
968		      mount.
969
970		   If this phase is successful, there should be 'n' new
971		   propagation trees; where 'n' is the number of mounts in the
972		   source tree.  Go to the commit phase
973
974		   Also there should be 'm' new mount trees, where 'm' is
975		   the number of mounts to which the destination mount
976		   propagates to.
977
978		   if any memory allocations fail, go to the abort phase.
979
980	Commit phase
981		attach each of the mount trees to their corresponding
982		destination mounts.
983
984	Abort phase
985		delete all the newly created trees.
986
987	.. Note::
988	   all the propagation related functionality resides in the file pnode.c
989
990
991------------------------------------------------------------------------
992
993version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
994
995version 0.2  (Incorporated comments from Al Viro)