Linux Audio

Check our new training course

Loading...
Note: File does not exist in v6.13.7.
  1TCP protocol
  2============
  3
  4Last updated: 9 February 2008
  5
  6Contents
  7========
  8
  9- Congestion control
 10- How the new TCP output machine [nyi] works
 11
 12Congestion control
 13==================
 14
 15The following variables are used in the tcp_sock for congestion control:
 16snd_cwnd		The size of the congestion window
 17snd_ssthresh		Slow start threshold. We are in slow start if
 18			snd_cwnd is less than this.
 19snd_cwnd_cnt		A counter used to slow down the rate of increase
 20			once we exceed slow start threshold.
 21snd_cwnd_clamp		This is the maximum size that snd_cwnd can grow to.
 22snd_cwnd_stamp		Timestamp for when congestion window last validated.
 23snd_cwnd_used		Used as a highwater mark for how much of the
 24			congestion window is in use. It is used to adjust
 25			snd_cwnd down when the link is limited by the
 26			application rather than the network.
 27
 28As of 2.6.13, Linux supports pluggable congestion control algorithms.
 29A congestion control mechanism can be registered through functions in
 30tcp_cong.c. The functions used by the congestion control mechanism are
 31registered via passing a tcp_congestion_ops struct to
 32tcp_register_congestion_control. As a minimum name, ssthresh,
 33cong_avoid must be valid.
 34
 35Private data for a congestion control mechanism is stored in tp->ca_priv.
 36tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
 37is important to check the size of your private data will fit this space, or
 38alternatively space could be allocated elsewhere and a pointer to it could
 39be stored here.
 40
 41There are three kinds of congestion control algorithms currently: The
 42simplest ones are derived from TCP reno (highspeed, scalable) and just
 43provide an alternative the congestion window calculation. More complex
 44ones like BIC try to look at other events to provide better
 45heuristics.  There are also round trip time based algorithms like
 46Vegas and Westwood+.
 47
 48Good TCP congestion control is a complex problem because the algorithm
 49needs to maintain fairness and performance. Please review current
 50research and RFC's before developing new modules.
 51
 52The method that is used to determine which congestion control mechanism is
 53determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
 54The default congestion control will be the last one registered (LIFO);
 55so if you built everything as modules, the default will be reno. If you
 56build with the defaults from Kconfig, then CUBIC will be builtin (not a
 57module) and it will end up the default.
 58
 59If you really want a particular default value then you will need
 60to set it with the sysctl.  If you use a sysctl, the module will be autoloaded
 61if needed and you will get the expected protocol. If you ask for an
 62unknown congestion method, then the sysctl attempt will fail.
 63
 64If you remove a tcp congestion control module, then you will get the next
 65available one. Since reno cannot be built as a module, and cannot be
 66deleted, it will always be available.
 67
 68How the new TCP output machine [nyi] works.
 69===========================================
 70
 71Data is kept on a single queue. The skb->users flag tells us if the frame is
 72one that has been queued already. To add a frame we throw it on the end. Ack
 73walks down the list from the start.
 74
 75We keep a set of control flags
 76
 77
 78	sk->tcp_pend_event
 79
 80		TCP_PEND_ACK			Ack needed
 81		TCP_ACK_NOW			Needed now
 82		TCP_WINDOW			Window update check
 83		TCP_WINZERO			Zero probing
 84
 85
 86	sk->transmit_queue		The transmission frame begin
 87	sk->transmit_new		First new frame pointer
 88	sk->transmit_end		Where to add frames
 89
 90	sk->tcp_last_tx_ack		Last ack seen
 91	sk->tcp_dup_ack			Dup ack count for fast retransmit
 92
 93
 94Frames are queued for output by tcp_write. We do our best to send the frames
 95off immediately if possible, but otherwise queue and compute the body
 96checksum in the copy. 
 97
 98When a write is done we try to clear any pending events and piggy back them.
 99If the window is full we queue full sized frames. On the first timeout in
100zero window we split this.
101
102On a timer we walk the retransmit list to send any retransmits, update the
103backoff timers etc. A change of route table stamp causes a change of header
104and recompute. We add any new tcp level headers and refinish the checksum
105before sending. 
106