Loading...
1.. SPDX-License-Identifier: GPL-2.0
2
3=================================
4NETWORK FILESYSTEM HELPER LIBRARY
5=================================
6
7.. Contents:
8
9 - Overview.
10 - Buffered read helpers.
11 - Read helper functions.
12 - Read helper structures.
13 - Read helper operations.
14 - Read helper procedure.
15 - Read helper cache API.
16
17
18Overview
19========
20
21The network filesystem helper library is a set of functions designed to aid a
22network filesystem in implementing VM/VFS operations. For the moment, that
23just includes turning various VM buffered read operations into requests to read
24from the server. The helper library, however, can also interpose other
25services, such as local caching or local data encryption.
26
27Note that the library module doesn't link against local caching directly, so
28access must be provided by the netfs.
29
30
31Buffered Read Helpers
32=====================
33
34The library provides a set of read helpers that handle the ->readpage(),
35->readahead() and much of the ->write_begin() VM operations and translate them
36into a common call framework.
37
38The following services are provided:
39
40 * Handles transparent huge pages (THPs).
41
42 * Insulates the netfs from VM interface changes.
43
44 * Allows the netfs to arbitrarily split reads up into pieces, even ones that
45 don't match page sizes or page alignments and that may cross pages.
46
47 * Allows the netfs to expand a readahead request in both directions to meet
48 its needs.
49
50 * Allows the netfs to partially fulfil a read, which will then be resubmitted.
51
52 * Handles local caching, allowing cached data and server-read data to be
53 interleaved for a single request.
54
55 * Handles clearing of bufferage that aren't on the server.
56
57 * Handle retrying of reads that failed, switching reads from the cache to the
58 server as necessary.
59
60 * In the future, this is a place that other services can be performed, such as
61 local encryption of data to be stored remotely or in the cache.
62
63From the network filesystem, the helpers require a table of operations. This
64includes a mandatory method to issue a read operation along with a number of
65optional methods.
66
67
68Read Helper Functions
69---------------------
70
71Three read helpers are provided::
72
73 * void netfs_readahead(struct readahead_control *ractl,
74 const struct netfs_read_request_ops *ops,
75 void *netfs_priv);``
76 * int netfs_readpage(struct file *file,
77 struct page *page,
78 const struct netfs_read_request_ops *ops,
79 void *netfs_priv);
80 * int netfs_write_begin(struct file *file,
81 struct address_space *mapping,
82 loff_t pos,
83 unsigned int len,
84 unsigned int flags,
85 struct page **_page,
86 void **_fsdata,
87 const struct netfs_read_request_ops *ops,
88 void *netfs_priv);
89
90Each corresponds to a VM operation, with the addition of a couple of parameters
91for the use of the read helpers:
92
93 * ``ops``
94
95 A table of operations through which the helpers can talk to the filesystem.
96
97 * ``netfs_priv``
98
99 Filesystem private data (can be NULL).
100
101Both of these values will be stored into the read request structure.
102
103For ->readahead() and ->readpage(), the network filesystem should just jump
104into the corresponding read helper; whereas for ->write_begin(), it may be a
105little more complicated as the network filesystem might want to flush
106conflicting writes or track dirty data and needs to put the acquired page if an
107error occurs after calling the helper.
108
109The helpers manage the read request, calling back into the network filesystem
110through the suppplied table of operations. Waits will be performed as
111necessary before returning for helpers that are meant to be synchronous.
112
113If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
114deal with it. If some parts of the request are in progress when an error
115occurs, the request will get partially completed if sufficient data is read.
116
117Additionally, there is::
118
119 * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
120 ssize_t transferred_or_error,
121 bool was_async);
122
123which should be called to complete a read subrequest. This is given the number
124of bytes transferred or a negative error code, plus a flag indicating whether
125the operation was asynchronous (ie. whether the follow-on processing can be
126done in the current context, given this may involve sleeping).
127
128
129Read Helper Structures
130----------------------
131
132The read helpers make use of a couple of structures to maintain the state of
133the read. The first is a structure that manages a read request as a whole::
134
135 struct netfs_read_request {
136 struct inode *inode;
137 struct address_space *mapping;
138 struct netfs_cache_resources cache_resources;
139 void *netfs_priv;
140 loff_t start;
141 size_t len;
142 loff_t i_size;
143 const struct netfs_read_request_ops *netfs_ops;
144 unsigned int debug_id;
145 ...
146 };
147
148The above fields are the ones the netfs can use. They are:
149
150 * ``inode``
151 * ``mapping``
152
153 The inode and the address space of the file being read from. The mapping
154 may or may not point to inode->i_data.
155
156 * ``cache_resources``
157
158 Resources for the local cache to use, if present.
159
160 * ``netfs_priv``
161
162 The network filesystem's private data. The value for this can be passed in
163 to the helper functions or set during the request. The ->cleanup() op will
164 be called if this is non-NULL at the end.
165
166 * ``start``
167 * ``len``
168
169 The file position of the start of the read request and the length. These
170 may be altered by the ->expand_readahead() op.
171
172 * ``i_size``
173
174 The size of the file at the start of the request.
175
176 * ``netfs_ops``
177
178 A pointer to the operation table. The value for this is passed into the
179 helper functions.
180
181 * ``debug_id``
182
183 A number allocated to this operation that can be displayed in trace lines
184 for reference.
185
186
187The second structure is used to manage individual slices of the overall read
188request::
189
190 struct netfs_read_subrequest {
191 struct netfs_read_request *rreq;
192 loff_t start;
193 size_t len;
194 size_t transferred;
195 unsigned long flags;
196 unsigned short debug_index;
197 ...
198 };
199
200Each subrequest is expected to access a single source, though the helpers will
201handle falling back from one source type to another. The members are:
202
203 * ``rreq``
204
205 A pointer to the read request.
206
207 * ``start``
208 * ``len``
209
210 The file position of the start of this slice of the read request and the
211 length.
212
213 * ``transferred``
214
215 The amount of data transferred so far of the length of this slice. The
216 network filesystem or cache should start the operation this far into the
217 slice. If a short read occurs, the helpers will call again, having updated
218 this to reflect the amount read so far.
219
220 * ``flags``
221
222 Flags pertaining to the read. There are two of interest to the filesystem
223 or cache:
224
225 * ``NETFS_SREQ_CLEAR_TAIL``
226
227 This can be set to indicate that the remainder of the slice, from
228 transferred to len, should be cleared.
229
230 * ``NETFS_SREQ_SEEK_DATA_READ``
231
232 This is a hint to the cache that it might want to try skipping ahead to
233 the next data (ie. using SEEK_DATA).
234
235 * ``debug_index``
236
237 A number allocated to this slice that can be displayed in trace lines for
238 reference.
239
240
241Read Helper Operations
242----------------------
243
244The network filesystem must provide the read helpers with a table of operations
245through which it can issue requests and negotiate::
246
247 struct netfs_read_request_ops {
248 void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
249 bool (*is_cache_enabled)(struct inode *inode);
250 int (*begin_cache_operation)(struct netfs_read_request *rreq);
251 void (*expand_readahead)(struct netfs_read_request *rreq);
252 bool (*clamp_length)(struct netfs_read_subrequest *subreq);
253 void (*issue_op)(struct netfs_read_subrequest *subreq);
254 bool (*is_still_valid)(struct netfs_read_request *rreq);
255 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
256 struct page *page, void **_fsdata);
257 void (*done)(struct netfs_read_request *rreq);
258 void (*cleanup)(struct address_space *mapping, void *netfs_priv);
259 };
260
261The operations are as follows:
262
263 * ``init_rreq()``
264
265 [Optional] This is called to initialise the request structure. It is given
266 the file for reference and can modify the ->netfs_priv value.
267
268 * ``is_cache_enabled()``
269
270 [Required] This is called by netfs_write_begin() to ask if the file is being
271 cached. It should return true if it is being cached and false otherwise.
272
273 * ``begin_cache_operation()``
274
275 [Optional] This is called to ask the network filesystem to call into the
276 cache (if present) to initialise the caching state for this read. The netfs
277 library module cannot access the cache directly, so the cache should call
278 something like fscache_begin_read_operation() to do this.
279
280 The cache gets to store its state in ->cache_resources and must set a table
281 of operations of its own there (though of a different type).
282
283 This should return 0 on success and an error code otherwise. If an error is
284 reported, the operation may proceed anyway, just without local caching (only
285 out of memory and interruption errors cause failure here).
286
287 * ``expand_readahead()``
288
289 [Optional] This is called to allow the filesystem to expand the size of a
290 readahead read request. The filesystem gets to expand the request in both
291 directions, though it's not permitted to reduce it as the numbers may
292 represent an allocation already made. If local caching is enabled, it gets
293 to expand the request first.
294
295 Expansion is communicated by changing ->start and ->len in the request
296 structure. Note that if any change is made, ->len must be increased by at
297 least as much as ->start is reduced.
298
299 * ``clamp_length()``
300
301 [Optional] This is called to allow the filesystem to reduce the size of a
302 subrequest. The filesystem can use this, for example, to chop up a request
303 that has to be split across multiple servers or to put multiple reads in
304 flight.
305
306 This should return 0 on success and an error code on error.
307
308 * ``issue_op()``
309
310 [Required] The helpers use this to dispatch a subrequest to the server for
311 reading. In the subrequest, ->start, ->len and ->transferred indicate what
312 data should be read from the server.
313
314 There is no return value; the netfs_subreq_terminated() function should be
315 called to indicate whether or not the operation succeeded and how much data
316 it transferred. The filesystem also should not deal with setting pages
317 uptodate, unlocking them or dropping their refs - the helpers need to deal
318 with this as they have to coordinate with copying to the local cache.
319
320 Note that the helpers have the pages locked, but not pinned. It is possible
321 to use the ITER_XARRAY iov iterator to refer to the range of the inode that
322 is being operated upon without the need to allocate large bvec tables.
323
324 * ``is_still_valid()``
325
326 [Optional] This is called to find out if the data just read from the local
327 cache is still valid. It should return true if it is still valid and false
328 if not. If it's not still valid, it will be reread from the server.
329
330 * ``check_write_begin()``
331
332 [Optional] This is called from the netfs_write_begin() helper once it has
333 allocated/grabbed the page to be modified to allow the filesystem to flush
334 conflicting state before allowing it to be modified.
335
336 It should return 0 if everything is now fine, -EAGAIN if the page should be
337 regrabbed and any other error code to abort the operation.
338
339 * ``done``
340
341 [Optional] This is called after the pages in the request have all been
342 unlocked (and marked uptodate if applicable).
343
344 * ``cleanup``
345
346 [Optional] This is called as the request is being deallocated so that the
347 filesystem can clean up ->netfs_priv.
348
349
350
351Read Helper Procedure
352---------------------
353
354The read helpers work by the following general procedure:
355
356 * Set up the request.
357
358 * For readahead, allow the local cache and then the network filesystem to
359 propose expansions to the read request. This is then proposed to the VM.
360 If the VM cannot fully perform the expansion, a partially expanded read will
361 be performed, though this may not get written to the cache in its entirety.
362
363 * Loop around slicing chunks off of the request to form subrequests:
364
365 * If a local cache is present, it gets to do the slicing, otherwise the
366 helpers just try to generate maximal slices.
367
368 * The network filesystem gets to clamp the size of each slice if it is to be
369 the source. This allows rsize and chunking to be implemented.
370
371 * The helpers issue a read from the cache or a read from the server or just
372 clears the slice as appropriate.
373
374 * The next slice begins at the end of the last one.
375
376 * As slices finish being read, they terminate.
377
378 * When all the subrequests have terminated, the subrequests are assessed and
379 any that are short or have failed are reissued:
380
381 * Failed cache requests are issued against the server instead.
382
383 * Failed server requests just fail.
384
385 * Short reads against either source will be reissued against that source
386 provided they have transferred some more data:
387
388 * The cache may need to skip holes that it can't do DIO from.
389
390 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
391 end of the slice instead of reissuing.
392
393 * Once the data is read, the pages that have been fully read/cleared:
394
395 * Will be marked uptodate.
396
397 * If a cache is present, will be marked with PG_fscache.
398
399 * Unlocked
400
401 * Any pages that need writing to the cache will then have DIO writes issued.
402
403 * Synchronous operations will wait for reading to be complete.
404
405 * Writes to the cache will proceed asynchronously and the pages will have the
406 PG_fscache mark removed when that completes.
407
408 * The request structures will be cleaned up when everything has completed.
409
410
411Read Helper Cache API
412---------------------
413
414When implementing a local cache to be used by the read helpers, two things are
415required: some way for the network filesystem to initialise the caching for a
416read request and a table of operations for the helpers to call.
417
418The network filesystem's ->begin_cache_operation() method is called to set up a
419cache and this must call into the cache to do the work. If using fscache, for
420example, the cache would call::
421
422 int fscache_begin_read_operation(struct netfs_read_request *rreq,
423 struct fscache_cookie *cookie);
424
425passing in the request pointer and the cookie corresponding to the file.
426
427The netfs_read_request object contains a place for the cache to hang its
428state::
429
430 struct netfs_cache_resources {
431 const struct netfs_cache_ops *ops;
432 void *cache_priv;
433 void *cache_priv2;
434 };
435
436This contains an operations table pointer and two private pointers. The
437operation table looks like the following::
438
439 struct netfs_cache_ops {
440 void (*end_operation)(struct netfs_cache_resources *cres);
441
442 void (*expand_readahead)(struct netfs_cache_resources *cres,
443 loff_t *_start, size_t *_len, loff_t i_size);
444
445 enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
446 loff_t i_size);
447
448 int (*read)(struct netfs_cache_resources *cres,
449 loff_t start_pos,
450 struct iov_iter *iter,
451 bool seek_data,
452 netfs_io_terminated_t term_func,
453 void *term_func_priv);
454
455 int (*write)(struct netfs_cache_resources *cres,
456 loff_t start_pos,
457 struct iov_iter *iter,
458 netfs_io_terminated_t term_func,
459 void *term_func_priv);
460 };
461
462With a termination handler function pointer::
463
464 typedef void (*netfs_io_terminated_t)(void *priv,
465 ssize_t transferred_or_error,
466 bool was_async);
467
468The methods defined in the table are:
469
470 * ``end_operation()``
471
472 [Required] Called to clean up the resources at the end of the read request.
473
474 * ``expand_readahead()``
475
476 [Optional] Called at the beginning of a netfs_readahead() operation to allow
477 the cache to expand a request in either direction. This allows the cache to
478 size the request appropriately for the cache granularity.
479
480 The function is passed poiners to the start and length in its parameters,
481 plus the size of the file for reference, and adjusts the start and length
482 appropriately. It should return one of:
483
484 * ``NETFS_FILL_WITH_ZEROES``
485 * ``NETFS_DOWNLOAD_FROM_SERVER``
486 * ``NETFS_READ_FROM_CACHE``
487 * ``NETFS_INVALID_READ``
488
489 to indicate whether the slice should just be cleared or whether it should be
490 downloaded from the server or read from the cache - or whether slicing
491 should be given up at the current point.
492
493 * ``prepare_read()``
494
495 [Required] Called to configure the next slice of a request. ->start and
496 ->len in the subrequest indicate where and how big the next slice can be;
497 the cache gets to reduce the length to match its granularity requirements.
498
499 * ``read()``
500
501 [Required] Called to read from the cache. The start file offset is given
502 along with an iterator to read to, which gives the length also. It can be
503 given a hint requesting that it seek forward from that start position for
504 data.
505
506 Also provided is a pointer to a termination handler function and private
507 data to pass to that function. The termination function should be called
508 with the number of bytes transferred or an error code, plus a flag
509 indicating whether the termination is definitely happening in the caller's
510 context.
511
512 * ``write()``
513
514 [Required] Called to write to the cache. The start file offset is given
515 along with an iterator to write from, which gives the length also.
516
517 Also provided is a pointer to a termination handler function and private
518 data to pass to that function. The termination function should be called
519 with the number of bytes transferred or an error code, plus a flag
520 indicating whether the termination is definitely happening in the caller's
521 context.
522
523Note that these methods are passed a pointer to the cache resource structure,
524not the read request structure as they could be used in other situations where
525there isn't a read request structure as well, such as writing dirty data to the
526cache.
1.. SPDX-License-Identifier: GPL-2.0
2
3=================================
4Network Filesystem Helper Library
5=================================
6
7.. Contents:
8
9 - Overview.
10 - Per-inode context.
11 - Inode context helper functions.
12 - Buffered read helpers.
13 - Read helper functions.
14 - Read helper structures.
15 - Read helper operations.
16 - Read helper procedure.
17 - Read helper cache API.
18
19
20Overview
21========
22
23The network filesystem helper library is a set of functions designed to aid a
24network filesystem in implementing VM/VFS operations. For the moment, that
25just includes turning various VM buffered read operations into requests to read
26from the server. The helper library, however, can also interpose other
27services, such as local caching or local data encryption.
28
29Note that the library module doesn't link against local caching directly, so
30access must be provided by the netfs.
31
32
33Per-Inode Context
34=================
35
36The network filesystem helper library needs a place to store a bit of state for
37its use on each netfs inode it is helping to manage. To this end, a context
38structure is defined::
39
40 struct netfs_inode {
41 struct inode inode;
42 const struct netfs_request_ops *ops;
43 struct fscache_cookie *cache;
44 };
45
46A network filesystem that wants to use netfs lib must place one of these in its
47inode wrapper struct instead of the VFS ``struct inode``. This can be done in
48a way similar to the following::
49
50 struct my_inode {
51 struct netfs_inode netfs; /* Netfslib context and vfs inode */
52 ...
53 };
54
55This allows netfslib to find its state by using ``container_of()`` from the
56inode pointer, thereby allowing the netfslib helper functions to be pointed to
57directly by the VFS/VM operation tables.
58
59The structure contains the following fields:
60
61 * ``inode``
62
63 The VFS inode structure.
64
65 * ``ops``
66
67 The set of operations provided by the network filesystem to netfslib.
68
69 * ``cache``
70
71 Local caching cookie, or NULL if no caching is enabled. This field does not
72 exist if fscache is disabled.
73
74
75Inode Context Helper Functions
76------------------------------
77
78To help deal with the per-inode context, a number helper functions are
79provided. Firstly, a function to perform basic initialisation on a context and
80set the operations table pointer::
81
82 void netfs_inode_init(struct netfs_inode *ctx,
83 const struct netfs_request_ops *ops);
84
85then a function to cast from the VFS inode structure to the netfs context::
86
87 struct netfs_inode *netfs_node(struct inode *inode);
88
89and finally, a function to get the cache cookie pointer from the context
90attached to an inode (or NULL if fscache is disabled)::
91
92 struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
93
94
95Buffered Read Helpers
96=====================
97
98The library provides a set of read helpers that handle the ->read_folio(),
99->readahead() and much of the ->write_begin() VM operations and translate them
100into a common call framework.
101
102The following services are provided:
103
104 * Handle folios that span multiple pages.
105
106 * Insulate the netfs from VM interface changes.
107
108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
109 don't match folio sizes or folio alignments and that may cross folios.
110
111 * Allow the netfs to expand a readahead request in both directions to meet its
112 needs.
113
114 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
115
116 * Handle local caching, allowing cached data and server-read data to be
117 interleaved for a single request.
118
119 * Handle clearing of bufferage that aren't on the server.
120
121 * Handle retrying of reads that failed, switching reads from the cache to the
122 server as necessary.
123
124 * In the future, this is a place that other services can be performed, such as
125 local encryption of data to be stored remotely or in the cache.
126
127From the network filesystem, the helpers require a table of operations. This
128includes a mandatory method to issue a read operation along with a number of
129optional methods.
130
131
132Read Helper Functions
133---------------------
134
135Three read helpers are provided::
136
137 void netfs_readahead(struct readahead_control *ractl);
138 int netfs_read_folio(struct file *file,
139 struct folio *folio);
140 int netfs_write_begin(struct netfs_inode *ctx,
141 struct file *file,
142 struct address_space *mapping,
143 loff_t pos,
144 unsigned int len,
145 struct folio **_folio,
146 void **_fsdata);
147
148Each corresponds to a VM address space operation. These operations use the
149state in the per-inode context.
150
151For ->readahead() and ->read_folio(), the network filesystem just point directly
152at the corresponding read helper; whereas for ->write_begin(), it may be a
153little more complicated as the network filesystem might want to flush
154conflicting writes or track dirty data and needs to put the acquired folio if
155an error occurs after calling the helper.
156
157The helpers manage the read request, calling back into the network filesystem
158through the supplied table of operations. Waits will be performed as
159necessary before returning for helpers that are meant to be synchronous.
160
161If an error occurs, the ->free_request() will be called to clean up the
162netfs_io_request struct allocated. If some parts of the request are in
163progress when an error occurs, the request will get partially completed if
164sufficient data is read.
165
166Additionally, there is::
167
168 * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
169 ssize_t transferred_or_error,
170 bool was_async);
171
172which should be called to complete a read subrequest. This is given the number
173of bytes transferred or a negative error code, plus a flag indicating whether
174the operation was asynchronous (ie. whether the follow-on processing can be
175done in the current context, given this may involve sleeping).
176
177
178Read Helper Structures
179----------------------
180
181The read helpers make use of a couple of structures to maintain the state of
182the read. The first is a structure that manages a read request as a whole::
183
184 struct netfs_io_request {
185 struct inode *inode;
186 struct address_space *mapping;
187 struct netfs_cache_resources cache_resources;
188 void *netfs_priv;
189 loff_t start;
190 size_t len;
191 loff_t i_size;
192 const struct netfs_request_ops *netfs_ops;
193 unsigned int debug_id;
194 ...
195 };
196
197The above fields are the ones the netfs can use. They are:
198
199 * ``inode``
200 * ``mapping``
201
202 The inode and the address space of the file being read from. The mapping
203 may or may not point to inode->i_data.
204
205 * ``cache_resources``
206
207 Resources for the local cache to use, if present.
208
209 * ``netfs_priv``
210
211 The network filesystem's private data. The value for this can be passed in
212 to the helper functions or set during the request.
213
214 * ``start``
215 * ``len``
216
217 The file position of the start of the read request and the length. These
218 may be altered by the ->expand_readahead() op.
219
220 * ``i_size``
221
222 The size of the file at the start of the request.
223
224 * ``netfs_ops``
225
226 A pointer to the operation table. The value for this is passed into the
227 helper functions.
228
229 * ``debug_id``
230
231 A number allocated to this operation that can be displayed in trace lines
232 for reference.
233
234
235The second structure is used to manage individual slices of the overall read
236request::
237
238 struct netfs_io_subrequest {
239 struct netfs_io_request *rreq;
240 loff_t start;
241 size_t len;
242 size_t transferred;
243 unsigned long flags;
244 unsigned short debug_index;
245 ...
246 };
247
248Each subrequest is expected to access a single source, though the helpers will
249handle falling back from one source type to another. The members are:
250
251 * ``rreq``
252
253 A pointer to the read request.
254
255 * ``start``
256 * ``len``
257
258 The file position of the start of this slice of the read request and the
259 length.
260
261 * ``transferred``
262
263 The amount of data transferred so far of the length of this slice. The
264 network filesystem or cache should start the operation this far into the
265 slice. If a short read occurs, the helpers will call again, having updated
266 this to reflect the amount read so far.
267
268 * ``flags``
269
270 Flags pertaining to the read. There are two of interest to the filesystem
271 or cache:
272
273 * ``NETFS_SREQ_CLEAR_TAIL``
274
275 This can be set to indicate that the remainder of the slice, from
276 transferred to len, should be cleared.
277
278 * ``NETFS_SREQ_SEEK_DATA_READ``
279
280 This is a hint to the cache that it might want to try skipping ahead to
281 the next data (ie. using SEEK_DATA).
282
283 * ``debug_index``
284
285 A number allocated to this slice that can be displayed in trace lines for
286 reference.
287
288
289Read Helper Operations
290----------------------
291
292The network filesystem must provide the read helpers with a table of operations
293through which it can issue requests and negotiate::
294
295 struct netfs_request_ops {
296 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
297 void (*free_request)(struct netfs_io_request *rreq);
298 void (*expand_readahead)(struct netfs_io_request *rreq);
299 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
300 void (*issue_read)(struct netfs_io_subrequest *subreq);
301 bool (*is_still_valid)(struct netfs_io_request *rreq);
302 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
303 struct folio **foliop, void **_fsdata);
304 void (*done)(struct netfs_io_request *rreq);
305 };
306
307The operations are as follows:
308
309 * ``init_request()``
310
311 [Optional] This is called to initialise the request structure. It is given
312 the file for reference.
313
314 * ``free_request()``
315
316 [Optional] This is called as the request is being deallocated so that the
317 filesystem can clean up any state it has attached there.
318
319 * ``expand_readahead()``
320
321 [Optional] This is called to allow the filesystem to expand the size of a
322 readahead read request. The filesystem gets to expand the request in both
323 directions, though it's not permitted to reduce it as the numbers may
324 represent an allocation already made. If local caching is enabled, it gets
325 to expand the request first.
326
327 Expansion is communicated by changing ->start and ->len in the request
328 structure. Note that if any change is made, ->len must be increased by at
329 least as much as ->start is reduced.
330
331 * ``clamp_length()``
332
333 [Optional] This is called to allow the filesystem to reduce the size of a
334 subrequest. The filesystem can use this, for example, to chop up a request
335 that has to be split across multiple servers or to put multiple reads in
336 flight.
337
338 This should return 0 on success and an error code on error.
339
340 * ``issue_read()``
341
342 [Required] The helpers use this to dispatch a subrequest to the server for
343 reading. In the subrequest, ->start, ->len and ->transferred indicate what
344 data should be read from the server.
345
346 There is no return value; the netfs_subreq_terminated() function should be
347 called to indicate whether or not the operation succeeded and how much data
348 it transferred. The filesystem also should not deal with setting folios
349 uptodate, unlocking them or dropping their refs - the helpers need to deal
350 with this as they have to coordinate with copying to the local cache.
351
352 Note that the helpers have the folios locked, but not pinned. It is
353 possible to use the ITER_XARRAY iov iterator to refer to the range of the
354 inode that is being operated upon without the need to allocate large bvec
355 tables.
356
357 * ``is_still_valid()``
358
359 [Optional] This is called to find out if the data just read from the local
360 cache is still valid. It should return true if it is still valid and false
361 if not. If it's not still valid, it will be reread from the server.
362
363 * ``check_write_begin()``
364
365 [Optional] This is called from the netfs_write_begin() helper once it has
366 allocated/grabbed the folio to be modified to allow the filesystem to flush
367 conflicting state before allowing it to be modified.
368
369 It may unlock and discard the folio it was given and set the caller's folio
370 pointer to NULL. It should return 0 if everything is now fine (``*foliop``
371 left set) or the op should be retried (``*foliop`` cleared) and any other
372 error code to abort the operation.
373
374 * ``done``
375
376 [Optional] This is called after the folios in the request have all been
377 unlocked (and marked uptodate if applicable).
378
379
380
381Read Helper Procedure
382---------------------
383
384The read helpers work by the following general procedure:
385
386 * Set up the request.
387
388 * For readahead, allow the local cache and then the network filesystem to
389 propose expansions to the read request. This is then proposed to the VM.
390 If the VM cannot fully perform the expansion, a partially expanded read will
391 be performed, though this may not get written to the cache in its entirety.
392
393 * Loop around slicing chunks off of the request to form subrequests:
394
395 * If a local cache is present, it gets to do the slicing, otherwise the
396 helpers just try to generate maximal slices.
397
398 * The network filesystem gets to clamp the size of each slice if it is to be
399 the source. This allows rsize and chunking to be implemented.
400
401 * The helpers issue a read from the cache or a read from the server or just
402 clears the slice as appropriate.
403
404 * The next slice begins at the end of the last one.
405
406 * As slices finish being read, they terminate.
407
408 * When all the subrequests have terminated, the subrequests are assessed and
409 any that are short or have failed are reissued:
410
411 * Failed cache requests are issued against the server instead.
412
413 * Failed server requests just fail.
414
415 * Short reads against either source will be reissued against that source
416 provided they have transferred some more data:
417
418 * The cache may need to skip holes that it can't do DIO from.
419
420 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
421 end of the slice instead of reissuing.
422
423 * Once the data is read, the folios that have been fully read/cleared:
424
425 * Will be marked uptodate.
426
427 * If a cache is present, will be marked with PG_fscache.
428
429 * Unlocked
430
431 * Any folios that need writing to the cache will then have DIO writes issued.
432
433 * Synchronous operations will wait for reading to be complete.
434
435 * Writes to the cache will proceed asynchronously and the folios will have the
436 PG_fscache mark removed when that completes.
437
438 * The request structures will be cleaned up when everything has completed.
439
440
441Read Helper Cache API
442---------------------
443
444When implementing a local cache to be used by the read helpers, two things are
445required: some way for the network filesystem to initialise the caching for a
446read request and a table of operations for the helpers to call.
447
448To begin a cache operation on an fscache object, the following function is
449called::
450
451 int fscache_begin_read_operation(struct netfs_io_request *rreq,
452 struct fscache_cookie *cookie);
453
454passing in the request pointer and the cookie corresponding to the file. This
455fills in the cache resources mentioned below.
456
457The netfs_io_request object contains a place for the cache to hang its
458state::
459
460 struct netfs_cache_resources {
461 const struct netfs_cache_ops *ops;
462 void *cache_priv;
463 void *cache_priv2;
464 };
465
466This contains an operations table pointer and two private pointers. The
467operation table looks like the following::
468
469 struct netfs_cache_ops {
470 void (*end_operation)(struct netfs_cache_resources *cres);
471
472 void (*expand_readahead)(struct netfs_cache_resources *cres,
473 loff_t *_start, size_t *_len, loff_t i_size);
474
475 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
476 loff_t i_size);
477
478 int (*read)(struct netfs_cache_resources *cres,
479 loff_t start_pos,
480 struct iov_iter *iter,
481 bool seek_data,
482 netfs_io_terminated_t term_func,
483 void *term_func_priv);
484
485 int (*prepare_write)(struct netfs_cache_resources *cres,
486 loff_t *_start, size_t *_len, loff_t i_size,
487 bool no_space_allocated_yet);
488
489 int (*write)(struct netfs_cache_resources *cres,
490 loff_t start_pos,
491 struct iov_iter *iter,
492 netfs_io_terminated_t term_func,
493 void *term_func_priv);
494
495 int (*query_occupancy)(struct netfs_cache_resources *cres,
496 loff_t start, size_t len, size_t granularity,
497 loff_t *_data_start, size_t *_data_len);
498 };
499
500With a termination handler function pointer::
501
502 typedef void (*netfs_io_terminated_t)(void *priv,
503 ssize_t transferred_or_error,
504 bool was_async);
505
506The methods defined in the table are:
507
508 * ``end_operation()``
509
510 [Required] Called to clean up the resources at the end of the read request.
511
512 * ``expand_readahead()``
513
514 [Optional] Called at the beginning of a netfs_readahead() operation to allow
515 the cache to expand a request in either direction. This allows the cache to
516 size the request appropriately for the cache granularity.
517
518 The function is passed poiners to the start and length in its parameters,
519 plus the size of the file for reference, and adjusts the start and length
520 appropriately. It should return one of:
521
522 * ``NETFS_FILL_WITH_ZEROES``
523 * ``NETFS_DOWNLOAD_FROM_SERVER``
524 * ``NETFS_READ_FROM_CACHE``
525 * ``NETFS_INVALID_READ``
526
527 to indicate whether the slice should just be cleared or whether it should be
528 downloaded from the server or read from the cache - or whether slicing
529 should be given up at the current point.
530
531 * ``prepare_read()``
532
533 [Required] Called to configure the next slice of a request. ->start and
534 ->len in the subrequest indicate where and how big the next slice can be;
535 the cache gets to reduce the length to match its granularity requirements.
536
537 * ``read()``
538
539 [Required] Called to read from the cache. The start file offset is given
540 along with an iterator to read to, which gives the length also. It can be
541 given a hint requesting that it seek forward from that start position for
542 data.
543
544 Also provided is a pointer to a termination handler function and private
545 data to pass to that function. The termination function should be called
546 with the number of bytes transferred or an error code, plus a flag
547 indicating whether the termination is definitely happening in the caller's
548 context.
549
550 * ``prepare_write()``
551
552 [Required] Called to prepare a write to the cache to take place. This
553 involves checking to see whether the cache has sufficient space to honour
554 the write. ``*_start`` and ``*_len`` indicate the region to be written; the
555 region can be shrunk or it can be expanded to a page boundary either way as
556 necessary to align for direct I/O. i_size holds the size of the object and
557 is provided for reference. no_space_allocated_yet is set to true if the
558 caller is certain that no data has been written to that region - for example
559 if it tried to do a read from there already.
560
561 * ``write()``
562
563 [Required] Called to write to the cache. The start file offset is given
564 along with an iterator to write from, which gives the length also.
565
566 Also provided is a pointer to a termination handler function and private
567 data to pass to that function. The termination function should be called
568 with the number of bytes transferred or an error code, plus a flag
569 indicating whether the termination is definitely happening in the caller's
570 context.
571
572 * ``query_occupancy()``
573
574 [Required] Called to find out where the next piece of data is within a
575 particular region of the cache. The start and length of the region to be
576 queried are passed in, along with the granularity to which the answer needs
577 to be aligned. The function passes back the start and length of the data,
578 if any, available within that region. Note that there may be a hole at the
579 front.
580
581 It returns 0 if some data was found, -ENODATA if there was no usable data
582 within the region or -ENOBUFS if there is no caching on this file.
583
584Note that these methods are passed a pointer to the cache resource structure,
585not the read request structure as they could be used in other situations where
586there isn't a read request structure as well, such as writing dirty data to the
587cache.
588
589
590API Function Reference
591======================
592
593.. kernel-doc:: include/linux/netfs.h
594.. kernel-doc:: fs/netfs/buffered_read.c
595.. kernel-doc:: fs/netfs/io.c