ubifs-authentication.rst - Documentation/filesystems/ubifs-authentication.rst - Linux diff v5.4

 
 
  1:orphan:
  2
  3.. UBIFS Authentication
  4.. sigma star gmbh
  5.. 2018
  6
  7Introduction
  8============
  9
 10UBIFS utilizes the fscrypt framework to provide confidentiality for file
 11contents and file names. This prevents attacks where an attacker is able to
 12read contents of the filesystem on a single point in time. A classic example
 13is a lost smartphone where the attacker is unable to read personal data stored
 14on the device without the filesystem decryption key.
 15
 16At the current state, UBIFS encryption however does not prevent attacks where
 17the attacker is able to modify the filesystem contents and the user uses the
 18device afterwards. In such a scenario an attacker can modify filesystem
 19contents arbitrarily without the user noticing. One example is to modify a
 20binary to perform a malicious action when executed [DMC-CBC-ATTACK]. Since
 21most of the filesystem metadata of UBIFS is stored in plain, this makes it
 22fairly easy to swap files and replace their contents.
 23
 24Other full disk encryption systems like dm-crypt cover all filesystem metadata,
 25which makes such kinds of attacks more complicated, but not impossible.
 26Especially, if the attacker is given access to the device multiple points in
 27time. For dm-crypt and other filesystems that build upon the Linux block IO
 28layer, the dm-integrity or dm-verity subsystems [DM-INTEGRITY, DM-VERITY]
 29can be used to get full data authentication at the block layer.
 30These can also be combined with dm-crypt [CRYPTSETUP2].
 31
 32This document describes an approach to get file contents _and_ full metadata
 33authentication for UBIFS. Since UBIFS uses fscrypt for file contents and file
 34name encryption, the authentication system could be tied into fscrypt such that
 35existing features like key derivation can be utilized. It should however also
 36be possible to use UBIFS authentication without using encryption.
 37
 38
 39MTD, UBI & UBIFS
 40----------------
 41
 42On Linux, the MTD (Memory Technology Devices) subsystem provides a uniform
 43interface to access raw flash devices. One of the more prominent subsystems that
 44work on top of MTD is UBI (Unsorted Block Images). It provides volume management
 45for flash devices and is thus somewhat similar to LVM for block devices. In
 46addition, it deals with flash-specific wear-leveling and transparent I/O error
 47handling. UBI offers logical erase blocks (LEBs) to the layers on top of it
 48and maps them transparently to physical erase blocks (PEBs) on the flash.
 49
 50UBIFS is a filesystem for raw flash which operates on top of UBI. Thus, wear
 51leveling and some flash specifics are left to UBI, while UBIFS focuses on
 52scalability, performance and recoverability.
 53
 54::
 55
 56	+------------+ +*******+ +-----------+ +-----+
 57	|            | * UBIFS * | UBI-BLOCK | | ... |
 58	| JFFS/JFFS2 | +*******+ +-----------+ +-----+
 59	|            | +-----------------------------+ +-----------+ +-----+
 60	|            | |              UBI            | | MTD-BLOCK | | ... |
 61	+------------+ +-----------------------------+ +-----------+ +-----+
 62	+------------------------------------------------------------------+
 63	|                  MEMORY TECHNOLOGY DEVICES (MTD)                 |
 64	+------------------------------------------------------------------+
 65	+-----------------------------+ +--------------------------+ +-----+
 66	|         NAND DRIVERS        | |        NOR DRIVERS       | | ... |
 67	+-----------------------------+ +--------------------------+ +-----+
 68
 69            Figure 1: Linux kernel subsystems for dealing with raw flash
 70
 71
 72
 73Internally, UBIFS maintains multiple data structures which are persisted on
 74the flash:
 75
 76- *Index*: an on-flash B+ tree where the leaf nodes contain filesystem data
 77- *Journal*: an additional data structure to collect FS changes before updating
 78  the on-flash index and reduce flash wear.
 79- *Tree Node Cache (TNC)*: an in-memory B+ tree that reflects the current FS
 80  state to avoid frequent flash reads. It is basically the in-memory
 81  representation of the index, but contains additional attributes.
 82- *LEB property tree (LPT)*: an on-flash B+ tree for free space accounting per
 83  UBI LEB.
 84
 85In the remainder of this section we will cover the on-flash UBIFS data
 86structures in more detail. The TNC is of less importance here since it is never
 87persisted onto the flash directly. More details on UBIFS can also be found in
 88[UBIFS-WP].
 89
 90
 91UBIFS Index & Tree Node Cache
 92~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 93
 94Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
 95of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file
 96contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes.
 97Almost all types of nodes share a common header (`ubifs_ch`) containing basic
 98information like node type, node length, a sequence number, etc. (see
 99`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT
100and some less important node types like padding nodes which are used to pad
101unusable content at the end of LEBs.
102
103To avoid re-writing the whole B+ tree on every single change, it is implemented
104as *wandering tree*, where only the changed nodes are re-written and previous
105versions of them are obsoleted without erasing them right away. As a result,
106the index is not stored in a single place on the flash, but *wanders* around
107and there are obsolete parts on the flash as long as the LEB containing them is
108not reused by UBIFS. To find the most recent version of the index, UBIFS stores
109a special node called *master node* into UBI LEB 1 which always points to the
110most recent root node of the UBIFS index. For recoverability, the master node
111is additionally duplicated to LEB 2. Mounting UBIFS is thus a simple read of
112LEB 1 and 2 to get the current master node and from there get the location of
113the most recent on-flash index.
114
115The TNC is the in-memory representation of the on-flash index. It contains some
116additional runtime attributes per node which are not persisted. One of these is
117a dirty-flag which marks nodes that have to be persisted the next time the
118index is written onto the flash. The TNC acts as a write-back cache and all
119modifications of the on-flash index are done through the TNC. Like other caches,
120the TNC does not have to mirror the full index into memory, but reads parts of
121it from flash whenever needed. A *commit* is the UBIFS operation of updating the
122on-flash filesystem structures like the index. On every commit, the TNC nodes
123marked as dirty are written to the flash to update the persisted index.
124
125
126Journal
127~~~~~~~
128
129To avoid wearing out the flash, the index is only persisted (*commited*) when
130certain conditions are met (eg. ``fsync(2)``). The journal is used to record
131any changes (in form of inode nodes, data nodes etc.) between commits
132of the index. During mount, the journal is read from the flash and replayed
133onto the TNC (which will be created on-demand from the on-flash index).
134
135UBIFS reserves a bunch of LEBs just for the journal called *log area*. The
136amount of log area LEBs is configured on filesystem creation (using
137``mkfs.ubifs``) and stored in the superblock node. The log area contains only
138two types of nodes: *reference nodes* and *commit start nodes*. A commit start
139node is written whenever an index commit is performed. Reference nodes are
140written on every journal update. Each reference node points to the position of
141other nodes (inode nodes, data nodes etc.) on the flash that are part of this
142journal entry. These nodes are called *buds* and describe the actual filesystem
143changes including their data.
144
145The log area is maintained as a ring. Whenever the journal is almost full,
146a commit is initiated. This also writes a commit start node so that during
147mount, UBIFS will seek for the most recent commit start node and just replay
148every reference node after that. Every reference node before the commit start
149node will be ignored as they are already part of the on-flash index.
150
151When writing a journal entry, UBIFS first ensures that enough space is
152available to write the reference node and buds part of this entry. Then, the
153reference node is written and afterwards the buds describing the file changes.
154On replay, UBIFS will record every reference node and inspect the location of
155the referenced LEBs to discover the buds. If these are corrupt or missing,
156UBIFS will attempt to recover them by re-reading the LEB. This is however only
157done for the last referenced LEB of the journal. Only this can become corrupt
158because of a power cut. If the recovery fails, UBIFS will not mount. An error
159for every other LEB will directly cause UBIFS to fail the mount operation.
160
161::
162
163       | ----    LOG AREA     ---- | ----------    MAIN AREA    ------------ |
164
165        -----+------+-----+--------+----   ------+-----+-----+---------------
166        \    |      |     |        |   /  /      |     |     |               \
167        / CS |  REF | REF |        |   \  \ DENT | INO | INO |               /
168        \    |      |     |        |   /  /      |     |     |               \
169         ----+------+-----+--------+---   -------+-----+-----+----------------
170                 |     |                  ^            ^
171                 |     |                  |            |
172                 +------------------------+            |
173                       |                               |
174                       +-------------------------------+
175
176
177                Figure 2: UBIFS flash layout of log area with commit start nodes
178                          (CS) and reference nodes (REF) pointing to main area
179                          containing their buds
180
181
182LEB Property Tree/Table
183~~~~~~~~~~~~~~~~~~~~~~~
184
185The LEB property tree is used to store per-LEB information. This includes the
186LEB type and amount of free and *dirty* (old, obsolete content) space [1]_ on
187the LEB. The type is important, because UBIFS never mixes index nodes with data
188nodes on a single LEB and thus each LEB has a specific purpose. This again is
189useful for free space calculations. See [UBIFS-WP] for more details.
190
191The LEB property tree again is a B+ tree, but it is much smaller than the
192index. Due to its smaller size it is always written as one chunk on every
193commit. Thus, saving the LPT is an atomic operation.
194
195
196.. [1] Since LEBs can only be appended and never overwritten, there is a
197   difference between free space ie. the remaining space left on the LEB to be
198   written to without erasing it and previously written content that is obsolete
199   but can't be overwritten without erasing the full LEB.
200
201
202UBIFS Authentication
203====================
204
205This chapter introduces UBIFS authentication which enables UBIFS to verify
206the authenticity and integrity of metadata and file contents stored on flash.
207
208
209Threat Model
210------------
211
212UBIFS authentication enables detection of offline data modification. While it
213does not prevent it, it enables (trusted) code to check the integrity and
214authenticity of on-flash file contents and filesystem metadata. This covers
215attacks where file contents are swapped.
216
217UBIFS authentication will not protect against rollback of full flash contents.
218Ie. an attacker can still dump the flash and restore it at a later time without
219detection. It will also not protect against partial rollback of individual
220index commits. That means that an attacker is able to partially undo changes.
221This is possible because UBIFS does not immediately overwrites obsolete
222versions of the index tree or the journal, but instead marks them as obsolete
223and garbage collection erases them at a later time. An attacker can use this by
224erasing parts of the current tree and restoring old versions that are still on
225the flash and have not yet been erased. This is possible, because every commit
226will always write a new version of the index root node and the master node
227without overwriting the previous version. This is further helped by the
228wear-leveling operations of UBI which copies contents from one physical
229eraseblock to another and does not atomically erase the first eraseblock.
230
231UBIFS authentication does not cover attacks where an attacker is able to
232execute code on the device after the authentication key was provided.
233Additional measures like secure boot and trusted boot have to be taken to
234ensure that only trusted code is executed on a device.
235
236
237Authentication
238--------------
239
240To be able to fully trust data read from flash, all UBIFS data structures
241stored on flash are authenticated. That is:
242
243- The index which includes file contents, file metadata like extended
244  attributes, file length etc.
245- The journal which also contains file contents and metadata by recording changes
246  to the filesystem
247- The LPT which stores UBI LEB metadata which UBIFS uses for free space accounting
248
249
250Index Authentication
251~~~~~~~~~~~~~~~~~~~~
252
253Through UBIFS' concept of a wandering tree, it already takes care of only
254updating and persisting changed parts from leaf node up to the root node
255of the full B+ tree. This enables us to augment the index nodes of the tree
256with a hash over each node's child nodes. As a result, the index basically also
257a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
258data, the hashes of their parent index nodes thus cover all the file contents
259and file metadata. When a file changes, the UBIFS index is updated accordingly
260from the leaf nodes up to the root node including the master node. This process
261can be hooked to recompute the hash only for each changed node at the same time.
262Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
263the root node to ensure the node's integrity.
264
265To ensure the authenticity of the whole index, the UBIFS master node stores a
266keyed hash (HMAC) over its own contents and a hash of the root node of the index
267tree. As mentioned above, the master node is always written to the flash whenever
268the index is persisted (ie. on index commit).
269
270Using this approach only UBIFS index nodes and the master node are changed to
271include a hash. All other types of nodes will remain unchanged. This reduces
272the storage overhead which is precious for users of UBIFS (ie. embedded
273devices).
274
275::
276
277                             +---------------+
278                             |  Master Node  |
279                             |    (hash)     |
280                             +---------------+
281                                     |
282                                     v
283                            +-------------------+
284                            |  Index Node #1    |
285                            |                   |
286                            | branch0   branchn |
287                            | (hash)    (hash)  |
288                            +-------------------+
289                               |    ...   |  (fanout: 8)
290                               |          |
291                       +-------+          +------+
292                       |                         |
293                       v                         v
294            +-------------------+       +-------------------+
295            |  Index Node #2    |       |  Index Node #3    |
296            |                   |       |                   |
297            | branch0   branchn |       | branch0   branchn |
298            | (hash)    (hash)  |       | (hash)    (hash)  |
299            +-------------------+       +-------------------+
300                 |   ...                     |   ...   |
301                 v                           v         v
302               +-----------+         +----------+  +-----------+
303               | Data Node |         | INO Node |  | DENT Node |
304               +-----------+         +----------+  +-----------+
305
306
307           Figure 3: Coverage areas of index node hash and master node HMAC
308
309
310
311The most important part for robustness and power-cut safety is to atomically
312persist the hash and file contents. Here the existing UBIFS logic for how
313changed nodes are persisted is already designed for this purpose such that
314UBIFS can safely recover if a power-cut occurs while persisting. Adding
315hashes to index nodes does not change this since each hash will be persisted
316atomically together with its respective node.
317
318
319Journal Authentication
320~~~~~~~~~~~~~~~~~~~~~~
321
322The journal is authenticated too. Since the journal is continuously written
323it is necessary to also add authentication information frequently to the
324journal so that in case of a powercut not too much data can't be authenticated.
325This is done by creating a continuous hash beginning from the commit start node
326over the previous reference nodes, the current reference node, and the bud
327nodes. From time to time whenever it is suitable authentication nodes are added
328between the bud nodes. This new node type contains a HMAC over the current state
329of the hash chain. That way a journal can be authenticated up to the last
330authentication node. The tail of the journal which may not have a authentication
331node cannot be authenticated and is skipped during journal replay.
332
333We get this picture for journal authentication::
334
335    ,,,,,,,,
336    ,......,...........................................
337    ,. CS  ,               hash1.----.           hash2.----.
338    ,.  |  ,                    .    |hmac            .    |hmac
339    ,.  v  ,                    .    v                .    v
340    ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
341    ,..|...,...........................................
342    ,  |   ,
343    ,  |   ,,,,,,,,,,,,,,,
344    .  |            hash3,----.
345    ,  |                 ,    |hmac
346    ,  v                 ,    v
347    , REF#1 -> bud -> bud,-> auth ...
348    ,,,|,,,,,,,,,,,,,,,,,,
349       v
350      REF#2 -> ...
351       |
352       V
353      ...
354
355Since the hash also includes the reference nodes an attacker cannot reorder or
356skip any journal heads for replay. An attacker can only remove bud nodes or
357reference nodes from the end of the journal, effectively rewinding the
358filesystem at maximum back to the last commit.
359
360The location of the log area is stored in the master node. Since the master
361node is authenticated with a HMAC as described above, it is not possible to
362tamper with that without detection. The size of the log area is specified when
363the filesystem is created using `mkfs.ubifs` and stored in the superblock node.
364To avoid tampering with this and other values stored there, a HMAC is added to
365the superblock struct. The superblock node is stored in LEB 0 and is only
366modified on feature flag or similar changes, but never on file changes.
367
368
369LPT Authentication
370~~~~~~~~~~~~~~~~~~
371
372The location of the LPT root node on the flash is stored in the UBIFS master
373node. Since the LPT is written and read atomically on every commit, there is
374no need to authenticate individual nodes of the tree. It suffices to
375protect the integrity of the full LPT by a simple hash stored in the master
376node. Since the master node itself is authenticated, the LPTs authenticity can
377be verified by verifying the authenticity of the master node and comparing the
378LTP hash stored there with the hash computed from the read on-flash LPT.
379
380
381Key Management
382--------------
383
384For simplicity, UBIFS authentication uses a single key to compute the HMACs
385of superblock, master, commit start and reference nodes. This key has to be
386available on creation of the filesystem (`mkfs.ubifs`) to authenticate the
387superblock node. Further, it has to be available on mount of the filesystem
388to verify authenticated nodes and generate new HMACs for changes.
389
390UBIFS authentication is intended to operate side-by-side with UBIFS encryption
391(fscrypt) to provide confidentiality and authenticity. Since UBIFS encryption
392has a different approach of encryption policies per directory, there can be
393multiple fscrypt master keys and there might be folders without encryption.
394UBIFS authentication on the other hand has an all-or-nothing approach in the
395sense that it either authenticates everything of the filesystem or nothing.
396Because of this and because UBIFS authentication should also be usable without
397encryption, it does not share the same master key with fscrypt, but manages
398a dedicated authentication key.
399
400The API for providing the authentication key has yet to be defined, but the
401key can eg. be provided by userspace through a keyring similar to the way it
402is currently done in fscrypt. It should however be noted that the current
403fscrypt approach has shown its flaws and the userspace API will eventually
404change [FSCRYPT-POLICY2].
405
406Nevertheless, it will be possible for a user to provide a single passphrase
407or key in userspace that covers UBIFS authentication and encryption. This can
408be solved by the corresponding userspace tools which derive a second key for
409authentication in addition to the derived fscrypt master key used for
410encryption.
411
412To be able to check if the proper key is available on mount, the UBIFS
413superblock node will additionally store a hash of the authentication key. This
414approach is similar to the approach proposed for fscrypt encryption policy v2
415[FSCRYPT-POLICY2].
416
417
418Future Extensions
419=================
420
421In certain cases where a vendor wants to provide an authenticated filesystem
422image to customers, it should be possible to do so without sharing the secret
423UBIFS authentication key. Instead, in addition the each HMAC a digital
424signature could be stored where the vendor shares the public key alongside the
425filesystem image. In case this filesystem has to be modified afterwards,
426UBIFS can exchange all digital signatures with HMACs on first mount similar
427to the way the IMA/EVM subsystem deals with such situations. The HMAC key
428will then have to be provided beforehand in the normal way.
429
430
431References
432==========
433
434[CRYPTSETUP2]        http://www.saout.de/pipermail/dm-crypt/2017-November/005745.html
435
436[DMC-CBC-ATTACK]     http://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/
437
438[DM-INTEGRITY]       https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst
439
440[DM-VERITY]          https://www.kernel.org/doc/Documentation/device-mapper/verity.rst
441
442[FSCRYPT-POLICY2]    https://www.spinics.net/lists/linux-ext4/msg58710.html
443
444[UBIFS-WP]           http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf

  1.. SPDX-License-Identifier: GPL-2.0
  2
  3:orphan:
  4
  5.. UBIFS Authentication
  6.. sigma star gmbh
  7.. 2018
  8
  9Introduction
 10============
 11
 12UBIFS utilizes the fscrypt framework to provide confidentiality for file
 13contents and file names. This prevents attacks where an attacker is able to
 14read contents of the filesystem on a single point in time. A classic example
 15is a lost smartphone where the attacker is unable to read personal data stored
 16on the device without the filesystem decryption key.
 17
 18At the current state, UBIFS encryption however does not prevent attacks where
 19the attacker is able to modify the filesystem contents and the user uses the
 20device afterwards. In such a scenario an attacker can modify filesystem
 21contents arbitrarily without the user noticing. One example is to modify a
 22binary to perform a malicious action when executed [DMC-CBC-ATTACK]. Since
 23most of the filesystem metadata of UBIFS is stored in plain, this makes it
 24fairly easy to swap files and replace their contents.
 25
 26Other full disk encryption systems like dm-crypt cover all filesystem metadata,
 27which makes such kinds of attacks more complicated, but not impossible.
 28Especially, if the attacker is given access to the device multiple points in
 29time. For dm-crypt and other filesystems that build upon the Linux block IO
 30layer, the dm-integrity or dm-verity subsystems [DM-INTEGRITY, DM-VERITY]
 31can be used to get full data authentication at the block layer.
 32These can also be combined with dm-crypt [CRYPTSETUP2].
 33
 34This document describes an approach to get file contents _and_ full metadata
 35authentication for UBIFS. Since UBIFS uses fscrypt for file contents and file
 36name encryption, the authentication system could be tied into fscrypt such that
 37existing features like key derivation can be utilized. It should however also
 38be possible to use UBIFS authentication without using encryption.
 39
 40
 41MTD, UBI & UBIFS
 42----------------
 43
 44On Linux, the MTD (Memory Technology Devices) subsystem provides a uniform
 45interface to access raw flash devices. One of the more prominent subsystems that
 46work on top of MTD is UBI (Unsorted Block Images). It provides volume management
 47for flash devices and is thus somewhat similar to LVM for block devices. In
 48addition, it deals with flash-specific wear-leveling and transparent I/O error
 49handling. UBI offers logical erase blocks (LEBs) to the layers on top of it
 50and maps them transparently to physical erase blocks (PEBs) on the flash.
 51
 52UBIFS is a filesystem for raw flash which operates on top of UBI. Thus, wear
 53leveling and some flash specifics are left to UBI, while UBIFS focuses on
 54scalability, performance and recoverability.
 55
 56::
 57
 58	+------------+ +*******+ +-----------+ +-----+
 59	|            | * UBIFS * | UBI-BLOCK | | ... |
 60	| JFFS/JFFS2 | +*******+ +-----------+ +-----+
 61	|            | +-----------------------------+ +-----------+ +-----+
 62	|            | |              UBI            | | MTD-BLOCK | | ... |
 63	+------------+ +-----------------------------+ +-----------+ +-----+
 64	+------------------------------------------------------------------+
 65	|                  MEMORY TECHNOLOGY DEVICES (MTD)                 |
 66	+------------------------------------------------------------------+
 67	+-----------------------------+ +--------------------------+ +-----+
 68	|         NAND DRIVERS        | |        NOR DRIVERS       | | ... |
 69	+-----------------------------+ +--------------------------+ +-----+
 70
 71            Figure 1: Linux kernel subsystems for dealing with raw flash
 72
 73
 74
 75Internally, UBIFS maintains multiple data structures which are persisted on
 76the flash:
 77
 78- *Index*: an on-flash B+ tree where the leaf nodes contain filesystem data
 79- *Journal*: an additional data structure to collect FS changes before updating
 80  the on-flash index and reduce flash wear.
 81- *Tree Node Cache (TNC)*: an in-memory B+ tree that reflects the current FS
 82  state to avoid frequent flash reads. It is basically the in-memory
 83  representation of the index, but contains additional attributes.
 84- *LEB property tree (LPT)*: an on-flash B+ tree for free space accounting per
 85  UBI LEB.
 86
 87In the remainder of this section we will cover the on-flash UBIFS data
 88structures in more detail. The TNC is of less importance here since it is never
 89persisted onto the flash directly. More details on UBIFS can also be found in
 90[UBIFS-WP].
 91
 92
 93UBIFS Index & Tree Node Cache
 94~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 95
 96Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
 97of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
 98contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
 99Almost all types of nodes share a common header (``ubifs_ch``) containing basic
100information like node type, node length, a sequence number, etc. (see
101``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
102and some less important node types like padding nodes which are used to pad
103unusable content at the end of LEBs.
104
105To avoid re-writing the whole B+ tree on every single change, it is implemented
106as *wandering tree*, where only the changed nodes are re-written and previous
107versions of them are obsoleted without erasing them right away. As a result,
108the index is not stored in a single place on the flash, but *wanders* around
109and there are obsolete parts on the flash as long as the LEB containing them is
110not reused by UBIFS. To find the most recent version of the index, UBIFS stores
111a special node called *master node* into UBI LEB 1 which always points to the
112most recent root node of the UBIFS index. For recoverability, the master node
113is additionally duplicated to LEB 2. Mounting UBIFS is thus a simple read of
114LEB 1 and 2 to get the current master node and from there get the location of
115the most recent on-flash index.
116
117The TNC is the in-memory representation of the on-flash index. It contains some
118additional runtime attributes per node which are not persisted. One of these is
119a dirty-flag which marks nodes that have to be persisted the next time the
120index is written onto the flash. The TNC acts as a write-back cache and all
121modifications of the on-flash index are done through the TNC. Like other caches,
122the TNC does not have to mirror the full index into memory, but reads parts of
123it from flash whenever needed. A *commit* is the UBIFS operation of updating the
124on-flash filesystem structures like the index. On every commit, the TNC nodes
125marked as dirty are written to the flash to update the persisted index.
126
127
128Journal
129~~~~~~~
130
131To avoid wearing out the flash, the index is only persisted (*commited*) when
132certain conditions are met (eg. ``fsync(2)``). The journal is used to record
133any changes (in form of inode nodes, data nodes etc.) between commits
134of the index. During mount, the journal is read from the flash and replayed
135onto the TNC (which will be created on-demand from the on-flash index).
136
137UBIFS reserves a bunch of LEBs just for the journal called *log area*. The
138amount of log area LEBs is configured on filesystem creation (using
139``mkfs.ubifs``) and stored in the superblock node. The log area contains only
140two types of nodes: *reference nodes* and *commit start nodes*. A commit start
141node is written whenever an index commit is performed. Reference nodes are
142written on every journal update. Each reference node points to the position of
143other nodes (inode nodes, data nodes etc.) on the flash that are part of this
144journal entry. These nodes are called *buds* and describe the actual filesystem
145changes including their data.
146
147The log area is maintained as a ring. Whenever the journal is almost full,
148a commit is initiated. This also writes a commit start node so that during
149mount, UBIFS will seek for the most recent commit start node and just replay
150every reference node after that. Every reference node before the commit start
151node will be ignored as they are already part of the on-flash index.
152
153When writing a journal entry, UBIFS first ensures that enough space is
154available to write the reference node and buds part of this entry. Then, the
155reference node is written and afterwards the buds describing the file changes.
156On replay, UBIFS will record every reference node and inspect the location of
157the referenced LEBs to discover the buds. If these are corrupt or missing,
158UBIFS will attempt to recover them by re-reading the LEB. This is however only
159done for the last referenced LEB of the journal. Only this can become corrupt
160because of a power cut. If the recovery fails, UBIFS will not mount. An error
161for every other LEB will directly cause UBIFS to fail the mount operation.
162
163::
164
165       | ----    LOG AREA     ---- | ----------    MAIN AREA    ------------ |
166
167        -----+------+-----+--------+----   ------+-----+-----+---------------
168        \    |      |     |        |   /  /      |     |     |               \
169        / CS |  REF | REF |        |   \  \ DENT | INO | INO |               /
170        \    |      |     |        |   /  /      |     |     |               \
171         ----+------+-----+--------+---   -------+-----+-----+----------------
172                 |     |                  ^            ^
173                 |     |                  |            |
174                 +------------------------+            |
175                       |                               |
176                       +-------------------------------+
177
178
179                Figure 2: UBIFS flash layout of log area with commit start nodes
180                          (CS) and reference nodes (REF) pointing to main area
181                          containing their buds
182
183
184LEB Property Tree/Table
185~~~~~~~~~~~~~~~~~~~~~~~
186
187The LEB property tree is used to store per-LEB information. This includes the
188LEB type and amount of free and *dirty* (old, obsolete content) space [1]_ on
189the LEB. The type is important, because UBIFS never mixes index nodes with data
190nodes on a single LEB and thus each LEB has a specific purpose. This again is
191useful for free space calculations. See [UBIFS-WP] for more details.
192
193The LEB property tree again is a B+ tree, but it is much smaller than the
194index. Due to its smaller size it is always written as one chunk on every
195commit. Thus, saving the LPT is an atomic operation.
196
197
198.. [1] Since LEBs can only be appended and never overwritten, there is a
199   difference between free space ie. the remaining space left on the LEB to be
200   written to without erasing it and previously written content that is obsolete
201   but can't be overwritten without erasing the full LEB.
202
203
204UBIFS Authentication
205====================
206
207This chapter introduces UBIFS authentication which enables UBIFS to verify
208the authenticity and integrity of metadata and file contents stored on flash.
209
210
211Threat Model
212------------
213
214UBIFS authentication enables detection of offline data modification. While it
215does not prevent it, it enables (trusted) code to check the integrity and
216authenticity of on-flash file contents and filesystem metadata. This covers
217attacks where file contents are swapped.
218
219UBIFS authentication will not protect against rollback of full flash contents.
220Ie. an attacker can still dump the flash and restore it at a later time without
221detection. It will also not protect against partial rollback of individual
222index commits. That means that an attacker is able to partially undo changes.
223This is possible because UBIFS does not immediately overwrites obsolete
224versions of the index tree or the journal, but instead marks them as obsolete
225and garbage collection erases them at a later time. An attacker can use this by
226erasing parts of the current tree and restoring old versions that are still on
227the flash and have not yet been erased. This is possible, because every commit
228will always write a new version of the index root node and the master node
229without overwriting the previous version. This is further helped by the
230wear-leveling operations of UBI which copies contents from one physical
231eraseblock to another and does not atomically erase the first eraseblock.
232
233UBIFS authentication does not cover attacks where an attacker is able to
234execute code on the device after the authentication key was provided.
235Additional measures like secure boot and trusted boot have to be taken to
236ensure that only trusted code is executed on a device.
237
238
239Authentication
240--------------
241
242To be able to fully trust data read from flash, all UBIFS data structures
243stored on flash are authenticated. That is:
244
245- The index which includes file contents, file metadata like extended
246  attributes, file length etc.
247- The journal which also contains file contents and metadata by recording changes
248  to the filesystem
249- The LPT which stores UBI LEB metadata which UBIFS uses for free space accounting
250
251
252Index Authentication
253~~~~~~~~~~~~~~~~~~~~
254
255Through UBIFS' concept of a wandering tree, it already takes care of only
256updating and persisting changed parts from leaf node up to the root node
257of the full B+ tree. This enables us to augment the index nodes of the tree
258with a hash over each node's child nodes. As a result, the index basically also
259a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
260data, the hashes of their parent index nodes thus cover all the file contents
261and file metadata. When a file changes, the UBIFS index is updated accordingly
262from the leaf nodes up to the root node including the master node. This process
263can be hooked to recompute the hash only for each changed node at the same time.
264Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
265the root node to ensure the node's integrity.
266
267To ensure the authenticity of the whole index, the UBIFS master node stores a
268keyed hash (HMAC) over its own contents and a hash of the root node of the index
269tree. As mentioned above, the master node is always written to the flash whenever
270the index is persisted (ie. on index commit).
271
272Using this approach only UBIFS index nodes and the master node are changed to
273include a hash. All other types of nodes will remain unchanged. This reduces
274the storage overhead which is precious for users of UBIFS (ie. embedded
275devices).
276
277::
278
279                             +---------------+
280                             |  Master Node  |
281                             |    (hash)     |
282                             +---------------+
283                                     |
284                                     v
285                            +-------------------+
286                            |  Index Node #1    |
287                            |                   |
288                            | branch0   branchn |
289                            | (hash)    (hash)  |
290                            +-------------------+
291                               |    ...   |  (fanout: 8)
292                               |          |
293                       +-------+          +------+
294                       |                         |
295                       v                         v
296            +-------------------+       +-------------------+
297            |  Index Node #2    |       |  Index Node #3    |
298            |                   |       |                   |
299            | branch0   branchn |       | branch0   branchn |
300            | (hash)    (hash)  |       | (hash)    (hash)  |
301            +-------------------+       +-------------------+
302                 |   ...                     |   ...   |
303                 v                           v         v
304               +-----------+         +----------+  +-----------+
305               | Data Node |         | INO Node |  | DENT Node |
306               +-----------+         +----------+  +-----------+
307
308
309           Figure 3: Coverage areas of index node hash and master node HMAC
310
311
312
313The most important part for robustness and power-cut safety is to atomically
314persist the hash and file contents. Here the existing UBIFS logic for how
315changed nodes are persisted is already designed for this purpose such that
316UBIFS can safely recover if a power-cut occurs while persisting. Adding
317hashes to index nodes does not change this since each hash will be persisted
318atomically together with its respective node.
319
320
321Journal Authentication
322~~~~~~~~~~~~~~~~~~~~~~
323
324The journal is authenticated too. Since the journal is continuously written
325it is necessary to also add authentication information frequently to the
326journal so that in case of a powercut not too much data can't be authenticated.
327This is done by creating a continuous hash beginning from the commit start node
328over the previous reference nodes, the current reference node, and the bud
329nodes. From time to time whenever it is suitable authentication nodes are added
330between the bud nodes. This new node type contains a HMAC over the current state
331of the hash chain. That way a journal can be authenticated up to the last
332authentication node. The tail of the journal which may not have a authentication
333node cannot be authenticated and is skipped during journal replay.
334
335We get this picture for journal authentication::
336
337    ,,,,,,,,
338    ,......,...........................................
339    ,. CS  ,               hash1.----.           hash2.----.
340    ,.  |  ,                    .    |hmac            .    |hmac
341    ,.  v  ,                    .    v                .    v
342    ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
343    ,..|...,...........................................
344    ,  |   ,
345    ,  |   ,,,,,,,,,,,,,,,
346    .  |            hash3,----.
347    ,  |                 ,    |hmac
348    ,  v                 ,    v
349    , REF#1 -> bud -> bud,-> auth ...
350    ,,,|,,,,,,,,,,,,,,,,,,
351       v
352      REF#2 -> ...
353       |
354       V
355      ...
356
357Since the hash also includes the reference nodes an attacker cannot reorder or
358skip any journal heads for replay. An attacker can only remove bud nodes or
359reference nodes from the end of the journal, effectively rewinding the
360filesystem at maximum back to the last commit.
361
362The location of the log area is stored in the master node. Since the master
363node is authenticated with a HMAC as described above, it is not possible to
364tamper with that without detection. The size of the log area is specified when
365the filesystem is created using `mkfs.ubifs` and stored in the superblock node.
366To avoid tampering with this and other values stored there, a HMAC is added to
367the superblock struct. The superblock node is stored in LEB 0 and is only
368modified on feature flag or similar changes, but never on file changes.
369
370
371LPT Authentication
372~~~~~~~~~~~~~~~~~~
373
374The location of the LPT root node on the flash is stored in the UBIFS master
375node. Since the LPT is written and read atomically on every commit, there is
376no need to authenticate individual nodes of the tree. It suffices to
377protect the integrity of the full LPT by a simple hash stored in the master
378node. Since the master node itself is authenticated, the LPTs authenticity can
379be verified by verifying the authenticity of the master node and comparing the
380LTP hash stored there with the hash computed from the read on-flash LPT.
381
382
383Key Management
384--------------
385
386For simplicity, UBIFS authentication uses a single key to compute the HMACs
387of superblock, master, commit start and reference nodes. This key has to be
388available on creation of the filesystem (`mkfs.ubifs`) to authenticate the
389superblock node. Further, it has to be available on mount of the filesystem
390to verify authenticated nodes and generate new HMACs for changes.
391
392UBIFS authentication is intended to operate side-by-side with UBIFS encryption
393(fscrypt) to provide confidentiality and authenticity. Since UBIFS encryption
394has a different approach of encryption policies per directory, there can be
395multiple fscrypt master keys and there might be folders without encryption.
396UBIFS authentication on the other hand has an all-or-nothing approach in the
397sense that it either authenticates everything of the filesystem or nothing.
398Because of this and because UBIFS authentication should also be usable without
399encryption, it does not share the same master key with fscrypt, but manages
400a dedicated authentication key.
401
402The API for providing the authentication key has yet to be defined, but the
403key can eg. be provided by userspace through a keyring similar to the way it
404is currently done in fscrypt. It should however be noted that the current
405fscrypt approach has shown its flaws and the userspace API will eventually
406change [FSCRYPT-POLICY2].
407
408Nevertheless, it will be possible for a user to provide a single passphrase
409or key in userspace that covers UBIFS authentication and encryption. This can
410be solved by the corresponding userspace tools which derive a second key for
411authentication in addition to the derived fscrypt master key used for
412encryption.
413
414To be able to check if the proper key is available on mount, the UBIFS
415superblock node will additionally store a hash of the authentication key. This
416approach is similar to the approach proposed for fscrypt encryption policy v2
417[FSCRYPT-POLICY2].
418
419
420Future Extensions
421=================
422
423In certain cases where a vendor wants to provide an authenticated filesystem
424image to customers, it should be possible to do so without sharing the secret
425UBIFS authentication key. Instead, in addition the each HMAC a digital
426signature could be stored where the vendor shares the public key alongside the
427filesystem image. In case this filesystem has to be modified afterwards,
428UBIFS can exchange all digital signatures with HMACs on first mount similar
429to the way the IMA/EVM subsystem deals with such situations. The HMAC key
430will then have to be provided beforehand in the normal way.
431
432
433References
434==========
435
436[CRYPTSETUP2]        https://www.saout.de/pipermail/dm-crypt/2017-November/005745.html
437
438[DMC-CBC-ATTACK]     https://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/
439
440[DM-INTEGRITY]       https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst
441
442[DM-VERITY]          https://www.kernel.org/doc/Documentation/device-mapper/verity.rst
443
444[FSCRYPT-POLICY2]    https://www.spinics.net/lists/linux-ext4/msg58710.html
445
446[UBIFS-WP]           http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf