Linux Audio

Check our new training course

Loading...
Note: File does not exist in v4.17.
  1.. SPDX-License-Identifier: GPL-2.0
  2
  3======================
  4Memory Protection Keys
  5======================
  6
  7Memory Protection Keys provide a mechanism for enforcing page-based
  8protections, but without requiring modification of the page tables when an
  9application changes protection domains.
 10
 11Pkeys Userspace (PKU) is a feature which can be found on:
 12        * Intel server CPUs, Skylake and later
 13        * Intel client CPUs, Tiger Lake (11th Gen Core) and later
 14        * Future AMD CPUs
 15        * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)
 16
 17x86_64
 18======
 19Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
 20a "protection key", giving 16 possible keys.
 21
 22Protections for each key are defined with a per-CPU user-accessible register
 23(PKRU).  Each of these is a 32-bit register storing two bits (Access Disable
 24and Write Disable) for each of 16 keys.
 25
 26Being a CPU register, PKRU is inherently thread-local, potentially giving each
 27thread a different set of protections from every other thread.
 28
 29There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
 30register.  The feature is only available in 64-bit mode, even though there is
 31theoretically space in the PAE PTEs.  These permissions are enforced on data
 32access only and have no effect on instruction fetches.
 33
 34arm64
 35=====
 36
 37Pkeys use 3 bits in each page table entry, to encode a "protection key index",
 38giving 8 possible keys.
 39
 40Protections for each key are defined with a per-CPU user-writable system
 41register (POR_EL0).  This is a 64-bit register encoding read, write and execute
 42overlay permissions for each protection key index.
 43
 44Being a CPU register, POR_EL0 is inherently thread-local, potentially giving
 45each thread a different set of protections from every other thread.
 46
 47Unlike x86_64, the protection key permissions also apply to instruction
 48fetches.
 49
 50Syscalls
 51========
 52
 53There are 3 system calls which directly interact with pkeys::
 54
 55	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
 56	int pkey_free(int pkey);
 57	int pkey_mprotect(unsigned long start, size_t len,
 58			  unsigned long prot, int pkey);
 59
 60Before a pkey can be used, it must first be allocated with pkey_alloc().  An
 61application writes to the architecture specific CPU register directly in order
 62to change access permissions to memory covered with a key.  In this example
 63this is wrapped by a C function called pkey_set().
 64::
 65
 66	int real_prot = PROT_READ|PROT_WRITE;
 67	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
 68	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
 69	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
 70	... application runs here
 71
 72Now, if the application needs to update the data at 'ptr', it can
 73gain access, do the update, then remove its write access::
 74
 75	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
 76	*ptr = foo; // assign something
 77	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
 78
 79Now when it frees the memory, it will also free the pkey since it
 80is no longer in use::
 81
 82	munmap(ptr, PAGE_SIZE);
 83	pkey_free(pkey);
 84
 85.. note:: pkey_set() is a wrapper around writing to the CPU register.
 86          Example implementations can be found in
 87          tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h
 88
 89Behavior
 90========
 91
 92The kernel attempts to make protection keys consistent with the
 93behavior of a plain mprotect().  For instance if you do this::
 94
 95	mprotect(ptr, size, PROT_NONE);
 96	something(ptr);
 97
 98you can expect the same effects with protection keys when doing this::
 99
100	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
101	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
102	something(ptr);
103
104That should be true whether something() is a direct access to 'ptr'
105like::
106
107	*ptr = foo;
108
109or when the kernel does the access on the application's behalf like
110with a read()::
111
112	read(fd, ptr, 1);
113
114The kernel will send a SIGSEGV in both cases, but si_code will be set
115to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
116the plain mprotect() permissions are violated.
117
118Note that kernel accesses from a kthread (such as io_uring) will use a default
119value for the protection key register and so will not be consistent with
120userspace's value of the register or mprotect().