Linux Audio

Check our new training course

Loading...
v6.13.7
  1.. SPDX-License-Identifier: GPL-2.0
  2
  3======================
  4Memory Protection Keys
  5======================
  6
  7Memory Protection Keys provide a mechanism for enforcing page-based
  8protections, but without requiring modification of the page tables when an
  9application changes protection domains.
 10
 11Pkeys Userspace (PKU) is a feature which can be found on:
 12        * Intel server CPUs, Skylake and later
 13        * Intel client CPUs, Tiger Lake (11th Gen Core) and later
 14        * Future AMD CPUs
 15        * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)
 16
 17x86_64
 18======
 19Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
 20a "protection key", giving 16 possible keys.
 21
 22Protections for each key are defined with a per-CPU user-accessible register
 23(PKRU).  Each of these is a 32-bit register storing two bits (Access Disable
 24and Write Disable) for each of 16 keys.
 25
 26Being a CPU register, PKRU is inherently thread-local, potentially giving each
 27thread a different set of protections from every other thread.
 28
 29There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
 30register.  The feature is only available in 64-bit mode, even though there is
 31theoretically space in the PAE PTEs.  These permissions are enforced on data
 32access only and have no effect on instruction fetches.
 33
 34arm64
 35=====
 36
 37Pkeys use 3 bits in each page table entry, to encode a "protection key index",
 38giving 8 possible keys.
 39
 40Protections for each key are defined with a per-CPU user-writable system
 41register (POR_EL0).  This is a 64-bit register encoding read, write and execute
 42overlay permissions for each protection key index.
 43
 44Being a CPU register, POR_EL0 is inherently thread-local, potentially giving
 45each thread a different set of protections from every other thread.
 46
 47Unlike x86_64, the protection key permissions also apply to instruction
 48fetches.
 49
 50Syscalls
 51========
 52
 53There are 3 system calls which directly interact with pkeys::
 54
 55	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
 56	int pkey_free(int pkey);
 57	int pkey_mprotect(unsigned long start, size_t len,
 58			  unsigned long prot, int pkey);
 59
 60Before a pkey can be used, it must first be allocated with pkey_alloc().  An
 61application writes to the architecture specific CPU register directly in order
 62to change access permissions to memory covered with a key.  In this example
 63this is wrapped by a C function called pkey_set().
 
 64::
 65
 66	int real_prot = PROT_READ|PROT_WRITE;
 67	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
 68	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
 69	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
 70	... application runs here
 71
 72Now, if the application needs to update the data at 'ptr', it can
 73gain access, do the update, then remove its write access::
 74
 75	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
 76	*ptr = foo; // assign something
 77	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
 78
 79Now when it frees the memory, it will also free the pkey since it
 80is no longer in use::
 81
 82	munmap(ptr, PAGE_SIZE);
 83	pkey_free(pkey);
 84
 85.. note:: pkey_set() is a wrapper around writing to the CPU register.
 86          Example implementations can be found in
 87          tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h
 88
 89Behavior
 90========
 91
 92The kernel attempts to make protection keys consistent with the
 93behavior of a plain mprotect().  For instance if you do this::
 94
 95	mprotect(ptr, size, PROT_NONE);
 96	something(ptr);
 97
 98you can expect the same effects with protection keys when doing this::
 99
100	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
101	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
102	something(ptr);
103
104That should be true whether something() is a direct access to 'ptr'
105like::
106
107	*ptr = foo;
108
109or when the kernel does the access on the application's behalf like
110with a read()::
111
112	read(fd, ptr, 1);
113
114The kernel will send a SIGSEGV in both cases, but si_code will be set
115to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
116the plain mprotect() permissions are violated.
117
118Note that kernel accesses from a kthread (such as io_uring) will use a default
119value for the protection key register and so will not be consistent with
120userspace's value of the register or mprotect().
v5.4
 1.. SPDX-License-Identifier: GPL-2.0
 2
 3======================
 4Memory Protection Keys
 5======================
 6
 7Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
 8which is found on Intel's Skylake "Scalable Processor" Server CPUs.
 9It will be avalable in future non-server parts.
10
11For anyone wishing to test or use this feature, it is available in
12Amazon's EC2 C5 instances and is known to work there using an Ubuntu
1317.04 image.
14
15Memory Protection Keys provides a mechanism for enforcing page-based
16protections, but without requiring modification of the page tables
17when an application changes protection domains.  It works by
18dedicating 4 previously ignored bits in each page table entry to a
19"protection key", giving 16 possible keys.
20
21There is also a new user-accessible register (PKRU) with two separate
22bits (Access Disable and Write Disable) for each key.  Being a CPU
23register, PKRU is inherently thread-local, potentially giving each
 
 
 
24thread a different set of protections from every other thread.
25
26There are two new instructions (RDPKRU/WRPKRU) for reading and writing
27to the new register.  The feature is only available in 64-bit mode,
28even though there is theoretically space in the PAE PTEs.  These
29permissions are enforced on data access only and have no effect on
30instruction fetches.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
32Syscalls
33========
34
35There are 3 system calls which directly interact with pkeys::
36
37	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
38	int pkey_free(int pkey);
39	int pkey_mprotect(unsigned long start, size_t len,
40			  unsigned long prot, int pkey);
41
42Before a pkey can be used, it must first be allocated with
43pkey_alloc().  An application calls the WRPKRU instruction
44directly in order to change access permissions to memory covered
45with a key.  In this example WRPKRU is wrapped by a C function
46called pkey_set().
47::
48
49	int real_prot = PROT_READ|PROT_WRITE;
50	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
51	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
52	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
53	... application runs here
54
55Now, if the application needs to update the data at 'ptr', it can
56gain access, do the update, then remove its write access::
57
58	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
59	*ptr = foo; // assign something
60	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
61
62Now when it frees the memory, it will also free the pkey since it
63is no longer in use::
64
65	munmap(ptr, PAGE_SIZE);
66	pkey_free(pkey);
67
68.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
69          An example implementation can be found in
70          tools/testing/selftests/x86/protection_keys.c.
71
72Behavior
73========
74
75The kernel attempts to make protection keys consistent with the
76behavior of a plain mprotect().  For instance if you do this::
77
78	mprotect(ptr, size, PROT_NONE);
79	something(ptr);
80
81you can expect the same effects with protection keys when doing this::
82
83	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
84	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
85	something(ptr);
86
87That should be true whether something() is a direct access to 'ptr'
88like::
89
90	*ptr = foo;
91
92or when the kernel does the access on the application's behalf like
93with a read()::
94
95	read(fd, ptr, 1);
96
97The kernel will send a SIGSEGV in both cases, but si_code will be set
98to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
99the plain mprotect() permissions are violated.