split_page_table_lock.rst - Documentation/vm/split_page_table_lock.rst - Linux source code v6.13.7

Note: File does not exist in v6.13.7.
  1.. _split_page_table_lock:
  2
  3=====================
  4Split page table lock
  5=====================
  6
  7Originally, mm->page_table_lock spinlock protected all page tables of the
  8mm_struct. But this approach leads to poor page fault scalability of
  9multi-threaded applications due high contention on the lock. To improve
 10scalability, split page table lock was introduced.
 11
 12With split page table lock we have separate per-table lock to serialize
 13access to the table. At the moment we use split lock for PTE and PMD
 14tables. Access to higher level tables protected by mm->page_table_lock.
 15
 16There are helpers to lock/unlock a table and other accessor functions:
 17
 18 - pte_offset_map_lock()
 19	maps pte and takes PTE table lock, returns pointer to the taken
 20	lock;
 21 - pte_unmap_unlock()
 22	unlocks and unmaps PTE table;
 23 - pte_alloc_map_lock()
 24	allocates PTE table if needed and take the lock, returns pointer
 25	to taken lock or NULL if allocation failed;
 26 - pte_lockptr()
 27	returns pointer to PTE table lock;
 28 - pmd_lock()
 29	takes PMD table lock, returns pointer to taken lock;
 30 - pmd_lockptr()
 31	returns pointer to PMD table lock;
 32
 33Split page table lock for PTE tables is enabled compile-time if
 34CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
 35If split lock is disabled, all tables are guarded by mm->page_table_lock.
 36
 37Split page table lock for PMD tables is enabled, if it's enabled for PTE
 38tables and the architecture supports it (see below).
 39
 40Hugetlb and split page table lock
 41=================================
 42
 43Hugetlb can support several page sizes. We use split lock only for PMD
 44level, but not for PUD.
 45
 46Hugetlb-specific helpers:
 47
 48 - huge_pte_lock()
 49	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
 50	otherwise;
 51 - huge_pte_lockptr()
 52	returns pointer to table lock;
 53
 54Support of split page table lock by an architecture
 55===================================================
 56
 57There's no need in special enabling of PTE split page table lock: everything
 58required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
 59must be called on PTE table allocation / freeing.
 60
 61Make sure the architecture doesn't use slab allocator for page table
 62allocation: slab uses page->slab_cache for its pages.
 63This field shares storage with page->ptl.
 64
 65PMD split lock only makes sense if you have more than two page table
 66levels.
 67
 68PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
 69allocation and pgtable_pmd_page_dtor() on freeing.
 70
 71Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 72pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
 73paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 74
 75With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 76
 77NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
 78be handled properly.
 79
 80page->ptl
 81=========
 82
 83page->ptl is used to access split page table lock, where 'page' is struct
 84page of page containing the table. It shares storage with page->private
 85(and few other fields in union).
 86
 87To avoid increasing size of struct page and have best performance, we use a
 88trick:
 89
 90 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
 91   can avoid indirect access and save a cache line.
 92 - if size of spinlock_t is bigger then size of long, we use page->ptl as
 93   pointer to spinlock_t and allocate it dynamically. This allows to use
 94   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
 95   one more cache line for indirect access;
 96
 97The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
 98pgtable_pmd_page_ctor() for PMD table.
 99
100Please, never access page->ptl directly -- use appropriate helper.