split_page_table_lock.rst - Documentation/mm/split_page_table_lock.rst - Linux diff v6.13.7

 
 
  1=====================
  2Split page table lock
  3=====================
  4
  5Originally, mm->page_table_lock spinlock protected all page tables of the
  6mm_struct. But this approach leads to poor page fault scalability of
  7multi-threaded applications due high contention on the lock. To improve
  8scalability, split page table lock was introduced.
  9
 10With split page table lock we have separate per-table lock to serialize
 11access to the table. At the moment we use split lock for PTE and PMD
 12tables. Access to higher level tables protected by mm->page_table_lock.
 13
 14There are helpers to lock/unlock a table and other accessor functions:
 15
 16 - pte_offset_map_lock()
 17	maps PTE and takes PTE table lock, returns pointer to PTE with
 18	pointer to its PTE table lock, or returns NULL if no PTE table;
 19 - pte_offset_map_ro_nolock()
 20	maps PTE, returns pointer to PTE with pointer to its PTE table
 21	lock (not taken), or returns NULL if no PTE table;
 22 - pte_offset_map_rw_nolock()
 23	maps PTE, returns pointer to PTE with pointer to its PTE table
 24	lock (not taken) and the value of its pmd entry, or returns NULL
 25	if no PTE table;
 26 - pte_offset_map()
 27	maps PTE, returns pointer to PTE, or returns NULL if no PTE table;
 28 - pte_unmap()
 29	unmaps PTE table;
 30 - pte_unmap_unlock()
 31	unlocks and unmaps PTE table;
 32 - pte_alloc_map_lock()
 33	allocates PTE table if needed and takes its lock, returns pointer to
 34	PTE with pointer to its lock, or returns NULL if allocation failed;
 
 
 35 - pmd_lock()
 36	takes PMD table lock, returns pointer to taken lock;
 37 - pmd_lockptr()
 38	returns pointer to PMD table lock;
 39
 40Split page table lock for PTE tables is enabled compile-time if
 41CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
 42If split lock is disabled, all tables are guarded by mm->page_table_lock.
 43
 44Split page table lock for PMD tables is enabled, if it's enabled for PTE
 45tables and the architecture supports it (see below).
 46
 47Hugetlb and split page table lock
 48=================================
 49
 50Hugetlb can support several page sizes. We use split lock only for PMD
 51level, but not for PUD.
 52
 53Hugetlb-specific helpers:
 54
 55 - huge_pte_lock()
 56	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
 57	otherwise;
 58 - huge_pte_lockptr()
 59	returns pointer to table lock;
 60
 61Support of split page table lock by an architecture
 62===================================================
 63
 64There's no need in special enabling of PTE split page table lock: everything
 65required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
 66must be called on PTE table allocation / freeing.
 67
 68Make sure the architecture doesn't use slab allocator for page table
 69allocation: slab uses page->slab_cache for its pages.
 70This field shares storage with page->ptl.
 71
 72PMD split lock only makes sense if you have more than two page table
 73levels.
 74
 75PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
 76allocation and pagetable_pmd_dtor() on freeing.
 77
 78Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 79pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
 80paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 81
 82With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 83
 84NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
 85be handled properly.
 86
 87page->ptl
 88=========
 89
 90page->ptl is used to access split page table lock, where 'page' is struct
 91page of page containing the table. It shares storage with page->private
 92(and few other fields in union).
 93
 94To avoid increasing size of struct page and have best performance, we use a
 95trick:
 96
 97 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
 98   can avoid indirect access and save a cache line.
 99 - if size of spinlock_t is bigger then size of long, we use page->ptl as
100   pointer to spinlock_t and allocate it dynamically. This allows to use
101   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
102   one more cache line for indirect access;
103
104The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
105pagetable_pmd_ctor() for PMD table.
106
107Please, never access page->ptl directly -- use appropriate helper.

  1.. _split_page_table_lock:
  2
  3=====================
  4Split page table lock
  5=====================
  6
  7Originally, mm->page_table_lock spinlock protected all page tables of the
  8mm_struct. But this approach leads to poor page fault scalability of
  9multi-threaded applications due high contention on the lock. To improve
 10scalability, split page table lock was introduced.
 11
 12With split page table lock we have separate per-table lock to serialize
 13access to the table. At the moment we use split lock for PTE and PMD
 14tables. Access to higher level tables protected by mm->page_table_lock.
 15
 16There are helpers to lock/unlock a table and other accessor functions:
 17
 18 - pte_offset_map_lock()
 19	maps pte and takes PTE table lock, returns pointer to the taken
 20	lock;
 
 
 
 
 
 
 
 
 
 
 
 21 - pte_unmap_unlock()
 22	unlocks and unmaps PTE table;
 23 - pte_alloc_map_lock()
 24	allocates PTE table if needed and take the lock, returns pointer
 25	to taken lock or NULL if allocation failed;
 26 - pte_lockptr()
 27	returns pointer to PTE table lock;
 28 - pmd_lock()
 29	takes PMD table lock, returns pointer to taken lock;
 30 - pmd_lockptr()
 31	returns pointer to PMD table lock;
 32
 33Split page table lock for PTE tables is enabled compile-time if
 34CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
 35If split lock is disabled, all tables are guarded by mm->page_table_lock.
 36
 37Split page table lock for PMD tables is enabled, if it's enabled for PTE
 38tables and the architecture supports it (see below).
 39
 40Hugetlb and split page table lock
 41=================================
 42
 43Hugetlb can support several page sizes. We use split lock only for PMD
 44level, but not for PUD.
 45
 46Hugetlb-specific helpers:
 47
 48 - huge_pte_lock()
 49	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
 50	otherwise;
 51 - huge_pte_lockptr()
 52	returns pointer to table lock;
 53
 54Support of split page table lock by an architecture
 55===================================================
 56
 57There's no need in special enabling of PTE split page table lock: everything
 58required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
 59must be called on PTE table allocation / freeing.
 60
 61Make sure the architecture doesn't use slab allocator for page table
 62allocation: slab uses page->slab_cache for its pages.
 63This field shares storage with page->ptl.
 64
 65PMD split lock only makes sense if you have more than two page table
 66levels.
 67
 68PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
 69allocation and pgtable_pmd_page_dtor() on freeing.
 70
 71Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 72pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
 73paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 74
 75With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 76
 77NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
 78be handled properly.
 79
 80page->ptl
 81=========
 82
 83page->ptl is used to access split page table lock, where 'page' is struct
 84page of page containing the table. It shares storage with page->private
 85(and few other fields in union).
 86
 87To avoid increasing size of struct page and have best performance, we use a
 88trick:
 89
 90 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
 91   can avoid indirect access and save a cache line.
 92 - if size of spinlock_t is bigger then size of long, we use page->ptl as
 93   pointer to spinlock_t and allocate it dynamically. This allows to use
 94   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
 95   one more cache line for indirect access;
 96
 97The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
 98pgtable_pmd_page_ctor() for PMD table.
 99
100Please, never access page->ptl directly -- use appropriate helper.