Loading...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | // SPDX-License-Identifier: GPL-2.0 #include <linux/percpu.h> #include <linux/sched.h> #include <linux/osq_lock.h> /* * An MCS like lock especially tailored for optimistic spinning for sleeping * lock implementations (mutex, rwsem, etc). * * Using a single mcs node per CPU is safe because sleeping locks should not be * called from interrupt context and we have preemption disabled while * spinning. */ struct optimistic_spin_node { struct optimistic_spin_node *next, *prev; int locked; /* 1 if lock acquired */ int cpu; /* encoded CPU # + 1 value */ }; static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node); /* * We use the value 0 to represent "no CPU", thus the encoded value * will be the CPU number incremented by 1. */ static inline int encode_cpu(int cpu_nr) { return cpu_nr + 1; } static inline int node_cpu(struct optimistic_spin_node *node) { return node->cpu - 1; } static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val) { int cpu_nr = encoded_cpu_val - 1; return per_cpu_ptr(&osq_node, cpu_nr); } /* * Get a stable @node->next pointer, either for unlock() or unqueue() purposes. * Can return NULL in case we were the last queued and we updated @lock instead. * * If osq_lock() is being cancelled there must be a previous node * and 'old_cpu' is its CPU #. * For osq_unlock() there is never a previous node and old_cpu is * set to OSQ_UNLOCKED_VAL. */ static inline struct optimistic_spin_node * osq_wait_next(struct optimistic_spin_queue *lock, struct optimistic_spin_node *node, int old_cpu) { int curr = encode_cpu(smp_processor_id()); for (;;) { if (atomic_read(&lock->tail) == curr && atomic_cmpxchg_acquire(&lock->tail, curr, old_cpu) == curr) { /* * We were the last queued, we moved @lock back. @prev * will now observe @lock and will complete its * unlock()/unqueue(). */ return NULL; } /* * We must xchg() the @node->next value, because if we were to * leave it in, a concurrent unlock()/unqueue() from * @node->next might complete Step-A and think its @prev is * still valid. * * If the concurrent unlock()/unqueue() wins the race, we'll * wait for either @lock to point to us, through its Step-B, or * wait for a new @node->next from its Step-C. */ if (node->next) { struct optimistic_spin_node *next; next = xchg(&node->next, NULL); if (next) return next; } cpu_relax(); } } bool osq_lock(struct optimistic_spin_queue *lock) { struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); struct optimistic_spin_node *prev, *next; int curr = encode_cpu(smp_processor_id()); int old; node->locked = 0; node->next = NULL; node->cpu = curr; /* * We need both ACQUIRE (pairs with corresponding RELEASE in * unlock() uncontended, or fastpath) and RELEASE (to publish * the node fields we just initialised) semantics when updating * the lock tail. */ old = atomic_xchg(&lock->tail, curr); if (old == OSQ_UNLOCKED_VAL) return true; prev = decode_cpu(old); node->prev = prev; /* * osq_lock() unqueue * * node->prev = prev osq_wait_next() * WMB MB * prev->next = node next->prev = prev // unqueue-C * * Here 'node->prev' and 'next->prev' are the same variable and we need * to ensure these stores happen in-order to avoid corrupting the list. */ smp_wmb(); WRITE_ONCE(prev->next, node); /* * Normally @prev is untouchable after the above store; because at that * moment unlock can proceed and wipe the node element from stack. * * However, since our nodes are static per-cpu storage, we're * guaranteed their existence -- this allows us to apply * cmpxchg in an attempt to undo our queueing. */ /* * Wait to acquire the lock or cancellation. Note that need_resched() * will come with an IPI, which will wake smp_cond_load_relaxed() if it * is implemented with a monitor-wait. vcpu_is_preempted() relies on * polling, be careful. */ if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() || vcpu_is_preempted(node_cpu(node->prev)))) return true; /* unqueue */ /* * Step - A -- stabilize @prev * * Undo our @prev->next assignment; this will make @prev's * unlock()/unqueue() wait for a next pointer since @lock points to us * (or later). */ for (;;) { /* * cpu_relax() below implies a compiler barrier which would * prevent this comparison being optimized away. */ if (data_race(prev->next) == node && cmpxchg(&prev->next, node, NULL) == node) break; /* * We can only fail the cmpxchg() racing against an unlock(), * in which case we should observe @node->locked becoming * true. */ if (smp_load_acquire(&node->locked)) return true; cpu_relax(); /* * Or we race against a concurrent unqueue()'s step-B, in which * case its step-C will write us a new @node->prev pointer. */ prev = READ_ONCE(node->prev); } /* * Step - B -- stabilize @next * * Similar to unlock(), wait for @node->next or move @lock from @node * back to @prev. */ next = osq_wait_next(lock, node, prev->cpu); if (!next) return false; /* * Step - C -- unlink * * @prev is stable because its still waiting for a new @prev->next * pointer, @next is stable because our @node->next pointer is NULL and * it will wait in Step-A. */ WRITE_ONCE(next->prev, prev); WRITE_ONCE(prev->next, next); return false; } void osq_unlock(struct optimistic_spin_queue *lock) { struct optimistic_spin_node *node, *next; int curr = encode_cpu(smp_processor_id()); /* * Fast path for the uncontended case. */ if (likely(atomic_cmpxchg_release(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr)) return; /* * Second most likely case. */ node = this_cpu_ptr(&osq_node); next = xchg(&node->next, NULL); if (next) { WRITE_ONCE(next->locked, 1); return; } next = osq_wait_next(lock, node, OSQ_UNLOCKED_VAL); if (next) WRITE_ONCE(next->locked, 1); } |