locking —
introduction to kernel synchronization and interrupt
  control
The NetBSD kernel provides several synchronization and
  interrupt control primitives. This man page aims to give an overview of these
  interfaces and their proper application. Also included are basic kernel thread
  control primitives and a rough overview of the NetBSD
  kernel design.
The aim of synchronization, threads and interrupt control in the kernel is:
  - To control concurrent access to shared resources (critical sections).
- Spawn tasks from an interrupt in the thread context.
- Mask interrupts from threads.
- Scale on multiple CPU system.
There are three types of contexts in the
    NetBSD kernel:
  - Thread context - running processes (represented by
      struct proc) and light-weight processes
      (represented bystruct lwp, also known as kernel
      threads). Code in this context can sleep, block resources and own
      address-space.
- Software interrupt context - limited by thread context.
      Code in this context must be processed shortly. These interrupts don't own
      any address space context. Software interrupts are a way of deferring
      hardware interrupts to do more expensive processing at a lower interrupt
      priority.
- Hard interrupt context - Code in this context must be
      processed as quickly as possible. It is forbidden for a piece of code to
      sleep or access long-awaited resources here.
The main differences between processes and kernel threads are:
  - A single process can own multiple kernel threads (LWPs).
- A process owns address space context to map userland address space.
- Processes are designed for userland executables and kernel threads for
      in-kernel tasks. The only process running in the kernel-space is
      proc0(called swapper).
Theatomic_ops family of functions provide atomic memory
  operations. There are 7 classes of atomic memory operations available:
  addition, logical “and”, compare-and-swap, decrement, increment,
  logical “or”, swap.
See
    atomic_ops(3).
Condition variables (CVs) are used in the kernel to synchronize access to
  resources that are limited (for example, memory) and to wait for pending I/O
  operations to complete.
See condvar(9).
The membar_ops family of functions provide memory access
  barrier operations necessary for synchronization in multiprocessor execution
  environments that have relaxed load and store order.
See
    membar_ops(3).
The memory barriers can be used to control the order in which memory accesses
  occur, and thus the order in which those accesses become visible to other
  processors. They can be used to implement “lockless” access to
  data structures where the necessary barrier conditions are well understood.
Thread-base adaptive mutexes. These are lightweight, exclusive locks that use
  threads as the focus of synchronization activity. Adaptive mutexes typically
  behave like spinlocks, but under specific conditions an attempt to acquire an
  already held adaptive mutex may cause the acquiring thread to sleep. Sleep
  activity occurs rarely. Busy-waiting is typically more efficient because mutex
  hold times are most often short. In contrast to pure spinlocks, a thread
  holding an adaptive mutex may be pre-empted in the kernel, which can allow for
  reduced latency where soft real-time application are in use on the system.
See mutex(9).
Restartable atomic sequences are user code only sequences which are guaranteed
  to execute without preemption. This property is assured by checking the set of
  restartable atomic sequences registered for a process during
  cpu_switchto(9). If a
  process is found to have been preempted during a restartable sequence, then
  its execution is rolled-back to the start of the sequence by resetting its
  program counter which is saved in its process control block (PCB).
See ras(9).
Reader / writer locks (RW locks) are used in the kernel to synchronize access to
  an object among LWPs (lightweight processes) and soft interrupt handlers. In
  addition to the capabilities provided by mutexes, RW locks distinguish between
  read (shared) and write (exclusive) access.
See rwlock(9).
These functions raise and lower the interrupt priority level. They are used by
  kernel code to block interrupts in critical sections, in order to protect data
  structures.
See spl(9).
The software interrupt framework is designed to provide a generic software
  interrupt mechanism which can be used any time a low-priority callback is
  required. It allows dynamic registration of software interrupts for loadable
  drivers, protocol stacks, software interrupt prioritization, software
  interrupt fair queuing and allows machine-dependent optimizations to reduce
  cost.
See softint(9).
The splraiseipl function raises the system priority
  level to the level specified by icookie, which should
  be a value returned by
  makeiplcookie(9). In
  general, device drivers should not make use of this interface. To ensure
  correct synchronization, device drivers should use the
  condvar(9),
  mutex(9), and
  rwlock(9) interfaces.
See
    splraiseipl(9).
Passive serialization is a reader / writer synchronization mechanism designed
  for lock-less read operations. The read operations may happen from software
  interrupt at IPL_SOFTCLOCK.
See
    pserialize(9).
Passive references allow CPUs to cheaply acquire and release passive references
  to a resource, which guarantee the resource will not be destroyed until the
  reference is released. Acquiring and releasing passive references requires no
  interprocessor synchronization, except when the resource is pending
  destruction.
See psref(9).
Localcounts are used in the kernel to implement a medium-weight reference
  counting mechanism. During normal operations, localcounts do not need the
  interprocessor synchronization associated with
  atomic_ops(3) atomic memory
  operations, and (unlike psref(9))
  localcount references can be held across sleeps and can migrate between CPUs.
  Draining a localcount requires more expensive interprocessor synchronization
  than atomic_ops(3) (similar
  to psref(9)). And localcount
  references require eight bytes of memory per object per-CPU, significantly
  more than atomic_ops(3) and
  almost always more than psref(9).
See
    localcount(9).
The workqueue utility routines are provided to defer work which is needed to be
  processed in a thread context.
See
  workqueue(9).
The following table describes in which contexts the use of the
  NetBSD kernel interfaces are valid. Synchronization
  primitives which are available in more than one context can be used to protect
  shared resources between the contexts they overlap.
Initial SMP support was introduced in NetBSD 2.0 and was
  designed with a giant kernel lock. Through NetBSD 4.0,
  the kernel used spinlocks and a per-CPU interrupt priority level (the
  spl(9) system). These mechanisms
  did not lend themselves well to a multiprocessor environment supporting kernel
  preemption. The use of thread based (lock) synchronization was limited and the
  available synchronization primitive (lockmgr) was inefficient and slow to
  execute. NetBSD 5.0 introduced massive performance
  improvements on multicore hardware by Andrew Doran. This work was sponsored by
  The NetBSD Foundation.
A locking manual first appeared in
    NetBSD 8.0 and was inspired by the corresponding
    locking manuals in FreeBSD
    and DragonFly.