mirror of
https://github.com/tbsdtv/linux_media.git
synced 2025-07-24 05:01:03 +02:00
Merge tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
This commit is contained in:
@@ -8,7 +8,7 @@ Although RCU is usually used to protect read-mostly data structures,
|
|||||||
it is possible to use RCU to provide dynamic non-maskable interrupt
|
it is possible to use RCU to provide dynamic non-maskable interrupt
|
||||||
handlers, as well as dynamic irq handlers. This document describes
|
handlers, as well as dynamic irq handlers. This document describes
|
||||||
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
|
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
|
||||||
work in "arch/x86/kernel/traps.c".
|
work in an old version of "arch/x86/kernel/traps.c".
|
||||||
|
|
||||||
The relevant pieces of code are listed below, each followed by a
|
The relevant pieces of code are listed below, each followed by a
|
||||||
brief explanation::
|
brief explanation::
|
||||||
@@ -116,7 +116,7 @@ Answer to Quick Quiz:
|
|||||||
|
|
||||||
This same sad story can happen on other CPUs when using
|
This same sad story can happen on other CPUs when using
|
||||||
a compiler with aggressive pointer-value speculation
|
a compiler with aggressive pointer-value speculation
|
||||||
optimizations.
|
optimizations. (But please don't!)
|
||||||
|
|
||||||
More important, the rcu_dereference_sched() makes it
|
More important, the rcu_dereference_sched() makes it
|
||||||
clear to someone reading the code that the pointer is
|
clear to someone reading the code that the pointer is
|
||||||
|
@@ -38,7 +38,7 @@ by having call_rcu() directly invoke its arguments only if it was called
|
|||||||
from process context. However, this can fail in a similar manner.
|
from process context. However, this can fail in a similar manner.
|
||||||
|
|
||||||
Suppose that an RCU-based algorithm again scans a linked list containing
|
Suppose that an RCU-based algorithm again scans a linked list containing
|
||||||
elements A, B, and C in process contexts, but that it invokes a function
|
elements A, B, and C in process context, but that it invokes a function
|
||||||
on each element as it is scanned. Suppose further that this function
|
on each element as it is scanned. Suppose further that this function
|
||||||
deletes element B from the list, then passes it to call_rcu() for deferred
|
deletes element B from the list, then passes it to call_rcu() for deferred
|
||||||
freeing. This may be a bit unconventional, but it is perfectly legal
|
freeing. This may be a bit unconventional, but it is perfectly legal
|
||||||
@@ -59,7 +59,8 @@ Example 3: Death by Deadlock
|
|||||||
Suppose that call_rcu() is invoked while holding a lock, and that the
|
Suppose that call_rcu() is invoked while holding a lock, and that the
|
||||||
callback function must acquire this same lock. In this case, if
|
callback function must acquire this same lock. In this case, if
|
||||||
call_rcu() were to directly invoke the callback, the result would
|
call_rcu() were to directly invoke the callback, the result would
|
||||||
be self-deadlock.
|
be self-deadlock *even if* this invocation occurred from a later
|
||||||
|
call_rcu() invocation a full grace period later.
|
||||||
|
|
||||||
In some cases, it would possible to restructure to code so that
|
In some cases, it would possible to restructure to code so that
|
||||||
the call_rcu() is delayed until after the lock is released. However,
|
the call_rcu() is delayed until after the lock is released. However,
|
||||||
@@ -85,6 +86,14 @@ Quick Quiz #2:
|
|||||||
|
|
||||||
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
|
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
|
||||||
|
|
||||||
|
It is important to note that userspace RCU implementations *do*
|
||||||
|
permit call_rcu() to directly invoke callbacks, but only if a full
|
||||||
|
grace period has elapsed since those callbacks were queued. This is
|
||||||
|
the case because some userspace environments are extremely constrained.
|
||||||
|
Nevertheless, people writing userspace RCU implementations are strongly
|
||||||
|
encouraged to avoid invoking callbacks from call_rcu(), thus obtaining
|
||||||
|
the deadlock-avoidance benefits called out above.
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
@@ -69,9 +69,8 @@ checking of rcu_dereference() primitives:
|
|||||||
value of the pointer itself, for example, against NULL.
|
value of the pointer itself, for example, against NULL.
|
||||||
|
|
||||||
The rcu_dereference_check() check expression can be any boolean
|
The rcu_dereference_check() check expression can be any boolean
|
||||||
expression, but would normally include a lockdep expression. However,
|
expression, but would normally include a lockdep expression. For a
|
||||||
any boolean expression can be used. For a moderately ornate example,
|
moderately ornate example, consider the following::
|
||||||
consider the following::
|
|
||||||
|
|
||||||
file = rcu_dereference_check(fdt->fd[fd],
|
file = rcu_dereference_check(fdt->fd[fd],
|
||||||
lockdep_is_held(&files->file_lock) ||
|
lockdep_is_held(&files->file_lock) ||
|
||||||
@@ -97,10 +96,10 @@ code, it could instead be written as follows::
|
|||||||
atomic_read(&files->count) == 1);
|
atomic_read(&files->count) == 1);
|
||||||
|
|
||||||
This would verify cases #2 and #3 above, and furthermore lockdep would
|
This would verify cases #2 and #3 above, and furthermore lockdep would
|
||||||
complain if this was used in an RCU read-side critical section unless one
|
complain even if this was used in an RCU read-side critical section unless
|
||||||
of these two cases held. Because rcu_dereference_protected() omits all
|
one of these two cases held. Because rcu_dereference_protected() omits
|
||||||
barriers and compiler constraints, it generates better code than do the
|
all barriers and compiler constraints, it generates better code than do
|
||||||
other flavors of rcu_dereference(). On the other hand, it is illegal
|
the other flavors of rcu_dereference(). On the other hand, it is illegal
|
||||||
to use rcu_dereference_protected() if either the RCU-protected pointer
|
to use rcu_dereference_protected() if either the RCU-protected pointer
|
||||||
or the RCU-protected data that it points to can change concurrently.
|
or the RCU-protected data that it points to can change concurrently.
|
||||||
|
|
||||||
|
@@ -77,15 +77,17 @@ Frequently Asked Questions
|
|||||||
search for the string "Patent" in Documentation/RCU/RTFP.txt to find them.
|
search for the string "Patent" in Documentation/RCU/RTFP.txt to find them.
|
||||||
Of these, one was allowed to lapse by the assignee, and the
|
Of these, one was allowed to lapse by the assignee, and the
|
||||||
others have been contributed to the Linux kernel under GPL.
|
others have been contributed to the Linux kernel under GPL.
|
||||||
|
Many (but not all) have long since expired.
|
||||||
There are now also LGPL implementations of user-level RCU
|
There are now also LGPL implementations of user-level RCU
|
||||||
available (https://liburcu.org/).
|
available (https://liburcu.org/).
|
||||||
|
|
||||||
- I hear that RCU needs work in order to support realtime kernels?
|
- I hear that RCU needs work in order to support realtime kernels?
|
||||||
|
|
||||||
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
|
Realtime-friendly RCU are enabled via the CONFIG_PREEMPTION
|
||||||
kernel configuration parameter.
|
kernel configuration parameter.
|
||||||
|
|
||||||
- Where can I find more information on RCU?
|
- Where can I find more information on RCU?
|
||||||
|
|
||||||
See the Documentation/RCU/RTFP.txt file.
|
See the Documentation/RCU/RTFP.txt file.
|
||||||
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
|
Or point your browser at (https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit)
|
||||||
|
or (https://docs.google.com/document/d/1GCdQC8SDbb54W1shjEXqGZ0Rq8a6kIeYutdSIajfpLA/edit?usp=sharing).
|
||||||
|
@@ -19,8 +19,9 @@ Follow these rules to keep your RCU code working properly:
|
|||||||
can reload the value, and won't your code have fun with two
|
can reload the value, and won't your code have fun with two
|
||||||
different values for a single pointer! Without rcu_dereference(),
|
different values for a single pointer! Without rcu_dereference(),
|
||||||
DEC Alpha can load a pointer, dereference that pointer, and
|
DEC Alpha can load a pointer, dereference that pointer, and
|
||||||
return data preceding initialization that preceded the store of
|
return data preceding initialization that preceded the store
|
||||||
the pointer.
|
of the pointer. (As noted later, in recent kernels READ_ONCE()
|
||||||
|
also prevents DEC Alpha from playing these tricks.)
|
||||||
|
|
||||||
In addition, the volatile cast in rcu_dereference() prevents the
|
In addition, the volatile cast in rcu_dereference() prevents the
|
||||||
compiler from deducing the resulting pointer value. Please see
|
compiler from deducing the resulting pointer value. Please see
|
||||||
@@ -34,7 +35,7 @@ Follow these rules to keep your RCU code working properly:
|
|||||||
takes on the role of the lockless_dereference() primitive that
|
takes on the role of the lockless_dereference() primitive that
|
||||||
was removed in v4.15.
|
was removed in v4.15.
|
||||||
|
|
||||||
- You are only permitted to use rcu_dereference on pointer values.
|
- You are only permitted to use rcu_dereference() on pointer values.
|
||||||
The compiler simply knows too much about integral values to
|
The compiler simply knows too much about integral values to
|
||||||
trust it to carry dependencies through integer operations.
|
trust it to carry dependencies through integer operations.
|
||||||
There are a very few exceptions, namely that you can temporarily
|
There are a very few exceptions, namely that you can temporarily
|
||||||
@@ -240,6 +241,7 @@ precautions. To see this, consider the following code fragment::
|
|||||||
struct foo *q;
|
struct foo *q;
|
||||||
int r1, r2;
|
int r1, r2;
|
||||||
|
|
||||||
|
rcu_read_lock();
|
||||||
p = rcu_dereference(gp2);
|
p = rcu_dereference(gp2);
|
||||||
if (p == NULL)
|
if (p == NULL)
|
||||||
return;
|
return;
|
||||||
@@ -248,7 +250,10 @@ precautions. To see this, consider the following code fragment::
|
|||||||
if (p == q) {
|
if (p == q) {
|
||||||
/* The compiler decides that q->c is same as p->c. */
|
/* The compiler decides that q->c is same as p->c. */
|
||||||
r2 = p->c; /* Could get 44 on weakly order system. */
|
r2 = p->c; /* Could get 44 on weakly order system. */
|
||||||
|
} else {
|
||||||
|
r2 = p->c - r1; /* Unconditional access to p->c. */
|
||||||
}
|
}
|
||||||
|
rcu_read_unlock();
|
||||||
do_something_with(r1, r2);
|
do_something_with(r1, r2);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -297,6 +302,7 @@ Then one approach is to use locking, for example, as follows::
|
|||||||
struct foo *q;
|
struct foo *q;
|
||||||
int r1, r2;
|
int r1, r2;
|
||||||
|
|
||||||
|
rcu_read_lock();
|
||||||
p = rcu_dereference(gp2);
|
p = rcu_dereference(gp2);
|
||||||
if (p == NULL)
|
if (p == NULL)
|
||||||
return;
|
return;
|
||||||
@@ -306,7 +312,12 @@ Then one approach is to use locking, for example, as follows::
|
|||||||
if (p == q) {
|
if (p == q) {
|
||||||
/* The compiler decides that q->c is same as p->c. */
|
/* The compiler decides that q->c is same as p->c. */
|
||||||
r2 = p->c; /* Locking guarantees r2 == 144. */
|
r2 = p->c; /* Locking guarantees r2 == 144. */
|
||||||
|
} else {
|
||||||
|
spin_lock(&q->lock);
|
||||||
|
r2 = q->c - r1;
|
||||||
|
spin_unlock(&q->lock);
|
||||||
}
|
}
|
||||||
|
rcu_read_unlock();
|
||||||
spin_unlock(&p->lock);
|
spin_unlock(&p->lock);
|
||||||
do_something_with(r1, r2);
|
do_something_with(r1, r2);
|
||||||
}
|
}
|
||||||
@@ -364,7 +375,7 @@ the exact value of "p" even in the not-equals case. This allows the
|
|||||||
compiler to make the return values independent of the load from "gp",
|
compiler to make the return values independent of the load from "gp",
|
||||||
in turn destroying the ordering between this load and the loads of the
|
in turn destroying the ordering between this load and the loads of the
|
||||||
return values. This can result in "p->b" returning pre-initialization
|
return values. This can result in "p->b" returning pre-initialization
|
||||||
garbage values.
|
garbage values on weakly ordered systems.
|
||||||
|
|
||||||
In short, rcu_dereference() is *not* optional when you are going to
|
In short, rcu_dereference() is *not* optional when you are going to
|
||||||
dereference the resulting pointer.
|
dereference the resulting pointer.
|
||||||
@@ -430,7 +441,7 @@ member of the rcu_dereference() to use in various situations:
|
|||||||
SPARSE CHECKING OF RCU-PROTECTED POINTERS
|
SPARSE CHECKING OF RCU-PROTECTED POINTERS
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
The sparse static-analysis tool checks for direct access to RCU-protected
|
The sparse static-analysis tool checks for non-RCU access to RCU-protected
|
||||||
pointers, which can result in "interesting" bugs due to compiler
|
pointers, which can result in "interesting" bugs due to compiler
|
||||||
optimizations involving invented loads and perhaps also load tearing.
|
optimizations involving invented loads and perhaps also load tearing.
|
||||||
For example, suppose someone mistakenly does something like this::
|
For example, suppose someone mistakenly does something like this::
|
||||||
|
@@ -5,37 +5,12 @@ RCU and Unloadable Modules
|
|||||||
|
|
||||||
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
|
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
|
||||||
|
|
||||||
RCU (read-copy update) is a synchronization mechanism that can be thought
|
RCU updaters sometimes use call_rcu() to initiate an asynchronous wait for
|
||||||
of as a replacement for read-writer locking (among other things), but with
|
a grace period to elapse. This primitive takes a pointer to an rcu_head
|
||||||
very low-overhead readers that are immune to deadlock, priority inversion,
|
struct placed within the RCU-protected data structure and another pointer
|
||||||
and unbounded latency. RCU read-side critical sections are delimited
|
to a function that may be invoked later to free that structure. Code to
|
||||||
by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPTION
|
delete an element p from the linked list from IRQ context might then be
|
||||||
kernels, generate no code whatsoever.
|
as follows::
|
||||||
|
|
||||||
This means that RCU writers are unaware of the presence of concurrent
|
|
||||||
readers, so that RCU updates to shared data must be undertaken quite
|
|
||||||
carefully, leaving an old version of the data structure in place until all
|
|
||||||
pre-existing readers have finished. These old versions are needed because
|
|
||||||
such readers might hold a reference to them. RCU updates can therefore be
|
|
||||||
rather expensive, and RCU is thus best suited for read-mostly situations.
|
|
||||||
|
|
||||||
How can an RCU writer possibly determine when all readers are finished,
|
|
||||||
given that readers might well leave absolutely no trace of their
|
|
||||||
presence? There is a synchronize_rcu() primitive that blocks until all
|
|
||||||
pre-existing readers have completed. An updater wishing to delete an
|
|
||||||
element p from a linked list might do the following, while holding an
|
|
||||||
appropriate lock, of course::
|
|
||||||
|
|
||||||
list_del_rcu(p);
|
|
||||||
synchronize_rcu();
|
|
||||||
kfree(p);
|
|
||||||
|
|
||||||
But the above code cannot be used in IRQ context -- the call_rcu()
|
|
||||||
primitive must be used instead. This primitive takes a pointer to an
|
|
||||||
rcu_head struct placed within the RCU-protected data structure and
|
|
||||||
another pointer to a function that may be invoked later to free that
|
|
||||||
structure. Code to delete an element p from the linked list from IRQ
|
|
||||||
context might then be as follows::
|
|
||||||
|
|
||||||
list_del_rcu(p);
|
list_del_rcu(p);
|
||||||
call_rcu(&p->rcu, p_callback);
|
call_rcu(&p->rcu, p_callback);
|
||||||
@@ -54,7 +29,7 @@ IRQ context. The function p_callback() might be defined as follows::
|
|||||||
Unloading Modules That Use call_rcu()
|
Unloading Modules That Use call_rcu()
|
||||||
-------------------------------------
|
-------------------------------------
|
||||||
|
|
||||||
But what if p_callback is defined in an unloadable module?
|
But what if the p_callback() function is defined in an unloadable module?
|
||||||
|
|
||||||
If we unload the module while some RCU callbacks are pending,
|
If we unload the module while some RCU callbacks are pending,
|
||||||
the CPUs executing these callbacks are going to be severely
|
the CPUs executing these callbacks are going to be severely
|
||||||
@@ -67,20 +42,21 @@ grace period to elapse, it does not wait for the callbacks to complete.
|
|||||||
|
|
||||||
One might be tempted to try several back-to-back synchronize_rcu()
|
One might be tempted to try several back-to-back synchronize_rcu()
|
||||||
calls, but this is still not guaranteed to work. If there is a very
|
calls, but this is still not guaranteed to work. If there is a very
|
||||||
heavy RCU-callback load, then some of the callbacks might be deferred
|
heavy RCU-callback load, then some of the callbacks might be deferred in
|
||||||
in order to allow other processing to proceed. Such deferral is required
|
order to allow other processing to proceed. For but one example, such
|
||||||
in realtime kernels in order to avoid excessive scheduling latencies.
|
deferral is required in realtime kernels in order to avoid excessive
|
||||||
|
scheduling latencies.
|
||||||
|
|
||||||
|
|
||||||
rcu_barrier()
|
rcu_barrier()
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
We instead need the rcu_barrier() primitive. Rather than waiting for
|
This situation can be handled by the rcu_barrier() primitive. Rather
|
||||||
a grace period to elapse, rcu_barrier() waits for all outstanding RCU
|
than waiting for a grace period to elapse, rcu_barrier() waits for all
|
||||||
callbacks to complete. Please note that rcu_barrier() does **not** imply
|
outstanding RCU callbacks to complete. Please note that rcu_barrier()
|
||||||
synchronize_rcu(), in particular, if there are no RCU callbacks queued
|
does **not** imply synchronize_rcu(), in particular, if there are no RCU
|
||||||
anywhere, rcu_barrier() is within its rights to return immediately,
|
callbacks queued anywhere, rcu_barrier() is within its rights to return
|
||||||
without waiting for a grace period to elapse.
|
immediately, without waiting for anything, let alone a grace period.
|
||||||
|
|
||||||
Pseudo-code using rcu_barrier() is as follows:
|
Pseudo-code using rcu_barrier() is as follows:
|
||||||
|
|
||||||
@@ -89,83 +65,86 @@ Pseudo-code using rcu_barrier() is as follows:
|
|||||||
3. Allow the module to be unloaded.
|
3. Allow the module to be unloaded.
|
||||||
|
|
||||||
There is also an srcu_barrier() function for SRCU, and you of course
|
There is also an srcu_barrier() function for SRCU, and you of course
|
||||||
must match the flavor of rcu_barrier() with that of call_rcu(). If your
|
must match the flavor of srcu_barrier() with that of call_srcu().
|
||||||
module uses multiple flavors of call_rcu(), then it must also use multiple
|
If your module uses multiple srcu_struct structures, then it must also
|
||||||
flavors of rcu_barrier() when unloading that module. For example, if
|
use multiple invocations of srcu_barrier() when unloading that module.
|
||||||
it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on
|
For example, if it uses call_rcu(), call_srcu() on srcu_struct_1, and
|
||||||
srcu_struct_2, then the following three lines of code will be required
|
call_srcu() on srcu_struct_2, then the following three lines of code
|
||||||
when unloading::
|
will be required when unloading::
|
||||||
|
|
||||||
1 rcu_barrier();
|
1 rcu_barrier();
|
||||||
2 srcu_barrier(&srcu_struct_1);
|
2 srcu_barrier(&srcu_struct_1);
|
||||||
3 srcu_barrier(&srcu_struct_2);
|
3 srcu_barrier(&srcu_struct_2);
|
||||||
|
|
||||||
The rcutorture module makes use of rcu_barrier() in its exit function
|
If latency is of the essence, workqueues could be used to run these
|
||||||
as follows::
|
three functions concurrently.
|
||||||
|
|
||||||
1 static void
|
An ancient version of the rcutorture module makes use of rcu_barrier()
|
||||||
2 rcu_torture_cleanup(void)
|
in its exit function as follows::
|
||||||
3 {
|
|
||||||
4 int i;
|
1 static void
|
||||||
5
|
2 rcu_torture_cleanup(void)
|
||||||
6 fullstop = 1;
|
3 {
|
||||||
7 if (shuffler_task != NULL) {
|
4 int i;
|
||||||
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
|
5
|
||||||
9 kthread_stop(shuffler_task);
|
6 fullstop = 1;
|
||||||
10 }
|
7 if (shuffler_task != NULL) {
|
||||||
11 shuffler_task = NULL;
|
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
|
||||||
|
9 kthread_stop(shuffler_task);
|
||||||
|
10 }
|
||||||
|
11 shuffler_task = NULL;
|
||||||
12
|
12
|
||||||
13 if (writer_task != NULL) {
|
13 if (writer_task != NULL) {
|
||||||
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
|
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
|
||||||
15 kthread_stop(writer_task);
|
15 kthread_stop(writer_task);
|
||||||
16 }
|
16 }
|
||||||
17 writer_task = NULL;
|
17 writer_task = NULL;
|
||||||
18
|
18
|
||||||
19 if (reader_tasks != NULL) {
|
19 if (reader_tasks != NULL) {
|
||||||
20 for (i = 0; i < nrealreaders; i++) {
|
20 for (i = 0; i < nrealreaders; i++) {
|
||||||
21 if (reader_tasks[i] != NULL) {
|
21 if (reader_tasks[i] != NULL) {
|
||||||
22 VERBOSE_PRINTK_STRING(
|
22 VERBOSE_PRINTK_STRING(
|
||||||
23 "Stopping rcu_torture_reader task");
|
23 "Stopping rcu_torture_reader task");
|
||||||
24 kthread_stop(reader_tasks[i]);
|
24 kthread_stop(reader_tasks[i]);
|
||||||
25 }
|
25 }
|
||||||
26 reader_tasks[i] = NULL;
|
26 reader_tasks[i] = NULL;
|
||||||
27 }
|
27 }
|
||||||
28 kfree(reader_tasks);
|
28 kfree(reader_tasks);
|
||||||
29 reader_tasks = NULL;
|
29 reader_tasks = NULL;
|
||||||
30 }
|
30 }
|
||||||
31 rcu_torture_current = NULL;
|
31 rcu_torture_current = NULL;
|
||||||
32
|
32
|
||||||
33 if (fakewriter_tasks != NULL) {
|
33 if (fakewriter_tasks != NULL) {
|
||||||
34 for (i = 0; i < nfakewriters; i++) {
|
34 for (i = 0; i < nfakewriters; i++) {
|
||||||
35 if (fakewriter_tasks[i] != NULL) {
|
35 if (fakewriter_tasks[i] != NULL) {
|
||||||
36 VERBOSE_PRINTK_STRING(
|
36 VERBOSE_PRINTK_STRING(
|
||||||
37 "Stopping rcu_torture_fakewriter task");
|
37 "Stopping rcu_torture_fakewriter task");
|
||||||
38 kthread_stop(fakewriter_tasks[i]);
|
38 kthread_stop(fakewriter_tasks[i]);
|
||||||
39 }
|
39 }
|
||||||
40 fakewriter_tasks[i] = NULL;
|
40 fakewriter_tasks[i] = NULL;
|
||||||
41 }
|
41 }
|
||||||
42 kfree(fakewriter_tasks);
|
42 kfree(fakewriter_tasks);
|
||||||
43 fakewriter_tasks = NULL;
|
43 fakewriter_tasks = NULL;
|
||||||
44 }
|
44 }
|
||||||
45
|
45
|
||||||
46 if (stats_task != NULL) {
|
46 if (stats_task != NULL) {
|
||||||
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
|
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
|
||||||
48 kthread_stop(stats_task);
|
48 kthread_stop(stats_task);
|
||||||
49 }
|
49 }
|
||||||
50 stats_task = NULL;
|
50 stats_task = NULL;
|
||||||
51
|
51
|
||||||
52 /* Wait for all RCU callbacks to fire. */
|
52 /* Wait for all RCU callbacks to fire. */
|
||||||
53 rcu_barrier();
|
53 rcu_barrier();
|
||||||
54
|
54
|
||||||
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
|
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
|
||||||
56
|
56
|
||||||
57 if (cur_ops->cleanup != NULL)
|
57 if (cur_ops->cleanup != NULL)
|
||||||
58 cur_ops->cleanup();
|
58 cur_ops->cleanup();
|
||||||
59 if (atomic_read(&n_rcu_torture_error))
|
59 if (atomic_read(&n_rcu_torture_error))
|
||||||
60 rcu_torture_print_module_parms("End of test: FAILURE");
|
60 rcu_torture_print_module_parms("End of test: FAILURE");
|
||||||
61 else
|
61 else
|
||||||
62 rcu_torture_print_module_parms("End of test: SUCCESS");
|
62 rcu_torture_print_module_parms("End of test: SUCCESS");
|
||||||
63 }
|
63 }
|
||||||
|
|
||||||
Line 6 sets a global variable that prevents any RCU callbacks from
|
Line 6 sets a global variable that prevents any RCU callbacks from
|
||||||
re-posting themselves. This will not be necessary in most cases, since
|
re-posting themselves. This will not be necessary in most cases, since
|
||||||
@@ -190,16 +169,17 @@ Quick Quiz #1:
|
|||||||
:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>`
|
:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>`
|
||||||
|
|
||||||
Your module might have additional complications. For example, if your
|
Your module might have additional complications. For example, if your
|
||||||
module invokes call_rcu() from timers, you will need to first cancel all
|
module invokes call_rcu() from timers, you will need to first refrain
|
||||||
the timers, and only then invoke rcu_barrier() to wait for any remaining
|
from posting new timers, cancel (or wait for) all the already-posted
|
||||||
|
timers, and only then invoke rcu_barrier() to wait for any remaining
|
||||||
RCU callbacks to complete.
|
RCU callbacks to complete.
|
||||||
|
|
||||||
Of course, if you module uses call_rcu(), you will need to invoke
|
Of course, if your module uses call_rcu(), you will need to invoke
|
||||||
rcu_barrier() before unloading. Similarly, if your module uses
|
rcu_barrier() before unloading. Similarly, if your module uses
|
||||||
call_srcu(), you will need to invoke srcu_barrier() before unloading,
|
call_srcu(), you will need to invoke srcu_barrier() before unloading,
|
||||||
and on the same srcu_struct structure. If your module uses call_rcu()
|
and on the same srcu_struct structure. If your module uses call_rcu()
|
||||||
**and** call_srcu(), then you will need to invoke rcu_barrier() **and**
|
**and** call_srcu(), then (as noted above) you will need to invoke
|
||||||
srcu_barrier().
|
rcu_barrier() **and** srcu_barrier().
|
||||||
|
|
||||||
|
|
||||||
Implementing rcu_barrier()
|
Implementing rcu_barrier()
|
||||||
@@ -211,27 +191,40 @@ queues. His implementation queues an RCU callback on each of the per-CPU
|
|||||||
callback queues, and then waits until they have all started executing, at
|
callback queues, and then waits until they have all started executing, at
|
||||||
which point, all earlier RCU callbacks are guaranteed to have completed.
|
which point, all earlier RCU callbacks are guaranteed to have completed.
|
||||||
|
|
||||||
The original code for rcu_barrier() was as follows::
|
The original code for rcu_barrier() was roughly as follows::
|
||||||
|
|
||||||
1 void rcu_barrier(void)
|
1 void rcu_barrier(void)
|
||||||
2 {
|
2 {
|
||||||
3 BUG_ON(in_interrupt());
|
3 BUG_ON(in_interrupt());
|
||||||
4 /* Take cpucontrol mutex to protect against CPU hotplug */
|
4 /* Take cpucontrol mutex to protect against CPU hotplug */
|
||||||
5 mutex_lock(&rcu_barrier_mutex);
|
5 mutex_lock(&rcu_barrier_mutex);
|
||||||
6 init_completion(&rcu_barrier_completion);
|
6 init_completion(&rcu_barrier_completion);
|
||||||
7 atomic_set(&rcu_barrier_cpu_count, 0);
|
7 atomic_set(&rcu_barrier_cpu_count, 1);
|
||||||
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
|
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
|
||||||
9 wait_for_completion(&rcu_barrier_completion);
|
9 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
|
||||||
10 mutex_unlock(&rcu_barrier_mutex);
|
10 complete(&rcu_barrier_completion);
|
||||||
11 }
|
11 wait_for_completion(&rcu_barrier_completion);
|
||||||
|
12 mutex_unlock(&rcu_barrier_mutex);
|
||||||
|
13 }
|
||||||
|
|
||||||
Line 3 verifies that the caller is in process context, and lines 5 and 10
|
Line 3 verifies that the caller is in process context, and lines 5 and 12
|
||||||
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
|
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
|
||||||
global completion and counters at a time, which are initialized on lines
|
global completion and counters at a time, which are initialized on lines
|
||||||
6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is
|
6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is
|
||||||
shown below. Note that the final "1" in on_each_cpu()'s argument list
|
shown below. Note that the final "1" in on_each_cpu()'s argument list
|
||||||
ensures that all the calls to rcu_barrier_func() will have completed
|
ensures that all the calls to rcu_barrier_func() will have completed
|
||||||
before on_each_cpu() returns. Line 9 then waits for the completion.
|
before on_each_cpu() returns. Line 9 removes the initial count from
|
||||||
|
rcu_barrier_cpu_count, and if this count is now zero, line 10 finalizes
|
||||||
|
the completion, which prevents line 11 from blocking. Either way,
|
||||||
|
line 11 then waits (if needed) for the completion.
|
||||||
|
|
||||||
|
.. _rcubarrier_quiz_2:
|
||||||
|
|
||||||
|
Quick Quiz #2:
|
||||||
|
Why doesn't line 8 initialize rcu_barrier_cpu_count to zero,
|
||||||
|
thereby avoiding the need for lines 9 and 10?
|
||||||
|
|
||||||
|
:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>`
|
||||||
|
|
||||||
This code was rewritten in 2008 and several times thereafter, but this
|
This code was rewritten in 2008 and several times thereafter, but this
|
||||||
still gives the general idea.
|
still gives the general idea.
|
||||||
@@ -239,21 +232,21 @@ still gives the general idea.
|
|||||||
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
|
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
|
||||||
to post an RCU callback, as follows::
|
to post an RCU callback, as follows::
|
||||||
|
|
||||||
1 static void rcu_barrier_func(void *notused)
|
1 static void rcu_barrier_func(void *notused)
|
||||||
2 {
|
2 {
|
||||||
3 int cpu = smp_processor_id();
|
3 int cpu = smp_processor_id();
|
||||||
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
||||||
5 struct rcu_head *head;
|
5 struct rcu_head *head;
|
||||||
6
|
6
|
||||||
7 head = &rdp->barrier;
|
7 head = &rdp->barrier;
|
||||||
8 atomic_inc(&rcu_barrier_cpu_count);
|
8 atomic_inc(&rcu_barrier_cpu_count);
|
||||||
9 call_rcu(head, rcu_barrier_callback);
|
9 call_rcu(head, rcu_barrier_callback);
|
||||||
10 }
|
10 }
|
||||||
|
|
||||||
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
|
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
|
||||||
which contains the struct rcu_head that needed for the later call to
|
which contains the struct rcu_head that needed for the later call to
|
||||||
call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line
|
call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line
|
||||||
8 increments a global counter. This counter will later be decremented
|
8 increments the global counter. This counter will later be decremented
|
||||||
by the callback. Line 9 then registers the rcu_barrier_callback() on
|
by the callback. Line 9 then registers the rcu_barrier_callback() on
|
||||||
the current CPU's queue.
|
the current CPU's queue.
|
||||||
|
|
||||||
@@ -261,33 +254,34 @@ The rcu_barrier_callback() function simply atomically decrements the
|
|||||||
rcu_barrier_cpu_count variable and finalizes the completion when it
|
rcu_barrier_cpu_count variable and finalizes the completion when it
|
||||||
reaches zero, as follows::
|
reaches zero, as follows::
|
||||||
|
|
||||||
1 static void rcu_barrier_callback(struct rcu_head *notused)
|
1 static void rcu_barrier_callback(struct rcu_head *notused)
|
||||||
2 {
|
2 {
|
||||||
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
|
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
|
||||||
4 complete(&rcu_barrier_completion);
|
4 complete(&rcu_barrier_completion);
|
||||||
5 }
|
5 }
|
||||||
|
|
||||||
.. _rcubarrier_quiz_2:
|
.. _rcubarrier_quiz_3:
|
||||||
|
|
||||||
Quick Quiz #2:
|
Quick Quiz #3:
|
||||||
What happens if CPU 0's rcu_barrier_func() executes
|
What happens if CPU 0's rcu_barrier_func() executes
|
||||||
immediately (thus incrementing rcu_barrier_cpu_count to the
|
immediately (thus incrementing rcu_barrier_cpu_count to the
|
||||||
value one), but the other CPU's rcu_barrier_func() invocations
|
value one), but the other CPU's rcu_barrier_func() invocations
|
||||||
are delayed for a full grace period? Couldn't this result in
|
are delayed for a full grace period? Couldn't this result in
|
||||||
rcu_barrier() returning prematurely?
|
rcu_barrier() returning prematurely?
|
||||||
|
|
||||||
:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>`
|
:ref:`Answer to Quick Quiz #3 <answer_rcubarrier_quiz_3>`
|
||||||
|
|
||||||
The current rcu_barrier() implementation is more complex, due to the need
|
The current rcu_barrier() implementation is more complex, due to the need
|
||||||
to avoid disturbing idle CPUs (especially on battery-powered systems)
|
to avoid disturbing idle CPUs (especially on battery-powered systems)
|
||||||
and the need to minimally disturb non-idle CPUs in real-time systems.
|
and the need to minimally disturb non-idle CPUs in real-time systems.
|
||||||
However, the code above illustrates the concepts.
|
In addition, a great many optimizations have been applied. However,
|
||||||
|
the code above illustrates the concepts.
|
||||||
|
|
||||||
|
|
||||||
rcu_barrier() Summary
|
rcu_barrier() Summary
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
The rcu_barrier() primitive has seen relatively little use, since most
|
The rcu_barrier() primitive is used relatively infrequently, since most
|
||||||
code using RCU is in the core kernel rather than in modules. However, if
|
code using RCU is in the core kernel rather than in modules. However, if
|
||||||
you are using RCU from an unloadable module, you need to use rcu_barrier()
|
you are using RCU from an unloadable module, you need to use rcu_barrier()
|
||||||
so that your module may be safely unloaded.
|
so that your module may be safely unloaded.
|
||||||
@@ -302,7 +296,8 @@ Quick Quiz #1:
|
|||||||
Is there any other situation where rcu_barrier() might
|
Is there any other situation where rcu_barrier() might
|
||||||
be required?
|
be required?
|
||||||
|
|
||||||
Answer: Interestingly enough, rcu_barrier() was not originally
|
Answer:
|
||||||
|
Interestingly enough, rcu_barrier() was not originally
|
||||||
implemented for module unloading. Nikita Danilov was using
|
implemented for module unloading. Nikita Danilov was using
|
||||||
RCU in a filesystem, which resulted in a similar situation at
|
RCU in a filesystem, which resulted in a similar situation at
|
||||||
filesystem-unmount time. Dipankar Sarma coded up rcu_barrier()
|
filesystem-unmount time. Dipankar Sarma coded up rcu_barrier()
|
||||||
@@ -318,13 +313,48 @@ Answer: Interestingly enough, rcu_barrier() was not originally
|
|||||||
.. _answer_rcubarrier_quiz_2:
|
.. _answer_rcubarrier_quiz_2:
|
||||||
|
|
||||||
Quick Quiz #2:
|
Quick Quiz #2:
|
||||||
|
Why doesn't line 8 initialize rcu_barrier_cpu_count to zero,
|
||||||
|
thereby avoiding the need for lines 9 and 10?
|
||||||
|
|
||||||
|
Answer:
|
||||||
|
Suppose that the on_each_cpu() function shown on line 8 was
|
||||||
|
delayed, so that CPU 0's rcu_barrier_func() executed and
|
||||||
|
the corresponding grace period elapsed, all before CPU 1's
|
||||||
|
rcu_barrier_func() started executing. This would result in
|
||||||
|
rcu_barrier_cpu_count being decremented to zero, so that line
|
||||||
|
11's wait_for_completion() would return immediately, failing to
|
||||||
|
wait for CPU 1's callbacks to be invoked.
|
||||||
|
|
||||||
|
Note that this was not a problem when the rcu_barrier() code
|
||||||
|
was first added back in 2005. This is because on_each_cpu()
|
||||||
|
disables preemption, which acted as an RCU read-side critical
|
||||||
|
section, thus preventing CPU 0's grace period from completing
|
||||||
|
until on_each_cpu() had dealt with all of the CPUs. However,
|
||||||
|
with the advent of preemptible RCU, rcu_barrier() no longer
|
||||||
|
waited on nonpreemptible regions of code in preemptible kernels,
|
||||||
|
that being the job of the new rcu_barrier_sched() function.
|
||||||
|
|
||||||
|
However, with the RCU flavor consolidation around v4.20, this
|
||||||
|
possibility was once again ruled out, because the consolidated
|
||||||
|
RCU once again waits on nonpreemptible regions of code.
|
||||||
|
|
||||||
|
Nevertheless, that extra count might still be a good idea.
|
||||||
|
Relying on these sort of accidents of implementation can result
|
||||||
|
in later surprise bugs when the implementation changes.
|
||||||
|
|
||||||
|
:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`
|
||||||
|
|
||||||
|
.. _answer_rcubarrier_quiz_3:
|
||||||
|
|
||||||
|
Quick Quiz #3:
|
||||||
What happens if CPU 0's rcu_barrier_func() executes
|
What happens if CPU 0's rcu_barrier_func() executes
|
||||||
immediately (thus incrementing rcu_barrier_cpu_count to the
|
immediately (thus incrementing rcu_barrier_cpu_count to the
|
||||||
value one), but the other CPU's rcu_barrier_func() invocations
|
value one), but the other CPU's rcu_barrier_func() invocations
|
||||||
are delayed for a full grace period? Couldn't this result in
|
are delayed for a full grace period? Couldn't this result in
|
||||||
rcu_barrier() returning prematurely?
|
rcu_barrier() returning prematurely?
|
||||||
|
|
||||||
Answer: This cannot happen. The reason is that on_each_cpu() has its last
|
Answer:
|
||||||
|
This cannot happen. The reason is that on_each_cpu() has its last
|
||||||
argument, the wait flag, set to "1". This flag is passed through
|
argument, the wait flag, set to "1". This flag is passed through
|
||||||
to smp_call_function() and further to smp_call_function_on_cpu(),
|
to smp_call_function() and further to smp_call_function_on_cpu(),
|
||||||
causing this latter to spin until the cross-CPU invocation of
|
causing this latter to spin until the cross-CPU invocation of
|
||||||
@@ -336,18 +366,15 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last
|
|||||||
|
|
||||||
Therefore, on_each_cpu() disables preemption across its call
|
Therefore, on_each_cpu() disables preemption across its call
|
||||||
to smp_call_function() and also across the local call to
|
to smp_call_function() and also across the local call to
|
||||||
rcu_barrier_func(). This prevents the local CPU from context
|
rcu_barrier_func(). Because recent RCU implementations treat
|
||||||
switching, again preventing grace periods from completing. This
|
preemption-disabled regions of code as RCU read-side critical
|
||||||
|
sections, this prevents grace periods from completing. This
|
||||||
means that all CPUs have executed rcu_barrier_func() before
|
means that all CPUs have executed rcu_barrier_func() before
|
||||||
the first rcu_barrier_callback() can possibly execute, in turn
|
the first rcu_barrier_callback() can possibly execute, in turn
|
||||||
preventing rcu_barrier_cpu_count from prematurely reaching zero.
|
preventing rcu_barrier_cpu_count from prematurely reaching zero.
|
||||||
|
|
||||||
Currently, -rt implementations of RCU keep but a single global
|
But if on_each_cpu() ever decides to forgo disabling preemption,
|
||||||
queue for RCU callbacks, and thus do not suffer from this
|
as might well happen due to real-time latency considerations,
|
||||||
problem. However, when the -rt RCU eventually does have per-CPU
|
initializing rcu_barrier_cpu_count to one will save the day.
|
||||||
callback queues, things will have to change. One simple change
|
|
||||||
is to add an rcu_read_lock() before line 8 of rcu_barrier()
|
|
||||||
and an rcu_read_unlock() after line 8 of this same function. If
|
|
||||||
you can think of a better change, please let me know!
|
|
||||||
|
|
||||||
:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`
|
:ref:`Back to Quick Quiz #3 <rcubarrier_quiz_3>`
|
||||||
|
@@ -14,19 +14,19 @@ Using 'nulls'
|
|||||||
=============
|
=============
|
||||||
|
|
||||||
Using special makers (called 'nulls') is a convenient way
|
Using special makers (called 'nulls') is a convenient way
|
||||||
to solve following problem :
|
to solve following problem.
|
||||||
|
|
||||||
A typical RCU linked list managing objects which are
|
Without 'nulls', a typical RCU linked list managing objects which are
|
||||||
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
|
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can use the following
|
||||||
use following algos :
|
algorithms:
|
||||||
|
|
||||||
1) Lookup algo
|
1) Lookup algorithm
|
||||||
--------------
|
-------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
rcu_read_lock()
|
|
||||||
begin:
|
begin:
|
||||||
|
rcu_read_lock()
|
||||||
obj = lockless_lookup(key);
|
obj = lockless_lookup(key);
|
||||||
if (obj) {
|
if (obj) {
|
||||||
if (!try_get_ref(obj)) // might fail for free objects
|
if (!try_get_ref(obj)) // might fail for free objects
|
||||||
@@ -38,6 +38,7 @@ use following algos :
|
|||||||
*/
|
*/
|
||||||
if (obj->key != key) { // not the object we expected
|
if (obj->key != key) { // not the object we expected
|
||||||
put_ref(obj);
|
put_ref(obj);
|
||||||
|
rcu_read_unlock();
|
||||||
goto begin;
|
goto begin;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -52,9 +53,9 @@ but a version with an additional memory barrier (smp_rmb())
|
|||||||
{
|
{
|
||||||
struct hlist_node *node, *next;
|
struct hlist_node *node, *next;
|
||||||
for (pos = rcu_dereference((head)->first);
|
for (pos = rcu_dereference((head)->first);
|
||||||
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
|
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
|
||||||
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
||||||
pos = rcu_dereference(next))
|
pos = rcu_dereference(next))
|
||||||
if (obj->key == key)
|
if (obj->key == key)
|
||||||
return obj;
|
return obj;
|
||||||
return NULL;
|
return NULL;
|
||||||
@@ -64,9 +65,9 @@ And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
|
|||||||
|
|
||||||
struct hlist_node *node;
|
struct hlist_node *node;
|
||||||
for (pos = rcu_dereference((head)->first);
|
for (pos = rcu_dereference((head)->first);
|
||||||
pos && ({ prefetch(pos->next); 1; }) &&
|
pos && ({ prefetch(pos->next); 1; }) &&
|
||||||
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
|
||||||
pos = rcu_dereference(pos->next))
|
pos = rcu_dereference(pos->next))
|
||||||
if (obj->key == key)
|
if (obj->key == key)
|
||||||
return obj;
|
return obj;
|
||||||
return NULL;
|
return NULL;
|
||||||
@@ -82,36 +83,32 @@ Quoting Corey Minyard::
|
|||||||
solved by pre-fetching the "next" field (with proper barriers) before
|
solved by pre-fetching the "next" field (with proper barriers) before
|
||||||
checking the key."
|
checking the key."
|
||||||
|
|
||||||
2) Insert algo
|
2) Insertion algorithm
|
||||||
--------------
|
----------------------
|
||||||
|
|
||||||
We need to make sure a reader cannot read the new 'obj->obj_next' value
|
We need to make sure a reader cannot read the new 'obj->obj_next' value
|
||||||
and previous value of 'obj->key'. Or else, an item could be deleted
|
and previous value of 'obj->key'. Otherwise, an item could be deleted
|
||||||
from a chain, and inserted into another chain. If new chain was empty
|
from a chain, and inserted into another chain. If new chain was empty
|
||||||
before the move, 'next' pointer is NULL, and lockless reader can
|
before the move, 'next' pointer is NULL, and lockless reader can not
|
||||||
not detect it missed following items in original chain.
|
detect the fact that it missed following items in original chain.
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Please note that new inserts are done at the head of list,
|
* Please note that new inserts are done at the head of list,
|
||||||
* not in the middle or end.
|
* not in the middle or end.
|
||||||
*/
|
*/
|
||||||
obj = kmem_cache_alloc(...);
|
obj = kmem_cache_alloc(...);
|
||||||
lock_chain(); // typically a spin_lock()
|
lock_chain(); // typically a spin_lock()
|
||||||
obj->key = key;
|
obj->key = key;
|
||||||
/*
|
atomic_set_release(&obj->refcnt, 1); // key before refcnt
|
||||||
* we need to make sure obj->key is updated before obj->next
|
|
||||||
* or obj->refcnt
|
|
||||||
*/
|
|
||||||
smp_wmb();
|
|
||||||
atomic_set(&obj->refcnt, 1);
|
|
||||||
hlist_add_head_rcu(&obj->obj_node, list);
|
hlist_add_head_rcu(&obj->obj_node, list);
|
||||||
unlock_chain(); // typically a spin_unlock()
|
unlock_chain(); // typically a spin_unlock()
|
||||||
|
|
||||||
|
|
||||||
3) Remove algo
|
3) Removal algorithm
|
||||||
--------------
|
--------------------
|
||||||
|
|
||||||
Nothing special here, we can use a standard RCU hlist deletion.
|
Nothing special here, we can use a standard RCU hlist deletion.
|
||||||
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
|
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
|
||||||
very very fast (before the end of RCU grace period)
|
very very fast (before the end of RCU grace period)
|
||||||
@@ -133,7 +130,7 @@ Avoiding extra smp_rmb()
|
|||||||
========================
|
========================
|
||||||
|
|
||||||
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
|
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
|
||||||
and extra smp_wmb() in insert function.
|
and extra _release() in insert function.
|
||||||
|
|
||||||
For example, if we choose to store the slot number as the 'nulls'
|
For example, if we choose to store the slot number as the 'nulls'
|
||||||
end-of-list marker for each slot of the hash table, we can detect
|
end-of-list marker for each slot of the hash table, we can detect
|
||||||
@@ -142,59 +139,61 @@ to another chain) checking the final 'nulls' value if
|
|||||||
the lookup met the end of chain. If final 'nulls' value
|
the lookup met the end of chain. If final 'nulls' value
|
||||||
is not the slot number, then we must restart the lookup at
|
is not the slot number, then we must restart the lookup at
|
||||||
the beginning. If the object was moved to the same chain,
|
the beginning. If the object was moved to the same chain,
|
||||||
then the reader doesn't care : It might eventually
|
then the reader doesn't care: It might occasionally
|
||||||
scan the list again without harm.
|
scan the list again without harm.
|
||||||
|
|
||||||
|
|
||||||
1) lookup algo
|
1) lookup algorithm
|
||||||
--------------
|
-------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
head = &table[slot];
|
head = &table[slot];
|
||||||
rcu_read_lock();
|
|
||||||
begin:
|
begin:
|
||||||
|
rcu_read_lock();
|
||||||
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
|
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
|
||||||
if (obj->key == key) {
|
if (obj->key == key) {
|
||||||
if (!try_get_ref(obj)) // might fail for free objects
|
if (!try_get_ref(obj)) { // might fail for free objects
|
||||||
goto begin;
|
rcu_read_unlock();
|
||||||
if (obj->key != key) { // not the object we expected
|
|
||||||
put_ref(obj);
|
|
||||||
goto begin;
|
goto begin;
|
||||||
}
|
}
|
||||||
goto out;
|
if (obj->key != key) { // not the object we expected
|
||||||
|
put_ref(obj);
|
||||||
|
rcu_read_unlock();
|
||||||
|
goto begin;
|
||||||
|
}
|
||||||
|
goto out;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If the nulls value we got at the end of this lookup is
|
||||||
|
// not the expected one, we must restart lookup.
|
||||||
|
// We probably met an item that was moved to another chain.
|
||||||
|
if (get_nulls_value(node) != slot) {
|
||||||
|
put_ref(obj);
|
||||||
|
rcu_read_unlock();
|
||||||
|
goto begin;
|
||||||
}
|
}
|
||||||
/*
|
|
||||||
* if the nulls value we got at the end of this lookup is
|
|
||||||
* not the expected one, we must restart lookup.
|
|
||||||
* We probably met an item that was moved to another chain.
|
|
||||||
*/
|
|
||||||
if (get_nulls_value(node) != slot)
|
|
||||||
goto begin;
|
|
||||||
obj = NULL;
|
obj = NULL;
|
||||||
|
|
||||||
out:
|
out:
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
|
|
||||||
2) Insert function
|
2) Insert algorithm
|
||||||
------------------
|
-------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Please note that new inserts are done at the head of list,
|
* Please note that new inserts are done at the head of list,
|
||||||
* not in the middle or end.
|
* not in the middle or end.
|
||||||
*/
|
*/
|
||||||
obj = kmem_cache_alloc(cachep);
|
obj = kmem_cache_alloc(cachep);
|
||||||
lock_chain(); // typically a spin_lock()
|
lock_chain(); // typically a spin_lock()
|
||||||
obj->key = key;
|
obj->key = key;
|
||||||
|
atomic_set_release(&obj->refcnt, 1); // key before refcnt
|
||||||
/*
|
/*
|
||||||
* changes to obj->key must be visible before refcnt one
|
* insert obj in RCU way (readers might be traversing chain)
|
||||||
*/
|
*/
|
||||||
smp_wmb();
|
|
||||||
atomic_set(&obj->refcnt, 1);
|
|
||||||
/*
|
|
||||||
* insert obj in RCU way (readers might be traversing chain)
|
|
||||||
*/
|
|
||||||
hlist_nulls_add_head_rcu(&obj->obj_node, list);
|
hlist_nulls_add_head_rcu(&obj->obj_node, list);
|
||||||
unlock_chain(); // typically a spin_unlock()
|
unlock_chain(); // typically a spin_unlock()
|
||||||
|
@@ -25,10 +25,10 @@ warnings:
|
|||||||
|
|
||||||
- A CPU looping with bottom halves disabled.
|
- A CPU looping with bottom halves disabled.
|
||||||
|
|
||||||
- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel
|
- For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the
|
||||||
without invoking schedule(). If the looping in the kernel is
|
kernel without potentially invoking schedule(). If the looping
|
||||||
really expected and desirable behavior, you might need to add
|
in the kernel is really expected and desirable behavior, you
|
||||||
some calls to cond_resched().
|
might need to add some calls to cond_resched().
|
||||||
|
|
||||||
- Booting Linux using a console connection that is too slow to
|
- Booting Linux using a console connection that is too slow to
|
||||||
keep up with the boot-time console-message rate. For example,
|
keep up with the boot-time console-message rate. For example,
|
||||||
@@ -108,16 +108,17 @@ warnings:
|
|||||||
|
|
||||||
- A bug in the RCU implementation.
|
- A bug in the RCU implementation.
|
||||||
|
|
||||||
- A hardware failure. This is quite unlikely, but has occurred
|
- A hardware failure. This is quite unlikely, but is not at all
|
||||||
at least once in real life. A CPU failed in a running system,
|
uncommon in large datacenter. In one memorable case some decades
|
||||||
becoming unresponsive, but not causing an immediate crash.
|
back, a CPU failed in a running system, becoming unresponsive,
|
||||||
This resulted in a series of RCU CPU stall warnings, eventually
|
but not causing an immediate crash. This resulted in a series
|
||||||
leading the realization that the CPU had failed.
|
of RCU CPU stall warnings, eventually leading the realization
|
||||||
|
that the CPU had failed.
|
||||||
|
|
||||||
The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
|
The RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
|
||||||
Note that SRCU does *not* have CPU stall warnings. Please note that
|
CPU stall warning. Note that SRCU does *not* have CPU stall warnings.
|
||||||
RCU only detects CPU stalls when there is a grace period in progress.
|
Please note that RCU only detects CPU stalls when there is a grace period
|
||||||
No grace period, no CPU stall warnings.
|
in progress. No grace period, no CPU stall warnings.
|
||||||
|
|
||||||
To diagnose the cause of the stall, inspect the stack traces.
|
To diagnose the cause of the stall, inspect the stack traces.
|
||||||
The offending function will usually be near the top of the stack.
|
The offending function will usually be near the top of the stack.
|
||||||
@@ -205,16 +206,21 @@ RCU_STALL_RAT_DELAY
|
|||||||
rcupdate.rcu_task_stall_timeout
|
rcupdate.rcu_task_stall_timeout
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
This boot/sysfs parameter controls the RCU-tasks stall warning
|
This boot/sysfs parameter controls the RCU-tasks and
|
||||||
interval. A value of zero or less suppresses RCU-tasks stall
|
RCU-tasks-trace stall warning intervals. A value of zero or less
|
||||||
warnings. A positive value sets the stall-warning interval
|
suppresses RCU-tasks stall warnings. A positive value sets the
|
||||||
in seconds. An RCU-tasks stall warning starts with the line:
|
stall-warning interval in seconds. An RCU-tasks stall warning
|
||||||
|
starts with the line:
|
||||||
|
|
||||||
INFO: rcu_tasks detected stalls on tasks:
|
INFO: rcu_tasks detected stalls on tasks:
|
||||||
|
|
||||||
And continues with the output of sched_show_task() for each
|
And continues with the output of sched_show_task() for each
|
||||||
task stalling the current RCU-tasks grace period.
|
task stalling the current RCU-tasks grace period.
|
||||||
|
|
||||||
|
An RCU-tasks-trace stall warning starts (and continues) similarly:
|
||||||
|
|
||||||
|
INFO: rcu_tasks_trace detected stalls on tasks
|
||||||
|
|
||||||
|
|
||||||
Interpreting RCU's CPU Stall-Detector "Splats"
|
Interpreting RCU's CPU Stall-Detector "Splats"
|
||||||
==============================================
|
==============================================
|
||||||
@@ -248,7 +254,8 @@ dynticks counter, which will have an even-numbered value if the CPU
|
|||||||
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
|
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
|
||||||
number between the two "/"s is the value of the nesting, which will be
|
number between the two "/"s is the value of the nesting, which will be
|
||||||
a small non-negative number if in the idle loop (as shown above) and a
|
a small non-negative number if in the idle loop (as shown above) and a
|
||||||
very large positive number otherwise.
|
very large positive number otherwise. The number following the final
|
||||||
|
"/" is the NMI nesting, which will be a small non-negative number.
|
||||||
|
|
||||||
The "softirq=" portion of the message tracks the number of RCU softirq
|
The "softirq=" portion of the message tracks the number of RCU softirq
|
||||||
handlers that the stalled CPU has executed. The number before the "/"
|
handlers that the stalled CPU has executed. The number before the "/"
|
||||||
@@ -383,3 +390,95 @@ for example, "P3421".
|
|||||||
|
|
||||||
It is entirely possible to see stall warnings from normal and from
|
It is entirely possible to see stall warnings from normal and from
|
||||||
expedited grace periods at about the same time during the same run.
|
expedited grace periods at about the same time during the same run.
|
||||||
|
|
||||||
|
RCU_CPU_STALL_CPUTIME
|
||||||
|
=====================
|
||||||
|
|
||||||
|
In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
|
||||||
|
rcupdate.rcu_cpu_stall_cputime=1, the following additional information
|
||||||
|
is supplied with each RCU CPU stall warning::
|
||||||
|
|
||||||
|
rcu: hardirqs softirqs csw/system
|
||||||
|
rcu: number: 624 45 0
|
||||||
|
rcu: cputime: 69 1 2425 ==> 2500(ms)
|
||||||
|
|
||||||
|
These statistics are collected during the sampling period. The values
|
||||||
|
in row "number:" are the number of hard interrupts, number of soft
|
||||||
|
interrupts, and number of context switches on the stalled CPU. The
|
||||||
|
first three values in row "cputime:" indicate the CPU time in
|
||||||
|
milliseconds consumed by hard interrupts, soft interrupts, and tasks
|
||||||
|
on the stalled CPU. The last number is the measurement interval, again
|
||||||
|
in milliseconds. Because user-mode tasks normally do not cause RCU CPU
|
||||||
|
stalls, these tasks are typically kernel tasks, which is why only the
|
||||||
|
system CPU time are considered.
|
||||||
|
|
||||||
|
The sampling period is shown as follows::
|
||||||
|
|
||||||
|
|<------------first timeout---------->|<-----second timeout----->|
|
||||||
|
|<--half timeout-->|<--half timeout-->| |
|
||||||
|
| |<--first period-->| |
|
||||||
|
| |<-----------second sampling period---------->|
|
||||||
|
| | | |
|
||||||
|
snapshot time point 1st-stall 2nd-stall
|
||||||
|
|
||||||
|
The following describes four typical scenarios:
|
||||||
|
|
||||||
|
1. A CPU looping with interrupts disabled.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
rcu: hardirqs softirqs csw/system
|
||||||
|
rcu: number: 0 0 0
|
||||||
|
rcu: cputime: 0 0 0 ==> 2500(ms)
|
||||||
|
|
||||||
|
Because interrupts have been disabled throughout the measurement
|
||||||
|
interval, there are no interrupts and no context switches.
|
||||||
|
Furthermore, because CPU time consumption was measured using interrupt
|
||||||
|
handlers, the system CPU consumption is misleadingly measured as zero.
|
||||||
|
This scenario will normally also have "(0 ticks this GP)" printed on
|
||||||
|
this CPU's summary line.
|
||||||
|
|
||||||
|
2. A CPU looping with bottom halves disabled.
|
||||||
|
|
||||||
|
This is similar to the previous example, but with non-zero number of
|
||||||
|
and CPU time consumed by hard interrupts, along with non-zero CPU
|
||||||
|
time consumed by in-kernel execution::
|
||||||
|
|
||||||
|
rcu: hardirqs softirqs csw/system
|
||||||
|
rcu: number: 624 0 0
|
||||||
|
rcu: cputime: 49 0 2446 ==> 2500(ms)
|
||||||
|
|
||||||
|
The fact that there are zero softirqs gives a hint that these were
|
||||||
|
disabled, perhaps via local_bh_disable(). It is of course possible
|
||||||
|
that there were no softirqs, perhaps because all events that would
|
||||||
|
result in softirq execution are confined to other CPUs. In this case,
|
||||||
|
the diagnosis should continue as shown in the next example.
|
||||||
|
|
||||||
|
3. A CPU looping with preemption disabled.
|
||||||
|
|
||||||
|
Here, only the number of context switches is zero::
|
||||||
|
|
||||||
|
rcu: hardirqs softirqs csw/system
|
||||||
|
rcu: number: 624 45 0
|
||||||
|
rcu: cputime: 69 1 2425 ==> 2500(ms)
|
||||||
|
|
||||||
|
This situation hints that the stalled CPU was looping with preemption
|
||||||
|
disabled.
|
||||||
|
|
||||||
|
4. No looping, but massive hard and soft interrupts.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
rcu: hardirqs softirqs csw/system
|
||||||
|
rcu: number: xx xx 0
|
||||||
|
rcu: cputime: xx xx 0 ==> 2500(ms)
|
||||||
|
|
||||||
|
Here, the number and CPU time of hard interrupts are all non-zero,
|
||||||
|
but the number of context switches and the in-kernel CPU time consumed
|
||||||
|
are zero. The number and cputime of soft interrupts will usually be
|
||||||
|
non-zero, but could be zero, for example, if the CPU was spinning
|
||||||
|
within a single hard interrupt handler.
|
||||||
|
|
||||||
|
If this type of RCU CPU stall warning can be reproduced, you can
|
||||||
|
narrow it down by looking at /proc/interrupts or by writing code to
|
||||||
|
trace each interrupt, for example, by referring to show_interrupts().
|
||||||
|
@@ -206,7 +206,11 @@ values for memory may require disabling the callback-flooding tests
|
|||||||
using the --bootargs parameter discussed below.
|
using the --bootargs parameter discussed below.
|
||||||
|
|
||||||
Sometimes additional debugging is useful, and in such cases the --kconfig
|
Sometimes additional debugging is useful, and in such cases the --kconfig
|
||||||
parameter to kvm.sh may be used, for example, ``--kconfig 'CONFIG_KASAN=y'``.
|
parameter to kvm.sh may be used, for example, ``--kconfig 'CONFIG_RCU_EQS_DEBUG=y'``.
|
||||||
|
In addition, there are the --gdb, --kasan, and --kcsan parameters.
|
||||||
|
Note that --gdb limits you to one scenario per kvm.sh run and requires
|
||||||
|
that you have another window open from which to run ``gdb`` as instructed
|
||||||
|
by the script.
|
||||||
|
|
||||||
Kernel boot arguments can also be supplied, for example, to control
|
Kernel boot arguments can also be supplied, for example, to control
|
||||||
rcutorture's module parameters. For example, to test a change to RCU's
|
rcutorture's module parameters. For example, to test a change to RCU's
|
||||||
@@ -219,10 +223,17 @@ require disabling rcutorture's callback-flooding tests::
|
|||||||
--bootargs 'rcutorture.fwd_progress=0'
|
--bootargs 'rcutorture.fwd_progress=0'
|
||||||
|
|
||||||
Sometimes all that is needed is a full set of kernel builds. This is
|
Sometimes all that is needed is a full set of kernel builds. This is
|
||||||
what the --buildonly argument does.
|
what the --buildonly parameter does.
|
||||||
|
|
||||||
Finally, the --trust-make argument allows each kernel build to reuse what
|
The --duration parameter can override the default run time of 30 minutes.
|
||||||
it can from the previous kernel build.
|
For example, ``--duration 2d`` would run for two days, ``--duration 3h``
|
||||||
|
would run for three hours, ``--duration 5m`` would run for five minutes,
|
||||||
|
and ``--duration 45s`` would run for 45 seconds. This last can be useful
|
||||||
|
for tracking down rare boot-time failures.
|
||||||
|
|
||||||
|
Finally, the --trust-make parameter allows each kernel build to reuse what
|
||||||
|
it can from the previous kernel build. Please note that without the
|
||||||
|
--trust-make parameter, your tags files may be demolished.
|
||||||
|
|
||||||
There are additional more arcane arguments that are documented in the
|
There are additional more arcane arguments that are documented in the
|
||||||
source code of the kvm.sh script.
|
source code of the kvm.sh script.
|
||||||
@@ -291,3 +302,73 @@ the following summary at the end of the run on a 12-CPU system::
|
|||||||
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
|
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
|
||||||
CPU count limited from 16 to 12
|
CPU count limited from 16 to 12
|
||||||
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
|
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
|
||||||
|
|
||||||
|
|
||||||
|
Repeated Runs
|
||||||
|
=============
|
||||||
|
|
||||||
|
Suppose that you are chasing down a rare boot-time failure. Although you
|
||||||
|
could use kvm.sh, doing so will rebuild the kernel on each run. If you
|
||||||
|
need (say) 1,000 runs to have confidence that you have fixed the bug,
|
||||||
|
these pointless rebuilds can become extremely annoying.
|
||||||
|
|
||||||
|
This is why kvm-again.sh exists.
|
||||||
|
|
||||||
|
Suppose that a previous kvm.sh run left its output in this directory::
|
||||||
|
|
||||||
|
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
|
||||||
|
|
||||||
|
Then this run can be re-run without rebuilding as follow:
|
||||||
|
|
||||||
|
kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28
|
||||||
|
|
||||||
|
A few of the original run's kvm.sh parameters may be overridden, perhaps
|
||||||
|
most notably --duration and --bootargs. For example::
|
||||||
|
|
||||||
|
kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 \
|
||||||
|
--duration 45s
|
||||||
|
|
||||||
|
would re-run the previous test, but for only 45 seconds, thus facilitating
|
||||||
|
tracking down the aforementioned rare boot-time failure.
|
||||||
|
|
||||||
|
|
||||||
|
Distributed Runs
|
||||||
|
================
|
||||||
|
|
||||||
|
Although kvm.sh is quite useful, its testing is confined to a single
|
||||||
|
system. It is not all that hard to use your favorite framework to cause
|
||||||
|
(say) 5 instances of kvm.sh to run on your 5 systems, but this will very
|
||||||
|
likely unnecessarily rebuild kernels. In addition, manually distributing
|
||||||
|
the desired rcutorture scenarios across the available systems can be
|
||||||
|
painstaking and error-prone.
|
||||||
|
|
||||||
|
And this is why the kvm-remote.sh script exists.
|
||||||
|
|
||||||
|
If you the following command works::
|
||||||
|
|
||||||
|
ssh system0 date
|
||||||
|
|
||||||
|
and if it also works for system1, system2, system3, system4, and system5,
|
||||||
|
and all of these systems have 64 CPUs, you can type::
|
||||||
|
|
||||||
|
kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
|
||||||
|
--cpus 64 --duration 8h --configs "5*CFLIST"
|
||||||
|
|
||||||
|
This will build each default scenario's kernel on the local system, then
|
||||||
|
spread each of five instances of each scenario over the systems listed,
|
||||||
|
running each scenario for eight hours. At the end of the runs, the
|
||||||
|
results will be gathered, recorded, and printed. Most of the parameters
|
||||||
|
that kvm.sh will accept can be passed to kvm-remote.sh, but the list of
|
||||||
|
systems must come first.
|
||||||
|
|
||||||
|
The kvm.sh ``--dryrun scenarios`` argument is useful for working out
|
||||||
|
how many scenarios may be run in one batch across a group of systems.
|
||||||
|
|
||||||
|
You can also re-run a previous remote run in a manner similar to kvm.sh:
|
||||||
|
|
||||||
|
kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
|
||||||
|
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
|
||||||
|
--duration 24h
|
||||||
|
|
||||||
|
In this case, most of the kvm-again.sh parmeters may be supplied following
|
||||||
|
the pathname of the old run-results directory.
|
||||||
|
@@ -16,18 +16,23 @@ to start learning about RCU:
|
|||||||
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
|
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
|
||||||
| 2019 Big API Table https://lwn.net/Articles/777165/
|
| 2019 Big API Table https://lwn.net/Articles/777165/
|
||||||
|
|
||||||
|
For those preferring video:
|
||||||
|
|
||||||
|
| 1. Unraveling RCU Mysteries: Fundamentals https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
|
||||||
|
| 2. Unraveling RCU Mysteries: Additional Use Cases https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
|
||||||
|
|
||||||
|
|
||||||
What is RCU?
|
What is RCU?
|
||||||
|
|
||||||
RCU is a synchronization mechanism that was added to the Linux kernel
|
RCU is a synchronization mechanism that was added to the Linux kernel
|
||||||
during the 2.5 development effort that is optimized for read-mostly
|
during the 2.5 development effort that is optimized for read-mostly
|
||||||
situations. Although RCU is actually quite simple once you understand it,
|
situations. Although RCU is actually quite simple, making effective use
|
||||||
getting there can sometimes be a challenge. Part of the problem is that
|
of it requires you to think differently about your code. Another part
|
||||||
most of the past descriptions of RCU have been written with the mistaken
|
of the problem is the mistaken assumption that there is "one true way" to
|
||||||
assumption that there is "one true way" to describe RCU. Instead,
|
describe and to use RCU. Instead, the experience has been that different
|
||||||
the experience has been that different people must take different paths
|
people must take different paths to arrive at an understanding of RCU,
|
||||||
to arrive at an understanding of RCU. This document provides several
|
depending on their experiences and use cases. This document provides
|
||||||
different paths, as follows:
|
several different paths, as follows:
|
||||||
|
|
||||||
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
|
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
|
||||||
|
|
||||||
@@ -157,34 +162,36 @@ rcu_read_lock()
|
|||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
void rcu_read_lock(void);
|
void rcu_read_lock(void);
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
This temporal primitive is used by a reader to inform the
|
||||||
entering an RCU read-side critical section. It is illegal
|
reclaimer that the reader is entering an RCU read-side critical
|
||||||
to block while in an RCU read-side critical section, though
|
section. It is illegal to block while in an RCU read-side
|
||||||
kernels built with CONFIG_PREEMPT_RCU can preempt RCU
|
critical section, though kernels built with CONFIG_PREEMPT_RCU
|
||||||
read-side critical sections. Any RCU-protected data structure
|
can preempt RCU read-side critical sections. Any RCU-protected
|
||||||
accessed during an RCU read-side critical section is guaranteed to
|
data structure accessed during an RCU read-side critical section
|
||||||
remain unreclaimed for the full duration of that critical section.
|
is guaranteed to remain unreclaimed for the full duration of that
|
||||||
Reference counts may be used in conjunction with RCU to maintain
|
critical section. Reference counts may be used in conjunction
|
||||||
longer-term references to data structures.
|
with RCU to maintain longer-term references to data structures.
|
||||||
|
|
||||||
rcu_read_unlock()
|
rcu_read_unlock()
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
void rcu_read_unlock(void);
|
void rcu_read_unlock(void);
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
This temporal primitives is used by a reader to inform the
|
||||||
exiting an RCU read-side critical section. Note that RCU
|
reclaimer that the reader is exiting an RCU read-side critical
|
||||||
read-side critical sections may be nested and/or overlapping.
|
section. Note that RCU read-side critical sections may be nested
|
||||||
|
and/or overlapping.
|
||||||
|
|
||||||
synchronize_rcu()
|
synchronize_rcu()
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
void synchronize_rcu(void);
|
void synchronize_rcu(void);
|
||||||
|
|
||||||
Marks the end of updater code and the beginning of reclaimer
|
This temporal primitive marks the end of updater code and the
|
||||||
code. It does this by blocking until all pre-existing RCU
|
beginning of reclaimer code. It does this by blocking until
|
||||||
read-side critical sections on all CPUs have completed.
|
all pre-existing RCU read-side critical sections on all CPUs
|
||||||
Note that synchronize_rcu() will **not** necessarily wait for
|
have completed. Note that synchronize_rcu() will **not**
|
||||||
any subsequent RCU read-side critical sections to complete.
|
necessarily wait for any subsequent RCU read-side critical
|
||||||
For example, consider the following sequence of events::
|
sections to complete. For example, consider the following
|
||||||
|
sequence of events::
|
||||||
|
|
||||||
CPU 0 CPU 1 CPU 2
|
CPU 0 CPU 1 CPU 2
|
||||||
----------------- ------------------------- ---------------
|
----------------- ------------------------- ---------------
|
||||||
@@ -211,13 +218,13 @@ synchronize_rcu()
|
|||||||
to be useful in all but the most read-intensive situations,
|
to be useful in all but the most read-intensive situations,
|
||||||
synchronize_rcu()'s overhead must also be quite small.
|
synchronize_rcu()'s overhead must also be quite small.
|
||||||
|
|
||||||
The call_rcu() API is a callback form of synchronize_rcu(),
|
The call_rcu() API is an asynchronous callback form of
|
||||||
and is described in more detail in a later section. Instead of
|
synchronize_rcu(), and is described in more detail in a later
|
||||||
blocking, it registers a function and argument which are invoked
|
section. Instead of blocking, it registers a function and
|
||||||
after all ongoing RCU read-side critical sections have completed.
|
argument which are invoked after all ongoing RCU read-side
|
||||||
This callback variant is particularly useful in situations where
|
critical sections have completed. This callback variant is
|
||||||
it is illegal to block or where update-side performance is
|
particularly useful in situations where it is illegal to block
|
||||||
critically important.
|
or where update-side performance is critically important.
|
||||||
|
|
||||||
However, the call_rcu() API should not be used lightly, as use
|
However, the call_rcu() API should not be used lightly, as use
|
||||||
of the synchronize_rcu() API generally results in simpler code.
|
of the synchronize_rcu() API generally results in simpler code.
|
||||||
@@ -236,11 +243,13 @@ rcu_assign_pointer()
|
|||||||
would be cool to be able to declare a function in this manner.
|
would be cool to be able to declare a function in this manner.
|
||||||
(Compiler experts will no doubt disagree.)
|
(Compiler experts will no doubt disagree.)
|
||||||
|
|
||||||
The updater uses this function to assign a new value to an
|
The updater uses this spatial macro to assign a new value to an
|
||||||
RCU-protected pointer, in order to safely communicate the change
|
RCU-protected pointer, in order to safely communicate the change
|
||||||
in value from the updater to the reader. This macro does not
|
in value from the updater to the reader. This is a spatial (as
|
||||||
evaluate to an rvalue, but it does execute any memory-barrier
|
opposed to temporal) macro. It does not evaluate to an rvalue,
|
||||||
instructions required for a given CPU architecture.
|
but it does execute any memory-barrier instructions required
|
||||||
|
for a given CPU architecture. Its ordering properties are that
|
||||||
|
of a store-release operation.
|
||||||
|
|
||||||
Perhaps just as important, it serves to document (1) which
|
Perhaps just as important, it serves to document (1) which
|
||||||
pointers are protected by RCU and (2) the point at which a
|
pointers are protected by RCU and (2) the point at which a
|
||||||
@@ -255,14 +264,15 @@ rcu_dereference()
|
|||||||
Like rcu_assign_pointer(), rcu_dereference() must be implemented
|
Like rcu_assign_pointer(), rcu_dereference() must be implemented
|
||||||
as a macro.
|
as a macro.
|
||||||
|
|
||||||
The reader uses rcu_dereference() to fetch an RCU-protected
|
The reader uses the spatial rcu_dereference() macro to fetch
|
||||||
pointer, which returns a value that may then be safely
|
an RCU-protected pointer, which returns a value that may
|
||||||
dereferenced. Note that rcu_dereference() does not actually
|
then be safely dereferenced. Note that rcu_dereference()
|
||||||
dereference the pointer, instead, it protects the pointer for
|
does not actually dereference the pointer, instead, it
|
||||||
later dereferencing. It also executes any needed memory-barrier
|
protects the pointer for later dereferencing. It also
|
||||||
instructions for a given CPU architecture. Currently, only Alpha
|
executes any needed memory-barrier instructions for a given
|
||||||
needs memory barriers within rcu_dereference() -- on other CPUs,
|
CPU architecture. Currently, only Alpha needs memory barriers
|
||||||
it compiles to nothing, not even a compiler directive.
|
within rcu_dereference() -- on other CPUs, it compiles to a
|
||||||
|
volatile load.
|
||||||
|
|
||||||
Common coding practice uses rcu_dereference() to copy an
|
Common coding practice uses rcu_dereference() to copy an
|
||||||
RCU-protected pointer to a local variable, then dereferences
|
RCU-protected pointer to a local variable, then dereferences
|
||||||
@@ -355,12 +365,15 @@ reader, updater, and reclaimer.
|
|||||||
synchronize_rcu() & call_rcu()
|
synchronize_rcu() & call_rcu()
|
||||||
|
|
||||||
|
|
||||||
The RCU infrastructure observes the time sequence of rcu_read_lock(),
|
The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
|
||||||
rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
|
rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
|
||||||
order to determine when (1) synchronize_rcu() invocations may return
|
order to determine when (1) synchronize_rcu() invocations may return
|
||||||
to their callers and (2) call_rcu() callbacks may be invoked. Efficient
|
to their callers and (2) call_rcu() callbacks may be invoked. Efficient
|
||||||
implementations of the RCU infrastructure make heavy use of batching in
|
implementations of the RCU infrastructure make heavy use of batching in
|
||||||
order to amortize their overhead over many uses of the corresponding APIs.
|
order to amortize their overhead over many uses of the corresponding APIs.
|
||||||
|
The rcu_assign_pointer() and rcu_dereference() invocations communicate
|
||||||
|
spatial changes via stores to and loads from the RCU-protected pointer in
|
||||||
|
question.
|
||||||
|
|
||||||
There are at least three flavors of RCU usage in the Linux kernel. The diagram
|
There are at least three flavors of RCU usage in the Linux kernel. The diagram
|
||||||
above shows the most common one. On the updater side, the rcu_assign_pointer(),
|
above shows the most common one. On the updater side, the rcu_assign_pointer(),
|
||||||
@@ -392,7 +405,9 @@ b. RCU applied to networking data structures that may be subjected
|
|||||||
c. RCU applied to scheduler and interrupt/NMI-handler tasks.
|
c. RCU applied to scheduler and interrupt/NMI-handler tasks.
|
||||||
|
|
||||||
Again, most uses will be of (a). The (b) and (c) cases are important
|
Again, most uses will be of (a). The (b) and (c) cases are important
|
||||||
for specialized uses, but are relatively uncommon.
|
for specialized uses, but are relatively uncommon. The SRCU, RCU-Tasks,
|
||||||
|
RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
|
||||||
|
their assorted primitives.
|
||||||
|
|
||||||
.. _3_whatisRCU:
|
.. _3_whatisRCU:
|
||||||
|
|
||||||
@@ -468,7 +483,7 @@ So, to sum up:
|
|||||||
- Within an RCU read-side critical section, use rcu_dereference()
|
- Within an RCU read-side critical section, use rcu_dereference()
|
||||||
to dereference RCU-protected pointers.
|
to dereference RCU-protected pointers.
|
||||||
|
|
||||||
- Use some solid scheme (such as locks or semaphores) to
|
- Use some solid design (such as locks or semaphores) to
|
||||||
keep concurrent updates from interfering with each other.
|
keep concurrent updates from interfering with each other.
|
||||||
|
|
||||||
- Use rcu_assign_pointer() to update an RCU-protected pointer.
|
- Use rcu_assign_pointer() to update an RCU-protected pointer.
|
||||||
@@ -579,6 +594,14 @@ to avoid having to write your own callback::
|
|||||||
|
|
||||||
kfree_rcu(old_fp, rcu);
|
kfree_rcu(old_fp, rcu);
|
||||||
|
|
||||||
|
If the occasional sleep is permitted, the single-argument form may
|
||||||
|
be used, omitting the rcu_head structure from struct foo.
|
||||||
|
|
||||||
|
kfree_rcu(old_fp);
|
||||||
|
|
||||||
|
This variant of kfree_rcu() almost never blocks, but might do so by
|
||||||
|
invoking synchronize_rcu() in response to memory-allocation failure.
|
||||||
|
|
||||||
Again, see checklist.rst for additional rules governing the use of RCU.
|
Again, see checklist.rst for additional rules governing the use of RCU.
|
||||||
|
|
||||||
.. _5_whatisRCU:
|
.. _5_whatisRCU:
|
||||||
@@ -596,7 +619,7 @@ lacking both functionality and performance. However, they are useful
|
|||||||
in getting a feel for how RCU works. See kernel/rcu/update.c for a
|
in getting a feel for how RCU works. See kernel/rcu/update.c for a
|
||||||
production-quality implementation, and see:
|
production-quality implementation, and see:
|
||||||
|
|
||||||
http://www.rdrop.com/users/paulmck/RCU
|
https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
|
||||||
|
|
||||||
for papers describing the Linux kernel RCU implementation. The OLS'01
|
for papers describing the Linux kernel RCU implementation. The OLS'01
|
||||||
and OLS'02 papers are a good introduction, and the dissertation provides
|
and OLS'02 papers are a good introduction, and the dissertation provides
|
||||||
@@ -929,6 +952,8 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
|
|||||||
initialized after each and every call to kmem_cache_alloc(), which renders
|
initialized after each and every call to kmem_cache_alloc(), which renders
|
||||||
reference-free spinlock acquisition completely unsafe. Therefore, when
|
reference-free spinlock acquisition completely unsafe. Therefore, when
|
||||||
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
|
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
|
||||||
|
(Those willing to use a kmem_cache constructor may also use locking,
|
||||||
|
including cache-friendly sequence locking.)
|
||||||
|
|
||||||
With traditional reference counting -- such as that implemented by the
|
With traditional reference counting -- such as that implemented by the
|
||||||
kref library in Linux -- there is typically code that runs when the last
|
kref library in Linux -- there is typically code that runs when the last
|
||||||
@@ -1047,6 +1072,30 @@ sched::
|
|||||||
rcu_read_lock_sched_held
|
rcu_read_lock_sched_held
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
N/A call_rcu_tasks rcu_barrier_tasks
|
||||||
|
synchronize_rcu_tasks
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks-Rude::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
N/A call_rcu_tasks_rude rcu_barrier_tasks_rude
|
||||||
|
synchronize_rcu_tasks_rude
|
||||||
|
|
||||||
|
|
||||||
|
RCU-Tasks-Trace::
|
||||||
|
|
||||||
|
Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
|
||||||
|
rcu_read_unlock_trace synchronize_rcu_tasks_trace
|
||||||
|
|
||||||
|
|
||||||
SRCU::
|
SRCU::
|
||||||
|
|
||||||
Critical sections Grace period Barrier
|
Critical sections Grace period Barrier
|
||||||
@@ -1087,35 +1136,43 @@ list can be helpful:
|
|||||||
|
|
||||||
a. Will readers need to block? If so, you need SRCU.
|
a. Will readers need to block? If so, you need SRCU.
|
||||||
|
|
||||||
b. What about the -rt patchset? If readers would need to block
|
b. Will readers need to block and are you doing tracing, for
|
||||||
in an non-rt kernel, you need SRCU. If readers would block
|
example, ftrace or BPF? If so, you need RCU-tasks,
|
||||||
in a -rt kernel, but not in a non-rt kernel, SRCU is not
|
RCU-tasks-rude, and/or RCU-tasks-trace.
|
||||||
necessary. (The -rt patchset turns spinlocks into sleeplocks,
|
|
||||||
hence this distinction.)
|
|
||||||
|
|
||||||
c. Do you need to treat NMI handlers, hardirq handlers,
|
c. What about the -rt patchset? If readers would need to block in
|
||||||
|
an non-rt kernel, you need SRCU. If readers would block when
|
||||||
|
acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
|
||||||
|
SRCU is not necessary. (The -rt patchset turns spinlocks into
|
||||||
|
sleeplocks, hence this distinction.)
|
||||||
|
|
||||||
|
d. Do you need to treat NMI handlers, hardirq handlers,
|
||||||
and code segments with preemption disabled (whether
|
and code segments with preemption disabled (whether
|
||||||
via preempt_disable(), local_irq_save(), local_bh_disable(),
|
via preempt_disable(), local_irq_save(), local_bh_disable(),
|
||||||
or some other mechanism) as if they were explicit RCU readers?
|
or some other mechanism) as if they were explicit RCU readers?
|
||||||
If so, RCU-sched is the only choice that will work for you.
|
If so, RCU-sched readers are the only choice that will work
|
||||||
|
for you, but since about v4.20 you use can use the vanilla RCU
|
||||||
|
update primitives.
|
||||||
|
|
||||||
d. Do you need RCU grace periods to complete even in the face
|
e. Do you need RCU grace periods to complete even in the face of
|
||||||
of softirq monopolization of one or more of the CPUs? For
|
softirq monopolization of one or more of the CPUs? For example,
|
||||||
example, is your code subject to network-based denial-of-service
|
is your code subject to network-based denial-of-service attacks?
|
||||||
attacks? If so, you should disable softirq across your readers,
|
If so, you should disable softirq across your readers, for
|
||||||
for example, by using rcu_read_lock_bh().
|
example, by using rcu_read_lock_bh(). Since about v4.20 you
|
||||||
|
use can use the vanilla RCU update primitives.
|
||||||
|
|
||||||
e. Is your workload too update-intensive for normal use of
|
f. Is your workload too update-intensive for normal use of
|
||||||
RCU, but inappropriate for other synchronization mechanisms?
|
RCU, but inappropriate for other synchronization mechanisms?
|
||||||
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
|
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
|
||||||
named SLAB_DESTROY_BY_RCU). But please be careful!
|
named SLAB_DESTROY_BY_RCU). But please be careful!
|
||||||
|
|
||||||
f. Do you need read-side critical sections that are respected
|
g. Do you need read-side critical sections that are respected even
|
||||||
even though they are in the middle of the idle loop, during
|
on CPUs that are deep in the idle loop, during entry to or exit
|
||||||
user-mode execution, or on an offlined CPU? If so, SRCU is the
|
from user-mode execution, or on an offlined CPU? If so, SRCU
|
||||||
only choice that will work for you.
|
and RCU Tasks Trace are the only choices that will work for you,
|
||||||
|
with SRCU being strongly preferred in almost all cases.
|
||||||
|
|
||||||
g. Otherwise, use RCU.
|
h. Otherwise, use RCU.
|
||||||
|
|
||||||
Of course, this all assumes that you have determined that RCU is in fact
|
Of course, this all assumes that you have determined that RCU is in fact
|
||||||
the right tool for your job.
|
the right tool for your job.
|
||||||
|
@@ -5113,6 +5113,17 @@
|
|||||||
rcupdate.rcu_cpu_stall_timeout to be used (after
|
rcupdate.rcu_cpu_stall_timeout to be used (after
|
||||||
conversion from seconds to milliseconds).
|
conversion from seconds to milliseconds).
|
||||||
|
|
||||||
|
rcupdate.rcu_cpu_stall_cputime= [KNL]
|
||||||
|
Provide statistics on the cputime and count of
|
||||||
|
interrupts and tasks during the sampling period. For
|
||||||
|
multiple continuous RCU stalls, all sampling periods
|
||||||
|
begin at half of the first RCU stall timeout.
|
||||||
|
|
||||||
|
rcupdate.rcu_exp_stall_task_details= [KNL]
|
||||||
|
Print stack dumps of any tasks blocking the
|
||||||
|
current expedited RCU grace period during an
|
||||||
|
expedited RCU CPU stall warning.
|
||||||
|
|
||||||
rcupdate.rcu_expedited= [KNL]
|
rcupdate.rcu_expedited= [KNL]
|
||||||
Use expedited grace-period primitives, for
|
Use expedited grace-period primitives, for
|
||||||
example, synchronize_rcu_expedited() instead
|
example, synchronize_rcu_expedited() instead
|
||||||
|
@@ -181,7 +181,6 @@ void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode)
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(fw_devlink_purge_absent_suppliers);
|
EXPORT_SYMBOL_GPL(fw_devlink_purge_absent_suppliers);
|
||||||
|
|
||||||
#ifdef CONFIG_SRCU
|
|
||||||
static DEFINE_MUTEX(device_links_lock);
|
static DEFINE_MUTEX(device_links_lock);
|
||||||
DEFINE_STATIC_SRCU(device_links_srcu);
|
DEFINE_STATIC_SRCU(device_links_srcu);
|
||||||
|
|
||||||
@@ -220,47 +219,6 @@ static void device_link_remove_from_lists(struct device_link *link)
|
|||||||
list_del_rcu(&link->s_node);
|
list_del_rcu(&link->s_node);
|
||||||
list_del_rcu(&link->c_node);
|
list_del_rcu(&link->c_node);
|
||||||
}
|
}
|
||||||
#else /* !CONFIG_SRCU */
|
|
||||||
static DECLARE_RWSEM(device_links_lock);
|
|
||||||
|
|
||||||
static inline void device_links_write_lock(void)
|
|
||||||
{
|
|
||||||
down_write(&device_links_lock);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void device_links_write_unlock(void)
|
|
||||||
{
|
|
||||||
up_write(&device_links_lock);
|
|
||||||
}
|
|
||||||
|
|
||||||
int device_links_read_lock(void)
|
|
||||||
{
|
|
||||||
down_read(&device_links_lock);
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
void device_links_read_unlock(int not_used)
|
|
||||||
{
|
|
||||||
up_read(&device_links_lock);
|
|
||||||
}
|
|
||||||
|
|
||||||
#ifdef CONFIG_DEBUG_LOCK_ALLOC
|
|
||||||
int device_links_read_lock_held(void)
|
|
||||||
{
|
|
||||||
return lockdep_is_held(&device_links_lock);
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
static inline void device_link_synchronize_removal(void)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
static void device_link_remove_from_lists(struct device_link *link)
|
|
||||||
{
|
|
||||||
list_del(&link->s_node);
|
|
||||||
list_del(&link->c_node);
|
|
||||||
}
|
|
||||||
#endif /* !CONFIG_SRCU */
|
|
||||||
|
|
||||||
static bool device_is_ancestor(struct device *dev, struct device *target)
|
static bool device_is_ancestor(struct device *dev, struct device *target)
|
||||||
{
|
{
|
||||||
|
@@ -1,7 +1,6 @@
|
|||||||
# SPDX-License-Identifier: GPL-2.0-only
|
# SPDX-License-Identifier: GPL-2.0-only
|
||||||
menuconfig DAX
|
menuconfig DAX
|
||||||
tristate "DAX: direct access to differentiated memory"
|
tristate "DAX: direct access to differentiated memory"
|
||||||
select SRCU
|
|
||||||
default m if NVDIMM_DAX
|
default m if NVDIMM_DAX
|
||||||
|
|
||||||
if DAX
|
if DAX
|
||||||
|
@@ -2,7 +2,6 @@
|
|||||||
config STM
|
config STM
|
||||||
tristate "System Trace Module devices"
|
tristate "System Trace Module devices"
|
||||||
select CONFIGFS_FS
|
select CONFIGFS_FS
|
||||||
select SRCU
|
|
||||||
help
|
help
|
||||||
A System Trace Module (STM) is a device exporting data in System
|
A System Trace Module (STM) is a device exporting data in System
|
||||||
Trace Protocol (STP) format as defined by MIPI STP standards.
|
Trace Protocol (STP) format as defined by MIPI STP standards.
|
||||||
|
@@ -6,7 +6,6 @@
|
|||||||
menuconfig MD
|
menuconfig MD
|
||||||
bool "Multiple devices driver support (RAID and LVM)"
|
bool "Multiple devices driver support (RAID and LVM)"
|
||||||
depends on BLOCK
|
depends on BLOCK
|
||||||
select SRCU
|
|
||||||
help
|
help
|
||||||
Support multiple physical spindles through a single logical device.
|
Support multiple physical spindles through a single logical device.
|
||||||
Required for RAID and logical volume management.
|
Required for RAID and logical volume management.
|
||||||
|
@@ -334,7 +334,6 @@ config NETCONSOLE_DYNAMIC
|
|||||||
|
|
||||||
config NETPOLL
|
config NETPOLL
|
||||||
def_bool NETCONSOLE
|
def_bool NETCONSOLE
|
||||||
select SRCU
|
|
||||||
|
|
||||||
config NET_POLL_CONTROLLER
|
config NET_POLL_CONTROLLER
|
||||||
def_bool NETPOLL
|
def_bool NETPOLL
|
||||||
|
@@ -258,7 +258,7 @@ config PCIE_MEDIATEK_GEN3
|
|||||||
MediaTek SoCs.
|
MediaTek SoCs.
|
||||||
|
|
||||||
config VMD
|
config VMD
|
||||||
depends on PCI_MSI && X86_64 && SRCU && !UML
|
depends on PCI_MSI && X86_64 && !UML
|
||||||
tristate "Intel Volume Management Device Driver"
|
tristate "Intel Volume Management Device Driver"
|
||||||
help
|
help
|
||||||
Adds support for the Intel Volume Management Device (VMD). VMD is a
|
Adds support for the Intel Volume Management Device (VMD). VMD is a
|
||||||
|
@@ -17,7 +17,6 @@ config BTRFS_FS
|
|||||||
select FS_IOMAP
|
select FS_IOMAP
|
||||||
select RAID6_PQ
|
select RAID6_PQ
|
||||||
select XOR_BLOCKS
|
select XOR_BLOCKS
|
||||||
select SRCU
|
|
||||||
depends on PAGE_SIZE_LESS_THAN_256KB
|
depends on PAGE_SIZE_LESS_THAN_256KB
|
||||||
|
|
||||||
help
|
help
|
||||||
|
25
fs/locks.c
25
fs/locks.c
@@ -1890,7 +1890,6 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL(generic_setlease);
|
EXPORT_SYMBOL(generic_setlease);
|
||||||
|
|
||||||
#if IS_ENABLED(CONFIG_SRCU)
|
|
||||||
/*
|
/*
|
||||||
* Kernel subsystems can register to be notified on any attempt to set
|
* Kernel subsystems can register to be notified on any attempt to set
|
||||||
* a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
|
* a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
|
||||||
@@ -1924,30 +1923,6 @@ void lease_unregister_notifier(struct notifier_block *nb)
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(lease_unregister_notifier);
|
EXPORT_SYMBOL_GPL(lease_unregister_notifier);
|
||||||
|
|
||||||
#else /* !IS_ENABLED(CONFIG_SRCU) */
|
|
||||||
static inline void
|
|
||||||
lease_notifier_chain_init(void)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void
|
|
||||||
setlease_notifier(long arg, struct file_lock *lease)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
|
|
||||||
int lease_register_notifier(struct notifier_block *nb)
|
|
||||||
{
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(lease_register_notifier);
|
|
||||||
|
|
||||||
void lease_unregister_notifier(struct notifier_block *nb)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(lease_unregister_notifier);
|
|
||||||
|
|
||||||
#endif /* IS_ENABLED(CONFIG_SRCU) */
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* vfs_setlease - sets a lease on an open file
|
* vfs_setlease - sets a lease on an open file
|
||||||
* @filp: file pointer
|
* @filp: file pointer
|
||||||
|
@@ -1,7 +1,6 @@
|
|||||||
# SPDX-License-Identifier: GPL-2.0-only
|
# SPDX-License-Identifier: GPL-2.0-only
|
||||||
config FSNOTIFY
|
config FSNOTIFY
|
||||||
def_bool n
|
def_bool n
|
||||||
select SRCU
|
|
||||||
|
|
||||||
source "fs/notify/dnotify/Kconfig"
|
source "fs/notify/dnotify/Kconfig"
|
||||||
source "fs/notify/inotify/Kconfig"
|
source "fs/notify/inotify/Kconfig"
|
||||||
|
@@ -6,7 +6,6 @@
|
|||||||
config QUOTA
|
config QUOTA
|
||||||
bool "Quota support"
|
bool "Quota support"
|
||||||
select QUOTACTL
|
select QUOTACTL
|
||||||
select SRCU
|
|
||||||
help
|
help
|
||||||
If you say Y here, you will be able to set per user limits for disk
|
If you say Y here, you will be able to set per user limits for disk
|
||||||
usage (also called disk quotas). Currently, it works for the
|
usage (also called disk quotas). Currently, it works for the
|
||||||
|
@@ -52,6 +52,7 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat);
|
|||||||
#define kstat_cpu(cpu) per_cpu(kstat, cpu)
|
#define kstat_cpu(cpu) per_cpu(kstat, cpu)
|
||||||
#define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu)
|
#define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu)
|
||||||
|
|
||||||
|
extern unsigned long long nr_context_switches_cpu(int cpu);
|
||||||
extern unsigned long long nr_context_switches(void);
|
extern unsigned long long nr_context_switches(void);
|
||||||
|
|
||||||
extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
|
extern unsigned int kstat_irqs_cpu(unsigned int irq, int cpu);
|
||||||
@@ -67,6 +68,17 @@ static inline unsigned int kstat_softirqs_cpu(unsigned int irq, int cpu)
|
|||||||
return kstat_cpu(cpu).softirqs[irq];
|
return kstat_cpu(cpu).softirqs[irq];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline unsigned int kstat_cpu_softirqs_sum(int cpu)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
unsigned int sum = 0;
|
||||||
|
|
||||||
|
for (i = 0; i < NR_SOFTIRQS; i++)
|
||||||
|
sum += kstat_softirqs_cpu(i, cpu);
|
||||||
|
|
||||||
|
return sum;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Number of interrupts per specific IRQ source, since bootup
|
* Number of interrupts per specific IRQ source, since bootup
|
||||||
*/
|
*/
|
||||||
@@ -75,7 +87,7 @@ extern unsigned int kstat_irqs_usr(unsigned int irq);
|
|||||||
/*
|
/*
|
||||||
* Number of interrupts per cpu, since bootup
|
* Number of interrupts per cpu, since bootup
|
||||||
*/
|
*/
|
||||||
static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
|
static inline unsigned long kstat_cpu_irqs_sum(unsigned int cpu)
|
||||||
{
|
{
|
||||||
return kstat_cpu(cpu).irqs_sum;
|
return kstat_cpu(cpu).irqs_sum;
|
||||||
}
|
}
|
||||||
|
@@ -139,7 +139,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
|
|||||||
if (last) {
|
if (last) {
|
||||||
n->next = last->next;
|
n->next = last->next;
|
||||||
n->pprev = &last->next;
|
n->pprev = &last->next;
|
||||||
rcu_assign_pointer(hlist_next_rcu(last), n);
|
rcu_assign_pointer(hlist_nulls_next_rcu(last), n);
|
||||||
} else {
|
} else {
|
||||||
hlist_nulls_add_head_rcu(n, h);
|
hlist_nulls_add_head_rcu(n, h);
|
||||||
}
|
}
|
||||||
|
@@ -238,6 +238,7 @@ void synchronize_rcu_tasks_rude(void);
|
|||||||
|
|
||||||
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
|
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
|
||||||
void exit_tasks_rcu_start(void);
|
void exit_tasks_rcu_start(void);
|
||||||
|
void exit_tasks_rcu_stop(void);
|
||||||
void exit_tasks_rcu_finish(void);
|
void exit_tasks_rcu_finish(void);
|
||||||
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
|
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
|
||||||
#define rcu_tasks_classic_qs(t, preempt) do { } while (0)
|
#define rcu_tasks_classic_qs(t, preempt) do { } while (0)
|
||||||
@@ -246,6 +247,7 @@ void exit_tasks_rcu_finish(void);
|
|||||||
#define call_rcu_tasks call_rcu
|
#define call_rcu_tasks call_rcu
|
||||||
#define synchronize_rcu_tasks synchronize_rcu
|
#define synchronize_rcu_tasks synchronize_rcu
|
||||||
static inline void exit_tasks_rcu_start(void) { }
|
static inline void exit_tasks_rcu_start(void) { }
|
||||||
|
static inline void exit_tasks_rcu_stop(void) { }
|
||||||
static inline void exit_tasks_rcu_finish(void) { }
|
static inline void exit_tasks_rcu_finish(void) { }
|
||||||
#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
|
#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
|
||||||
|
|
||||||
@@ -374,11 +376,18 @@ static inline int debug_lockdep_rcu_enabled(void)
|
|||||||
* RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
|
* RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
|
||||||
* @c: condition to check
|
* @c: condition to check
|
||||||
* @s: informative message
|
* @s: informative message
|
||||||
|
*
|
||||||
|
* This checks debug_lockdep_rcu_enabled() before checking (c) to
|
||||||
|
* prevent early boot splats due to lockdep not yet being initialized,
|
||||||
|
* and rechecks it after checking (c) to prevent false-positive splats
|
||||||
|
* due to races with lockdep being disabled. See commit 3066820034b5dd
|
||||||
|
* ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
|
||||||
*/
|
*/
|
||||||
#define RCU_LOCKDEP_WARN(c, s) \
|
#define RCU_LOCKDEP_WARN(c, s) \
|
||||||
do { \
|
do { \
|
||||||
static bool __section(".data.unlikely") __warned; \
|
static bool __section(".data.unlikely") __warned; \
|
||||||
if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
|
if (debug_lockdep_rcu_enabled() && (c) && \
|
||||||
|
debug_lockdep_rcu_enabled() && !__warned) { \
|
||||||
__warned = true; \
|
__warned = true; \
|
||||||
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
|
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
|
||||||
} \
|
} \
|
||||||
@@ -1004,6 +1013,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
|
|||||||
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \
|
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \
|
||||||
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
|
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
|
||||||
|
|
||||||
|
#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
|
||||||
|
#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr)
|
||||||
|
|
||||||
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
|
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
|
||||||
#define kvfree_rcu_arg_2(ptr, rhf) \
|
#define kvfree_rcu_arg_2(ptr, rhf) \
|
||||||
do { \
|
do { \
|
||||||
@@ -1011,8 +1023,7 @@ do { \
|
|||||||
\
|
\
|
||||||
if (___p) { \
|
if (___p) { \
|
||||||
BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
|
BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
|
||||||
kvfree_call_rcu(&((___p)->rhf), (rcu_callback_t)(unsigned long) \
|
kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
|
||||||
(offsetof(typeof(*(ptr)), rhf))); \
|
|
||||||
} \
|
} \
|
||||||
} while (0)
|
} while (0)
|
||||||
|
|
||||||
@@ -1021,7 +1032,7 @@ do { \
|
|||||||
typeof(ptr) ___p = (ptr); \
|
typeof(ptr) ___p = (ptr); \
|
||||||
\
|
\
|
||||||
if (___p) \
|
if (___p) \
|
||||||
kvfree_call_rcu(NULL, (rcu_callback_t) (___p)); \
|
kvfree_call_rcu(NULL, (void *) (___p)); \
|
||||||
} while (0)
|
} while (0)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@@ -98,25 +98,25 @@ static inline void synchronize_rcu_expedited(void)
|
|||||||
*/
|
*/
|
||||||
extern void kvfree(const void *addr);
|
extern void kvfree(const void *addr);
|
||||||
|
|
||||||
static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr)
|
||||||
{
|
{
|
||||||
if (head) {
|
if (head) {
|
||||||
call_rcu(head, func);
|
call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// kvfree_rcu(one_arg) call.
|
// kvfree_rcu(one_arg) call.
|
||||||
might_sleep();
|
might_sleep();
|
||||||
synchronize_rcu();
|
synchronize_rcu();
|
||||||
kvfree((void *) func);
|
kvfree(ptr);
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_KASAN_GENERIC
|
#ifdef CONFIG_KASAN_GENERIC
|
||||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
|
void kvfree_call_rcu(struct rcu_head *head, void *ptr);
|
||||||
#else
|
#else
|
||||||
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
static inline void kvfree_call_rcu(struct rcu_head *head, void *ptr)
|
||||||
{
|
{
|
||||||
__kvfree_call_rcu(head, func);
|
__kvfree_call_rcu(head, ptr);
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
void synchronize_rcu_expedited(void);
|
void synchronize_rcu_expedited(void);
|
||||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
|
void kvfree_call_rcu(struct rcu_head *head, void *ptr);
|
||||||
|
|
||||||
void rcu_barrier(void);
|
void rcu_barrier(void);
|
||||||
bool rcu_eqs_special_set(int cpu);
|
bool rcu_eqs_special_set(int cpu);
|
||||||
|
@@ -214,6 +214,34 @@ srcu_read_lock_notrace(struct srcu_struct *ssp) __acquires(ssp)
|
|||||||
return retval;
|
return retval;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* srcu_down_read - register a new reader for an SRCU-protected structure.
|
||||||
|
* @ssp: srcu_struct in which to register the new reader.
|
||||||
|
*
|
||||||
|
* Enter a semaphore-like SRCU read-side critical section. Note that
|
||||||
|
* SRCU read-side critical sections may be nested. However, it is
|
||||||
|
* illegal to call anything that waits on an SRCU grace period for the
|
||||||
|
* same srcu_struct, whether directly or indirectly. Please note that
|
||||||
|
* one way to indirectly wait on an SRCU grace period is to acquire
|
||||||
|
* a mutex that is held elsewhere while calling synchronize_srcu() or
|
||||||
|
* synchronize_srcu_expedited(). But if you want lockdep to help you
|
||||||
|
* keep this stuff straight, you should instead use srcu_read_lock().
|
||||||
|
*
|
||||||
|
* The semaphore-like nature of srcu_down_read() means that the matching
|
||||||
|
* srcu_up_read() can be invoked from some other context, for example,
|
||||||
|
* from some other task or from an irq handler. However, neither
|
||||||
|
* srcu_down_read() nor srcu_up_read() may be invoked from an NMI handler.
|
||||||
|
*
|
||||||
|
* Calls to srcu_down_read() may be nested, similar to the manner in
|
||||||
|
* which calls to down_read() may be nested.
|
||||||
|
*/
|
||||||
|
static inline int srcu_down_read(struct srcu_struct *ssp) __acquires(ssp)
|
||||||
|
{
|
||||||
|
WARN_ON_ONCE(in_nmi());
|
||||||
|
srcu_check_nmi_safety(ssp, false);
|
||||||
|
return __srcu_read_lock(ssp);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
|
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
|
||||||
* @ssp: srcu_struct in which to unregister the old reader.
|
* @ssp: srcu_struct in which to unregister the old reader.
|
||||||
@@ -254,6 +282,23 @@ srcu_read_unlock_notrace(struct srcu_struct *ssp, int idx) __releases(ssp)
|
|||||||
__srcu_read_unlock(ssp, idx);
|
__srcu_read_unlock(ssp, idx);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* srcu_up_read - unregister a old reader from an SRCU-protected structure.
|
||||||
|
* @ssp: srcu_struct in which to unregister the old reader.
|
||||||
|
* @idx: return value from corresponding srcu_read_lock().
|
||||||
|
*
|
||||||
|
* Exit an SRCU read-side critical section, but not necessarily from
|
||||||
|
* the same context as the maching srcu_down_read().
|
||||||
|
*/
|
||||||
|
static inline void srcu_up_read(struct srcu_struct *ssp, int idx)
|
||||||
|
__releases(ssp)
|
||||||
|
{
|
||||||
|
WARN_ON_ONCE(idx & ~0x1);
|
||||||
|
WARN_ON_ONCE(in_nmi());
|
||||||
|
srcu_check_nmi_safety(ssp, false);
|
||||||
|
__srcu_read_unlock(ssp, idx);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
|
* smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
|
||||||
*
|
*
|
||||||
|
@@ -49,7 +49,7 @@ struct srcu_data {
|
|||||||
struct srcu_node {
|
struct srcu_node {
|
||||||
spinlock_t __private lock;
|
spinlock_t __private lock;
|
||||||
unsigned long srcu_have_cbs[4]; /* GP seq for children having CBs, but only */
|
unsigned long srcu_have_cbs[4]; /* GP seq for children having CBs, but only */
|
||||||
/* if greater than ->srcu_gq_seq. */
|
/* if greater than ->srcu_gp_seq. */
|
||||||
unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs have CBs for given GP? */
|
unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs have CBs for given GP? */
|
||||||
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
|
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
|
||||||
struct srcu_node *srcu_parent; /* Next up in tree. */
|
struct srcu_node *srcu_parent; /* Next up in tree. */
|
||||||
|
@@ -1873,7 +1873,6 @@ config PERF_EVENTS
|
|||||||
default y if PROFILING
|
default y if PROFILING
|
||||||
depends on HAVE_PERF_EVENTS
|
depends on HAVE_PERF_EVENTS
|
||||||
select IRQ_WORK
|
select IRQ_WORK
|
||||||
select SRCU
|
|
||||||
help
|
help
|
||||||
Enable kernel support for various performance events provided
|
Enable kernel support for various performance events provided
|
||||||
by software and hardware.
|
by software and hardware.
|
||||||
|
@@ -46,6 +46,9 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
|
|||||||
torture_param(int, stat_interval, 60,
|
torture_param(int, stat_interval, 60,
|
||||||
"Number of seconds between stats printk()s");
|
"Number of seconds between stats printk()s");
|
||||||
torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
|
torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
|
||||||
|
torture_param(int, rt_boost, 2,
|
||||||
|
"Do periodic rt-boost. 0=Disable, 1=Only for rt_mutex, 2=For all lock types.");
|
||||||
|
torture_param(int, rt_boost_factor, 50, "A factor determining how often rt-boost happens.");
|
||||||
torture_param(int, verbose, 1,
|
torture_param(int, verbose, 1,
|
||||||
"Enable verbose debugging printk()s");
|
"Enable verbose debugging printk()s");
|
||||||
|
|
||||||
@@ -127,15 +130,50 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
|
|||||||
/* BUGGY, do not use in real life!!! */
|
/* BUGGY, do not use in real life!!! */
|
||||||
}
|
}
|
||||||
|
|
||||||
static void torture_boost_dummy(struct torture_random_state *trsp)
|
static void __torture_rt_boost(struct torture_random_state *trsp)
|
||||||
{
|
{
|
||||||
/* Only rtmutexes care about priority */
|
const unsigned int factor = rt_boost_factor;
|
||||||
|
|
||||||
|
if (!rt_task(current)) {
|
||||||
|
/*
|
||||||
|
* Boost priority once every rt_boost_factor operations. When
|
||||||
|
* the task tries to take the lock, the rtmutex it will account
|
||||||
|
* for the new priority, and do any corresponding pi-dance.
|
||||||
|
*/
|
||||||
|
if (trsp && !(torture_random(trsp) %
|
||||||
|
(cxt.nrealwriters_stress * factor))) {
|
||||||
|
sched_set_fifo(current);
|
||||||
|
} else /* common case, do nothing */
|
||||||
|
return;
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* The task will remain boosted for another 10 * rt_boost_factor
|
||||||
|
* operations, then restored back to its original prio, and so
|
||||||
|
* forth.
|
||||||
|
*
|
||||||
|
* When @trsp is nil, we want to force-reset the task for
|
||||||
|
* stopping the kthread.
|
||||||
|
*/
|
||||||
|
if (!trsp || !(torture_random(trsp) %
|
||||||
|
(cxt.nrealwriters_stress * factor * 2))) {
|
||||||
|
sched_set_normal(current, 0);
|
||||||
|
} else /* common case, do nothing */
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void torture_rt_boost(struct torture_random_state *trsp)
|
||||||
|
{
|
||||||
|
if (rt_boost != 2)
|
||||||
|
return;
|
||||||
|
|
||||||
|
__torture_rt_boost(trsp);
|
||||||
}
|
}
|
||||||
|
|
||||||
static struct lock_torture_ops lock_busted_ops = {
|
static struct lock_torture_ops lock_busted_ops = {
|
||||||
.writelock = torture_lock_busted_write_lock,
|
.writelock = torture_lock_busted_write_lock,
|
||||||
.write_delay = torture_lock_busted_write_delay,
|
.write_delay = torture_lock_busted_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_lock_busted_write_unlock,
|
.writeunlock = torture_lock_busted_write_unlock,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -179,7 +217,7 @@ __releases(torture_spinlock)
|
|||||||
static struct lock_torture_ops spin_lock_ops = {
|
static struct lock_torture_ops spin_lock_ops = {
|
||||||
.writelock = torture_spin_lock_write_lock,
|
.writelock = torture_spin_lock_write_lock,
|
||||||
.write_delay = torture_spin_lock_write_delay,
|
.write_delay = torture_spin_lock_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_spin_lock_write_unlock,
|
.writeunlock = torture_spin_lock_write_unlock,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -206,7 +244,7 @@ __releases(torture_spinlock)
|
|||||||
static struct lock_torture_ops spin_lock_irq_ops = {
|
static struct lock_torture_ops spin_lock_irq_ops = {
|
||||||
.writelock = torture_spin_lock_write_lock_irq,
|
.writelock = torture_spin_lock_write_lock_irq,
|
||||||
.write_delay = torture_spin_lock_write_delay,
|
.write_delay = torture_spin_lock_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_lock_spin_write_unlock_irq,
|
.writeunlock = torture_lock_spin_write_unlock_irq,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -275,7 +313,7 @@ __releases(torture_rwlock)
|
|||||||
static struct lock_torture_ops rw_lock_ops = {
|
static struct lock_torture_ops rw_lock_ops = {
|
||||||
.writelock = torture_rwlock_write_lock,
|
.writelock = torture_rwlock_write_lock,
|
||||||
.write_delay = torture_rwlock_write_delay,
|
.write_delay = torture_rwlock_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_rwlock_write_unlock,
|
.writeunlock = torture_rwlock_write_unlock,
|
||||||
.readlock = torture_rwlock_read_lock,
|
.readlock = torture_rwlock_read_lock,
|
||||||
.read_delay = torture_rwlock_read_delay,
|
.read_delay = torture_rwlock_read_delay,
|
||||||
@@ -318,7 +356,7 @@ __releases(torture_rwlock)
|
|||||||
static struct lock_torture_ops rw_lock_irq_ops = {
|
static struct lock_torture_ops rw_lock_irq_ops = {
|
||||||
.writelock = torture_rwlock_write_lock_irq,
|
.writelock = torture_rwlock_write_lock_irq,
|
||||||
.write_delay = torture_rwlock_write_delay,
|
.write_delay = torture_rwlock_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_rwlock_write_unlock_irq,
|
.writeunlock = torture_rwlock_write_unlock_irq,
|
||||||
.readlock = torture_rwlock_read_lock_irq,
|
.readlock = torture_rwlock_read_lock_irq,
|
||||||
.read_delay = torture_rwlock_read_delay,
|
.read_delay = torture_rwlock_read_delay,
|
||||||
@@ -358,7 +396,7 @@ __releases(torture_mutex)
|
|||||||
static struct lock_torture_ops mutex_lock_ops = {
|
static struct lock_torture_ops mutex_lock_ops = {
|
||||||
.writelock = torture_mutex_lock,
|
.writelock = torture_mutex_lock,
|
||||||
.write_delay = torture_mutex_delay,
|
.write_delay = torture_mutex_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_mutex_unlock,
|
.writeunlock = torture_mutex_unlock,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -456,7 +494,7 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
|
|||||||
.exit = torture_ww_mutex_exit,
|
.exit = torture_ww_mutex_exit,
|
||||||
.writelock = torture_ww_mutex_lock,
|
.writelock = torture_ww_mutex_lock,
|
||||||
.write_delay = torture_mutex_delay,
|
.write_delay = torture_mutex_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_ww_mutex_unlock,
|
.writeunlock = torture_ww_mutex_unlock,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -474,37 +512,6 @@ __acquires(torture_rtmutex)
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void torture_rtmutex_boost(struct torture_random_state *trsp)
|
|
||||||
{
|
|
||||||
const unsigned int factor = 50000; /* yes, quite arbitrary */
|
|
||||||
|
|
||||||
if (!rt_task(current)) {
|
|
||||||
/*
|
|
||||||
* Boost priority once every ~50k operations. When the
|
|
||||||
* task tries to take the lock, the rtmutex it will account
|
|
||||||
* for the new priority, and do any corresponding pi-dance.
|
|
||||||
*/
|
|
||||||
if (trsp && !(torture_random(trsp) %
|
|
||||||
(cxt.nrealwriters_stress * factor))) {
|
|
||||||
sched_set_fifo(current);
|
|
||||||
} else /* common case, do nothing */
|
|
||||||
return;
|
|
||||||
} else {
|
|
||||||
/*
|
|
||||||
* The task will remain boosted for another ~500k operations,
|
|
||||||
* then restored back to its original prio, and so forth.
|
|
||||||
*
|
|
||||||
* When @trsp is nil, we want to force-reset the task for
|
|
||||||
* stopping the kthread.
|
|
||||||
*/
|
|
||||||
if (!trsp || !(torture_random(trsp) %
|
|
||||||
(cxt.nrealwriters_stress * factor * 2))) {
|
|
||||||
sched_set_normal(current, 0);
|
|
||||||
} else /* common case, do nothing */
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static void torture_rtmutex_delay(struct torture_random_state *trsp)
|
static void torture_rtmutex_delay(struct torture_random_state *trsp)
|
||||||
{
|
{
|
||||||
const unsigned long shortdelay_us = 2;
|
const unsigned long shortdelay_us = 2;
|
||||||
@@ -530,10 +537,18 @@ __releases(torture_rtmutex)
|
|||||||
rt_mutex_unlock(&torture_rtmutex);
|
rt_mutex_unlock(&torture_rtmutex);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void torture_rt_boost_rtmutex(struct torture_random_state *trsp)
|
||||||
|
{
|
||||||
|
if (!rt_boost)
|
||||||
|
return;
|
||||||
|
|
||||||
|
__torture_rt_boost(trsp);
|
||||||
|
}
|
||||||
|
|
||||||
static struct lock_torture_ops rtmutex_lock_ops = {
|
static struct lock_torture_ops rtmutex_lock_ops = {
|
||||||
.writelock = torture_rtmutex_lock,
|
.writelock = torture_rtmutex_lock,
|
||||||
.write_delay = torture_rtmutex_delay,
|
.write_delay = torture_rtmutex_delay,
|
||||||
.task_boost = torture_rtmutex_boost,
|
.task_boost = torture_rt_boost_rtmutex,
|
||||||
.writeunlock = torture_rtmutex_unlock,
|
.writeunlock = torture_rtmutex_unlock,
|
||||||
.readlock = NULL,
|
.readlock = NULL,
|
||||||
.read_delay = NULL,
|
.read_delay = NULL,
|
||||||
@@ -600,7 +615,7 @@ __releases(torture_rwsem)
|
|||||||
static struct lock_torture_ops rwsem_lock_ops = {
|
static struct lock_torture_ops rwsem_lock_ops = {
|
||||||
.writelock = torture_rwsem_down_write,
|
.writelock = torture_rwsem_down_write,
|
||||||
.write_delay = torture_rwsem_write_delay,
|
.write_delay = torture_rwsem_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_rwsem_up_write,
|
.writeunlock = torture_rwsem_up_write,
|
||||||
.readlock = torture_rwsem_down_read,
|
.readlock = torture_rwsem_down_read,
|
||||||
.read_delay = torture_rwsem_read_delay,
|
.read_delay = torture_rwsem_read_delay,
|
||||||
@@ -652,7 +667,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
|
|||||||
.exit = torture_percpu_rwsem_exit,
|
.exit = torture_percpu_rwsem_exit,
|
||||||
.writelock = torture_percpu_rwsem_down_write,
|
.writelock = torture_percpu_rwsem_down_write,
|
||||||
.write_delay = torture_rwsem_write_delay,
|
.write_delay = torture_rwsem_write_delay,
|
||||||
.task_boost = torture_boost_dummy,
|
.task_boost = torture_rt_boost,
|
||||||
.writeunlock = torture_percpu_rwsem_up_write,
|
.writeunlock = torture_percpu_rwsem_up_write,
|
||||||
.readlock = torture_percpu_rwsem_down_read,
|
.readlock = torture_percpu_rwsem_down_read,
|
||||||
.read_delay = torture_rwsem_read_delay,
|
.read_delay = torture_rwsem_read_delay,
|
||||||
|
@@ -456,7 +456,6 @@ int raw_notifier_call_chain(struct raw_notifier_head *nh,
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(raw_notifier_call_chain);
|
EXPORT_SYMBOL_GPL(raw_notifier_call_chain);
|
||||||
|
|
||||||
#ifdef CONFIG_SRCU
|
|
||||||
/*
|
/*
|
||||||
* SRCU notifier chain routines. Registration and unregistration
|
* SRCU notifier chain routines. Registration and unregistration
|
||||||
* use a mutex, and call_chain is synchronized by SRCU (no locks).
|
* use a mutex, and call_chain is synchronized by SRCU (no locks).
|
||||||
@@ -573,8 +572,6 @@ void srcu_init_notifier_head(struct srcu_notifier_head *nh)
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(srcu_init_notifier_head);
|
EXPORT_SYMBOL_GPL(srcu_init_notifier_head);
|
||||||
|
|
||||||
#endif /* CONFIG_SRCU */
|
|
||||||
|
|
||||||
static ATOMIC_NOTIFIER_HEAD(die_chain);
|
static ATOMIC_NOTIFIER_HEAD(die_chain);
|
||||||
|
|
||||||
int notrace notify_die(enum die_val val, const char *str,
|
int notrace notify_die(enum die_val val, const char *str,
|
||||||
|
@@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
|
|||||||
set_current_state(TASK_INTERRUPTIBLE);
|
set_current_state(TASK_INTERRUPTIBLE);
|
||||||
if (pid_ns->pid_allocated == init_pids)
|
if (pid_ns->pid_allocated == init_pids)
|
||||||
break;
|
break;
|
||||||
|
/*
|
||||||
|
* Release tasks_rcu_exit_srcu to avoid following deadlock:
|
||||||
|
*
|
||||||
|
* 1) TASK A unshare(CLONE_NEWPID)
|
||||||
|
* 2) TASK A fork() twice -> TASK B (child reaper for new ns)
|
||||||
|
* and TASK C
|
||||||
|
* 3) TASK B exits, kills TASK C, waits for TASK A to reap it
|
||||||
|
* 4) TASK A calls synchronize_rcu_tasks()
|
||||||
|
* -> synchronize_srcu(tasks_rcu_exit_srcu)
|
||||||
|
* 5) *DEADLOCK*
|
||||||
|
*
|
||||||
|
* It is considered safe to release tasks_rcu_exit_srcu here
|
||||||
|
* because we assume the current task can not be concurrently
|
||||||
|
* reaped at this point.
|
||||||
|
*/
|
||||||
|
exit_tasks_rcu_stop();
|
||||||
schedule();
|
schedule();
|
||||||
|
exit_tasks_rcu_start();
|
||||||
}
|
}
|
||||||
__set_current_state(TASK_RUNNING);
|
__set_current_state(TASK_RUNNING);
|
||||||
|
|
||||||
|
@@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
|
|||||||
config RCU_EXP_CPU_STALL_TIMEOUT
|
config RCU_EXP_CPU_STALL_TIMEOUT
|
||||||
int "Expedited RCU CPU stall timeout in milliseconds"
|
int "Expedited RCU CPU stall timeout in milliseconds"
|
||||||
depends on RCU_STALL_COMMON
|
depends on RCU_STALL_COMMON
|
||||||
range 0 21000
|
range 0 300000
|
||||||
default 0
|
default 0
|
||||||
help
|
help
|
||||||
If a given expedited RCU grace period extends more than the
|
If a given expedited RCU grace period extends more than the
|
||||||
@@ -92,6 +92,19 @@ config RCU_EXP_CPU_STALL_TIMEOUT
|
|||||||
says to use the RCU_CPU_STALL_TIMEOUT value converted from
|
says to use the RCU_CPU_STALL_TIMEOUT value converted from
|
||||||
seconds to milliseconds.
|
seconds to milliseconds.
|
||||||
|
|
||||||
|
config RCU_CPU_STALL_CPUTIME
|
||||||
|
bool "Provide additional RCU stall debug information"
|
||||||
|
depends on RCU_STALL_COMMON
|
||||||
|
default n
|
||||||
|
help
|
||||||
|
Collect statistics during the sampling period, such as the number of
|
||||||
|
(hard interrupts, soft interrupts, task switches) and the cputime of
|
||||||
|
(hard interrupts, soft interrupts, kernel tasks) are added to the
|
||||||
|
RCU stall report. For multiple continuous RCU stalls, all sampling
|
||||||
|
periods begin at half of the first RCU stall timeout.
|
||||||
|
The boot option rcupdate.rcu_cpu_stall_cputime has the same function
|
||||||
|
as this one, but will override this if it exists.
|
||||||
|
|
||||||
config RCU_TRACE
|
config RCU_TRACE
|
||||||
bool "Enable tracing for RCU"
|
bool "Enable tracing for RCU"
|
||||||
depends on DEBUG_KERNEL
|
depends on DEBUG_KERNEL
|
||||||
|
@@ -224,6 +224,8 @@ extern int rcu_cpu_stall_ftrace_dump;
|
|||||||
extern int rcu_cpu_stall_suppress;
|
extern int rcu_cpu_stall_suppress;
|
||||||
extern int rcu_cpu_stall_timeout;
|
extern int rcu_cpu_stall_timeout;
|
||||||
extern int rcu_exp_cpu_stall_timeout;
|
extern int rcu_exp_cpu_stall_timeout;
|
||||||
|
extern int rcu_cpu_stall_cputime;
|
||||||
|
extern bool rcu_exp_stall_task_details __read_mostly;
|
||||||
int rcu_jiffies_till_stall_check(void);
|
int rcu_jiffies_till_stall_check(void);
|
||||||
int rcu_exp_jiffies_till_stall_check(void);
|
int rcu_exp_jiffies_till_stall_check(void);
|
||||||
|
|
||||||
@@ -447,14 +449,20 @@ do { \
|
|||||||
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
|
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
|
||||||
static inline bool rcu_gp_is_normal(void) { return true; }
|
static inline bool rcu_gp_is_normal(void) { return true; }
|
||||||
static inline bool rcu_gp_is_expedited(void) { return false; }
|
static inline bool rcu_gp_is_expedited(void) { return false; }
|
||||||
|
static inline bool rcu_async_should_hurry(void) { return false; }
|
||||||
static inline void rcu_expedite_gp(void) { }
|
static inline void rcu_expedite_gp(void) { }
|
||||||
static inline void rcu_unexpedite_gp(void) { }
|
static inline void rcu_unexpedite_gp(void) { }
|
||||||
|
static inline void rcu_async_hurry(void) { }
|
||||||
|
static inline void rcu_async_relax(void) { }
|
||||||
static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
|
static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
|
||||||
#else /* #ifdef CONFIG_TINY_RCU */
|
#else /* #ifdef CONFIG_TINY_RCU */
|
||||||
bool rcu_gp_is_normal(void); /* Internal RCU use. */
|
bool rcu_gp_is_normal(void); /* Internal RCU use. */
|
||||||
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
|
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
|
||||||
|
bool rcu_async_should_hurry(void); /* Internal RCU use. */
|
||||||
void rcu_expedite_gp(void);
|
void rcu_expedite_gp(void);
|
||||||
void rcu_unexpedite_gp(void);
|
void rcu_unexpedite_gp(void);
|
||||||
|
void rcu_async_hurry(void);
|
||||||
|
void rcu_async_relax(void);
|
||||||
void rcupdate_announce_bootup_oddness(void);
|
void rcupdate_announce_bootup_oddness(void);
|
||||||
#ifdef CONFIG_TASKS_RCU_GENERIC
|
#ifdef CONFIG_TASKS_RCU_GENERIC
|
||||||
void show_rcu_tasks_gp_kthreads(void);
|
void show_rcu_tasks_gp_kthreads(void);
|
||||||
|
@@ -89,7 +89,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* Get the length of a segment of the rcu_segcblist structure. */
|
/* Get the length of a segment of the rcu_segcblist structure. */
|
||||||
static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
|
long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
|
||||||
{
|
{
|
||||||
return READ_ONCE(rsclp->seglen[seg]);
|
return READ_ONCE(rsclp->seglen[seg]);
|
||||||
}
|
}
|
||||||
|
@@ -15,6 +15,8 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
|
|||||||
return READ_ONCE(rclp->len);
|
return READ_ONCE(rclp->len);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg);
|
||||||
|
|
||||||
/* Return number of callbacks in segmented callback list by summing seglen. */
|
/* Return number of callbacks in segmented callback list by summing seglen. */
|
||||||
long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
|
long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
|
||||||
|
|
||||||
|
@@ -399,7 +399,7 @@ static int torture_readlock_not_held(void)
|
|||||||
return rcu_read_lock_bh_held() || rcu_read_lock_sched_held();
|
return rcu_read_lock_bh_held() || rcu_read_lock_sched_held();
|
||||||
}
|
}
|
||||||
|
|
||||||
static int rcu_torture_read_lock(void) __acquires(RCU)
|
static int rcu_torture_read_lock(void)
|
||||||
{
|
{
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
return 0;
|
return 0;
|
||||||
@@ -441,7 +441,7 @@ rcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void rcu_torture_read_unlock(int idx) __releases(RCU)
|
static void rcu_torture_read_unlock(int idx)
|
||||||
{
|
{
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
}
|
}
|
||||||
@@ -625,7 +625,7 @@ static struct srcu_struct srcu_ctld;
|
|||||||
static struct srcu_struct *srcu_ctlp = &srcu_ctl;
|
static struct srcu_struct *srcu_ctlp = &srcu_ctl;
|
||||||
static struct rcu_torture_ops srcud_ops;
|
static struct rcu_torture_ops srcud_ops;
|
||||||
|
|
||||||
static int srcu_torture_read_lock(void) __acquires(srcu_ctlp)
|
static int srcu_torture_read_lock(void)
|
||||||
{
|
{
|
||||||
if (cur_ops == &srcud_ops)
|
if (cur_ops == &srcud_ops)
|
||||||
return srcu_read_lock_nmisafe(srcu_ctlp);
|
return srcu_read_lock_nmisafe(srcu_ctlp);
|
||||||
@@ -652,7 +652,7 @@ srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void srcu_torture_read_unlock(int idx) __releases(srcu_ctlp)
|
static void srcu_torture_read_unlock(int idx)
|
||||||
{
|
{
|
||||||
if (cur_ops == &srcud_ops)
|
if (cur_ops == &srcud_ops)
|
||||||
srcu_read_unlock_nmisafe(srcu_ctlp, idx);
|
srcu_read_unlock_nmisafe(srcu_ctlp, idx);
|
||||||
@@ -814,13 +814,13 @@ static void synchronize_rcu_trivial(void)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int rcu_torture_read_lock_trivial(void) __acquires(RCU)
|
static int rcu_torture_read_lock_trivial(void)
|
||||||
{
|
{
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void rcu_torture_read_unlock_trivial(int idx) __releases(RCU)
|
static void rcu_torture_read_unlock_trivial(int idx)
|
||||||
{
|
{
|
||||||
preempt_enable();
|
preempt_enable();
|
||||||
}
|
}
|
||||||
|
@@ -76,6 +76,8 @@ torture_param(int, verbose_batched, 0, "Batch verbose debugging printk()s");
|
|||||||
// Wait until there are multiple CPUs before starting test.
|
// Wait until there are multiple CPUs before starting test.
|
||||||
torture_param(int, holdoff, IS_BUILTIN(CONFIG_RCU_REF_SCALE_TEST) ? 10 : 0,
|
torture_param(int, holdoff, IS_BUILTIN(CONFIG_RCU_REF_SCALE_TEST) ? 10 : 0,
|
||||||
"Holdoff time before test start (s)");
|
"Holdoff time before test start (s)");
|
||||||
|
// Number of typesafe_lookup structures, that is, the degree of concurrency.
|
||||||
|
torture_param(long, lookup_instances, 0, "Number of typesafe_lookup structures.");
|
||||||
// Number of loops per experiment, all readers execute operations concurrently.
|
// Number of loops per experiment, all readers execute operations concurrently.
|
||||||
torture_param(long, loops, 10000, "Number of loops per experiment.");
|
torture_param(long, loops, 10000, "Number of loops per experiment.");
|
||||||
// Number of readers, with -1 defaulting to about 75% of the CPUs.
|
// Number of readers, with -1 defaulting to about 75% of the CPUs.
|
||||||
@@ -124,7 +126,7 @@ static int exp_idx;
|
|||||||
|
|
||||||
// Operations vector for selecting different types of tests.
|
// Operations vector for selecting different types of tests.
|
||||||
struct ref_scale_ops {
|
struct ref_scale_ops {
|
||||||
void (*init)(void);
|
bool (*init)(void);
|
||||||
void (*cleanup)(void);
|
void (*cleanup)(void);
|
||||||
void (*readsection)(const int nloops);
|
void (*readsection)(const int nloops);
|
||||||
void (*delaysection)(const int nloops, const int udl, const int ndl);
|
void (*delaysection)(const int nloops, const int udl, const int ndl);
|
||||||
@@ -162,8 +164,9 @@ static void ref_rcu_delay_section(const int nloops, const int udl, const int ndl
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void rcu_sync_scale_init(void)
|
static bool rcu_sync_scale_init(void)
|
||||||
{
|
{
|
||||||
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
static struct ref_scale_ops rcu_ops = {
|
static struct ref_scale_ops rcu_ops = {
|
||||||
@@ -315,9 +318,10 @@ static struct ref_scale_ops refcnt_ops = {
|
|||||||
// Definitions for rwlock
|
// Definitions for rwlock
|
||||||
static rwlock_t test_rwlock;
|
static rwlock_t test_rwlock;
|
||||||
|
|
||||||
static void ref_rwlock_init(void)
|
static bool ref_rwlock_init(void)
|
||||||
{
|
{
|
||||||
rwlock_init(&test_rwlock);
|
rwlock_init(&test_rwlock);
|
||||||
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ref_rwlock_section(const int nloops)
|
static void ref_rwlock_section(const int nloops)
|
||||||
@@ -351,9 +355,10 @@ static struct ref_scale_ops rwlock_ops = {
|
|||||||
// Definitions for rwsem
|
// Definitions for rwsem
|
||||||
static struct rw_semaphore test_rwsem;
|
static struct rw_semaphore test_rwsem;
|
||||||
|
|
||||||
static void ref_rwsem_init(void)
|
static bool ref_rwsem_init(void)
|
||||||
{
|
{
|
||||||
init_rwsem(&test_rwsem);
|
init_rwsem(&test_rwsem);
|
||||||
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ref_rwsem_section(const int nloops)
|
static void ref_rwsem_section(const int nloops)
|
||||||
@@ -523,6 +528,237 @@ static struct ref_scale_ops clock_ops = {
|
|||||||
.name = "clock"
|
.name = "clock"
|
||||||
};
|
};
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////
|
||||||
|
//
|
||||||
|
// Methods leveraging SLAB_TYPESAFE_BY_RCU.
|
||||||
|
//
|
||||||
|
|
||||||
|
// Item to look up in a typesafe manner. Array of pointers to these.
|
||||||
|
struct refscale_typesafe {
|
||||||
|
atomic_t rts_refctr; // Used by all flavors
|
||||||
|
spinlock_t rts_lock;
|
||||||
|
seqlock_t rts_seqlock;
|
||||||
|
unsigned int a;
|
||||||
|
unsigned int b;
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct kmem_cache *typesafe_kmem_cachep;
|
||||||
|
static struct refscale_typesafe **rtsarray;
|
||||||
|
static long rtsarray_size;
|
||||||
|
static DEFINE_TORTURE_RANDOM_PERCPU(refscale_rand);
|
||||||
|
static bool (*rts_acquire)(struct refscale_typesafe *rtsp, unsigned int *start);
|
||||||
|
static bool (*rts_release)(struct refscale_typesafe *rtsp, unsigned int start);
|
||||||
|
|
||||||
|
// Conditionally acquire an explicit in-structure reference count.
|
||||||
|
static bool typesafe_ref_acquire(struct refscale_typesafe *rtsp, unsigned int *start)
|
||||||
|
{
|
||||||
|
return atomic_inc_not_zero(&rtsp->rts_refctr);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unconditionally release an explicit in-structure reference count.
|
||||||
|
static bool typesafe_ref_release(struct refscale_typesafe *rtsp, unsigned int start)
|
||||||
|
{
|
||||||
|
if (!atomic_dec_return(&rtsp->rts_refctr)) {
|
||||||
|
WRITE_ONCE(rtsp->a, rtsp->a + 1);
|
||||||
|
kmem_cache_free(typesafe_kmem_cachep, rtsp);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unconditionally acquire an explicit in-structure spinlock.
|
||||||
|
static bool typesafe_lock_acquire(struct refscale_typesafe *rtsp, unsigned int *start)
|
||||||
|
{
|
||||||
|
spin_lock(&rtsp->rts_lock);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unconditionally release an explicit in-structure spinlock.
|
||||||
|
static bool typesafe_lock_release(struct refscale_typesafe *rtsp, unsigned int start)
|
||||||
|
{
|
||||||
|
spin_unlock(&rtsp->rts_lock);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unconditionally acquire an explicit in-structure sequence lock.
|
||||||
|
static bool typesafe_seqlock_acquire(struct refscale_typesafe *rtsp, unsigned int *start)
|
||||||
|
{
|
||||||
|
*start = read_seqbegin(&rtsp->rts_seqlock);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Conditionally release an explicit in-structure sequence lock. Return
|
||||||
|
// true if this release was successful, that is, if no retry is required.
|
||||||
|
static bool typesafe_seqlock_release(struct refscale_typesafe *rtsp, unsigned int start)
|
||||||
|
{
|
||||||
|
return !read_seqretry(&rtsp->rts_seqlock, start);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Do a read-side critical section with the specified delay in
|
||||||
|
// microseconds and nanoseconds inserted so as to increase probability
|
||||||
|
// of failure.
|
||||||
|
static void typesafe_delay_section(const int nloops, const int udl, const int ndl)
|
||||||
|
{
|
||||||
|
unsigned int a;
|
||||||
|
unsigned int b;
|
||||||
|
int i;
|
||||||
|
long idx;
|
||||||
|
struct refscale_typesafe *rtsp;
|
||||||
|
unsigned int start;
|
||||||
|
|
||||||
|
for (i = nloops; i >= 0; i--) {
|
||||||
|
preempt_disable();
|
||||||
|
idx = torture_random(this_cpu_ptr(&refscale_rand)) % rtsarray_size;
|
||||||
|
preempt_enable();
|
||||||
|
retry:
|
||||||
|
rcu_read_lock();
|
||||||
|
rtsp = rcu_dereference(rtsarray[idx]);
|
||||||
|
a = READ_ONCE(rtsp->a);
|
||||||
|
if (!rts_acquire(rtsp, &start)) {
|
||||||
|
rcu_read_unlock();
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
if (a != READ_ONCE(rtsp->a)) {
|
||||||
|
(void)rts_release(rtsp, start);
|
||||||
|
rcu_read_unlock();
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
un_delay(udl, ndl);
|
||||||
|
// Remember, seqlock read-side release can fail.
|
||||||
|
if (!rts_release(rtsp, start)) {
|
||||||
|
rcu_read_unlock();
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
b = READ_ONCE(rtsp->a);
|
||||||
|
WARN_ONCE(a != b, "Re-read of ->a changed from %u to %u.\n", a, b);
|
||||||
|
b = rtsp->b;
|
||||||
|
rcu_read_unlock();
|
||||||
|
WARN_ON_ONCE(a * a != b);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Because the acquisition and release methods are expensive, there
|
||||||
|
// is no point in optimizing away the un_delay() function's two checks.
|
||||||
|
// Thus simply define typesafe_read_section() as a simple wrapper around
|
||||||
|
// typesafe_delay_section().
|
||||||
|
static void typesafe_read_section(const int nloops)
|
||||||
|
{
|
||||||
|
typesafe_delay_section(nloops, 0, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Allocate and initialize one refscale_typesafe structure.
|
||||||
|
static struct refscale_typesafe *typesafe_alloc_one(void)
|
||||||
|
{
|
||||||
|
struct refscale_typesafe *rtsp;
|
||||||
|
|
||||||
|
rtsp = kmem_cache_alloc(typesafe_kmem_cachep, GFP_KERNEL);
|
||||||
|
if (!rtsp)
|
||||||
|
return NULL;
|
||||||
|
atomic_set(&rtsp->rts_refctr, 1);
|
||||||
|
WRITE_ONCE(rtsp->a, rtsp->a + 1);
|
||||||
|
WRITE_ONCE(rtsp->b, rtsp->a * rtsp->a);
|
||||||
|
return rtsp;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Slab-allocator constructor for refscale_typesafe structures created
|
||||||
|
// out of a new slab of system memory.
|
||||||
|
static void refscale_typesafe_ctor(void *rtsp_in)
|
||||||
|
{
|
||||||
|
struct refscale_typesafe *rtsp = rtsp_in;
|
||||||
|
|
||||||
|
spin_lock_init(&rtsp->rts_lock);
|
||||||
|
seqlock_init(&rtsp->rts_seqlock);
|
||||||
|
preempt_disable();
|
||||||
|
rtsp->a = torture_random(this_cpu_ptr(&refscale_rand));
|
||||||
|
preempt_enable();
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct ref_scale_ops typesafe_ref_ops;
|
||||||
|
static struct ref_scale_ops typesafe_lock_ops;
|
||||||
|
static struct ref_scale_ops typesafe_seqlock_ops;
|
||||||
|
|
||||||
|
// Initialize for a typesafe test.
|
||||||
|
static bool typesafe_init(void)
|
||||||
|
{
|
||||||
|
long idx;
|
||||||
|
long si = lookup_instances;
|
||||||
|
|
||||||
|
typesafe_kmem_cachep = kmem_cache_create("refscale_typesafe",
|
||||||
|
sizeof(struct refscale_typesafe), sizeof(void *),
|
||||||
|
SLAB_TYPESAFE_BY_RCU, refscale_typesafe_ctor);
|
||||||
|
if (!typesafe_kmem_cachep)
|
||||||
|
return false;
|
||||||
|
if (si < 0)
|
||||||
|
si = -si * nr_cpu_ids;
|
||||||
|
else if (si == 0)
|
||||||
|
si = nr_cpu_ids;
|
||||||
|
rtsarray_size = si;
|
||||||
|
rtsarray = kcalloc(si, sizeof(*rtsarray), GFP_KERNEL);
|
||||||
|
if (!rtsarray)
|
||||||
|
return false;
|
||||||
|
for (idx = 0; idx < rtsarray_size; idx++) {
|
||||||
|
rtsarray[idx] = typesafe_alloc_one();
|
||||||
|
if (!rtsarray[idx])
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if (cur_ops == &typesafe_ref_ops) {
|
||||||
|
rts_acquire = typesafe_ref_acquire;
|
||||||
|
rts_release = typesafe_ref_release;
|
||||||
|
} else if (cur_ops == &typesafe_lock_ops) {
|
||||||
|
rts_acquire = typesafe_lock_acquire;
|
||||||
|
rts_release = typesafe_lock_release;
|
||||||
|
} else if (cur_ops == &typesafe_seqlock_ops) {
|
||||||
|
rts_acquire = typesafe_seqlock_acquire;
|
||||||
|
rts_release = typesafe_seqlock_release;
|
||||||
|
} else {
|
||||||
|
WARN_ON_ONCE(1);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up after a typesafe test.
|
||||||
|
static void typesafe_cleanup(void)
|
||||||
|
{
|
||||||
|
long idx;
|
||||||
|
|
||||||
|
if (rtsarray) {
|
||||||
|
for (idx = 0; idx < rtsarray_size; idx++)
|
||||||
|
kmem_cache_free(typesafe_kmem_cachep, rtsarray[idx]);
|
||||||
|
kfree(rtsarray);
|
||||||
|
rtsarray = NULL;
|
||||||
|
rtsarray_size = 0;
|
||||||
|
}
|
||||||
|
kmem_cache_destroy(typesafe_kmem_cachep);
|
||||||
|
typesafe_kmem_cachep = NULL;
|
||||||
|
rts_acquire = NULL;
|
||||||
|
rts_release = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
// The typesafe_init() function distinguishes these structures by address.
|
||||||
|
static struct ref_scale_ops typesafe_ref_ops = {
|
||||||
|
.init = typesafe_init,
|
||||||
|
.cleanup = typesafe_cleanup,
|
||||||
|
.readsection = typesafe_read_section,
|
||||||
|
.delaysection = typesafe_delay_section,
|
||||||
|
.name = "typesafe_ref"
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct ref_scale_ops typesafe_lock_ops = {
|
||||||
|
.init = typesafe_init,
|
||||||
|
.cleanup = typesafe_cleanup,
|
||||||
|
.readsection = typesafe_read_section,
|
||||||
|
.delaysection = typesafe_delay_section,
|
||||||
|
.name = "typesafe_lock"
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct ref_scale_ops typesafe_seqlock_ops = {
|
||||||
|
.init = typesafe_init,
|
||||||
|
.cleanup = typesafe_cleanup,
|
||||||
|
.readsection = typesafe_read_section,
|
||||||
|
.delaysection = typesafe_delay_section,
|
||||||
|
.name = "typesafe_seqlock"
|
||||||
|
};
|
||||||
|
|
||||||
static void rcu_scale_one_reader(void)
|
static void rcu_scale_one_reader(void)
|
||||||
{
|
{
|
||||||
if (readdelay <= 0)
|
if (readdelay <= 0)
|
||||||
@@ -812,6 +1048,7 @@ ref_scale_init(void)
|
|||||||
static struct ref_scale_ops *scale_ops[] = {
|
static struct ref_scale_ops *scale_ops[] = {
|
||||||
&rcu_ops, &srcu_ops, RCU_TRACE_OPS RCU_TASKS_OPS &refcnt_ops, &rwlock_ops,
|
&rcu_ops, &srcu_ops, RCU_TRACE_OPS RCU_TASKS_OPS &refcnt_ops, &rwlock_ops,
|
||||||
&rwsem_ops, &lock_ops, &lock_irq_ops, &acqrel_ops, &clock_ops,
|
&rwsem_ops, &lock_ops, &lock_irq_ops, &acqrel_ops, &clock_ops,
|
||||||
|
&typesafe_ref_ops, &typesafe_lock_ops, &typesafe_seqlock_ops,
|
||||||
};
|
};
|
||||||
|
|
||||||
if (!torture_init_begin(scale_type, verbose))
|
if (!torture_init_begin(scale_type, verbose))
|
||||||
@@ -833,7 +1070,10 @@ ref_scale_init(void)
|
|||||||
goto unwind;
|
goto unwind;
|
||||||
}
|
}
|
||||||
if (cur_ops->init)
|
if (cur_ops->init)
|
||||||
cur_ops->init();
|
if (!cur_ops->init()) {
|
||||||
|
firsterr = -EUCLEAN;
|
||||||
|
goto unwind;
|
||||||
|
}
|
||||||
|
|
||||||
ref_scale_print_module_parms(cur_ops, "Start of test");
|
ref_scale_print_module_parms(cur_ops, "Start of test");
|
||||||
|
|
||||||
|
@@ -154,7 +154,7 @@ static void init_srcu_struct_data(struct srcu_struct *ssp)
|
|||||||
*/
|
*/
|
||||||
static inline bool srcu_invl_snp_seq(unsigned long s)
|
static inline bool srcu_invl_snp_seq(unsigned long s)
|
||||||
{
|
{
|
||||||
return rcu_seq_state(s) == SRCU_SNP_INIT_SEQ;
|
return s == SRCU_SNP_INIT_SEQ;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -469,24 +469,59 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *ssp, int idx)
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* If the locks are the same as the unlocks, then there must have
|
* If the locks are the same as the unlocks, then there must have
|
||||||
* been no readers on this index at some time in between. This does
|
* been no readers on this index at some point in this function.
|
||||||
* not mean that there are no more readers, as one could have read
|
* But there might be more readers, as a task might have read
|
||||||
* the current index but not have incremented the lock counter yet.
|
* the current ->srcu_idx but not yet have incremented its CPU's
|
||||||
|
* ->srcu_lock_count[idx] counter. In fact, it is possible
|
||||||
|
* that most of the tasks have been preempted between fetching
|
||||||
|
* ->srcu_idx and incrementing ->srcu_lock_count[idx]. And there
|
||||||
|
* could be almost (ULONG_MAX / sizeof(struct task_struct)) tasks
|
||||||
|
* in a system whose address space was fully populated with memory.
|
||||||
|
* Call this quantity Nt.
|
||||||
*
|
*
|
||||||
* So suppose that the updater is preempted here for so long
|
* So suppose that the updater is preempted at this point in the
|
||||||
* that more than ULONG_MAX non-nested readers come and go in
|
* code for a long time. That now-preempted updater has already
|
||||||
* the meantime. It turns out that this cannot result in overflow
|
* flipped ->srcu_idx (possibly during the preceding grace period),
|
||||||
* because if a reader modifies its unlock count after we read it
|
* done an smp_mb() (again, possibly during the preceding grace
|
||||||
* above, then that reader's next load of ->srcu_idx is guaranteed
|
* period), and summed up the ->srcu_unlock_count[idx] counters.
|
||||||
* to get the new value, which will cause it to operate on the
|
* How many times can a given one of the aforementioned Nt tasks
|
||||||
* other bank of counters, where it cannot contribute to the
|
* increment the old ->srcu_idx value's ->srcu_lock_count[idx]
|
||||||
* overflow of these counters. This means that there is a maximum
|
* counter, in the absence of nesting?
|
||||||
* of 2*NR_CPUS increments, which cannot overflow given current
|
|
||||||
* systems, especially not on 64-bit systems.
|
|
||||||
*
|
*
|
||||||
* OK, how about nesting? This does impose a limit on nesting
|
* It can clearly do so once, given that it has already fetched
|
||||||
* of floor(ULONG_MAX/NR_CPUS/2), which should be sufficient,
|
* the old value of ->srcu_idx and is just about to use that value
|
||||||
* especially on 64-bit systems.
|
* to index its increment of ->srcu_lock_count[idx]. But as soon as
|
||||||
|
* it leaves that SRCU read-side critical section, it will increment
|
||||||
|
* ->srcu_unlock_count[idx], which must follow the updater's above
|
||||||
|
* read from that same value. Thus, as soon the reading task does
|
||||||
|
* an smp_mb() and a later fetch from ->srcu_idx, that task will be
|
||||||
|
* guaranteed to get the new index. Except that the increment of
|
||||||
|
* ->srcu_unlock_count[idx] in __srcu_read_unlock() is after the
|
||||||
|
* smp_mb(), and the fetch from ->srcu_idx in __srcu_read_lock()
|
||||||
|
* is before the smp_mb(). Thus, that task might not see the new
|
||||||
|
* value of ->srcu_idx until the -second- __srcu_read_lock(),
|
||||||
|
* which in turn means that this task might well increment
|
||||||
|
* ->srcu_lock_count[idx] for the old value of ->srcu_idx twice,
|
||||||
|
* not just once.
|
||||||
|
*
|
||||||
|
* However, it is important to note that a given smp_mb() takes
|
||||||
|
* effect not just for the task executing it, but also for any
|
||||||
|
* later task running on that same CPU.
|
||||||
|
*
|
||||||
|
* That is, there can be almost Nt + Nc further increments of
|
||||||
|
* ->srcu_lock_count[idx] for the old index, where Nc is the number
|
||||||
|
* of CPUs. But this is OK because the size of the task_struct
|
||||||
|
* structure limits the value of Nt and current systems limit Nc
|
||||||
|
* to a few thousand.
|
||||||
|
*
|
||||||
|
* OK, but what about nesting? This does impose a limit on
|
||||||
|
* nesting of half of the size of the task_struct structure
|
||||||
|
* (measured in bytes), which should be sufficient. A late 2022
|
||||||
|
* TREE01 rcutorture run reported this size to be no less than
|
||||||
|
* 9408 bytes, allowing up to 4704 levels of nesting, which is
|
||||||
|
* comfortably beyond excessive. Especially on 64-bit systems,
|
||||||
|
* which are unlikely to be configured with an address space fully
|
||||||
|
* populated with memory, at least not anytime soon.
|
||||||
*/
|
*/
|
||||||
return srcu_readers_lock_idx(ssp, idx) == unlocks;
|
return srcu_readers_lock_idx(ssp, idx) == unlocks;
|
||||||
}
|
}
|
||||||
@@ -726,7 +761,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
|
|||||||
int state;
|
int state;
|
||||||
|
|
||||||
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
|
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
|
||||||
sdp = per_cpu_ptr(ssp->sda, 0);
|
sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
|
||||||
else
|
else
|
||||||
sdp = this_cpu_ptr(ssp->sda);
|
sdp = this_cpu_ptr(ssp->sda);
|
||||||
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
|
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
|
||||||
@@ -837,7 +872,8 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
|||||||
/* Initiate callback invocation as needed. */
|
/* Initiate callback invocation as needed. */
|
||||||
ss_state = smp_load_acquire(&ssp->srcu_size_state);
|
ss_state = smp_load_acquire(&ssp->srcu_size_state);
|
||||||
if (ss_state < SRCU_SIZE_WAIT_BARRIER) {
|
if (ss_state < SRCU_SIZE_WAIT_BARRIER) {
|
||||||
srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, 0), cbdelay);
|
srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, get_boot_cpu_id()),
|
||||||
|
cbdelay);
|
||||||
} else {
|
} else {
|
||||||
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
|
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
|
||||||
srcu_for_each_node_breadth_first(ssp, snp) {
|
srcu_for_each_node_breadth_first(ssp, snp) {
|
||||||
@@ -914,7 +950,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
|
|||||||
if (snp)
|
if (snp)
|
||||||
for (; snp != NULL; snp = snp->srcu_parent) {
|
for (; snp != NULL; snp = snp->srcu_parent) {
|
||||||
sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp);
|
sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp);
|
||||||
if (rcu_seq_done(&ssp->srcu_gp_seq, s) ||
|
if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) ||
|
||||||
(!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)))
|
(!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)))
|
||||||
return;
|
return;
|
||||||
spin_lock_irqsave_rcu_node(snp, flags);
|
spin_lock_irqsave_rcu_node(snp, flags);
|
||||||
@@ -941,6 +977,9 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
|
|||||||
*
|
*
|
||||||
* Note that this function also does the work of srcu_funnel_exp_start(),
|
* Note that this function also does the work of srcu_funnel_exp_start(),
|
||||||
* in some cases by directly invoking it.
|
* in some cases by directly invoking it.
|
||||||
|
*
|
||||||
|
* The srcu read lock should be hold around this function. And s is a seq snap
|
||||||
|
* after holding that lock.
|
||||||
*/
|
*/
|
||||||
static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
||||||
unsigned long s, bool do_norm)
|
unsigned long s, bool do_norm)
|
||||||
@@ -961,7 +1000,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
|||||||
if (snp_leaf)
|
if (snp_leaf)
|
||||||
/* Each pass through the loop does one level of the srcu_node tree. */
|
/* Each pass through the loop does one level of the srcu_node tree. */
|
||||||
for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
|
for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
|
||||||
if (rcu_seq_done(&ssp->srcu_gp_seq, s) && snp != snp_leaf)
|
if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) && snp != snp_leaf)
|
||||||
return; /* GP already done and CBs recorded. */
|
return; /* GP already done and CBs recorded. */
|
||||||
spin_lock_irqsave_rcu_node(snp, flags);
|
spin_lock_irqsave_rcu_node(snp, flags);
|
||||||
snp_seq = snp->srcu_have_cbs[idx];
|
snp_seq = snp->srcu_have_cbs[idx];
|
||||||
@@ -998,8 +1037,8 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
|||||||
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
||||||
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
||||||
|
|
||||||
/* If grace period not already done and none in progress, start it. */
|
/* If grace period not already in progress, start it. */
|
||||||
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
|
if (!WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) &&
|
||||||
rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
|
rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
|
||||||
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
|
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
|
||||||
srcu_gp_start(ssp);
|
srcu_gp_start(ssp);
|
||||||
@@ -1059,10 +1098,11 @@ static void srcu_flip(struct srcu_struct *ssp)
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* Ensure that if the updater misses an __srcu_read_unlock()
|
* Ensure that if the updater misses an __srcu_read_unlock()
|
||||||
* increment, that task's next __srcu_read_lock() will see the
|
* increment, that task's __srcu_read_lock() following its next
|
||||||
* above counter update. Note that both this memory barrier
|
* __srcu_read_lock() or __srcu_read_unlock() will see the above
|
||||||
* and the one in srcu_readers_active_idx_check() provide the
|
* counter update. Note that both this memory barrier and the
|
||||||
* guarantee for __srcu_read_lock().
|
* one in srcu_readers_active_idx_check() provide the guarantee
|
||||||
|
* for __srcu_read_lock().
|
||||||
*/
|
*/
|
||||||
smp_mb(); /* D */ /* Pairs with C. */
|
smp_mb(); /* D */ /* Pairs with C. */
|
||||||
}
|
}
|
||||||
@@ -1161,7 +1201,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
|
|||||||
idx = __srcu_read_lock_nmisafe(ssp);
|
idx = __srcu_read_lock_nmisafe(ssp);
|
||||||
ss_state = smp_load_acquire(&ssp->srcu_size_state);
|
ss_state = smp_load_acquire(&ssp->srcu_size_state);
|
||||||
if (ss_state < SRCU_SIZE_WAIT_CALL)
|
if (ss_state < SRCU_SIZE_WAIT_CALL)
|
||||||
sdp = per_cpu_ptr(ssp->sda, 0);
|
sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
|
||||||
else
|
else
|
||||||
sdp = raw_cpu_ptr(ssp->sda);
|
sdp = raw_cpu_ptr(ssp->sda);
|
||||||
spin_lock_irqsave_sdp_contention(sdp, &flags);
|
spin_lock_irqsave_sdp_contention(sdp, &flags);
|
||||||
@@ -1497,7 +1537,7 @@ void srcu_barrier(struct srcu_struct *ssp)
|
|||||||
|
|
||||||
idx = __srcu_read_lock_nmisafe(ssp);
|
idx = __srcu_read_lock_nmisafe(ssp);
|
||||||
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
|
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
|
||||||
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, 0));
|
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, get_boot_cpu_id()));
|
||||||
else
|
else
|
||||||
for_each_possible_cpu(cpu)
|
for_each_possible_cpu(cpu)
|
||||||
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, cpu));
|
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, cpu));
|
||||||
|
@@ -384,6 +384,7 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
|
|||||||
{
|
{
|
||||||
int cpu;
|
int cpu;
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
|
bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq);
|
||||||
long n;
|
long n;
|
||||||
long ncbs = 0;
|
long ncbs = 0;
|
||||||
long ncbsnz = 0;
|
long ncbsnz = 0;
|
||||||
@@ -425,21 +426,23 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
|
|||||||
WRITE_ONCE(rtp->percpu_enqueue_shift, order_base_2(nr_cpu_ids));
|
WRITE_ONCE(rtp->percpu_enqueue_shift, order_base_2(nr_cpu_ids));
|
||||||
smp_store_release(&rtp->percpu_enqueue_lim, 1);
|
smp_store_release(&rtp->percpu_enqueue_lim, 1);
|
||||||
rtp->percpu_dequeue_gpseq = get_state_synchronize_rcu();
|
rtp->percpu_dequeue_gpseq = get_state_synchronize_rcu();
|
||||||
|
gpdone = false;
|
||||||
pr_info("Starting switch %s to CPU-0 callback queuing.\n", rtp->name);
|
pr_info("Starting switch %s to CPU-0 callback queuing.\n", rtp->name);
|
||||||
}
|
}
|
||||||
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
||||||
}
|
}
|
||||||
if (rcu_task_cb_adjust && !ncbsnz &&
|
if (rcu_task_cb_adjust && !ncbsnz && gpdone) {
|
||||||
poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq)) {
|
|
||||||
raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
|
raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags);
|
||||||
if (rtp->percpu_enqueue_lim < rtp->percpu_dequeue_lim) {
|
if (rtp->percpu_enqueue_lim < rtp->percpu_dequeue_lim) {
|
||||||
WRITE_ONCE(rtp->percpu_dequeue_lim, 1);
|
WRITE_ONCE(rtp->percpu_dequeue_lim, 1);
|
||||||
pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name);
|
pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name);
|
||||||
}
|
}
|
||||||
for (cpu = rtp->percpu_dequeue_lim; cpu < nr_cpu_ids; cpu++) {
|
if (rtp->percpu_dequeue_lim == 1) {
|
||||||
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
|
for (cpu = rtp->percpu_dequeue_lim; cpu < nr_cpu_ids; cpu++) {
|
||||||
|
struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
|
||||||
|
|
||||||
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rtpcp->cblist));
|
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rtpcp->cblist));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
|
||||||
}
|
}
|
||||||
@@ -560,8 +563,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
|
|||||||
static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
|
static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
|
||||||
{
|
{
|
||||||
/* Complain if the scheduler has not started. */
|
/* Complain if the scheduler has not started. */
|
||||||
WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
|
if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
|
||||||
"synchronize_rcu_tasks called too soon");
|
"synchronize_%s() called too soon", rtp->name))
|
||||||
|
return;
|
||||||
|
|
||||||
// If the grace-period kthread is running, use it.
|
// If the grace-period kthread is running, use it.
|
||||||
if (READ_ONCE(rtp->kthread_ptr)) {
|
if (READ_ONCE(rtp->kthread_ptr)) {
|
||||||
@@ -827,11 +831,21 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
|
|||||||
static void rcu_tasks_postscan(struct list_head *hop)
|
static void rcu_tasks_postscan(struct list_head *hop)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Wait for tasks that are in the process of exiting. This
|
* Exiting tasks may escape the tasklist scan. Those are vulnerable
|
||||||
* does only part of the job, ensuring that all tasks that were
|
* until their final schedule() with TASK_DEAD state. To cope with
|
||||||
* previously exiting reach the point where they have disabled
|
* this, divide the fragile exit path part in two intersecting
|
||||||
* preemption, allowing the later synchronize_rcu() to finish
|
* read side critical sections:
|
||||||
* the job.
|
*
|
||||||
|
* 1) An _SRCU_ read side starting before calling exit_notify(),
|
||||||
|
* which may remove the task from the tasklist, and ending after
|
||||||
|
* the final preempt_disable() call in do_exit().
|
||||||
|
*
|
||||||
|
* 2) An _RCU_ read side starting with the final preempt_disable()
|
||||||
|
* call in do_exit() and ending with the final call to schedule()
|
||||||
|
* with TASK_DEAD state.
|
||||||
|
*
|
||||||
|
* This handles the part 1). And postgp will handle part 2) with a
|
||||||
|
* call to synchronize_rcu().
|
||||||
*/
|
*/
|
||||||
synchronize_srcu(&tasks_rcu_exit_srcu);
|
synchronize_srcu(&tasks_rcu_exit_srcu);
|
||||||
}
|
}
|
||||||
@@ -898,7 +912,10 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
|
|||||||
*
|
*
|
||||||
* In addition, this synchronize_rcu() waits for exiting tasks
|
* In addition, this synchronize_rcu() waits for exiting tasks
|
||||||
* to complete their final preempt_disable() region of execution,
|
* to complete their final preempt_disable() region of execution,
|
||||||
* cleaning up after the synchronize_srcu() above.
|
* cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu),
|
||||||
|
* enforcing the whole region before tasklist removal until
|
||||||
|
* the final schedule() with TASK_DEAD state to be an RCU TASKS
|
||||||
|
* read side critical section.
|
||||||
*/
|
*/
|
||||||
synchronize_rcu();
|
synchronize_rcu();
|
||||||
}
|
}
|
||||||
@@ -988,27 +1005,42 @@ void show_rcu_tasks_classic_gp_kthread(void)
|
|||||||
EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread);
|
EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread);
|
||||||
#endif // !defined(CONFIG_TINY_RCU)
|
#endif // !defined(CONFIG_TINY_RCU)
|
||||||
|
|
||||||
/* Do the srcu_read_lock() for the above synchronize_srcu(). */
|
/*
|
||||||
|
* Contribute to protect against tasklist scan blind spot while the
|
||||||
|
* task is exiting and may be removed from the tasklist. See
|
||||||
|
* corresponding synchronize_srcu() for further details.
|
||||||
|
*/
|
||||||
void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
|
void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
|
||||||
{
|
{
|
||||||
preempt_disable();
|
|
||||||
current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
|
current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
|
||||||
preempt_enable();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Do the srcu_read_unlock() for the above synchronize_srcu(). */
|
/*
|
||||||
void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
|
* Contribute to protect against tasklist scan blind spot while the
|
||||||
|
* task is exiting and may be removed from the tasklist. See
|
||||||
|
* corresponding synchronize_srcu() for further details.
|
||||||
|
*/
|
||||||
|
void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
|
||||||
{
|
{
|
||||||
struct task_struct *t = current;
|
struct task_struct *t = current;
|
||||||
|
|
||||||
preempt_disable();
|
|
||||||
__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
|
__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
|
||||||
preempt_enable();
|
}
|
||||||
exit_tasks_rcu_finish_trace(t);
|
|
||||||
|
/*
|
||||||
|
* Contribute to protect against tasklist scan blind spot while the
|
||||||
|
* task is exiting and may be removed from the tasklist. See
|
||||||
|
* corresponding synchronize_srcu() for further details.
|
||||||
|
*/
|
||||||
|
void exit_tasks_rcu_finish(void)
|
||||||
|
{
|
||||||
|
exit_tasks_rcu_stop();
|
||||||
|
exit_tasks_rcu_finish_trace(current);
|
||||||
}
|
}
|
||||||
|
|
||||||
#else /* #ifdef CONFIG_TASKS_RCU */
|
#else /* #ifdef CONFIG_TASKS_RCU */
|
||||||
void exit_tasks_rcu_start(void) { }
|
void exit_tasks_rcu_start(void) { }
|
||||||
|
void exit_tasks_rcu_stop(void) { }
|
||||||
void exit_tasks_rcu_finish(void) { exit_tasks_rcu_finish_trace(current); }
|
void exit_tasks_rcu_finish(void) { exit_tasks_rcu_finish_trace(current); }
|
||||||
#endif /* #else #ifdef CONFIG_TASKS_RCU */
|
#endif /* #else #ifdef CONFIG_TASKS_RCU */
|
||||||
|
|
||||||
@@ -1036,9 +1068,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
|
|||||||
// Wait for one rude RCU-tasks grace period.
|
// Wait for one rude RCU-tasks grace period.
|
||||||
static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
|
static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
|
||||||
{
|
{
|
||||||
if (num_online_cpus() <= 1)
|
|
||||||
return; // Fastpath for only one CPU.
|
|
||||||
|
|
||||||
rtp->n_ipis += cpumask_weight(cpu_online_mask);
|
rtp->n_ipis += cpumask_weight(cpu_online_mask);
|
||||||
schedule_on_each_cpu(rcu_tasks_be_rude);
|
schedule_on_each_cpu(rcu_tasks_be_rude);
|
||||||
}
|
}
|
||||||
@@ -1815,23 +1844,21 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp)
|
|||||||
|
|
||||||
static void rcu_tasks_initiate_self_tests(void)
|
static void rcu_tasks_initiate_self_tests(void)
|
||||||
{
|
{
|
||||||
unsigned long j = jiffies;
|
|
||||||
|
|
||||||
pr_info("Running RCU-tasks wait API self tests\n");
|
pr_info("Running RCU-tasks wait API self tests\n");
|
||||||
#ifdef CONFIG_TASKS_RCU
|
#ifdef CONFIG_TASKS_RCU
|
||||||
tests[0].runstart = j;
|
tests[0].runstart = jiffies;
|
||||||
synchronize_rcu_tasks();
|
synchronize_rcu_tasks();
|
||||||
call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback);
|
call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef CONFIG_TASKS_RUDE_RCU
|
#ifdef CONFIG_TASKS_RUDE_RCU
|
||||||
tests[1].runstart = j;
|
tests[1].runstart = jiffies;
|
||||||
synchronize_rcu_tasks_rude();
|
synchronize_rcu_tasks_rude();
|
||||||
call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback);
|
call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef CONFIG_TASKS_TRACE_RCU
|
#ifdef CONFIG_TASKS_TRACE_RCU
|
||||||
tests[2].runstart = j;
|
tests[2].runstart = jiffies;
|
||||||
synchronize_rcu_tasks_trace();
|
synchronize_rcu_tasks_trace();
|
||||||
call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);
|
call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);
|
||||||
#endif
|
#endif
|
||||||
|
@@ -246,15 +246,12 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
|
|||||||
EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
|
EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
|
||||||
|
|
||||||
#ifdef CONFIG_KASAN_GENERIC
|
#ifdef CONFIG_KASAN_GENERIC
|
||||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
void kvfree_call_rcu(struct rcu_head *head, void *ptr)
|
||||||
{
|
{
|
||||||
if (head) {
|
if (head)
|
||||||
void *ptr = (void *) head - (unsigned long) func;
|
|
||||||
|
|
||||||
kasan_record_aux_stack_noalloc(ptr);
|
kasan_record_aux_stack_noalloc(ptr);
|
||||||
}
|
|
||||||
|
|
||||||
__kvfree_call_rcu(head, func);
|
__kvfree_call_rcu(head, ptr);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
|
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
|
||||||
#endif
|
#endif
|
||||||
|
@@ -144,14 +144,16 @@ static int rcu_scheduler_fully_active __read_mostly;
|
|||||||
|
|
||||||
static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
|
static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
|
||||||
unsigned long gps, unsigned long flags);
|
unsigned long gps, unsigned long flags);
|
||||||
static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
|
|
||||||
static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
|
|
||||||
static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
|
static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
|
||||||
static void invoke_rcu_core(void);
|
static void invoke_rcu_core(void);
|
||||||
static void rcu_report_exp_rdp(struct rcu_data *rdp);
|
static void rcu_report_exp_rdp(struct rcu_data *rdp);
|
||||||
static void sync_sched_exp_online_cleanup(int cpu);
|
static void sync_sched_exp_online_cleanup(int cpu);
|
||||||
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
|
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
|
||||||
static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
|
static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
|
||||||
|
static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
|
||||||
|
static bool rcu_init_invoked(void);
|
||||||
|
static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
|
||||||
|
static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
|
* rcuc/rcub/rcuop kthread realtime priority. The "rcuop"
|
||||||
@@ -214,27 +216,6 @@ EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
|
|||||||
*/
|
*/
|
||||||
#define PER_RCU_NODE_PERIOD 3 /* Number of grace periods between delays for debugging. */
|
#define PER_RCU_NODE_PERIOD 3 /* Number of grace periods between delays for debugging. */
|
||||||
|
|
||||||
/*
|
|
||||||
* Compute the mask of online CPUs for the specified rcu_node structure.
|
|
||||||
* This will not be stable unless the rcu_node structure's ->lock is
|
|
||||||
* held, but the bit corresponding to the current CPU will be stable
|
|
||||||
* in most contexts.
|
|
||||||
*/
|
|
||||||
static unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
|
|
||||||
{
|
|
||||||
return READ_ONCE(rnp->qsmaskinitnext);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Is the CPU corresponding to the specified rcu_data structure online
|
|
||||||
* from RCU's perspective? This perspective is given by that structure's
|
|
||||||
* ->qsmaskinitnext field rather than by the global cpu_online_mask.
|
|
||||||
*/
|
|
||||||
static bool rcu_rdp_cpu_online(struct rcu_data *rdp)
|
|
||||||
{
|
|
||||||
return !!(rdp->grpmask & rcu_rnp_online_cpus(rdp->mynode));
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Return true if an RCU grace period is in progress. The READ_ONCE()s
|
* Return true if an RCU grace period is in progress. The READ_ONCE()s
|
||||||
* permit this function to be invoked without holding the root rcu_node
|
* permit this function to be invoked without holding the root rcu_node
|
||||||
@@ -734,46 +715,6 @@ void rcu_request_urgent_qs_task(struct task_struct *t)
|
|||||||
smp_store_release(per_cpu_ptr(&rcu_data.rcu_urgent_qs, cpu), true);
|
smp_store_release(per_cpu_ptr(&rcu_data.rcu_urgent_qs, cpu), true);
|
||||||
}
|
}
|
||||||
|
|
||||||
#if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Is the current CPU online as far as RCU is concerned?
|
|
||||||
*
|
|
||||||
* Disable preemption to avoid false positives that could otherwise
|
|
||||||
* happen due to the current CPU number being sampled, this task being
|
|
||||||
* preempted, its old CPU being taken offline, resuming on some other CPU,
|
|
||||||
* then determining that its old CPU is now offline.
|
|
||||||
*
|
|
||||||
* Disable checking if in an NMI handler because we cannot safely
|
|
||||||
* report errors from NMI handlers anyway. In addition, it is OK to use
|
|
||||||
* RCU on an offline processor during initial boot, hence the check for
|
|
||||||
* rcu_scheduler_fully_active.
|
|
||||||
*/
|
|
||||||
bool rcu_lockdep_current_cpu_online(void)
|
|
||||||
{
|
|
||||||
struct rcu_data *rdp;
|
|
||||||
bool ret = false;
|
|
||||||
|
|
||||||
if (in_nmi() || !rcu_scheduler_fully_active)
|
|
||||||
return true;
|
|
||||||
preempt_disable_notrace();
|
|
||||||
rdp = this_cpu_ptr(&rcu_data);
|
|
||||||
/*
|
|
||||||
* Strictly, we care here about the case where the current CPU is
|
|
||||||
* in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
|
|
||||||
* not being up to date. So arch_spin_is_locked() might have a
|
|
||||||
* false positive if it's held by some *other* CPU, but that's
|
|
||||||
* OK because that just means a false *negative* on the warning.
|
|
||||||
*/
|
|
||||||
if (rcu_rdp_cpu_online(rdp) || arch_spin_is_locked(&rcu_state.ofl_lock))
|
|
||||||
ret = true;
|
|
||||||
preempt_enable_notrace();
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
|
|
||||||
|
|
||||||
#endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* When trying to report a quiescent state on behalf of some other CPU,
|
* When trying to report a quiescent state on behalf of some other CPU,
|
||||||
* it is our responsibility to check for and handle potential overflow
|
* it is our responsibility to check for and handle potential overflow
|
||||||
@@ -925,6 +866,24 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
|
|||||||
rdp->rcu_iw_gp_seq = rnp->gp_seq;
|
rdp->rcu_iw_gp_seq = rnp->gp_seq;
|
||||||
irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
|
irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (rcu_cpu_stall_cputime && rdp->snap_record.gp_seq != rdp->gp_seq) {
|
||||||
|
int cpu = rdp->cpu;
|
||||||
|
struct rcu_snap_record *rsrp;
|
||||||
|
struct kernel_cpustat *kcsp;
|
||||||
|
|
||||||
|
kcsp = &kcpustat_cpu(cpu);
|
||||||
|
|
||||||
|
rsrp = &rdp->snap_record;
|
||||||
|
rsrp->cputime_irq = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
|
||||||
|
rsrp->cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
|
||||||
|
rsrp->cputime_system = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
|
||||||
|
rsrp->nr_hardirqs = kstat_cpu_irqs_sum(rdp->cpu);
|
||||||
|
rsrp->nr_softirqs = kstat_cpu_softirqs_sum(rdp->cpu);
|
||||||
|
rsrp->nr_csw = nr_context_switches_cpu(rdp->cpu);
|
||||||
|
rsrp->jiffies = jiffies;
|
||||||
|
rsrp->gp_seq = rdp->gp_seq;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
@@ -1350,13 +1309,6 @@ static void rcu_strict_gp_boundary(void *unused)
|
|||||||
invoke_rcu_core();
|
invoke_rcu_core();
|
||||||
}
|
}
|
||||||
|
|
||||||
// Has rcu_init() been invoked? This is used (for example) to determine
|
|
||||||
// whether spinlocks may be acquired safely.
|
|
||||||
static bool rcu_init_invoked(void)
|
|
||||||
{
|
|
||||||
return !!rcu_state.n_online_cpus;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Make the polled API aware of the beginning of a grace period.
|
// Make the polled API aware of the beginning of a grace period.
|
||||||
static void rcu_poll_gp_seq_start(unsigned long *snap)
|
static void rcu_poll_gp_seq_start(unsigned long *snap)
|
||||||
{
|
{
|
||||||
@@ -2091,92 +2043,6 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
|
|||||||
rcu_report_qs_rdp(rdp);
|
rcu_report_qs_rdp(rdp);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* Near the end of the offline process. Trace the fact that this CPU
|
|
||||||
* is going offline.
|
|
||||||
*/
|
|
||||||
int rcutree_dying_cpu(unsigned int cpu)
|
|
||||||
{
|
|
||||||
bool blkd;
|
|
||||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
|
||||||
struct rcu_node *rnp = rdp->mynode;
|
|
||||||
|
|
||||||
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
|
|
||||||
return 0;
|
|
||||||
|
|
||||||
blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
|
|
||||||
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
|
|
||||||
blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
|
||||||
* All CPUs for the specified rcu_node structure have gone offline,
|
|
||||||
* and all tasks that were preempted within an RCU read-side critical
|
|
||||||
* section while running on one of those CPUs have since exited their RCU
|
|
||||||
* read-side critical section. Some other CPU is reporting this fact with
|
|
||||||
* the specified rcu_node structure's ->lock held and interrupts disabled.
|
|
||||||
* This function therefore goes up the tree of rcu_node structures,
|
|
||||||
* clearing the corresponding bits in the ->qsmaskinit fields. Note that
|
|
||||||
* the leaf rcu_node structure's ->qsmaskinit field has already been
|
|
||||||
* updated.
|
|
||||||
*
|
|
||||||
* This function does check that the specified rcu_node structure has
|
|
||||||
* all CPUs offline and no blocked tasks, so it is OK to invoke it
|
|
||||||
* prematurely. That said, invoking it after the fact will cost you
|
|
||||||
* a needless lock acquisition. So once it has done its work, don't
|
|
||||||
* invoke it again.
|
|
||||||
*/
|
|
||||||
static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
|
|
||||||
{
|
|
||||||
long mask;
|
|
||||||
struct rcu_node *rnp = rnp_leaf;
|
|
||||||
|
|
||||||
raw_lockdep_assert_held_rcu_node(rnp_leaf);
|
|
||||||
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) ||
|
|
||||||
WARN_ON_ONCE(rnp_leaf->qsmaskinit) ||
|
|
||||||
WARN_ON_ONCE(rcu_preempt_has_tasks(rnp_leaf)))
|
|
||||||
return;
|
|
||||||
for (;;) {
|
|
||||||
mask = rnp->grpmask;
|
|
||||||
rnp = rnp->parent;
|
|
||||||
if (!rnp)
|
|
||||||
break;
|
|
||||||
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
|
|
||||||
rnp->qsmaskinit &= ~mask;
|
|
||||||
/* Between grace periods, so better already be zero! */
|
|
||||||
WARN_ON_ONCE(rnp->qsmask);
|
|
||||||
if (rnp->qsmaskinit) {
|
|
||||||
raw_spin_unlock_rcu_node(rnp);
|
|
||||||
/* irqs remain disabled. */
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
|
||||||
* The CPU has been completely removed, and some other CPU is reporting
|
|
||||||
* this fact from process context. Do the remainder of the cleanup.
|
|
||||||
* There can only be one CPU hotplug operation at a time, so no need for
|
|
||||||
* explicit locking.
|
|
||||||
*/
|
|
||||||
int rcutree_dead_cpu(unsigned int cpu)
|
|
||||||
{
|
|
||||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
|
||||||
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
|
|
||||||
|
|
||||||
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
|
|
||||||
return 0;
|
|
||||||
|
|
||||||
WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
|
|
||||||
/* Adjust any no-longer-needed kthreads. */
|
|
||||||
rcu_boost_kthread_setaffinity(rnp, -1);
|
|
||||||
// Stop-machine done, so allow nohz_full to disable tick.
|
|
||||||
tick_dep_clear(TICK_DEP_BIT_RCU);
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Invoke any RCU callbacks that have made it to the end of their grace
|
* Invoke any RCU callbacks that have made it to the end of their grace
|
||||||
* period. Throttle as specified by rdp->blimit.
|
* period. Throttle as specified by rdp->blimit.
|
||||||
@@ -2209,7 +2075,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
|||||||
*/
|
*/
|
||||||
rcu_nocb_lock_irqsave(rdp, flags);
|
rcu_nocb_lock_irqsave(rdp, flags);
|
||||||
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
|
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
|
||||||
pending = rcu_segcblist_n_cbs(&rdp->cblist);
|
pending = rcu_segcblist_get_seglen(&rdp->cblist, RCU_DONE_TAIL);
|
||||||
div = READ_ONCE(rcu_divisor);
|
div = READ_ONCE(rcu_divisor);
|
||||||
div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
|
div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
|
||||||
bl = max(rdp->blimit, pending >> div);
|
bl = max(rdp->blimit, pending >> div);
|
||||||
@@ -2727,10 +2593,11 @@ static void check_cb_ovld(struct rcu_data *rdp)
|
|||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
|
__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
|
||||||
{
|
{
|
||||||
static atomic_t doublefrees;
|
static atomic_t doublefrees;
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
|
bool lazy;
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
bool was_alldone;
|
bool was_alldone;
|
||||||
|
|
||||||
@@ -2755,6 +2622,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
|
|||||||
kasan_record_aux_stack_noalloc(head);
|
kasan_record_aux_stack_noalloc(head);
|
||||||
local_irq_save(flags);
|
local_irq_save(flags);
|
||||||
rdp = this_cpu_ptr(&rcu_data);
|
rdp = this_cpu_ptr(&rcu_data);
|
||||||
|
lazy = lazy_in && !rcu_async_should_hurry();
|
||||||
|
|
||||||
/* Add the callback to our list. */
|
/* Add the callback to our list. */
|
||||||
if (unlikely(!rcu_segcblist_is_enabled(&rdp->cblist))) {
|
if (unlikely(!rcu_segcblist_is_enabled(&rdp->cblist))) {
|
||||||
@@ -2876,13 +2744,15 @@ EXPORT_SYMBOL_GPL(call_rcu);
|
|||||||
|
|
||||||
/**
|
/**
|
||||||
* struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
|
* struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
|
||||||
|
* @list: List node. All blocks are linked between each other
|
||||||
|
* @gp_snap: Snapshot of RCU state for objects placed to this bulk
|
||||||
* @nr_records: Number of active pointers in the array
|
* @nr_records: Number of active pointers in the array
|
||||||
* @next: Next bulk object in the block chain
|
|
||||||
* @records: Array of the kvfree_rcu() pointers
|
* @records: Array of the kvfree_rcu() pointers
|
||||||
*/
|
*/
|
||||||
struct kvfree_rcu_bulk_data {
|
struct kvfree_rcu_bulk_data {
|
||||||
|
struct list_head list;
|
||||||
|
unsigned long gp_snap;
|
||||||
unsigned long nr_records;
|
unsigned long nr_records;
|
||||||
struct kvfree_rcu_bulk_data *next;
|
|
||||||
void *records[];
|
void *records[];
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -2898,26 +2768,28 @@ struct kvfree_rcu_bulk_data {
|
|||||||
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
|
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
|
||||||
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
|
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
|
||||||
* @head_free: List of kfree_rcu() objects waiting for a grace period
|
* @head_free: List of kfree_rcu() objects waiting for a grace period
|
||||||
* @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
|
* @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
|
||||||
* @krcp: Pointer to @kfree_rcu_cpu structure
|
* @krcp: Pointer to @kfree_rcu_cpu structure
|
||||||
*/
|
*/
|
||||||
|
|
||||||
struct kfree_rcu_cpu_work {
|
struct kfree_rcu_cpu_work {
|
||||||
struct rcu_work rcu_work;
|
struct rcu_work rcu_work;
|
||||||
struct rcu_head *head_free;
|
struct rcu_head *head_free;
|
||||||
struct kvfree_rcu_bulk_data *bkvhead_free[FREE_N_CHANNELS];
|
struct list_head bulk_head_free[FREE_N_CHANNELS];
|
||||||
struct kfree_rcu_cpu *krcp;
|
struct kfree_rcu_cpu *krcp;
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
|
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
|
||||||
* @head: List of kfree_rcu() objects not yet waiting for a grace period
|
* @head: List of kfree_rcu() objects not yet waiting for a grace period
|
||||||
* @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
|
* @head_gp_snap: Snapshot of RCU state for objects placed to "@head"
|
||||||
|
* @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
|
||||||
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
||||||
* @lock: Synchronize access to this structure
|
* @lock: Synchronize access to this structure
|
||||||
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
||||||
* @initialized: The @rcu_work fields have been initialized
|
* @initialized: The @rcu_work fields have been initialized
|
||||||
* @count: Number of objects for which GP not started
|
* @head_count: Number of objects in rcu_head singular list
|
||||||
|
* @bulk_count: Number of objects in bulk-list
|
||||||
* @bkvcache:
|
* @bkvcache:
|
||||||
* A simple cache list that contains objects for reuse purpose.
|
* A simple cache list that contains objects for reuse purpose.
|
||||||
* In order to save some per-cpu space the list is singular.
|
* In order to save some per-cpu space the list is singular.
|
||||||
@@ -2935,13 +2807,20 @@ struct kfree_rcu_cpu_work {
|
|||||||
* the interactions with the slab allocators.
|
* the interactions with the slab allocators.
|
||||||
*/
|
*/
|
||||||
struct kfree_rcu_cpu {
|
struct kfree_rcu_cpu {
|
||||||
|
// Objects queued on a linked list
|
||||||
|
// through their rcu_head structures.
|
||||||
struct rcu_head *head;
|
struct rcu_head *head;
|
||||||
struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS];
|
unsigned long head_gp_snap;
|
||||||
|
atomic_t head_count;
|
||||||
|
|
||||||
|
// Objects queued on a bulk-list.
|
||||||
|
struct list_head bulk_head[FREE_N_CHANNELS];
|
||||||
|
atomic_t bulk_count[FREE_N_CHANNELS];
|
||||||
|
|
||||||
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
||||||
raw_spinlock_t lock;
|
raw_spinlock_t lock;
|
||||||
struct delayed_work monitor_work;
|
struct delayed_work monitor_work;
|
||||||
bool initialized;
|
bool initialized;
|
||||||
int count;
|
|
||||||
|
|
||||||
struct delayed_work page_cache_work;
|
struct delayed_work page_cache_work;
|
||||||
atomic_t backoff_page_cache_fill;
|
atomic_t backoff_page_cache_fill;
|
||||||
@@ -3029,82 +2908,51 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
|
|||||||
return freed;
|
return freed;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
static void
|
||||||
* This function is invoked in workqueue context after a grace period.
|
kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
|
||||||
* It frees all the objects queued on ->bkvhead_free or ->head_free.
|
struct kvfree_rcu_bulk_data *bnode, int idx)
|
||||||
*/
|
|
||||||
static void kfree_rcu_work(struct work_struct *work)
|
|
||||||
{
|
{
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS], *bnext;
|
int i;
|
||||||
struct rcu_head *head, *next;
|
|
||||||
struct kfree_rcu_cpu *krcp;
|
|
||||||
struct kfree_rcu_cpu_work *krwp;
|
|
||||||
int i, j;
|
|
||||||
|
|
||||||
krwp = container_of(to_rcu_work(work),
|
debug_rcu_bhead_unqueue(bnode);
|
||||||
struct kfree_rcu_cpu_work, rcu_work);
|
|
||||||
krcp = krwp->krcp;
|
|
||||||
|
|
||||||
raw_spin_lock_irqsave(&krcp->lock, flags);
|
rcu_lock_acquire(&rcu_callback_map);
|
||||||
// Channels 1 and 2.
|
if (idx == 0) { // kmalloc() / kfree().
|
||||||
for (i = 0; i < FREE_N_CHANNELS; i++) {
|
trace_rcu_invoke_kfree_bulk_callback(
|
||||||
bkvhead[i] = krwp->bkvhead_free[i];
|
rcu_state.name, bnode->nr_records,
|
||||||
krwp->bkvhead_free[i] = NULL;
|
bnode->records);
|
||||||
}
|
|
||||||
|
|
||||||
// Channel 3.
|
kfree_bulk(bnode->nr_records, bnode->records);
|
||||||
head = krwp->head_free;
|
} else { // vmalloc() / vfree().
|
||||||
krwp->head_free = NULL;
|
for (i = 0; i < bnode->nr_records; i++) {
|
||||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
trace_rcu_invoke_kvfree_callback(
|
||||||
|
rcu_state.name, bnode->records[i], 0);
|
||||||
|
|
||||||
// Handle the first two channels.
|
vfree(bnode->records[i]);
|
||||||
for (i = 0; i < FREE_N_CHANNELS; i++) {
|
|
||||||
for (; bkvhead[i]; bkvhead[i] = bnext) {
|
|
||||||
bnext = bkvhead[i]->next;
|
|
||||||
debug_rcu_bhead_unqueue(bkvhead[i]);
|
|
||||||
|
|
||||||
rcu_lock_acquire(&rcu_callback_map);
|
|
||||||
if (i == 0) { // kmalloc() / kfree().
|
|
||||||
trace_rcu_invoke_kfree_bulk_callback(
|
|
||||||
rcu_state.name, bkvhead[i]->nr_records,
|
|
||||||
bkvhead[i]->records);
|
|
||||||
|
|
||||||
kfree_bulk(bkvhead[i]->nr_records,
|
|
||||||
bkvhead[i]->records);
|
|
||||||
} else { // vmalloc() / vfree().
|
|
||||||
for (j = 0; j < bkvhead[i]->nr_records; j++) {
|
|
||||||
trace_rcu_invoke_kvfree_callback(
|
|
||||||
rcu_state.name,
|
|
||||||
bkvhead[i]->records[j], 0);
|
|
||||||
|
|
||||||
vfree(bkvhead[i]->records[j]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
rcu_lock_release(&rcu_callback_map);
|
|
||||||
|
|
||||||
raw_spin_lock_irqsave(&krcp->lock, flags);
|
|
||||||
if (put_cached_bnode(krcp, bkvhead[i]))
|
|
||||||
bkvhead[i] = NULL;
|
|
||||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
|
||||||
|
|
||||||
if (bkvhead[i])
|
|
||||||
free_page((unsigned long) bkvhead[i]);
|
|
||||||
|
|
||||||
cond_resched_tasks_rcu_qs();
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
rcu_lock_release(&rcu_callback_map);
|
||||||
|
|
||||||
|
raw_spin_lock_irqsave(&krcp->lock, flags);
|
||||||
|
if (put_cached_bnode(krcp, bnode))
|
||||||
|
bnode = NULL;
|
||||||
|
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
|
|
||||||
|
if (bnode)
|
||||||
|
free_page((unsigned long) bnode);
|
||||||
|
|
||||||
|
cond_resched_tasks_rcu_qs();
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
kvfree_rcu_list(struct rcu_head *head)
|
||||||
|
{
|
||||||
|
struct rcu_head *next;
|
||||||
|
|
||||||
/*
|
|
||||||
* This is used when the "bulk" path can not be used for the
|
|
||||||
* double-argument of kvfree_rcu(). This happens when the
|
|
||||||
* page-cache is empty, which means that objects are instead
|
|
||||||
* queued on a linked list through their rcu_head structures.
|
|
||||||
* This list is named "Channel 3".
|
|
||||||
*/
|
|
||||||
for (; head; head = next) {
|
for (; head; head = next) {
|
||||||
unsigned long offset = (unsigned long)head->func;
|
void *ptr = (void *) head->func;
|
||||||
void *ptr = (void *)head - offset;
|
unsigned long offset = (void *) head - ptr;
|
||||||
|
|
||||||
next = head->next;
|
next = head->next;
|
||||||
debug_rcu_head_unqueue((struct rcu_head *)ptr);
|
debug_rcu_head_unqueue((struct rcu_head *)ptr);
|
||||||
@@ -3119,16 +2967,72 @@ static void kfree_rcu_work(struct work_struct *work)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This function is invoked in workqueue context after a grace period.
|
||||||
|
* It frees all the objects queued on ->bulk_head_free or ->head_free.
|
||||||
|
*/
|
||||||
|
static void kfree_rcu_work(struct work_struct *work)
|
||||||
|
{
|
||||||
|
unsigned long flags;
|
||||||
|
struct kvfree_rcu_bulk_data *bnode, *n;
|
||||||
|
struct list_head bulk_head[FREE_N_CHANNELS];
|
||||||
|
struct rcu_head *head;
|
||||||
|
struct kfree_rcu_cpu *krcp;
|
||||||
|
struct kfree_rcu_cpu_work *krwp;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
krwp = container_of(to_rcu_work(work),
|
||||||
|
struct kfree_rcu_cpu_work, rcu_work);
|
||||||
|
krcp = krwp->krcp;
|
||||||
|
|
||||||
|
raw_spin_lock_irqsave(&krcp->lock, flags);
|
||||||
|
// Channels 1 and 2.
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++)
|
||||||
|
list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]);
|
||||||
|
|
||||||
|
// Channel 3.
|
||||||
|
head = krwp->head_free;
|
||||||
|
krwp->head_free = NULL;
|
||||||
|
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
|
|
||||||
|
// Handle the first two channels.
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++) {
|
||||||
|
// Start from the tail page, so a GP is likely passed for it.
|
||||||
|
list_for_each_entry_safe(bnode, n, &bulk_head[i], list)
|
||||||
|
kvfree_rcu_bulk(krcp, bnode, i);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This is used when the "bulk" path can not be used for the
|
||||||
|
* double-argument of kvfree_rcu(). This happens when the
|
||||||
|
* page-cache is empty, which means that objects are instead
|
||||||
|
* queued on a linked list through their rcu_head structures.
|
||||||
|
* This list is named "Channel 3".
|
||||||
|
*/
|
||||||
|
kvfree_rcu_list(head);
|
||||||
|
}
|
||||||
|
|
||||||
static bool
|
static bool
|
||||||
need_offload_krc(struct kfree_rcu_cpu *krcp)
|
need_offload_krc(struct kfree_rcu_cpu *krcp)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
for (i = 0; i < FREE_N_CHANNELS; i++)
|
for (i = 0; i < FREE_N_CHANNELS; i++)
|
||||||
if (krcp->bkvhead[i])
|
if (!list_empty(&krcp->bulk_head[i]))
|
||||||
return true;
|
return true;
|
||||||
|
|
||||||
return !!krcp->head;
|
return !!READ_ONCE(krcp->head);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int krc_count(struct kfree_rcu_cpu *krcp)
|
||||||
|
{
|
||||||
|
int sum = atomic_read(&krcp->head_count);
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++)
|
||||||
|
sum += atomic_read(&krcp->bulk_count[i]);
|
||||||
|
|
||||||
|
return sum;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
@@ -3136,7 +3040,7 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
|
|||||||
{
|
{
|
||||||
long delay, delay_left;
|
long delay, delay_left;
|
||||||
|
|
||||||
delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES;
|
delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES;
|
||||||
if (delayed_work_pending(&krcp->monitor_work)) {
|
if (delayed_work_pending(&krcp->monitor_work)) {
|
||||||
delay_left = krcp->monitor_work.timer.expires - jiffies;
|
delay_left = krcp->monitor_work.timer.expires - jiffies;
|
||||||
if (delay < delay_left)
|
if (delay < delay_left)
|
||||||
@@ -3146,6 +3050,44 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
|
|||||||
queue_delayed_work(system_wq, &krcp->monitor_work, delay);
|
queue_delayed_work(system_wq, &krcp->monitor_work, delay);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
|
||||||
|
{
|
||||||
|
struct list_head bulk_ready[FREE_N_CHANNELS];
|
||||||
|
struct kvfree_rcu_bulk_data *bnode, *n;
|
||||||
|
struct rcu_head *head_ready = NULL;
|
||||||
|
unsigned long flags;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
raw_spin_lock_irqsave(&krcp->lock, flags);
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++) {
|
||||||
|
INIT_LIST_HEAD(&bulk_ready[i]);
|
||||||
|
|
||||||
|
list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) {
|
||||||
|
if (!poll_state_synchronize_rcu(bnode->gp_snap))
|
||||||
|
break;
|
||||||
|
|
||||||
|
atomic_sub(bnode->nr_records, &krcp->bulk_count[i]);
|
||||||
|
list_move(&bnode->list, &bulk_ready[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) {
|
||||||
|
head_ready = krcp->head;
|
||||||
|
atomic_set(&krcp->head_count, 0);
|
||||||
|
WRITE_ONCE(krcp->head, NULL);
|
||||||
|
}
|
||||||
|
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
|
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++) {
|
||||||
|
list_for_each_entry_safe(bnode, n, &bulk_ready[i], list)
|
||||||
|
kvfree_rcu_bulk(krcp, bnode, i);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (head_ready)
|
||||||
|
kvfree_rcu_list(head_ready);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
|
* This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
|
||||||
*/
|
*/
|
||||||
@@ -3156,26 +3098,31 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
|||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
int i, j;
|
int i, j;
|
||||||
|
|
||||||
|
// Drain ready for reclaim.
|
||||||
|
kvfree_rcu_drain_ready(krcp);
|
||||||
|
|
||||||
raw_spin_lock_irqsave(&krcp->lock, flags);
|
raw_spin_lock_irqsave(&krcp->lock, flags);
|
||||||
|
|
||||||
// Attempt to start a new batch.
|
// Attempt to start a new batch.
|
||||||
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
||||||
struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
|
struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
|
||||||
|
|
||||||
// Try to detach bkvhead or head and attach it over any
|
// Try to detach bulk_head or head and attach it over any
|
||||||
// available corresponding free channel. It can be that
|
// available corresponding free channel. It can be that
|
||||||
// a previous RCU batch is in progress, it means that
|
// a previous RCU batch is in progress, it means that
|
||||||
// immediately to queue another one is not possible so
|
// immediately to queue another one is not possible so
|
||||||
// in that case the monitor work is rearmed.
|
// in that case the monitor work is rearmed.
|
||||||
if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
|
if ((!list_empty(&krcp->bulk_head[0]) && list_empty(&krwp->bulk_head_free[0])) ||
|
||||||
(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
|
(!list_empty(&krcp->bulk_head[1]) && list_empty(&krwp->bulk_head_free[1])) ||
|
||||||
(krcp->head && !krwp->head_free)) {
|
(READ_ONCE(krcp->head) && !krwp->head_free)) {
|
||||||
|
|
||||||
// Channel 1 corresponds to the SLAB-pointer bulk path.
|
// Channel 1 corresponds to the SLAB-pointer bulk path.
|
||||||
// Channel 2 corresponds to vmalloc-pointer bulk path.
|
// Channel 2 corresponds to vmalloc-pointer bulk path.
|
||||||
for (j = 0; j < FREE_N_CHANNELS; j++) {
|
for (j = 0; j < FREE_N_CHANNELS; j++) {
|
||||||
if (!krwp->bkvhead_free[j]) {
|
if (list_empty(&krwp->bulk_head_free[j])) {
|
||||||
krwp->bkvhead_free[j] = krcp->bkvhead[j];
|
atomic_set(&krcp->bulk_count[j], 0);
|
||||||
krcp->bkvhead[j] = NULL;
|
list_replace_init(&krcp->bulk_head[j],
|
||||||
|
&krwp->bulk_head_free[j]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3183,11 +3130,10 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
|||||||
// objects queued on the linked list.
|
// objects queued on the linked list.
|
||||||
if (!krwp->head_free) {
|
if (!krwp->head_free) {
|
||||||
krwp->head_free = krcp->head;
|
krwp->head_free = krcp->head;
|
||||||
krcp->head = NULL;
|
atomic_set(&krcp->head_count, 0);
|
||||||
|
WRITE_ONCE(krcp->head, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
WRITE_ONCE(krcp->count, 0);
|
|
||||||
|
|
||||||
// One work is per one batch, so there are three
|
// One work is per one batch, so there are three
|
||||||
// "free channels", the batch can handle. It can
|
// "free channels", the batch can handle. It can
|
||||||
// be that the work is in the pending state when
|
// be that the work is in the pending state when
|
||||||
@@ -3197,6 +3143,8 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
|
|
||||||
// If there is nothing to detach, it means that our job is
|
// If there is nothing to detach, it means that our job is
|
||||||
// successfully done here. In case of having at least one
|
// successfully done here. In case of having at least one
|
||||||
// of the channels that is still busy we should rearm the
|
// of the channels that is still busy we should rearm the
|
||||||
@@ -3204,8 +3152,6 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
|||||||
// still in progress.
|
// still in progress.
|
||||||
if (need_offload_krc(krcp))
|
if (need_offload_krc(krcp))
|
||||||
schedule_delayed_monitor_work(krcp);
|
schedule_delayed_monitor_work(krcp);
|
||||||
|
|
||||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static enum hrtimer_restart
|
static enum hrtimer_restart
|
||||||
@@ -3288,10 +3234,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
|
|||||||
return false;
|
return false;
|
||||||
|
|
||||||
idx = !!is_vmalloc_addr(ptr);
|
idx = !!is_vmalloc_addr(ptr);
|
||||||
|
bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx],
|
||||||
|
struct kvfree_rcu_bulk_data, list);
|
||||||
|
|
||||||
/* Check if a new block is required. */
|
/* Check if a new block is required. */
|
||||||
if (!(*krcp)->bkvhead[idx] ||
|
if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) {
|
||||||
(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
|
|
||||||
bnode = get_cached_bnode(*krcp);
|
bnode = get_cached_bnode(*krcp);
|
||||||
if (!bnode && can_alloc) {
|
if (!bnode && can_alloc) {
|
||||||
krc_this_cpu_unlock(*krcp, *flags);
|
krc_this_cpu_unlock(*krcp, *flags);
|
||||||
@@ -3315,17 +3262,15 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
|
|||||||
if (!bnode)
|
if (!bnode)
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
/* Initialize the new block. */
|
// Initialize the new block and attach it.
|
||||||
bnode->nr_records = 0;
|
bnode->nr_records = 0;
|
||||||
bnode->next = (*krcp)->bkvhead[idx];
|
list_add(&bnode->list, &(*krcp)->bulk_head[idx]);
|
||||||
|
|
||||||
/* Attach it to the head. */
|
|
||||||
(*krcp)->bkvhead[idx] = bnode;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Finally insert. */
|
// Finally insert and update the GP for this page.
|
||||||
(*krcp)->bkvhead[idx]->records
|
bnode->records[bnode->nr_records++] = ptr;
|
||||||
[(*krcp)->bkvhead[idx]->nr_records++] = ptr;
|
bnode->gp_snap = get_state_synchronize_rcu();
|
||||||
|
atomic_inc(&(*krcp)->bulk_count[idx]);
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
@@ -3342,26 +3287,21 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
|
|||||||
* be free'd in workqueue context. This allows us to: batch requests together to
|
* be free'd in workqueue context. This allows us to: batch requests together to
|
||||||
* reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
|
* reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
|
||||||
*/
|
*/
|
||||||
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
void kvfree_call_rcu(struct rcu_head *head, void *ptr)
|
||||||
{
|
{
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
struct kfree_rcu_cpu *krcp;
|
struct kfree_rcu_cpu *krcp;
|
||||||
bool success;
|
bool success;
|
||||||
void *ptr;
|
|
||||||
|
|
||||||
if (head) {
|
/*
|
||||||
ptr = (void *) head - (unsigned long) func;
|
* Please note there is a limitation for the head-less
|
||||||
} else {
|
* variant, that is why there is a clear rule for such
|
||||||
/*
|
* objects: it can be used from might_sleep() context
|
||||||
* Please note there is a limitation for the head-less
|
* only. For other places please embed an rcu_head to
|
||||||
* variant, that is why there is a clear rule for such
|
* your data.
|
||||||
* objects: it can be used from might_sleep() context
|
*/
|
||||||
* only. For other places please embed an rcu_head to
|
if (!head)
|
||||||
* your data.
|
|
||||||
*/
|
|
||||||
might_sleep();
|
might_sleep();
|
||||||
ptr = (unsigned long *) func;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Queue the object but don't yet schedule the batch.
|
// Queue the object but don't yet schedule the batch.
|
||||||
if (debug_rcu_head_queue(ptr)) {
|
if (debug_rcu_head_queue(ptr)) {
|
||||||
@@ -3382,14 +3322,16 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
|||||||
// Inline if kvfree_rcu(one_arg) call.
|
// Inline if kvfree_rcu(one_arg) call.
|
||||||
goto unlock_return;
|
goto unlock_return;
|
||||||
|
|
||||||
head->func = func;
|
head->func = ptr;
|
||||||
head->next = krcp->head;
|
head->next = krcp->head;
|
||||||
krcp->head = head;
|
WRITE_ONCE(krcp->head, head);
|
||||||
|
atomic_inc(&krcp->head_count);
|
||||||
|
|
||||||
|
// Take a snapshot for this krcp.
|
||||||
|
krcp->head_gp_snap = get_state_synchronize_rcu();
|
||||||
success = true;
|
success = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
WRITE_ONCE(krcp->count, krcp->count + 1);
|
|
||||||
|
|
||||||
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
||||||
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
|
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
|
||||||
schedule_delayed_monitor_work(krcp);
|
schedule_delayed_monitor_work(krcp);
|
||||||
@@ -3420,7 +3362,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
|
|||||||
for_each_possible_cpu(cpu) {
|
for_each_possible_cpu(cpu) {
|
||||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||||
|
|
||||||
count += READ_ONCE(krcp->count);
|
count += krc_count(krcp);
|
||||||
count += READ_ONCE(krcp->nr_bkv_objs);
|
count += READ_ONCE(krcp->nr_bkv_objs);
|
||||||
atomic_set(&krcp->backoff_page_cache_fill, 1);
|
atomic_set(&krcp->backoff_page_cache_fill, 1);
|
||||||
}
|
}
|
||||||
@@ -3437,7 +3379,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
|
|||||||
int count;
|
int count;
|
||||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||||
|
|
||||||
count = krcp->count;
|
count = krc_count(krcp);
|
||||||
count += drain_page_cache(krcp);
|
count += drain_page_cache(krcp);
|
||||||
kfree_rcu_monitor(&krcp->monitor_work.work);
|
kfree_rcu_monitor(&krcp->monitor_work.work);
|
||||||
|
|
||||||
@@ -3461,15 +3403,12 @@ static struct shrinker kfree_rcu_shrinker = {
|
|||||||
void __init kfree_rcu_scheduler_running(void)
|
void __init kfree_rcu_scheduler_running(void)
|
||||||
{
|
{
|
||||||
int cpu;
|
int cpu;
|
||||||
unsigned long flags;
|
|
||||||
|
|
||||||
for_each_possible_cpu(cpu) {
|
for_each_possible_cpu(cpu) {
|
||||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||||
|
|
||||||
raw_spin_lock_irqsave(&krcp->lock, flags);
|
|
||||||
if (need_offload_krc(krcp))
|
if (need_offload_krc(krcp))
|
||||||
schedule_delayed_monitor_work(krcp);
|
schedule_delayed_monitor_work(krcp);
|
||||||
raw_spin_unlock_irqrestore(&krcp->lock, flags);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3485,9 +3424,10 @@ void __init kfree_rcu_scheduler_running(void)
|
|||||||
*/
|
*/
|
||||||
static int rcu_blocking_is_gp(void)
|
static int rcu_blocking_is_gp(void)
|
||||||
{
|
{
|
||||||
if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
|
if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) {
|
||||||
|
might_sleep();
|
||||||
return false;
|
return false;
|
||||||
might_sleep(); /* Check for RCU read-side critical section. */
|
}
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3711,7 +3651,9 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full);
|
|||||||
* If @false is returned, it is the caller's responsibility to invoke this
|
* If @false is returned, it is the caller's responsibility to invoke this
|
||||||
* function later on until it does return @true. Alternatively, the caller
|
* function later on until it does return @true. Alternatively, the caller
|
||||||
* can explicitly wait for a grace period, for example, by passing @oldstate
|
* can explicitly wait for a grace period, for example, by passing @oldstate
|
||||||
* to cond_synchronize_rcu() or by directly invoking synchronize_rcu().
|
* to either cond_synchronize_rcu() or cond_synchronize_rcu_expedited()
|
||||||
|
* on the one hand or by directly invoking either synchronize_rcu() or
|
||||||
|
* synchronize_rcu_expedited() on the other.
|
||||||
*
|
*
|
||||||
* Yes, this function does not take counter wrap into account.
|
* Yes, this function does not take counter wrap into account.
|
||||||
* But counter wrap is harmless. If the counter wraps, we have waited for
|
* But counter wrap is harmless. If the counter wraps, we have waited for
|
||||||
@@ -3722,6 +3664,12 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full);
|
|||||||
* completed. Alternatively, they can use get_completed_synchronize_rcu()
|
* completed. Alternatively, they can use get_completed_synchronize_rcu()
|
||||||
* to get a guaranteed-completed grace-period state.
|
* to get a guaranteed-completed grace-period state.
|
||||||
*
|
*
|
||||||
|
* In addition, because oldstate compresses the grace-period state for
|
||||||
|
* both normal and expedited grace periods into a single unsigned long,
|
||||||
|
* it can miss a grace period when synchronize_rcu() runs concurrently
|
||||||
|
* with synchronize_rcu_expedited(). If this is unacceptable, please
|
||||||
|
* instead use the _full() variant of these polling APIs.
|
||||||
|
*
|
||||||
* This function provides the same memory-ordering guarantees that
|
* This function provides the same memory-ordering guarantees that
|
||||||
* would be provided by a synchronize_rcu() that was invoked at the call
|
* would be provided by a synchronize_rcu() that was invoked at the call
|
||||||
* to the function that provided @oldstate, and that returned at the end
|
* to the function that provided @oldstate, and that returned at the end
|
||||||
@@ -4079,6 +4027,155 @@ retry:
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(rcu_barrier);
|
EXPORT_SYMBOL_GPL(rcu_barrier);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Compute the mask of online CPUs for the specified rcu_node structure.
|
||||||
|
* This will not be stable unless the rcu_node structure's ->lock is
|
||||||
|
* held, but the bit corresponding to the current CPU will be stable
|
||||||
|
* in most contexts.
|
||||||
|
*/
|
||||||
|
static unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
|
||||||
|
{
|
||||||
|
return READ_ONCE(rnp->qsmaskinitnext);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Is the CPU corresponding to the specified rcu_data structure online
|
||||||
|
* from RCU's perspective? This perspective is given by that structure's
|
||||||
|
* ->qsmaskinitnext field rather than by the global cpu_online_mask.
|
||||||
|
*/
|
||||||
|
static bool rcu_rdp_cpu_online(struct rcu_data *rdp)
|
||||||
|
{
|
||||||
|
return !!(rdp->grpmask & rcu_rnp_online_cpus(rdp->mynode));
|
||||||
|
}
|
||||||
|
|
||||||
|
#if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Is the current CPU online as far as RCU is concerned?
|
||||||
|
*
|
||||||
|
* Disable preemption to avoid false positives that could otherwise
|
||||||
|
* happen due to the current CPU number being sampled, this task being
|
||||||
|
* preempted, its old CPU being taken offline, resuming on some other CPU,
|
||||||
|
* then determining that its old CPU is now offline.
|
||||||
|
*
|
||||||
|
* Disable checking if in an NMI handler because we cannot safely
|
||||||
|
* report errors from NMI handlers anyway. In addition, it is OK to use
|
||||||
|
* RCU on an offline processor during initial boot, hence the check for
|
||||||
|
* rcu_scheduler_fully_active.
|
||||||
|
*/
|
||||||
|
bool rcu_lockdep_current_cpu_online(void)
|
||||||
|
{
|
||||||
|
struct rcu_data *rdp;
|
||||||
|
bool ret = false;
|
||||||
|
|
||||||
|
if (in_nmi() || !rcu_scheduler_fully_active)
|
||||||
|
return true;
|
||||||
|
preempt_disable_notrace();
|
||||||
|
rdp = this_cpu_ptr(&rcu_data);
|
||||||
|
/*
|
||||||
|
* Strictly, we care here about the case where the current CPU is
|
||||||
|
* in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
|
||||||
|
* not being up to date. So arch_spin_is_locked() might have a
|
||||||
|
* false positive if it's held by some *other* CPU, but that's
|
||||||
|
* OK because that just means a false *negative* on the warning.
|
||||||
|
*/
|
||||||
|
if (rcu_rdp_cpu_online(rdp) || arch_spin_is_locked(&rcu_state.ofl_lock))
|
||||||
|
ret = true;
|
||||||
|
preempt_enable_notrace();
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
|
||||||
|
|
||||||
|
#endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */
|
||||||
|
|
||||||
|
// Has rcu_init() been invoked? This is used (for example) to determine
|
||||||
|
// whether spinlocks may be acquired safely.
|
||||||
|
static bool rcu_init_invoked(void)
|
||||||
|
{
|
||||||
|
return !!rcu_state.n_online_cpus;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Near the end of the offline process. Trace the fact that this CPU
|
||||||
|
* is going offline.
|
||||||
|
*/
|
||||||
|
int rcutree_dying_cpu(unsigned int cpu)
|
||||||
|
{
|
||||||
|
bool blkd;
|
||||||
|
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
|
struct rcu_node *rnp = rdp->mynode;
|
||||||
|
|
||||||
|
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
|
||||||
|
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
|
||||||
|
blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* All CPUs for the specified rcu_node structure have gone offline,
|
||||||
|
* and all tasks that were preempted within an RCU read-side critical
|
||||||
|
* section while running on one of those CPUs have since exited their RCU
|
||||||
|
* read-side critical section. Some other CPU is reporting this fact with
|
||||||
|
* the specified rcu_node structure's ->lock held and interrupts disabled.
|
||||||
|
* This function therefore goes up the tree of rcu_node structures,
|
||||||
|
* clearing the corresponding bits in the ->qsmaskinit fields. Note that
|
||||||
|
* the leaf rcu_node structure's ->qsmaskinit field has already been
|
||||||
|
* updated.
|
||||||
|
*
|
||||||
|
* This function does check that the specified rcu_node structure has
|
||||||
|
* all CPUs offline and no blocked tasks, so it is OK to invoke it
|
||||||
|
* prematurely. That said, invoking it after the fact will cost you
|
||||||
|
* a needless lock acquisition. So once it has done its work, don't
|
||||||
|
* invoke it again.
|
||||||
|
*/
|
||||||
|
static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
|
||||||
|
{
|
||||||
|
long mask;
|
||||||
|
struct rcu_node *rnp = rnp_leaf;
|
||||||
|
|
||||||
|
raw_lockdep_assert_held_rcu_node(rnp_leaf);
|
||||||
|
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) ||
|
||||||
|
WARN_ON_ONCE(rnp_leaf->qsmaskinit) ||
|
||||||
|
WARN_ON_ONCE(rcu_preempt_has_tasks(rnp_leaf)))
|
||||||
|
return;
|
||||||
|
for (;;) {
|
||||||
|
mask = rnp->grpmask;
|
||||||
|
rnp = rnp->parent;
|
||||||
|
if (!rnp)
|
||||||
|
break;
|
||||||
|
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
|
||||||
|
rnp->qsmaskinit &= ~mask;
|
||||||
|
/* Between grace periods, so better already be zero! */
|
||||||
|
WARN_ON_ONCE(rnp->qsmask);
|
||||||
|
if (rnp->qsmaskinit) {
|
||||||
|
raw_spin_unlock_rcu_node(rnp);
|
||||||
|
/* irqs remain disabled. */
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The CPU has been completely removed, and some other CPU is reporting
|
||||||
|
* this fact from process context. Do the remainder of the cleanup.
|
||||||
|
* There can only be one CPU hotplug operation at a time, so no need for
|
||||||
|
* explicit locking.
|
||||||
|
*/
|
||||||
|
int rcutree_dead_cpu(unsigned int cpu)
|
||||||
|
{
|
||||||
|
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
|
||||||
|
// Stop-machine done, so allow nohz_full to disable tick.
|
||||||
|
tick_dep_clear(TICK_DEP_BIT_RCU);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Propagate ->qsinitmask bits up the rcu_node tree to account for the
|
* Propagate ->qsinitmask bits up the rcu_node tree to account for the
|
||||||
* first CPU in a given leaf rcu_node structure coming online. The caller
|
* first CPU in a given leaf rcu_node structure coming online. The caller
|
||||||
@@ -4408,11 +4505,13 @@ static int rcu_pm_notify(struct notifier_block *self,
|
|||||||
switch (action) {
|
switch (action) {
|
||||||
case PM_HIBERNATION_PREPARE:
|
case PM_HIBERNATION_PREPARE:
|
||||||
case PM_SUSPEND_PREPARE:
|
case PM_SUSPEND_PREPARE:
|
||||||
|
rcu_async_hurry();
|
||||||
rcu_expedite_gp();
|
rcu_expedite_gp();
|
||||||
break;
|
break;
|
||||||
case PM_POST_HIBERNATION:
|
case PM_POST_HIBERNATION:
|
||||||
case PM_POST_SUSPEND:
|
case PM_POST_SUSPEND:
|
||||||
rcu_unexpedite_gp();
|
rcu_unexpedite_gp();
|
||||||
|
rcu_async_relax();
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
@@ -4766,7 +4865,7 @@ struct workqueue_struct *rcu_gp_wq;
|
|||||||
static void __init kfree_rcu_batch_init(void)
|
static void __init kfree_rcu_batch_init(void)
|
||||||
{
|
{
|
||||||
int cpu;
|
int cpu;
|
||||||
int i;
|
int i, j;
|
||||||
|
|
||||||
/* Clamp it to [0:100] seconds interval. */
|
/* Clamp it to [0:100] seconds interval. */
|
||||||
if (rcu_delay_page_cache_fill_msec < 0 ||
|
if (rcu_delay_page_cache_fill_msec < 0 ||
|
||||||
@@ -4786,8 +4885,14 @@ static void __init kfree_rcu_batch_init(void)
|
|||||||
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
||||||
INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
|
INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
|
||||||
krcp->krw_arr[i].krcp = krcp;
|
krcp->krw_arr[i].krcp = krcp;
|
||||||
|
|
||||||
|
for (j = 0; j < FREE_N_CHANNELS; j++)
|
||||||
|
INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
for (i = 0; i < FREE_N_CHANNELS; i++)
|
||||||
|
INIT_LIST_HEAD(&krcp->bulk_head[i]);
|
||||||
|
|
||||||
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
|
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
|
||||||
INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
|
INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
|
||||||
krcp->initialized = true;
|
krcp->initialized = true;
|
||||||
@@ -4838,6 +4943,8 @@ void __init rcu_init(void)
|
|||||||
// Kick-start any polled grace periods that started early.
|
// Kick-start any polled grace periods that started early.
|
||||||
if (!(per_cpu_ptr(&rcu_data, cpu)->mynode->exp_seq_poll_rq & 0x1))
|
if (!(per_cpu_ptr(&rcu_data, cpu)->mynode->exp_seq_poll_rq & 0x1))
|
||||||
(void)start_poll_synchronize_rcu_expedited();
|
(void)start_poll_synchronize_rcu_expedited();
|
||||||
|
|
||||||
|
rcu_test_sync_prims();
|
||||||
}
|
}
|
||||||
|
|
||||||
#include "tree_stall.h"
|
#include "tree_stall.h"
|
||||||
|
@@ -158,6 +158,23 @@ union rcu_noqs {
|
|||||||
u16 s; /* Set of bits, aggregate OR here. */
|
u16 s; /* Set of bits, aggregate OR here. */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Record the snapshot of the core stats at half of the first RCU stall timeout.
|
||||||
|
* The member gp_seq is used to ensure that all members are updated only once
|
||||||
|
* during the sampling period. The snapshot is taken only if this gp_seq is not
|
||||||
|
* equal to rdp->gp_seq.
|
||||||
|
*/
|
||||||
|
struct rcu_snap_record {
|
||||||
|
unsigned long gp_seq; /* Track rdp->gp_seq counter */
|
||||||
|
u64 cputime_irq; /* Accumulated cputime of hard irqs */
|
||||||
|
u64 cputime_softirq;/* Accumulated cputime of soft irqs */
|
||||||
|
u64 cputime_system; /* Accumulated cputime of kernel tasks */
|
||||||
|
unsigned long nr_hardirqs; /* Accumulated number of hard irqs */
|
||||||
|
unsigned int nr_softirqs; /* Accumulated number of soft irqs */
|
||||||
|
unsigned long long nr_csw; /* Accumulated number of task switches */
|
||||||
|
unsigned long jiffies; /* Track jiffies value */
|
||||||
|
};
|
||||||
|
|
||||||
/* Per-CPU data for read-copy update. */
|
/* Per-CPU data for read-copy update. */
|
||||||
struct rcu_data {
|
struct rcu_data {
|
||||||
/* 1) quiescent-state and grace-period handling : */
|
/* 1) quiescent-state and grace-period handling : */
|
||||||
@@ -262,6 +279,8 @@ struct rcu_data {
|
|||||||
short rcu_onl_gp_flags; /* ->gp_flags at last online. */
|
short rcu_onl_gp_flags; /* ->gp_flags at last online. */
|
||||||
unsigned long last_fqs_resched; /* Time of last rcu_resched(). */
|
unsigned long last_fqs_resched; /* Time of last rcu_resched(). */
|
||||||
unsigned long last_sched_clock; /* Jiffies of last rcu_sched_clock_irq(). */
|
unsigned long last_sched_clock; /* Jiffies of last rcu_sched_clock_irq(). */
|
||||||
|
struct rcu_snap_record snap_record; /* Snapshot of core stats at half of */
|
||||||
|
/* the first RCU stall timeout */
|
||||||
|
|
||||||
long lazy_len; /* Length of buffered lazy callbacks. */
|
long lazy_len; /* Length of buffered lazy callbacks. */
|
||||||
int cpu;
|
int cpu;
|
||||||
|
@@ -11,6 +11,7 @@
|
|||||||
|
|
||||||
static void rcu_exp_handler(void *unused);
|
static void rcu_exp_handler(void *unused);
|
||||||
static int rcu_print_task_exp_stall(struct rcu_node *rnp);
|
static int rcu_print_task_exp_stall(struct rcu_node *rnp);
|
||||||
|
static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Record the start of an expedited grace period.
|
* Record the start of an expedited grace period.
|
||||||
@@ -667,8 +668,11 @@ static void synchronize_rcu_expedited_wait(void)
|
|||||||
mask = leaf_node_cpu_bit(rnp, cpu);
|
mask = leaf_node_cpu_bit(rnp, cpu);
|
||||||
if (!(READ_ONCE(rnp->expmask) & mask))
|
if (!(READ_ONCE(rnp->expmask) & mask))
|
||||||
continue;
|
continue;
|
||||||
|
preempt_disable(); // For smp_processor_id() in dump_cpu_task().
|
||||||
dump_cpu_task(cpu);
|
dump_cpu_task(cpu);
|
||||||
|
preempt_enable();
|
||||||
}
|
}
|
||||||
|
rcu_exp_print_detail_task_stall_rnp(rnp);
|
||||||
}
|
}
|
||||||
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
|
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
|
||||||
panic_on_rcu_stall();
|
panic_on_rcu_stall();
|
||||||
@@ -811,6 +815,36 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
|
|||||||
return ndetected;
|
return ndetected;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Scan the current list of tasks blocked within RCU read-side critical
|
||||||
|
* sections, dumping the stack of each that is blocking the current
|
||||||
|
* expedited grace period.
|
||||||
|
*/
|
||||||
|
static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp)
|
||||||
|
{
|
||||||
|
unsigned long flags;
|
||||||
|
struct task_struct *t;
|
||||||
|
|
||||||
|
if (!rcu_exp_stall_task_details)
|
||||||
|
return;
|
||||||
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
|
if (!READ_ONCE(rnp->exp_tasks)) {
|
||||||
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
t = list_entry(rnp->exp_tasks->prev,
|
||||||
|
struct task_struct, rcu_node_entry);
|
||||||
|
list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) {
|
||||||
|
/*
|
||||||
|
* We could be printing a lot while holding a spinlock.
|
||||||
|
* Avoid triggering hard lockup.
|
||||||
|
*/
|
||||||
|
touch_nmi_watchdog();
|
||||||
|
sched_show_task(t);
|
||||||
|
}
|
||||||
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
|
}
|
||||||
|
|
||||||
#else /* #ifdef CONFIG_PREEMPT_RCU */
|
#else /* #ifdef CONFIG_PREEMPT_RCU */
|
||||||
|
|
||||||
/* Request an expedited quiescent state. */
|
/* Request an expedited quiescent state. */
|
||||||
@@ -883,6 +917,15 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Because preemptible RCU does not exist, we never have to print out
|
||||||
|
* tasks blocked within RCU read-side critical sections that are blocking
|
||||||
|
* the current expedited grace period.
|
||||||
|
*/
|
||||||
|
static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
|
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@@ -39,7 +39,7 @@ int rcu_exp_jiffies_till_stall_check(void)
|
|||||||
// CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, so check the allowed range.
|
// CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, so check the allowed range.
|
||||||
// The minimum clamped value is "2UL", because at least one full
|
// The minimum clamped value is "2UL", because at least one full
|
||||||
// tick has to be guaranteed.
|
// tick has to be guaranteed.
|
||||||
till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 21UL * HZ);
|
till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 300UL * HZ);
|
||||||
|
|
||||||
if (cpu_stall_timeout && jiffies_to_msecs(till_stall_check) != cpu_stall_timeout)
|
if (cpu_stall_timeout && jiffies_to_msecs(till_stall_check) != cpu_stall_timeout)
|
||||||
WRITE_ONCE(rcu_exp_cpu_stall_timeout, jiffies_to_msecs(till_stall_check));
|
WRITE_ONCE(rcu_exp_cpu_stall_timeout, jiffies_to_msecs(till_stall_check));
|
||||||
@@ -428,6 +428,35 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
|
|||||||
return j > 2 * HZ;
|
return j > 2 * HZ;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void print_cpu_stat_info(int cpu)
|
||||||
|
{
|
||||||
|
struct rcu_snap_record rsr, *rsrp;
|
||||||
|
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
|
struct kernel_cpustat *kcsp = &kcpustat_cpu(cpu);
|
||||||
|
|
||||||
|
if (!rcu_cpu_stall_cputime)
|
||||||
|
return;
|
||||||
|
|
||||||
|
rsrp = &rdp->snap_record;
|
||||||
|
if (rsrp->gp_seq != rdp->gp_seq)
|
||||||
|
return;
|
||||||
|
|
||||||
|
rsr.cputime_irq = kcpustat_field(kcsp, CPUTIME_IRQ, cpu);
|
||||||
|
rsr.cputime_softirq = kcpustat_field(kcsp, CPUTIME_SOFTIRQ, cpu);
|
||||||
|
rsr.cputime_system = kcpustat_field(kcsp, CPUTIME_SYSTEM, cpu);
|
||||||
|
|
||||||
|
pr_err("\t hardirqs softirqs csw/system\n");
|
||||||
|
pr_err("\t number: %8ld %10d %12lld\n",
|
||||||
|
kstat_cpu_irqs_sum(cpu) - rsrp->nr_hardirqs,
|
||||||
|
kstat_cpu_softirqs_sum(cpu) - rsrp->nr_softirqs,
|
||||||
|
nr_context_switches_cpu(cpu) - rsrp->nr_csw);
|
||||||
|
pr_err("\tcputime: %8lld %10lld %12lld ==> %d(ms)\n",
|
||||||
|
div_u64(rsr.cputime_irq - rsrp->cputime_irq, NSEC_PER_MSEC),
|
||||||
|
div_u64(rsr.cputime_softirq - rsrp->cputime_softirq, NSEC_PER_MSEC),
|
||||||
|
div_u64(rsr.cputime_system - rsrp->cputime_system, NSEC_PER_MSEC),
|
||||||
|
jiffies_to_msecs(jiffies - rsrp->jiffies));
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Print out diagnostic information for the specified stalled CPU.
|
* Print out diagnostic information for the specified stalled CPU.
|
||||||
*
|
*
|
||||||
@@ -484,6 +513,8 @@ static void print_cpu_stall_info(int cpu)
|
|||||||
data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
|
data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
|
||||||
rcuc_starved ? buf : "",
|
rcuc_starved ? buf : "",
|
||||||
falsepositive ? " (false positive?)" : "");
|
falsepositive ? " (false positive?)" : "");
|
||||||
|
|
||||||
|
print_cpu_stat_info(cpu);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Complain about starvation of grace-period kthread. */
|
/* Complain about starvation of grace-period kthread. */
|
||||||
@@ -588,7 +619,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
|
|||||||
|
|
||||||
for_each_possible_cpu(cpu)
|
for_each_possible_cpu(cpu)
|
||||||
totqlen += rcu_get_n_cbs_cpu(cpu);
|
totqlen += rcu_get_n_cbs_cpu(cpu);
|
||||||
pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
|
pr_err("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
|
||||||
smp_processor_id(), (long)(jiffies - gps),
|
smp_processor_id(), (long)(jiffies - gps),
|
||||||
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
|
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
|
||||||
if (ndetected) {
|
if (ndetected) {
|
||||||
@@ -649,7 +680,7 @@ static void print_cpu_stall(unsigned long gps)
|
|||||||
raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
|
||||||
for_each_possible_cpu(cpu)
|
for_each_possible_cpu(cpu)
|
||||||
totqlen += rcu_get_n_cbs_cpu(cpu);
|
totqlen += rcu_get_n_cbs_cpu(cpu);
|
||||||
pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
|
pr_err("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
|
||||||
jiffies - gps,
|
jiffies - gps,
|
||||||
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
|
(long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);
|
||||||
|
|
||||||
|
@@ -144,8 +144,45 @@ bool rcu_gp_is_normal(void)
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(rcu_gp_is_normal);
|
EXPORT_SYMBOL_GPL(rcu_gp_is_normal);
|
||||||
|
|
||||||
static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1);
|
static atomic_t rcu_async_hurry_nesting = ATOMIC_INIT(1);
|
||||||
|
/*
|
||||||
|
* Should call_rcu() callbacks be processed with urgency or are
|
||||||
|
* they OK being executed with arbitrary delays?
|
||||||
|
*/
|
||||||
|
bool rcu_async_should_hurry(void)
|
||||||
|
{
|
||||||
|
return !IS_ENABLED(CONFIG_RCU_LAZY) ||
|
||||||
|
atomic_read(&rcu_async_hurry_nesting);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_async_should_hurry);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* rcu_async_hurry - Make future async RCU callbacks not lazy.
|
||||||
|
*
|
||||||
|
* After a call to this function, future calls to call_rcu()
|
||||||
|
* will be processed in a timely fashion.
|
||||||
|
*/
|
||||||
|
void rcu_async_hurry(void)
|
||||||
|
{
|
||||||
|
if (IS_ENABLED(CONFIG_RCU_LAZY))
|
||||||
|
atomic_inc(&rcu_async_hurry_nesting);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_async_hurry);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* rcu_async_relax - Make future async RCU callbacks lazy.
|
||||||
|
*
|
||||||
|
* After a call to this function, future calls to call_rcu()
|
||||||
|
* will be processed in a lazy fashion.
|
||||||
|
*/
|
||||||
|
void rcu_async_relax(void)
|
||||||
|
{
|
||||||
|
if (IS_ENABLED(CONFIG_RCU_LAZY))
|
||||||
|
atomic_dec(&rcu_async_hurry_nesting);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_async_relax);
|
||||||
|
|
||||||
|
static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1);
|
||||||
/*
|
/*
|
||||||
* Should normal grace-period primitives be expedited? Intended for
|
* Should normal grace-period primitives be expedited? Intended for
|
||||||
* use within RCU. Note that this function takes the rcu_expedited
|
* use within RCU. Note that this function takes the rcu_expedited
|
||||||
@@ -195,6 +232,7 @@ static bool rcu_boot_ended __read_mostly;
|
|||||||
void rcu_end_inkernel_boot(void)
|
void rcu_end_inkernel_boot(void)
|
||||||
{
|
{
|
||||||
rcu_unexpedite_gp();
|
rcu_unexpedite_gp();
|
||||||
|
rcu_async_relax();
|
||||||
if (rcu_normal_after_boot)
|
if (rcu_normal_after_boot)
|
||||||
WRITE_ONCE(rcu_normal, 1);
|
WRITE_ONCE(rcu_normal, 1);
|
||||||
rcu_boot_ended = true;
|
rcu_boot_ended = true;
|
||||||
@@ -220,6 +258,7 @@ void rcu_test_sync_prims(void)
|
|||||||
{
|
{
|
||||||
if (!IS_ENABLED(CONFIG_PROVE_RCU))
|
if (!IS_ENABLED(CONFIG_PROVE_RCU))
|
||||||
return;
|
return;
|
||||||
|
pr_info("Running RCU synchronous self tests\n");
|
||||||
synchronize_rcu();
|
synchronize_rcu();
|
||||||
synchronize_rcu_expedited();
|
synchronize_rcu_expedited();
|
||||||
}
|
}
|
||||||
@@ -508,6 +547,10 @@ int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
|
|||||||
module_param(rcu_cpu_stall_timeout, int, 0644);
|
module_param(rcu_cpu_stall_timeout, int, 0644);
|
||||||
int rcu_exp_cpu_stall_timeout __read_mostly = CONFIG_RCU_EXP_CPU_STALL_TIMEOUT;
|
int rcu_exp_cpu_stall_timeout __read_mostly = CONFIG_RCU_EXP_CPU_STALL_TIMEOUT;
|
||||||
module_param(rcu_exp_cpu_stall_timeout, int, 0644);
|
module_param(rcu_exp_cpu_stall_timeout, int, 0644);
|
||||||
|
int rcu_cpu_stall_cputime __read_mostly = IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
|
||||||
|
module_param(rcu_cpu_stall_cputime, int, 0644);
|
||||||
|
bool rcu_exp_stall_task_details __read_mostly;
|
||||||
|
module_param(rcu_exp_stall_task_details, bool, 0644);
|
||||||
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
||||||
|
|
||||||
// Suppress boot-time RCU CPU stall warnings and rcutorture writer stall
|
// Suppress boot-time RCU CPU stall warnings and rcutorture writer stall
|
||||||
@@ -555,9 +598,12 @@ struct early_boot_kfree_rcu {
|
|||||||
static void early_boot_test_call_rcu(void)
|
static void early_boot_test_call_rcu(void)
|
||||||
{
|
{
|
||||||
static struct rcu_head head;
|
static struct rcu_head head;
|
||||||
|
int idx;
|
||||||
static struct rcu_head shead;
|
static struct rcu_head shead;
|
||||||
struct early_boot_kfree_rcu *rhp;
|
struct early_boot_kfree_rcu *rhp;
|
||||||
|
|
||||||
|
idx = srcu_down_read(&early_srcu);
|
||||||
|
srcu_up_read(&early_srcu, idx);
|
||||||
call_rcu(&head, test_callback);
|
call_rcu(&head, test_callback);
|
||||||
early_srcu_cookie = start_poll_synchronize_srcu(&early_srcu);
|
early_srcu_cookie = start_poll_synchronize_srcu(&early_srcu);
|
||||||
call_srcu(&early_srcu, &shead, test_callback);
|
call_srcu(&early_srcu, &shead, test_callback);
|
||||||
@@ -586,6 +632,7 @@ static int rcu_verify_early_boot_tests(void)
|
|||||||
early_boot_test_counter++;
|
early_boot_test_counter++;
|
||||||
srcu_barrier(&early_srcu);
|
srcu_barrier(&early_srcu);
|
||||||
WARN_ON_ONCE(!poll_state_synchronize_srcu(&early_srcu, early_srcu_cookie));
|
WARN_ON_ONCE(!poll_state_synchronize_srcu(&early_srcu, early_srcu_cookie));
|
||||||
|
cleanup_srcu_struct(&early_srcu);
|
||||||
}
|
}
|
||||||
if (rcu_self_test_counter != early_boot_test_counter) {
|
if (rcu_self_test_counter != early_boot_test_counter) {
|
||||||
WARN_ON(1);
|
WARN_ON(1);
|
||||||
|
@@ -5342,6 +5342,11 @@ bool single_task_running(void)
|
|||||||
}
|
}
|
||||||
EXPORT_SYMBOL(single_task_running);
|
EXPORT_SYMBOL(single_task_running);
|
||||||
|
|
||||||
|
unsigned long long nr_context_switches_cpu(int cpu)
|
||||||
|
{
|
||||||
|
return cpu_rq(cpu)->nr_switches;
|
||||||
|
}
|
||||||
|
|
||||||
unsigned long long nr_context_switches(void)
|
unsigned long long nr_context_switches(void)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
|
@@ -450,7 +450,7 @@ unsigned long
|
|||||||
torture_random(struct torture_random_state *trsp)
|
torture_random(struct torture_random_state *trsp)
|
||||||
{
|
{
|
||||||
if (--trsp->trs_count < 0) {
|
if (--trsp->trs_count < 0) {
|
||||||
trsp->trs_state += (unsigned long)local_clock();
|
trsp->trs_state += (unsigned long)local_clock() + raw_smp_processor_id();
|
||||||
trsp->trs_count = TORTURE_RANDOM_REFRESH;
|
trsp->trs_count = TORTURE_RANDOM_REFRESH;
|
||||||
}
|
}
|
||||||
trsp->trs_state = trsp->trs_state * TORTURE_RANDOM_MULT +
|
trsp->trs_state = trsp->trs_state * TORTURE_RANDOM_MULT +
|
||||||
@@ -915,7 +915,7 @@ void torture_kthread_stopping(char *title)
|
|||||||
VERBOSE_TOROUT_STRING(buf);
|
VERBOSE_TOROUT_STRING(buf);
|
||||||
while (!kthread_should_stop()) {
|
while (!kthread_should_stop()) {
|
||||||
torture_shutdown_absorb(title);
|
torture_shutdown_absorb(title);
|
||||||
schedule_timeout_uninterruptible(1);
|
schedule_timeout_uninterruptible(HZ / 20);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
|
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
|
||||||
|
@@ -10,10 +10,9 @@
|
|||||||
T="`mktemp -d ${TMPDIR-/tmp}/configcheck.sh.XXXXXX`"
|
T="`mktemp -d ${TMPDIR-/tmp}/configcheck.sh.XXXXXX`"
|
||||||
trap 'rm -rf $T' 0
|
trap 'rm -rf $T' 0
|
||||||
|
|
||||||
cat $1 > $T/.config
|
sed -e 's/"//g' < $1 > $T/.config
|
||||||
|
|
||||||
cat $2 | sed -e 's/\(.*\)=n/# \1 is not set/' -e 's/^#CHECK#//' |
|
sed -e 's/"//g' -e 's/\(.*\)=n/# \1 is not set/' -e 's/^#CHECK#//' < $2 |
|
||||||
grep -v '^CONFIG_INITRAMFS_SOURCE' |
|
|
||||||
awk '
|
awk '
|
||||||
{
|
{
|
||||||
print "if grep -q \"" $0 "\" < '"$T/.config"'";
|
print "if grep -q \"" $0 "\" < '"$T/.config"'";
|
||||||
|
@@ -10,7 +10,7 @@
|
|||||||
#
|
#
|
||||||
# Authors: Paul E. McKenney <paulmck@kernel.org>
|
# Authors: Paul E. McKenney <paulmck@kernel.org>
|
||||||
|
|
||||||
egrep 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
|
grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
|
||||||
grep -v 'ODEBUG: ' |
|
grep -v 'ODEBUG: ' |
|
||||||
grep -v 'This means that this is a DEBUG kernel and it is' |
|
grep -v 'This means that this is a DEBUG kernel and it is' |
|
||||||
grep -v 'Warning: unable to open an initial console' |
|
grep -v 'Warning: unable to open an initial console' |
|
||||||
|
@@ -44,10 +44,10 @@ fi
|
|||||||
ncpus="`getconf _NPROCESSORS_ONLN`"
|
ncpus="`getconf _NPROCESSORS_ONLN`"
|
||||||
make -j$((2 * ncpus)) $TORTURE_KMAKE_ARG > $resdir/Make.out 2>&1
|
make -j$((2 * ncpus)) $TORTURE_KMAKE_ARG > $resdir/Make.out 2>&1
|
||||||
retval=$?
|
retval=$?
|
||||||
if test $retval -ne 0 || grep "rcu[^/]*": < $resdir/Make.out | egrep -q "Stop|Error|error:|warning:" || egrep -q "Stop|Error|error:" < $resdir/Make.out
|
if test $retval -ne 0 || grep "rcu[^/]*": < $resdir/Make.out | grep -E -q "Stop|Error|error:|warning:" || grep -E -q "Stop|Error|error:" < $resdir/Make.out
|
||||||
then
|
then
|
||||||
echo Kernel build error
|
echo Kernel build error
|
||||||
egrep "Stop|Error|error:|warning:" < $resdir/Make.out
|
grep -E "Stop|Error|error:|warning:" < $resdir/Make.out
|
||||||
echo Run aborted.
|
echo Run aborted.
|
||||||
exit 3
|
exit 3
|
||||||
fi
|
fi
|
||||||
|
@@ -32,11 +32,11 @@ for i in ${rundir}/*/Make.out
|
|||||||
do
|
do
|
||||||
scenariodir="`dirname $i`"
|
scenariodir="`dirname $i`"
|
||||||
scenariobasedir="`echo ${scenariodir} | sed -e 's/\.[0-9]*$//'`"
|
scenariobasedir="`echo ${scenariodir} | sed -e 's/\.[0-9]*$//'`"
|
||||||
if egrep -q "error:|warning:|^ld: .*undefined reference to" < $i
|
if grep -E -q "error:|warning:|^ld: .*undefined reference to" < $i
|
||||||
then
|
then
|
||||||
egrep "error:|warning:|^ld: .*undefined reference to" < $i > $i.diags
|
grep -E "error:|warning:|^ld: .*undefined reference to" < $i > $i.diags
|
||||||
files="$files $i.diags $i"
|
files="$files $i.diags $i"
|
||||||
elif ! test -f ${scenariobasedir}/vmlinux && ! test -f "${rundir}/re-run"
|
elif ! test -f ${scenariobasedir}/vmlinux && ! test -f ${scenariobasedir}/vmlinux.xz && ! test -f "${rundir}/re-run"
|
||||||
then
|
then
|
||||||
echo No ${scenariobasedir}/vmlinux file > $i.diags
|
echo No ${scenariobasedir}/vmlinux file > $i.diags
|
||||||
files="$files $i.diags $i"
|
files="$files $i.diags $i"
|
||||||
|
@@ -186,7 +186,7 @@ do
|
|||||||
fi
|
fi
|
||||||
;;
|
;;
|
||||||
--kconfig|--kconfigs)
|
--kconfig|--kconfigs)
|
||||||
checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\( CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\)*$' '^error$'
|
checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\|"[^"]*"\)\( CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\|"[^"]*"\)\)*$' '^error$'
|
||||||
TORTURE_KCONFIG_ARG="`echo "$TORTURE_KCONFIG_ARG $2" | sed -e 's/^ *//' -e 's/ *$//'`"
|
TORTURE_KCONFIG_ARG="`echo "$TORTURE_KCONFIG_ARG $2" | sed -e 's/^ *//' -e 's/ *$//'`"
|
||||||
shift
|
shift
|
||||||
;;
|
;;
|
||||||
@@ -585,7 +585,7 @@ awk < $T/cfgcpu.pack \
|
|||||||
echo kvm-end-run-stats.sh "$resdir/$ds" "$starttime" >> $T/script
|
echo kvm-end-run-stats.sh "$resdir/$ds" "$starttime" >> $T/script
|
||||||
|
|
||||||
# Extract the tests and their batches from the script.
|
# Extract the tests and their batches from the script.
|
||||||
egrep 'Start batch|Starting build\.' $T/script | grep -v ">>" |
|
grep -E 'Start batch|Starting build\.' $T/script | grep -v ">>" |
|
||||||
sed -e 's/:.*$//' -e 's/^echo //' -e 's/-ovf//' |
|
sed -e 's/:.*$//' -e 's/^echo //' -e 's/-ovf//' |
|
||||||
awk '
|
awk '
|
||||||
/^----Start/ {
|
/^----Start/ {
|
||||||
@@ -622,7 +622,7 @@ then
|
|||||||
elif test "$dryrun" = sched
|
elif test "$dryrun" = sched
|
||||||
then
|
then
|
||||||
# Extract the test run schedule from the script.
|
# Extract the test run schedule from the script.
|
||||||
egrep 'Start batch|Starting build\.' $T/script | grep -v ">>" |
|
grep -E 'Start batch|Starting build\.' $T/script | grep -v ">>" |
|
||||||
sed -e 's/:.*$//' -e 's/^echo //'
|
sed -e 's/:.*$//' -e 's/^echo //'
|
||||||
nbuilds="`grep 'Starting build\.' $T/script |
|
nbuilds="`grep 'Starting build\.' $T/script |
|
||||||
grep -v ">>" | sed -e 's/:.*$//' -e 's/^echo //' |
|
grep -v ">>" | sed -e 's/:.*$//' -e 's/^echo //' |
|
||||||
|
@@ -65,7 +65,7 @@ then
|
|||||||
fi
|
fi
|
||||||
|
|
||||||
grep --binary-files=text 'torture:.*ver:' $file |
|
grep --binary-files=text 'torture:.*ver:' $file |
|
||||||
egrep --binary-files=text -v '\(null\)|rtc: 000000000* ' |
|
grep -E --binary-files=text -v '\(null\)|rtc: 000000000* ' |
|
||||||
sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' |
|
sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' |
|
||||||
sed -e 's/^.*ver: //' |
|
sed -e 's/^.*ver: //' |
|
||||||
awk '
|
awk '
|
||||||
@@ -128,17 +128,17 @@ then
|
|||||||
then
|
then
|
||||||
summary="$summary Badness: $n_badness"
|
summary="$summary Badness: $n_badness"
|
||||||
fi
|
fi
|
||||||
n_warn=`grep -v 'Warning: unable to open an initial console' $file | grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr for the init process' | egrep -c 'WARNING:|Warn'`
|
n_warn=`grep -v 'Warning: unable to open an initial console' $file | grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr for the init process' | grep -E -c 'WARNING:|Warn'`
|
||||||
if test "$n_warn" -ne 0
|
if test "$n_warn" -ne 0
|
||||||
then
|
then
|
||||||
summary="$summary Warnings: $n_warn"
|
summary="$summary Warnings: $n_warn"
|
||||||
fi
|
fi
|
||||||
n_bugs=`egrep -c '\bBUG|Oops:' $file`
|
n_bugs=`grep -E -c '\bBUG|Oops:' $file`
|
||||||
if test "$n_bugs" -ne 0
|
if test "$n_bugs" -ne 0
|
||||||
then
|
then
|
||||||
summary="$summary Bugs: $n_bugs"
|
summary="$summary Bugs: $n_bugs"
|
||||||
fi
|
fi
|
||||||
n_kcsan=`egrep -c 'BUG: KCSAN: ' $file`
|
n_kcsan=`grep -E -c 'BUG: KCSAN: ' $file`
|
||||||
if test "$n_kcsan" -ne 0
|
if test "$n_kcsan" -ne 0
|
||||||
then
|
then
|
||||||
if test "$n_bugs" = "$n_kcsan"
|
if test "$n_bugs" = "$n_kcsan"
|
||||||
@@ -158,7 +158,7 @@ then
|
|||||||
then
|
then
|
||||||
summary="$summary lockdep: $n_badness"
|
summary="$summary lockdep: $n_badness"
|
||||||
fi
|
fi
|
||||||
n_stalls=`egrep -c 'detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state' $file`
|
n_stalls=`grep -E -c 'detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state' $file`
|
||||||
if test "$n_stalls" -ne 0
|
if test "$n_stalls" -ne 0
|
||||||
then
|
then
|
||||||
summary="$summary Stalls: $n_stalls"
|
summary="$summary Stalls: $n_stalls"
|
||||||
|
Reference in New Issue
Block a user