42210 Commits

Author SHA1 Message Date
CrazyCat
af8890e8f6 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
    sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
    .llseek = no_llseek,
so it's obviously safe.
2024-12-11 18:00:46 +02:00
CrazyCat
b7236f208b move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
    sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
    sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-12-11 17:53:49 +02:00
Linus Torvalds
f71f64210d Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - swiotlb area sizing fixes (Petr Tesarik)

* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: reduce the number of areas to match actual memory pool size
  swiotlb: always set the number of areas before allocating the pool
2023-07-09 10:24:22 -07:00
Linus Torvalds
a9943ad3dd Merge tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq update from Borislav Petkov:

 - Optimize IRQ domain's name assignment

* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdomain: Use return value of strreplace()
2023-07-09 10:16:04 -07:00
Suren Baghdasaryan
fb49c45532 fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().

We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable.  For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.

A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.

Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults.  But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.

This fix can potentially regress some fork-heavy workloads.  Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08 14:08:02 -07:00
Linus Torvalds
8066178f53 Merge tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix bad git merge of #endif in arm64 code

   A merge of the arm64 tree caused #endif to go into the wrong place

 - Fix crash on lseek of write access to tracefs/error_log

   Opening error_log as write only, and then doing an lseek() causes a
   kernel panic, because the lseek() handle expects a "seq_file" to
   exist (which is not done on write only opens). Use tracing_lseek()
   that tests for this instead of calling the default seq lseek handler.

 - Check for negative instead of -E2BIG for error on strscpy() returns

   Instead of testing for -E2BIG from strscpy(), to be more robust,
   check for less than zero, which will make sure it catches any error
   that strscpy() may someday return.

* tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/boot: Test strscpy() against less than zero for error
  arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=n
  tracing: Fix null pointer dereference in tracing_err_log_open()
2023-07-06 19:07:15 -07:00
Linus Torvalds
7b82e90411 Merge tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
 "These are cleanups for architecture specific header files:

   - the comments in include/linux/syscalls.h have gone out of sync and
     are really pointless, so these get removed

   - The asm/bitsperlong.h header no longer needs to be architecture
     specific on modern compilers, so use a generic version for newer
     architectures that use new enough userspace compilers

   - A cleanup for virt_to_pfn/virt_to_bus to have proper type checking,
     forcing the use of pointers"

* tag 'asm-generic-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  syscalls: Remove file path comments from headers
  tools arch: Remove uapi bitsperlong.h of hexagon and microblaze
  asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
  m68k/mm: Make pfn accessors static inlines
  arm64: memory: Make virt_to_pfn() a static inline
  ARM: mm: Make virt_to_pfn() a static inline
  asm-generic/page.h: Make pfn accessors static inlines
  xen/netback: Pass (void *) to virt_to_page()
  netfs: Pass a pointer to virt_to_page()
  cifs: Pass a pointer to virt_to_page() in cifsglob
  cifs: Pass a pointer to virt_to_page()
  riscv: mm: init: Pass a pointer to virt_to_page()
  ARC: init: Pass a pointer to virt_to_pfn() in init
  m68k: Pass a pointer to virt_to_pfn() virt_to_page()
  fs/proc/kcore.c: Pass a pointer to virt_addr_valid()
2023-07-06 10:06:04 -07:00
Linus Torvalds
6843306689 Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf and wireguard.

  Current release - regressions:

   - nvme-tcp: fix comma-related oops after sendpage changes

  Current release - new code bugs:

   - ptp: make max_phase_adjustment sysfs device attribute invisible
     when not supported

  Previous releases - regressions:

   - sctp: fix potential deadlock on &net->sctp.addr_wq_lock

   - mptcp:
      - ensure subflow is unhashed before cleaning the backlog
      - do not rely on implicit state check in mptcp_listen()

  Previous releases - always broken:

   - net: fix net_dev_start_xmit trace event vs skb_transport_offset()

   - Bluetooth:
      - fix use-bdaddr-property quirk
      - L2CAP: fix multiple UaFs
      - ISO: use hci_sync for setting CIG parameters
      - hci_event: fix Set CIG Parameters error status handling
      - hci_event: fix parsing of CIS Established Event
      - MGMT: fix marking SCAN_RSP as not connectable

   - wireguard: queuing: use saner cpu selection wrapping

   - sched: act_ipt: various bug fixes for iptables <> TC interactions

   - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX

   - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging

   - eth: sfc: fix null-deref in devlink port without MAE access

   - eth: ibmvnic: do not reset dql stats on NON_FATAL err

  Misc:

   - xsk: honor SO_BINDTODEVICE on bind"

* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  nfp: clean mc addresses in application firmware when closing port
  selftests: mptcp: pm_nl_ctl: fix 32-bit support
  selftests: mptcp: depend on SYN_COOKIES
  selftests: mptcp: userspace_pm: report errors with 'remove' tests
  selftests: mptcp: userspace_pm: use correct server port
  selftests: mptcp: sockopt: return error if wrong mark
  selftests: mptcp: sockopt: use 'iptables-legacy' if available
  selftests: mptcp: connect: fail if nft supposed to work
  mptcp: do not rely on implicit state check in mptcp_listen()
  mptcp: ensure subflow is unhashed before cleaning the backlog
  s390/qeth: Fix vipa deletion
  octeontx-af: fix hardware timestamp configuration
  net: dsa: sja1105: always enable the send_meta options
  net: dsa: tag_sja1105: fix MAC DA patching from meta frames
  net: Replace strlcpy with strscpy
  pptp: Fix fib lookup calls.
  mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
  net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
  xsk: Honor SO_BINDTODEVICE on bind
  ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
  ...
2023-07-05 15:44:45 -07:00
Steven Rostedt (Google)
fddca7db4a tracing/boot: Test strscpy() against less than zero for error
Instead of checking for -E2BIG, it is better to just check for less than
zero of strscpy() for error. Testing for -E2BIG is not very robust, and
the calling code does not really care about the error code, just that
there was an error.

One of the updates to convert strlcpy() to strscpy() had a v2 version
that changed the test from testing against -E2BIG to less than zero, but I
took the v1 version that still tested for -E2BIG.

Link: https://lore.kernel.org/linux-trace-kernel/20230615180420.400769-1-azeemshaikh38@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230704100807.707d1605@rorschach.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Azeem Shaikh <azeemshaikh38@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-05 10:30:49 -04:00
Mateusz Stachyra
02b0095e2f tracing: Fix null pointer dereference in tracing_err_log_open()
Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.

Tracefs node: /sys/kernel/tracing/error_log

Example Kernel panic:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Call trace:
 mutex_lock+0x30/0x110
 seq_lseek+0x34/0xb8
 __arm64_sys_lseek+0x6c/0xb8
 invoke_syscall+0x58/0x13c
 el0_svc_common+0xc4/0x10c
 do_el0_svc+0x24/0x98
 el0_svc+0x24/0x88
 el0t_64_sync_handler+0x84/0xe4
 el0t_64_sync+0x1b4/0x1b8
Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
---[ end trace 561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception

Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3

Cc: stable@vger.kernel.org
Fixes: 8a062902be ("tracing: Add tracing error log")
Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-05 09:45:45 -04:00
Linus Torvalds
f196220715 module: fix init_module_from_file() error handling
Vegard Nossum pointed out two different problems with the error handling
in init_module_from_file():

 (a) the idempotent loading code didn't clean up properly in some error
     cases, leaving the on-stack 'struct idempotent' element still in
     the hash table

 (b) failure to read the module file would nonsensically update the
     'invalid_kread_bytes' stat counter with the error value

The first error is quite nasty, in that it can then cause subsequent
idempotent loads of that same file to access stale stack contents of the
previous failure.  The case may not happen in any normal situation
(explaining all the "Tested-by's on the original change), and requires
admin privileges, but syzkaller triggers random bad behavior as a
result:

    BUG: soft lockup in sys_finit_module
    BUG: unable to handle kernel paging request in init_module_from_file
    general protection fault in init_module_from_file
    INFO: task hung in init_module_from_file
    KASAN: out-of-bounds Read in init_module_from_file
    KASAN: slab-out-of-bounds Read in init_module_from_file
    ...

The second error is fairly benign and just leads to nonsensical stats
(and has been around since the debug stats were added).

Vegard also provided a patch for the idempotent loading issue, but I'd
rather re-organize the code and make it more legible using another level
of helper functions than add the usual "goto out" error handling.

Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
Fixes: 9b9879fc03 ("modules: catch concurrent module loads, treat them as idempotent")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-04 10:17:11 -07:00
Linus Torvalds
eded37770c Merge tag 'kgdb-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
 "Fairly small changes this cycle:

   - An additional static inline function when kgdb is not enabled to
     reduce boilerplate in arch files

   - kdb will now handle input with linefeeds more like carriage return.
     This will make little difference for interactive use but can make
     it script to use expect-like interaction with kdb

   - A couple of warning fixes"

* tag 'kgdb-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: move kdb_send_sig() declaration to a better header file
  kdb: Handle LF in the command parser
  kdb: include kdb_private.h for function prototypes
  kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
2023-07-03 15:19:56 -07:00
SeongJae Park
3de4d22cc9 bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
__register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko
file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the
function prints an error message and return an error. As a result, such
modules cannot be loaded.

However, the section could be stripped out during a build process. It would
be better to let the modules loaded, because their basic functionalities
have no problem [0], though the BTF functionalities will not be supported.
Make the function to lower the level of the message from error to warn, and
return no error.

  [0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion

Fixes: c446fdacb1 ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF")
Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com>
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/87y228q66f.fsf@oc8242746057.ibm.com
Link: https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion
Link: https://lore.kernel.org/bpf/20230701171447.56464-1-sj@kernel.org
2023-07-03 18:48:09 +02:00
Daniel Thompson
b6464883f4 kdb: move kdb_send_sig() declaration to a better header file
kdb_send_sig() is defined in the signal code and called from kdb,
but the declaration is part of the kdb internal code.
Move the declaration to the shared header to avoid the warning:

kernel/signal.c:4789:6: error: no previous prototype for 'kdb_send_sig' [-Werror=missing-prototypes]

Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/lkml/20230517125423.930967-1-arnd@kernel.org/
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20230630201206.2396930-1-daniel.thompson@linaro.org
2023-07-03 09:27:12 +01:00
Linus Torvalds
ad2885979e Merge tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Remove the deprecated rule to build *.dtbo from *.dts

 - Refactor section mismatch detection in modpost

 - Fix bogus ARM section mismatch detections

 - Fix error of 'make gtags' with O= option

 - Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error
   with the latest LLVM version

 - Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed

 - Ignore more compiler-generated symbols for kallsyms

 - Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles

 - Enable more kernel-doc warnings with W=2

 - Refactor <linux/export.h> by generating KSYMTAB data by modpost

 - Deprecate <asm/export.h> and <asm-generic/export.h>

 - Remove the EXPORT_DATA_SYMBOL macro

 - Move the check for static EXPORT_SYMBOL back to modpost, which makes
   the build faster

 - Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm

 - Warn missing MODULE_DESCRIPTION when building modules with W=1

 - Make 'make clean' robust against too long argument error

 - Exclude more objects from GCOV to fix CFI failures with GCOV

 - Allow 'make modules_install' to install modules.builtin and
   modules.builtin.modinfo even when CONFIG_MODULES is disabled

 - Include modules.builtin and modules.builtin.modinfo in the
   linux-image Debian package even when CONFIG_MODULES is disabled

 - Revive "Entering directory" logging for the latest Make version

* tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits)
  modpost: define more R_ARM_* for old distributions
  kbuild: revive "Entering directory" for Make >= 4.4.1
  kbuild: set correct abs_srctree and abs_objtree for package builds
  scripts/mksysmap: Ignore prefixed KCFI symbols
  kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb
  kbuild: builddeb: always make modules_install, to install modules.builtin*
  modpost: continue even with unknown relocation type
  modpost: factor out Elf_Sym pointer calculation to section_rel()
  modpost: factor out inst location calculation to section_rel()
  kbuild: Disable GCOV for *.mod.o
  kbuild: Fix CFI failures with GCOV
  kbuild: make clean rule robust against too long argument error
  script: modpost: emit a warning when the description is missing
  kbuild: make modules_install copy modules.builtin(.modinfo)
  linux/export.h: rename 'sec' argument to 'license'
  modpost: show offset from symbol for section mismatch warnings
  modpost: merge two similar section mismatch warnings
  kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion
  modpost: use null string instead of NULL pointer for default namespace
  modpost: squash sym_update_namespace() into sym_add_exported()
  ...
2023-07-01 09:24:31 -07:00
Linus Torvalds
d25f002575 Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dan Williams:
 "The highlights in terms of new functionality are support for the
  standard CXL Performance Monitor definition that appeared in CXL 3.0,
  support for device sanitization (wiping all data from a device),
  secure-erase (re-keying encryption of user data), and support for
  firmware update. The firmware update support is notable as it reuses
  the simple sysfs_upload interface to just cat(1) a blob to a sysfs
  file and pipe that to the device.

  Additionally there are a substantial number of cleanups and
  reorganizations to get ready for RCH error handling (RCH == Restricted
  CXL Host == current shipping hardware generation / pre CXL-2.0
  topologies) and type-2 (accelerator / vendor specific) devices.

  For vendor specific devices they implement a subset of what the
  generic type-3 (generic memory expander) driver expects. As a result
  the rework decouples optional infrastructure from the core driver
  context.

  For RCH topologies, where the specification working group did not want
  to confuse pre-CXL-aware operating systems, many of the standard
  registers are hidden which makes support standard bus features like
  AER (PCIe Advanced Error Reporting) difficult. The rework arranges for
  the driver to help the PCI-AER core. Bjorn is on board with this
  direction but a late regression disocvery means the completion of this
  functionality needs to cook a bit longer, so it is code
  reorganizations only for now.

  Summary:

   - Add infrastructure for supporting background commands along with
     support for device sanitization and firmware update

   - Introduce a CXL performance monitoring unit driver based on the
     common definition in the specification.

   - Land some preparatory cleanup and refactoring for the anticipated
     arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
     topology) error handling.

   - Rework CPU cache management with respect to region configuration
     (device hotplug or other dynamic changes to memory interleaving)

   - Fix region reconfiguration vs CXL decoder ordering rules"

* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits)
  cxl: Fix one kernel-doc comment
  cxl/pci: Use correct flag for sanitize polling
  docs: perf: Minimal introduction the the CXL PMU device and driver
  perf: CXL Performance Monitoring Unit driver
  tools/testing/cxl: add firmware update emulation to CXL memdevs
  tools/testing/cxl: Use named effects for the Command Effect Log
  tools/testing/cxl: Fix command effects for inject/clear poison
  cxl: add a firmware update mechanism using the sysfs firmware loader
  cxl/test: Add Secure Erase opcode support
  cxl/mem: Support Secure Erase
  cxl/test: Add Sanitize opcode support
  cxl/mem: Wire up Sanitization support
  cxl/mbox: Add sanitization handling machinery
  cxl/mem: Introduce security state sysfs file
  cxl/mbox: Allow for IRQ_NONE case in the isr
  Revert "cxl/port: Enable the HDM decoder capability for switch ports"
  cxl/memdev: Formalize endpoint port linkage
  cxl/pci: Unconditionally unmask 256B Flit errors
  cxl/region: Manage decoder target_type at decoder-attach time
  cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
  ...
2023-07-01 08:58:41 -07:00
Christian Brauner
dd546618ba pid: use struct_size_t() helper
Before commit d67790ddf0 ("overflow: Add struct_size_t() helper") only
struct_size() existed, which expects a valid pointer instance containing
the flexible array.

However, when we determine the default struct pid allocation size for
the associated kmem cache of a pid namespace we need to take the nesting
depth of the pid namespace into account without an variable instance
necessarily being available.

In commit b69f0aeb06 ("pid: Replace struct pid 1-element array with
flex-array") we used to handle this the old fashioned way and cast NULL
to a struct pid pointer type. However, we do apparently have a dedicated
struct_size_t() helper for exactly this case. So switch to that.

Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-01 08:26:23 -07:00
Linus Torvalds
f4ce392b03 Merge tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching update from Petr Mladek:

 - Make a variable static to fix a sparse warning

* tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch: Make 'klp_stack_entries' static
2023-06-30 21:40:44 -07:00
Linus Torvalds
d2a6fd45c5 Merge tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:

 - fprobe: Pass return address to the fprobe entry/exit callbacks so
   that the callbacks don't need to analyze pt_regs/stack to find the
   function return address.

 - kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN
   flags so that those are not set at once.

 - fprobe events:
      - Add a new fprobe events for tracing arbitrary function entry and
        exit as a trace event.
      - Add a new tracepoint events for tracing raw tracepoint as a
        trace event. This allows user to trace non user-exposed
        tracepoints.
      - Move eprobe's event parser code into probe event common file.
      - Introduce BTF (BPF type format) support to kernel probe (kprobe,
        fprobe and tracepoint probe) events so that user can specify
        traced function arguments by name. This also applies the type of
        argument when fetching the argument.
      - Introduce '$arg*' wildcard support if BTF is available. This
        expands the '$arg*' meta argument to all function argument
        automatically.
      - Check the return value types by BTF. If the function returns
        'void', '$retval' is rejected.
      - Add some selftest script for fprobe events, tracepoint events
        and BTF support.
      - Update documentation about the fprobe events.
      - Some fixes for above features, document and selftests.

 - selftests for ftrace (in addition to the new fprobe events):
      - Add a test case for multiple consecutive probes in a function
        which checks if ftrace based kprobe, optimized kprobe and normal
        kprobe can be defined in the same target function.
      - Add a test case for optimized probe, which checks whether kprobe
        can be optimized or not.

* tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Fix tracepoint event with $arg* to fetch correct argument
  Documentation: Fix typo of reference file name
  tracing/probes: Fix to return NULL and keep using current argc
  selftests/ftrace: Add new test case which checks for optimized probes
  selftests/ftrace: Add new test case which adds multiple consecutive probes in a function
  Documentation: tracing/probes: Add fprobe event tracing document
  selftests/ftrace: Add BTF arguments test cases
  selftests/ftrace: Add tracepoint probe test case
  tracing/probes: Add BTF retval type support
  tracing/probes: Add $arg* meta argument for all function args
  tracing/probes: Support function parameters if BTF is available
  tracing/probes: Move event parameter fetching code to common parser
  tracing/probes: Add tracepoint support on fprobe_events
  selftests/ftrace: Add fprobe related testcases
  tracing/probes: Add fprobe events for tracing function entry and exit.
  tracing/probes: Avoid setting TPARG_FL_FENTRY and TPARG_FL_RETURN
  fprobe: Pass return address to the handlers
2023-06-30 10:44:53 -07:00
Linus Torvalds
cccf0c2ee5 Merge tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Add new feature to have function graph tracer record the return
   value. Adds a new option: funcgraph-retval ; when set, will show the
   return value of a function in the function graph tracer.

 - Also add the option: funcgraph-retval-hex where if it is not set, and
   the return value is an error code, then it will return the decimal of
   the error code, otherwise it still reports the hex value.

 - Add the file /sys/kernel/tracing/osnoise/per_cpu/cpu<cpu>/timerlat_fd
   That when a application opens it, it becomes the task that the timer
   lat tracer traces. The application can also read this file to find
   out how it's being interrupted.

 - Add the file /sys/kernel/tracing/available_filter_functions_addrs
   that works just the same as available_filter_functions but also shows
   the addresses of the functions like kallsyms, except that it gives
   the address of where the fentry/mcount jump/nop is. This is used by
   BPF to make it easier to attach BPF programs to ftrace hooks.

 - Replace strlcpy with strscpy in the tracing boot code.

* tag 'trace-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix warnings when building htmldocs for function graph retval
  riscv: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing/boot: Replace strlcpy with strscpy
  tracing/timerlat: Add user-space interface
  tracing/osnoise: Skip running osnoise if all instances are off
  tracing/osnoise: Switch from PF_NO_SETAFFINITY to migrate_disable
  ftrace: Show all functions with addresses in available_filter_functions_addrs
  selftests/ftrace: Add funcgraph-retval test case
  LoongArch: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL
  tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex
  function_graph: Support recording and printing the return value of function
  fgraph: Add declaration of "struct fgraph_ret_regs"
2023-06-30 10:33:17 -07:00
Linus Torvalds
533925cb76 Merge tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - Support for ACPI

 - Various cleanups to the ISA string parsing, including making them
   case-insensitive

 - Support for the vector extension

 - Support for independent irq/softirq stacks

 - Our CPU DT binding now has "unevaluatedProperties: false"

* tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits)
  riscv: hibernate: remove WARN_ON in save_processor_state
  dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
  dt-bindings: riscv: cpus: add a ref the common cpu schema
  riscv: stack: Add config of thread stack size
  riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
  riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
  RISC-V: always report presence of extensions formerly part of the base ISA
  dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
  RISC-V: remove decrement/increment dance in ISA string parser
  RISC-V: rework comments in ISA string parser
  RISC-V: validate riscv,isa at boot, not during ISA string parsing
  RISC-V: split early & late of_node to hartid mapping
  RISC-V: simplify register width check in ISA string parsing
  perf: RISC-V: Limit the number of counters returned from SBI
  riscv: replace deprecated scall with ecall
  riscv: uprobes: Restore thread.bad_cause
  riscv: mm: try VMA lock-based page fault handling first
  riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
  RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
  RISC-V: Track ISA extensions per hart
  ...
2023-06-30 09:37:26 -07:00
Linus Torvalds
d8b0bd57c2 Merge tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - Extend KCSAN support to 32-bit and BookE. Add some KCSAN annotations

 - Make ELFv2 ABI the default for 64-bit big-endian kernel builds, and
   use the -mprofile-kernel option (kernel specific ftrace ABI) for big
   endian ELFv2 kernels

 - Add initial Dynamic Execution Control Register (DEXCR) support, and
   allow the ROP protection instructions to be used on Power 10

 - Various other small features and fixes

Thanks to Aditya Gupta, Aneesh Kumar K.V, Benjamin Gray, Brian King,
Christophe Leroy, Colin Ian King, Dmitry Torokhov, Gaurav Batra, Jean
Delvare, Joel Stanley, Marco Elver, Masahiro Yamada, Nageswara R Sastry,
Nathan Chancellor, Naveen N Rao, Nayna Jain, Nicholas Piggin, Paul
Gortmaker, Randy Dunlap, Rob Herring, Rohan McLure, Russell Currey,
Sachin Sant, Timothy Pearson, Tom Rix, and Uwe Kleine-König.

* tag 'powerpc-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (76 commits)
  powerpc: remove checks for binutils older than 2.25
  powerpc: Fail build if using recordmcount with binutils v2.37
  powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  powerpc/iommu: Only build sPAPR access functions on pSeries
  powerpc: powernv: Annotate data races in opal events
  powerpc: Mark writes registering ipi to host cpu through kvm and polling
  powerpc: Annotate accesses to ipi message flags
  powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
  powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
  powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
  powerpc: qspinlock: Mark accesses to qnode lock checks
  powerpc/powernv/pci: Remove last IODA1 defines
  powerpc/powernv/pci: Remove MVE code
  powerpc/powernv/pci: Remove ioda1 support
  powerpc: 52xx: Make immr_id DT match tables static
  powerpc: mpc512x: Remove open coded "ranges" parsing
  powerpc: fsl_soc: Use of_range_to_resource() for "ranges" parsing
  powerpc: fsl: Use of_property_read_reg() to parse "reg"
  powerpc: fsl_rio: Use of_range_to_resource() for "ranges" parsing
  macintosh: Use of_property_read_reg() to parse "reg"
  ...
2023-06-30 09:20:08 -07:00
Kees Cook
b69f0aeb06 pid: Replace struct pid 1-element array with flex-array
For pid namespaces, struct pid uses a dynamically sized array member,
"numbers".  This was implemented using the ancient 1-element fake
flexible array, which has been deprecated for decades.

Replace it with a C99 flexible array, refactor the array size
calculations to use struct_size(), and address elements via indexes.
Note that the static initializer (which defines a single element) works
as-is, and requires no special handling.

Without this, CONFIG_UBSAN_BOUNDS (and potentially
CONFIG_FORTIFY_SOURCE) will trigger bounds checks:

  https://lore.kernel.org/lkml/20230517-bushaltestelle-super-e223978c1ba6@brauner

Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reported-by: syzbot+ac3b41786a2d0565b6d5@syzkaller.appspotmail.com
[brauner: dropped unrelated changes and remove 0 with NULL cast]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-30 09:04:01 -07:00
Douglas Anderson
1ed0555850 kdb: Handle LF in the command parser
The main kdb command parser only handles CR (ASCII 13 AKA '\r') today,
but not LF (ASCII 10 AKA '\n'). That means that the kdb command parser
can handle terminals that send just CR or that send CR+LF but can't
handle terminals that send just LF.

The fact that kdb didn't handle LF in the command parser tripped up a
tool I tried to use with it. Specifically, I was trying to send a
command to my device to resume it from kdb using a ChromeOS tool like:
  dut-control cpu_uart_cmd:"g"
That tool only terminates lines with LF, not CR+LF.

Arguably the ChromeOS tool should be fixed. After all, officially kdb
seems to be designed such that CR+LF is the official line ending
transmitted over the wire and that internally a line ending is just
'\n' (LF). Some evidence:
* uart_poll_put_char(), which is used by kdb, notices a '\n' and
  converts it to '\r\n'.
* kdb functions specifically use '\r' to get a carriage return without
  a newline. You can see this in the pager where kdb will write a '\r'
  and then write over the pager prompt.

However, all that being said there's no real harm in accepting LF as a
command terminator in the kdb parser and doing so seems like it would
improve compatibility. After this, I'd expect that things would work
OK-ish with a remote terminal that used any of CR, CR+LF, or LF as a
line ending. Someone using CR as a line ending might get some ugliness
where kdb wasn't able to overwrite the last line, but basic commands
would work. Someone using just LF as a line ending would probably also
work OK.

A few other notes:
- It can be noted that "bash" running on an "agetty" handles LF as a
  line termination with no complaints.
- Historically, kdb's "pager" actually handled either CR or LF fine. A
  very quick inspection would make one think that kdb's pager actually
  could have paged down two lines instead of one for anyone using
  CR+LF, but this is generally avoided because of kdb_input_flush().
- Conceivably one could argue that some of this special case logic
  belongs in uart_poll_get_char() since uart_poll_put_char() handles
  the '\n' => '\r\n' conversion. I would argue that perhaps we should
  eventually do the opposite and move the '\n' => '\r\n' out of
  uart_poll_put_char(). Having that conversion at such a low level
  could interfere if we ever want to transfer binary data. In
  addition, if we truly made uart_poll_get_char() the inverse of
  uart_poll_put_char() it would convert back to '\n' and (ironically)
  kdb's parser currently only looks for '\r' to find the end of a
  command.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230628125612.1.I5cc6c3d916195f5bcfdf5b75d823f2037707f5dc@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2023-06-30 16:44:29 +01:00
Andy Shevchenko
67a4e1a3bf irqdomain: Use return value of strreplace()
Since strreplace() returns the pointer to the string itself, use it
directly.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230628150251.17832-1-andriy.shevchenko@linux.intel.com
2023-06-30 11:13:44 +02:00
Linus Torvalds
1e6d5dea34 Merge tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - swiotlb cleanups (Petr Tesarik)

 - use kvmalloc_array (gaoxu)

 - a small step towards removing is_swiotlb_active (Christoph Hellwig)

 - fix a Kconfig typo Sui Jingfeng)

* tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping:
  drm/nouveau: stop using is_swiotlb_active
  swiotlb: use the atomic counter of total used slabs if available
  swiotlb: remove unused field "used" from struct io_tlb_mem
  dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
  dma-mapping: fix a Kconfig typo
2023-06-29 21:12:20 -07:00
Linus Torvalds
82a2a51055 Merge tag 'sysctl-6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl fix from Luis Chamberlain:
 "A missed minor fix which Matthieu Baerts noted I had not picked up"

* tag 'sysctl-6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  sysctl: fix unused proc_cap_handler() function warning
2023-06-29 17:44:56 -07:00
Linus Torvalds
3ad7b12c72 Merge tag 'trace-v6.4-rc7-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
 "Fix user event write on buffer disabled.

  The user events write currently returns the size of what was supposed
  to be written when tracing is disabled and nothing was written.

  Instead, behave like trace_marker and return -EBADF, as that is what
  is returned if a file is opened for read only, and a write is
  performed on it. Writing to the buffer that is disabled is like trying
  to write to a file opened for read only, as the buffer still can be
  read, but just not written to.

  This also includes test cases for this use case"

* tag 'trace-v6.4-rc7-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  selftests/user_events: Add test cases when event is disabled
  selftests/user_events: Enable the event before write_fault test in ftrace self-test
  tracing/user_events: Fix incorrect return value for writing operation when events are disabled
2023-06-29 17:36:00 -07:00
Linus Torvalds
632f54b4d6 Merge tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - SLAB deprecation:

   Following the discussion at LSF/MM 2023 [1] and no objections, the
   SLAB allocator is deprecated by renaming the config option (to make
   its users notice) to CONFIG_SLAB_DEPRECATED with updated help text.
   SLUB should be used instead. Existing defconfigs with CONFIG_SLAB are
   also updated.

 - SLAB_NO_MERGE kmem_cache flag (Jesper Dangaard Brouer):

   There are (very limited) cases where kmem_cache merging is
   undesirable, and existing ways to prevent it are hacky. Introduce a
   new flag to do that cleanly and convert the existing hacky users.
   Btrfs plans to use this for debug kernel builds (that use case is
   always fine), networking for performance reasons (that should be very
   rare).

 - Replace the usage of weak PRNGs (David Keisar Schmidt):

   In addition to using stronger RNGs for the security related features,
   the code is a bit cleaner.

 - Misc code cleanups (SeongJae Parki, Xiongwei Song, Zhen Lei, and
   zhaoxinchao)

Link: https://lwn.net/Articles/932201/ [1]

* tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab_common: use SLAB_NO_MERGE instead of negative refcount
  mm/slab: break up RCU readers on SLAB_TYPESAFE_BY_RCU example code
  mm/slab: add a missing semicolon on SLAB_TYPESAFE_BY_RCU example code
  mm/slab_common: reduce an if statement in create_cache()
  mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
  mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED
  mm/slab: remove HAVE_HARDENED_USERCOPY_ALLOCATOR
  mm/slab_common: Replace invocation of weak PRNG
  mm/slab: Replace invocation of weak PRNG
  slub: Don't read nr_slabs and total_objects directly
  slub: Remove slabs_node() function
  slub: Remove CONFIG_SMP defined check
  slub: Put objects_show() into CONFIG_SLUB_DEBUG enabled block
  slub: Correct the error code when slab_kset is NULL
  mm/slab: correct return values in comment for _kmem_cache_create()
2023-06-29 16:34:12 -07:00
Arnd Bergmann
554588e8e9 sysctl: fix unused proc_cap_handler() function warning
Since usermodehelper_table() is marked static now, we get a
warning about it being unused when SYSCTL is disabled:

kernel/umh.c:497:12: error: 'proc_cap_handler' defined but not used [-Werror=unused-function]

Just move it inside of the same #ifdef.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Fixes: 861dc0b464 ("sysctl: move umh sysctl registration to its own file")
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
[mcgrof: adjust new commit ID for Fixes tag]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-06-29 15:19:43 -07:00
Arnd Bergmann
0914e4d3cd kdb: include kdb_private.h for function prototypes
The kdb_kbd_cleanup_state() is called from another file through
the kdb_private.h file, but that is not included before the
definition, causing a W=1 warning:

kernel/debug/kdb/kdb_keyboard.c:198:6: error: no previous prototype for 'kdb_kbd_cleanup_state' [-Werror=missing-prototypes]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230517124802.929751-1-arnd@kernel.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2023-06-29 15:10:17 +01:00
Petr Tesarik
8ac0406335 swiotlb: reduce the number of areas to match actual memory pool size
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.

For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.

If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.

Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-06-29 07:10:28 +02:00
Petr Tesarik
aabd12609f swiotlb: always set the number of areas before allocating the pool
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.

While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.

Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-06-29 07:10:00 +02:00
Linus Torvalds
3a8a670eee Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
 "WiFi 7 and sendpage changes are the biggest pieces of work for this
  release. The latter will definitely require fixes but I think that we
  got it to a reasonable point.

  Core:

   - Rework the sendpage & splice implementations

     Instead of feeding data into sockets page by page extend sendmsg
     handlers to support taking a reference on the data, controlled by a
     new flag called MSG_SPLICE_PAGES

     Rework the handling of unexpected-end-of-file to invoke an
     additional callback instead of trying to predict what the right
     combination of MORE/NOTLAST flags is

     Remove the MSG_SENDPAGE_NOTLAST flag completely

   - Implement SCM_PIDFD, a new type of CMSG type analogous to
     SCM_CREDENTIALS, but it contains pidfd instead of plain pid

   - Enable socket busy polling with CONFIG_RT

   - Improve reliability and efficiency of reporting for ref_tracker

   - Auto-generate a user space C library for various Netlink families

  Protocols:

   - Allow TCP to shrink the advertised window when necessary, prevent
     sk_rcvbuf auto-tuning from growing the window all the way up to
     tcp_rmem[2]

   - Use per-VMA locking for "page-flipping" TCP receive zerocopy

   - Prepare TCP for device-to-device data transfers, by making sure
     that payloads are always attached to skbs as page frags

   - Make the backoff time for the first N TCP SYN retransmissions
     linear. Exponential backoff is unnecessarily conservative

   - Create a new MPTCP getsockopt to retrieve all info
     (MPTCP_FULL_INFO)

   - Avoid waking up applications using TLS sockets until we have a full
     record

   - Allow using kernel memory for protocol ioctl callbacks, paving the
     way to issuing ioctls over io_uring

   - Add nolocalbypass option to VxLAN, forcing packets to be fully
     encapsulated even if they are destined for a local IP address

   - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
     in-kernel ECMP implementation (e.g. Open vSwitch) select the same
     link for all packets. Support L4 symmetric hashing in Open vSwitch

   - PPPoE: make number of hash bits configurable

   - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
     (ipconfig)

   - Add layer 2 miss indication and filtering, allowing higher layers
     (e.g. ACL filters) to make forwarding decisions based on whether
     packet matched forwarding state in lower devices (bridge)

   - Support matching on Connectivity Fault Management (CFM) packets

   - Hide the "link becomes ready" IPv6 messages by demoting their
     printk level to debug

   - HSR: don't enable promiscuous mode if device offloads the proto

   - Support active scanning in IEEE 802.15.4

   - Continue work on Multi-Link Operation for WiFi 7

  BPF:

   - Add precision propagation for subprogs and callbacks. This allows
     maintaining verification efficiency when subprograms are used, or
     in fact passing the verifier at all for complex programs,
     especially those using open-coded iterators

   - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
     assumed the length is always equal to the amount of written data.
     But some protos allow passing a NULL buffer to discover what the
     output buffer *should* be, without writing anything

   - Accept dynptr memory as memory arguments passed to helpers

   - Add routing table ID to bpf_fib_lookup BPF helper

   - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

   - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
     maps as read-only)

   - Show target_{obj,btf}_id in tracing link fdinfo

   - Addition of several new kfuncs (most of the names are
     self-explanatory):
      - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
        bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
        and bpf_dynptr_clone().
      - bpf_task_under_cgroup()
      - bpf_sock_destroy() - force closing sockets
      - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs

  Netfilter:

   - Relax set/map validation checks in nf_tables. Allow checking
     presence of an entry in a map without using the value

   - Increase ip_vs_conn_tab_bits range for 64BIT builds

   - Allow updating size of a set

   - Improve NAT tuple selection when connection is closing

  Driver API:

   - Integrate netdev with LED subsystem, to allow configuring HW
     "offloaded" blinking of LEDs based on link state and activity
     (i.e. packets coming in and out)

   - Support configuring rate selection pins of SFP modules

   - Factor Clause 73 auto-negotiation code out of the drivers, provide
     common helper routines

   - Add more fool-proof helpers for managing lifetime of MDIO devices
     associated with the PCS layer

   - Allow drivers to report advanced statistics related to Time Aware
     scheduler offload (taprio)

   - Allow opting out of VF statistics in link dump, to allow more VFs
     to fit into the message

   - Split devlink instance and devlink port operations

  New hardware / drivers:

   - Ethernet:
      - Synopsys EMAC4 IP support (stmmac)
      - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
      - Marvell 88E6250 7 port switches
      - Microchip LAN8650/1 Rev.B0 PHYs
      - MediaTek MT7981/MT7988 built-in 1GE PHY driver

   - WiFi:
      - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
      - Realtek RTL8723DS (SDIO variant)
      - Realtek RTL8851BE

   - CAN:
      - Fintek F81604

  Drivers:

   - Ethernet NICs:
      - Intel (100G, ice):
         - support dynamic interrupt allocation
         - use meta data match instead of VF MAC addr on slow-path
      - nVidia/Mellanox:
         - extend link aggregation to handle 4, rather than just 2 ports
         - spawn sub-functions without any features by default
      - OcteonTX2:
         - support HTB (Tx scheduling/QoS) offload
         - make RSS hash generation configurable
         - support selecting Rx queue using TC filters
      - Wangxun (ngbe/txgbe):
         - add basic Tx/Rx packet offloads
         - add phylink support (SFP/PCS control)
      - Freescale/NXP (enetc):
         - report TAPRIO packet statistics
      - Solarflare/AMD:
         - support matching on IP ToS and UDP source port of outer
           header
         - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
         - add devlink dev info support for EF10

   - Virtual NICs:
      - Microsoft vNIC:
         - size the Rx indirection table based on requested
           configuration
         - support VLAN tagging
      - Amazon vNIC:
         - try to reuse Rx buffers if not fully consumed, useful for ARM
           servers running with 16kB pages
      - Google vNIC:
         - support TCP segmentation of >64kB frames

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - enable USXGMII (88E6191X)
      - Microchip:
         - lan966x: add support for Egress Stage 0 ACL engine
         - lan966x: support mapping packet priority to internal switch
           priority (based on PCP or DSCP)

   - Ethernet PHYs:
      - Broadcom PHYs:
         - support for Wake-on-LAN for BCM54210E/B50212E
         - report LPI counter
      - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
      - Micrel PHYs: receive timestamp in the frame (LAN8841)
      - Realtek PHYs: support optional external PHY clock
      - Altera TSE PCS: merge the driver into Lynx PCS which it is a
        variant of

   - CAN: Kvaser PCIEcan:
      - support packet timestamping

   - WiFi:
      - Intel (iwlwifi):
         - major update for new firmware and Multi-Link Operation (MLO)
         - configuration rework to drop test devices and split the
           different families
         - support for segmented PNVM images and power tables
         - new vendor entries for PPAG (platform antenna gain) feature
      - Qualcomm 802.11ax (ath11k):
         - Multiple Basic Service Set Identifier (MBSSID) and Enhanced
           MBSSID Advertisement (EMA) support in AP mode
         - support factory test mode
      - RealTek (rtw89):
         - add RSSI based antenna diversity
         - support U-NII-4 channels on 5 GHz band
      - RealTek (rtl8xxxu):
         - AP mode support for 8188f
         - support USB RX aggregation for the newer chips"

* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
  net: scm: introduce and use scm_recv_unix helper
  af_unix: Skip SCM_PIDFD if scm->pid is NULL.
  net: lan743x: Simplify comparison
  netlink: Add __sock_i_ino() for __netlink_diag_dump().
  net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
  Revert "af_unix: Call scm_recv() only after scm_set_cred()."
  phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
  libceph: Partially revert changes to support MSG_SPLICE_PAGES
  net: phy: mscc: fix packet loss due to RGMII delays
  net: mana: use vmalloc_array and vcalloc
  net: enetc: use vmalloc_array and vcalloc
  ionic: use vmalloc_array and vcalloc
  pds_core: use vmalloc_array and vcalloc
  gve: use vmalloc_array and vcalloc
  octeon_ep: use vmalloc_array and vcalloc
  net: usb: qmi_wwan: add u-blox 0x1312 composition
  perf trace: fix MSG_SPLICE_PAGES build error
  ipvlan: Fix return value of ipvlan_queue_xmit()
  netfilter: nf_tables: fix underflow in chain reference counter
  netfilter: nf_tables: unbind non-anonymous set if rule construction fails
  ...
2023-06-28 16:43:10 -07:00
Linus Torvalds
6a8cbd9253 Merge tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
 "The changes for sysctl are in line with prior efforts to stop usage of
  deprecated routines which incur recursion and also make it hard to
  remove the empty array element in each sysctl array declaration.

  The most difficult user to modify was parport which required a bit of
  re-thinking of how to declare shared sysctls there, Joel Granados has
  stepped up to the plate to do most of this work and eventual removal
  of register_sysctl_table(). That work ended up saving us about 1465
  bytes according to bloat-o-meter. Since we gained a few bloat-o-meter
  karma points I moved two rather small sysctl arrays from
  kernel/sysctl.c leaving us only two more sysctl arrays to move left.

  Most changes have been tested on linux-next for about a month. The
  last straggler patches are a minor parport fix, changes to the sysctl
  kernel selftest so to verify correctness and prevent regressions for
  the future change he made to provide an alternative solution for the
  special sysctl mount point target which was using the now deprecated
  sysctl child element.

  This is all prep work to now finally be able to remove the empty array
  element in all sysctl declarations / registrations which is expected
  to save us a bit of bytes all over the kernel. That work will be
  tested early after v6.5-rc1 is out"

* tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  sysctl: replace child with an enumeration
  sysctl: Remove debugging dump_stack
  test_sysclt: Test for registering a mount point
  test_sysctl: Add an option to prevent test skip
  test_sysctl: Add an unregister sysctl test
  test_sysctl: Group node sysctl test under one func
  test_sysctl: Fix test metadata getters
  parport: plug a sysctl register leak
  sysctl: move security keys sysctl registration to its own file
  sysctl: move umh sysctl registration to its own file
  signal: move show_unhandled_signals sysctl to its own file
  sysctl: remove empty dev table
  sysctl: Remove register_sysctl_table
  sysctl: Refactor base paths registrations
  sysctl: stop exporting register_sysctl_table
  parport: Removed sysctl related defines
  parport: Remove register_sysctl_table from parport_default_proc_register
  parport: Remove register_sysctl_table from parport_device_proc_register
  parport: Remove register_sysctl_table from parport_proc_register
  parport: Move magic number "15" to a define
2023-06-28 16:05:21 -07:00
Linus Torvalds
4e3c09e954 Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
 "The changes queued up for modules are pretty tame, mostly code removal
  of moving of code.

  Only two minor functional changes are made, the only one which stands
  out is Sebastian Andrzej Siewior's simplification of module reference
  counting by removing preempt_disable() and that has been tested on
  linux-next for well over a month without no regressions.

  I'm now, I guess, also a kitchen sink for some kallsyms changes"

[ There was a mis-communication about the concurrent module load changes
  that I had expected to come through Luis despite me authoring the
  patch. So some of the module updates were left hanging in the email
  ether, and I just committed them separately.

  It's my bad - I should have made it more clear that I expected my
  own patches to come through the module tree too. Now they missed
  linux-next, but hopefully that won't cause any issues    - Linus ]

* tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  kallsyms: make kallsyms_show_value() as generic function
  kallsyms: move kallsyms_show_value() out of kallsyms.c
  kallsyms: remove unsed API lookup_symbol_attrs
  kallsyms: remove unused arch_get_kallsym() helper
  module: Remove preempt_disable() from module reference counting.
2023-06-28 15:51:08 -07:00
Linus Torvalds
9b9879fc03 modules: catch concurrent module loads, treat them as idempotent
This is the new-and-improved attempt at avoiding huge memory load spikes
when the user space boot sequence tries to load hundreds (or even
thousands) of redundant duplicate modules in parallel.

See commit 9828ed3f69 ("module: error out early on concurrent load of
the same module file") for background and an earlier failed attempt that
was reverted.

That earlier attempt just said "concurrently loading the same module is
silly, just open the module file exclusively and return -ETXTBSY if
somebody else is already loading it".

While it is true that concurrent module loads of the same module is
silly, the reason that earlier attempt then failed was that the
concurrently loaded module would often be a prerequisite for another
module.

Thus failing to load the prerequisite would then cause cascading
failures of the other modules, rather than just short-circuiting that
one unnecessary module load.

At the same time, we still really don't want to load the contents of the
same module file hundreds of times, only to then wait for an eventually
successful load, and have everybody else return -EEXIST.

As a result, this takes another approach, and treats concurrent module
loads from the same file as "idempotent" in the inode.  So if one module
load is ongoing, we don't start a new one, but instead just wait for the
first one to complete and return the same return value as it did.

So unlike the first attempt, this does not return early: the intent is
not to speed up the boot, but to avoid a thundering herd problem in
allocating memory (both physical and virtual) for a module more than
once.

Also note that this does change behavior: it used to be that when you
had concurrent loads, you'd have one "winner" that would return success,
and everybody else would return -EEXIST.

In contrast, this idempotent logic goes all Oprah on the problem, and
says "You are a winner! And you are a winner! We are ALL winners".  But
since there's no possible actual real semantic difference between "you
loaded the module" and "somebody else already loaded the module", this
is more of a feel-good change than an actual honest-to-goodness semantic
change.

Of course, any true Johnny-come-latelies that don't get caught in the
concurrency filter will still return -EEXIST.  It's no different from
not even getting a seat at an Oprah taping.  That's life.

See the long thread on the kernel mailing list about this all, which
includes some numbers for memory use before and after the patch.

Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/
Reviewed-by: Johan Hovold <johan@kernel.org>
Tested-by: Johan Hovold <johan@kernel.org>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Rudi Heitbaum <rudi@heitbaum..com>
Tested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-28 15:46:44 -07:00
Linus Torvalds
054a73009c module: split up 'finit_module()' into init_module_from_file() helper
This will simplify the next step, where we can then key off the inode to
do one idempotent module load.

Let's do the obvious re-organization in one step, and then the new code
in another.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-28 15:46:43 -07:00
Linus Torvalds
77b1a7f7a0 Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:

 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
   directories

 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector. It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs

 - Zhen Lei has enhanced kexec's ability to support two crash regions

 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries

 - And the usual bunch of singleton patches in various places

* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  kernel/time/posix-stubs.c: remove duplicated include
  ocfs2: remove redundant assignment to variable bit_off
  watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
  powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
  devres: show which resource was invalid in __devm_ioremap_resource()
  watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
  watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
  watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
  watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
  watchdog/hardlockup: make the config checks more straightforward
  watchdog/hardlockup: sort hardlockup detector related config values a logical way
  watchdog/hardlockup: move SMP barriers from common code to buddy code
  watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  ...
2023-06-28 10:59:38 -07:00
Linus Torvalds
6e17c6de3d Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
sunliming
f6d026eea3 tracing/user_events: Fix incorrect return value for writing operation when events are disabled
The writing operation return the count of writes regardless of whether events
are enabled or disabled. Switch it to return -EBADF to indicates that the event
is disabled.

Link: https://lkml.kernel.org/r/20230626111344.19136-2-sunliming@kylinos.cn

Cc: stable@vger.kernel.org
7f5a08c79d ("user_events: Add minimal support for trace_event into ftrace")
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-28 11:00:50 -04:00
Linus Torvalds
582c161cf3 Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "There are three areas of note:

  A bunch of strlcpy()->strscpy() conversions ended up living in my tree
  since they were either Acked by maintainers for me to carry, or got
  ignored for multiple weeks (and were trivial changes).

  The compiler option '-fstrict-flex-arrays=3' has been enabled
  globally, and has been in -next for the entire devel cycle. This
  changes compiler diagnostics (though mainly just -Warray-bounds which
  is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
  coverage. In other words, there are no new restrictions, just
  potentially new warnings. Any new FORTIFY warnings we've seen have
  been fixed (usually in their respective subsystem trees). For more
  details, see commit df8fc4e934.

  The under-development compiler attribute __counted_by has been added
  so that we can start annotating flexible array members with their
  associated structure member that tracks the count of flexible array
  elements at run-time. It is possible (likely?) that the exact syntax
  of the attribute will change before it is finalized, but GCC and Clang
  are working together to sort it out. Any changes can be made to the
  macro while we continue to add annotations.

  As an example of that last case, I have a treewide commit waiting with
  such annotations found via Coccinelle:

    https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b

  Also see commit dd06e72e68 for more details.

  Summary:

   - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)

   - Convert strreplace() to return string start (Andy Shevchenko)

   - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)

   - Add missing function prototypes seen with W=1 (Arnd Bergmann)

   - Fix strscpy() kerndoc typo (Arne Welzel)

   - Replace strlcpy() with strscpy() across many subsystems which were
     either Acked by respective maintainers or were trivial changes that
     went ignored for multiple weeks (Azeem Shaikh)

   - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)

   - Add KUnit tests for strcat()-family

   - Enable KUnit tests of FORTIFY wrappers under UML

   - Add more complete FORTIFY protections for strlcat()

   - Add missed disabling of FORTIFY for all arch purgatories.

   - Enable -fstrict-flex-arrays=3 globally

   - Tightening UBSAN_BOUNDS when using GCC

   - Improve checkpatch to check for strcpy, strncpy, and fake flex
     arrays

   - Improve use of const variables in FORTIFY

   - Add requested struct_size_t() helper for types not pointers

   - Add __counted_by macro for annotating flexible array size members"

* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
  netfilter: ipset: Replace strlcpy with strscpy
  uml: Replace strlcpy with strscpy
  um: Use HOST_DIR for mrproper
  kallsyms: Replace all non-returning strlcpy with strscpy
  sh: Replace all non-returning strlcpy with strscpy
  of/flattree: Replace all non-returning strlcpy with strscpy
  sparc64: Replace all non-returning strlcpy with strscpy
  Hexagon: Replace all non-returning strlcpy with strscpy
  kobject: Use return value of strreplace()
  lib/string_helpers: Change returned value of the strreplace()
  jbd2: Avoid printing outside the boundary of the buffer
  checkpatch: Check for 0-length and 1-element arrays
  riscv/purgatory: Do not use fortified string functions
  s390/purgatory: Do not use fortified string functions
  x86/purgatory: Do not use fortified string functions
  acpi: Replace struct acpi_table_slit 1-element array with flex-array
  clocksource: Replace all non-returning strlcpy with strscpy
  string: use __builtin_memcpy() in strlcpy/strlcat
  staging: most: Replace all non-returning strlcpy with strscpy
  drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
  ...
2023-06-27 21:24:18 -07:00
Linus Torvalds
21953eb16c Merge tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:

 - A SafeSetID patch to correct what appears to be a cut-n-paste typo in
   the code causing a UID to be printed where a GID was desired.

   This is coming via the LSM tree because we haven't been able to get a
   response from the SafeSetID maintainer (Micah Morton) in several
   months. Hopefully we are able to get in touch with Micah, but until
   we do I'm going to pick them up in the LSM tree.

 - A small fix to the reiserfs LSM xattr code.

   We're continuing to work through some issues with the reiserfs code
   as we try to fixup the LSM xattr handling, but in the process we're
   uncovering some ugly problems in reiserfs and we may just end up
   removing the LSM xattr support in reiserfs prior to reiserfs'
   removal.

   For better or worse, this shouldn't impact any of the reiserfs users,
   as we discovered that LSM xattrs on reiserfs were completely broken,
   meaning no one is currently using the combo of reiserfs and a file
   labeling LSM.

 - A tweak to how the cap_user_data_t struct/typedef is declared in the
   header file to appease the Sparse gods.

 - In the process of trying to sort out the SafeSetID lost-maintainer
   problem I realized that I needed to update the labeled networking
   entry to "Supported".

 - Minor comment/documentation and spelling fixes.

* tag 'lsm-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  device_cgroup: Fix kernel-doc warnings in device_cgroup
  SafeSetID: fix UID printed instead of GID
  MAINTAINERS: move labeled networking to "supported"
  capability: erase checker warnings about struct __user_cap_data_struct
  lsm: fix a number of misspellings
  reiserfs: Initialize sec->length in reiserfs_security_init().
  capability: fix kernel-doc warnings in capability.c
2023-06-27 17:24:26 -07:00
Linus Torvalds
cae72026b5 Merge tag 'audit-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
 "A single audit patch that resolves two compiler warnings regarding
  missing function prototypes"

* tag 'audit-pr-20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: avoid missing-prototype warnings
2023-06-27 17:15:35 -07:00
Linus Torvalds
6e2332e0ab Merge tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Whenever cpuset needs to rebuild sched_domain, it walked all tasks
   looking for DEADLINE tasks as they need to be accounted on the new
   domain. Walking all tasks can be expensive and there may not be any
   DEADLINE tasks at all. Task iteration is now omitted if there are no
   DEADLINE tasks

 - Fixes DEADLINE bandwidth misaccounting after task migration failures

 - When no controller is enabled, -Wstringop-overflow warning is
   triggered. The fix patch added an early exit which is too eager and
   got reverted for now. Will fix later

 - Everything else is minor cleanups

* tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Revert "cgroup: Avoid -Wstringop-overflow warnings"
  cgroup/misc: Expose misc.current on cgroup v2 root
  cgroup: Avoid -Wstringop-overflow warnings
  cgroup: remove obsolete comment on cgroup_on_dfl()
  cgroup: remove unused task_cgroup_path()
  cgroup/cpuset: remove unneeded header files
  cgroup: make cgroup_is_threaded() and cgroup_is_thread_root() static
  rdmacg: fix kernel-doc warnings in rdmacg
  cgroup: Replace the css_set call with cgroup_get
  cgroup: remove unused macro for_each_e_css()
  cgroup: Update out-of-date comment in cgroup_migrate()
  cgroup: Replace all non-returning strlcpy with strscpy
  cgroup/cpuset: remove unneeded header files
  cgroup/cpuset: Free DL BW in case can_attach() fails
  sched/deadline: Create DL BW alloc, free & check overflow interface
  cgroup/cpuset: Iterate only if DEADLINE tasks are present
  sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
  sched/cpuset: Bring back cpuset_mutex
  cgroup/cpuset: Rename functions dealing with DEADLINE accounting
2023-06-27 16:54:21 -07:00
Linus Torvalds
7ab044a4f4 Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and
   excluded from concurrency management. Reporting on such work items
   can also be enabled through a config option.

 - Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.

 - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
   superseded by commit afa4bb778e in mainline.

* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
  workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  workqueue: fix enum type for gcc-13
  workqueue: Track and monitor per-workqueue CPU time usage
  workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
  workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
  workqueue: Improve locking rule description for worker fields
  workqueue: Move worker_set/clr_flags() upwards
  workqueue: Re-order struct worker fields
  workqueue: Add pwq->stats[] and a monitoring script
  Further upgrade queue_work_on() comment
2023-06-27 16:32:52 -07:00
Linus Torvalds
a193cc7506 Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Rework & fix the event forwarding logic by extending the core
   interface.

   This fixes AMD PMU events that have to be forwarded from the
   core PMU to the IBS PMU.

 - Add self-tests to test AMD IBS invocation via core PMU events

 - Clean up Intel FixCntrCtl MSR encoding & handling

* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Re-instate the linear PMU search
  perf/x86/intel: Define bit macros for FixCntrCtl MSR
  perf test: Add selftest to test IBS invocation via core pmu events
  perf/core: Remove pmu linear searching code
  perf/ibs: Fix interface via core pmu events
  perf/core: Rework forwarding of {task|cpu}-clock events
2023-06-27 14:43:02 -07:00
Linus Torvalds
bc6cb4d5bc Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

   The cmpxchg128() family of functions is basically & functionally the
   same as cmpxchg_double(), but with a saner interface.

   Instead of a 6-parameter horror that forced u128 - u64/u64-halves
   layout details on the interface and exposed users to complexity,
   fragility & bugs, use a natural 3-parameter interface with u128
   types.

 - Restructure the generated atomic headers, and add kerneldoc comments
   for all of the generic atomic{,64,_long}_t operations.

   The generated definitions are much cleaner now, and come with
   documentation.

 - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
   taking multiple locks of the same type.

   This gets rid of one use of lockdep_set_novalidate_class() in the
   bcache code.

 - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
   shadowing generating garbage code on Clang on certain ARM builds.

* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
  percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
  locking/atomic: treewide: delete arch_atomic_*() kerneldoc
  locking/atomic: docs: Add atomic operations to the driver basic API documentation
  locking/atomic: scripts: generate kerneldoc comments
  docs: scripts: kernel-doc: accept bitwise negation like ~@var
  locking/atomic: scripts: simplify raw_atomic*() definitions
  locking/atomic: scripts: simplify raw_atomic_long*() definitions
  locking/atomic: scripts: split pfx/name/sfx/order
  locking/atomic: scripts: restructure fallback ifdeffery
  locking/atomic: scripts: build raw_atomic_long*() directly
  locking/atomic: treewide: use raw_atomic*_<op>()
  locking/atomic: scripts: add trivial raw_atomic*_<op>()
  locking/atomic: scripts: factor out order template generation
  locking/atomic: scripts: remove leftover "${mult}"
  locking/atomic: scripts: remove bogus order parameter
  locking/atomic: xtensa: add preprocessor symbols
  locking/atomic: x86: add preprocessor symbols
  locking/atomic: sparc: add preprocessor symbols
  locking/atomic: sh: add preprocessor symbols
  ...
2023-06-27 14:14:30 -07:00
Linus Torvalds
ed3b7923a8 Merge tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Scheduler SMP load-balancer improvements:

   - Avoid unnecessary migrations within SMT domains on hybrid systems.

     Problem:

        On hybrid CPU systems, (processors with a mixture of
        higher-frequency SMT cores and lower-frequency non-SMT cores),
        under the old code lower-priority CPUs pulled tasks from the
        higher-priority cores if more than one SMT sibling was busy -
        resulting in many unnecessary task migrations.

     Solution:

        The new code improves the load balancer to recognize SMT cores
        with more than one busy sibling and allows lower-priority CPUs
        to pull tasks, which avoids superfluous migrations and lets
        lower-priority cores inspect all SMT siblings for the busiest
        queue.

   - Implement the 'runnable boosting' feature in the EAS balancer:
     consider CPU contention in frequency, EAS max util & load-balance
     busiest CPU selection.

     This improves CPU utilization for certain workloads, while leaves
     other key workloads unchanged.

  Scheduler infrastructure improvements:

   - Rewrite the scheduler topology setup code by consolidating it into
     the build_sched_topology() helper function and building it
     dynamically on the fly.

   - Resolve the local_clock() vs. noinstr complications by rewriting
     the code: provide separate sched_clock_noinstr() and
     local_clock_noinstr() functions to be used in instrumentation code,
     and make sure it is all instrumentation-safe.

  Fixes:

   - Fix a kthread_park() race with wait_woken()

   - Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
       - Fix UP PREEMPT bug by unifying the SMP and UP implementations
       - Fix task_struct::saved_state handling

   - Fix various rq clock update bugs, unearthed by turning on the rq
     clock debugging code.

   - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
     by creating enough cgroups, by removing the warnign and restricting
     window size triggers to PSI file write-permission or
     CAP_SYS_RESOURCE.

   - Propagate SMT flags in the topology when removing degenerate domain

   - Fix grub_reclaim() calculation bug in the deadline scheduler code

   - Avoid resetting the min update period when it is unnecessary, in
     psi_trigger_destroy().

   - Don't balance a task to its current running CPU in load_balance(),
     which was possible on certain NUMA topologies with overlapping
     groups.

   - Fix the sched-debug printing of rq->nr_uninterruptible

  Cleanups:

   - Address various -Wmissing-prototype warnings, as a preparation to
     (maybe) enable this warning in the future.

   - Remove unused code

   - Mark more functions __init

   - Fix shadow-variable warnings"

* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
  sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
  sched/core: Fixed missing rq clock update before calling set_rq_offline()
  sched/deadline: Update GRUB description in the documentation
  sched/deadline: Fix bandwidth reclaim equation in GRUB
  sched/wait: Fix a kthread_park race with wait_woken()
  sched/topology: Mark set_sched_topology() __init
  sched/fair: Rename variable cpu_util eff_util
  arm64/arch_timer: Fix MMIO byteswap
  sched/fair, cpufreq: Introduce 'runnable boosting'
  sched/fair: Refactor CPU utilization functions
  cpuidle: Use local_clock_noinstr()
  sched/clock: Provide local_clock_noinstr()
  x86/tsc: Provide sched_clock_noinstr()
  clocksource: hyper-v: Provide noinstr sched_clock()
  clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  x86/vdso: Fix gettimeofday masking
  math64: Always inline u128 version of mul_u64_u64_shr()
  s390/time: Provide sched_clock_noinstr()
  loongarch: Provide noinstr sched_clock_read()
  ...
2023-06-27 14:03:21 -07:00
Linus Torvalds
af96134dc8 Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
 "Documentation updates

  Miscellaneous fixes, perhaps most notably:

   - Remove RCU_NONIDLE(). The new visibility of most of the idle loop
     to RCU has obsoleted this API.

   - Make the RCU_SOFTIRQ callback-invocation time limit also apply to
     the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.

   - Add a jiffies-based callback-invocation time limit to handle
     long-running callbacks. (The local_clock() function is only invoked
     once per 32 callbacks due to its high overhead.)

   - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
     fixes a bug that can occur on systems with non-contiguous CPU
     numbering.

  kvfree_rcu updates:

   - Eliminate the single-argument variant of k[v]free_rcu() now that
     all uses have been converted to k[v]free_rcu_mightsleep().

   - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
     soon. Yes, this is closing the barn door after the horse has
     escaped, but Murphy says that there will be more horses.

  Callback-offloading updates:

   - Fix a number of bugs involving the shrinker and lazy callbacks.

  Tasks RCU updates

  Torture-test updates"

* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
  torture: Remove duplicated argument -enable-kvm for ppc64
  doc/rcutorture: Add description of rcutorture.stall_cpu_block
  rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
  rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
  rcutorture: Correct name of use_softirq module parameter
  locktorture: Add long_hold to adjust lock-hold delays
  rcu/nocb: Make shrinker iterate only over NOCB CPUs
  rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
  rcu: Make rcu_cpu_starting() rely on interrupts being disabled
  rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
  rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
  rcu: Employ jiffies-based backstop to callback time limit
  rcu: Check callback-invocation time limit for rcuc kthreads
  rcu: Remove RCU_NONIDLE()
  rcu: Add more RCU files to kernel-api.rst
  rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
  rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
  rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
  rcu/nocb: Fix shrinker race against callback enqueuer
  rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
  ...
2023-06-27 10:37:01 -07:00