From 6a25212dc35897e36f83860df3cefcdf5fb5c189 Mon Sep 17 00:00:00 2001 From: Prathu Baronia Date: Tue, 2 May 2023 14:32:40 +0530 Subject: [PATCH 01/72] kthread: fix spelling typo and grammar in comments - `If present` -> `If present,' - `reuturn` -> `return` - `function exit safely` -> `function to exit safely` Link: https://lkml.kernel.org/r/20230502090242.3037194-1-quic_pbaronia@quicinc.com Signed-off-by: Prathu Baronia Cc: Eric W. Biederman Cc: Jason A. Donenfeld Cc: Kees Cook Cc: Petr Mladek Cc: Sami Tolvanen Cc: Tejun Heo Cc: Zqiang Signed-off-by: Andrew Morton --- kernel/kthread.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 490792b1066e..08455fb48ba6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -312,10 +312,10 @@ void __noreturn kthread_exit(long result) * @comp: Completion to complete * @code: The integer value to return to kthread_stop(). * - * If present complete @comp and the reuturn code to kthread_stop(). + * If present, complete @comp and then return code to kthread_stop(). * * A kernel thread whose module may be removed after the completion of - * @comp can use this function exit safely. + * @comp can use this function to exit safely. * * Does not return. */ From 35a609a82c17b460fde063f8f4a6514441360cfd Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 27 Apr 2023 11:28:35 +0100 Subject: [PATCH 02/72] scripts/spelling.txt: add more spellings to spelling.txt Some of the more common spelling mistakes and typos that I've found while fixing up spelling mistakes in the kernel over the past couple of releases. Link: https://lkml.kernel.org/r/20230427102835.83482-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King Signed-off-by: Andrew Morton --- scripts/spelling.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/scripts/spelling.txt b/scripts/spelling.txt index f8bd6178d17b..fc7ba95e86a0 100644 --- a/scripts/spelling.txt +++ b/scripts/spelling.txt @@ -155,6 +155,7 @@ aquired||acquired aquisition||acquisition arbitary||arbitrary architechture||architecture +archtecture||architecture arguement||argument arguements||arguments arithmatic||arithmetic @@ -279,6 +280,7 @@ cant'||can't canot||cannot cann't||can't cannnot||cannot +capabiity||capability capabilites||capabilities capabilties||capabilities capabilty||capability @@ -426,6 +428,7 @@ cotrol||control cound||could couter||counter coutner||counter +creationg||creating cryptocraphic||cryptographic cummulative||cumulative cunter||counter @@ -492,6 +495,7 @@ destorys||destroys destroied||destroyed detabase||database deteced||detected +detecion||detection detectt||detect detroyed||destroyed develope||develop @@ -513,6 +517,7 @@ diferent||different differrence||difference diffrent||different differenciate||differentiate +diffreential||differential diffrentiate||differentiate difinition||definition digial||digital @@ -617,6 +622,7 @@ evalute||evaluate evalutes||evaluates evalution||evaluation excecutable||executable +excceed||exceed exceded||exceeded exceds||exceeds exceeed||exceed @@ -632,6 +638,7 @@ existant||existent exixt||exist exsits||exists exlcude||exclude +exlcuding||excluding exlcusive||exclusive exlusive||exclusive exmaple||example @@ -726,6 +733,8 @@ generiously||generously genereate||generate genereted||generated genric||generic +gerenal||general +geting||getting globel||global grabing||grabbing grahical||graphical @@ -899,6 +908,7 @@ iteraions||iterations iternations||iterations itertation||iteration itslef||itself +ivalid||invalid jave||java jeffies||jiffies jumpimng||jumping @@ -977,6 +987,7 @@ microprocesspr||microprocessor migrateable||migratable millenium||millennium milliseonds||milliseconds +minimim||minimum minium||minimum minimam||minimum minimun||minimum @@ -1042,6 +1053,7 @@ notifed||notified notity||notify nubmer||number numebr||number +numer||number numner||number nunber||number obtaion||obtain @@ -1061,6 +1073,7 @@ offet||offset offlaod||offload offloded||offloaded offseting||offsetting +oflload||offload omited||omitted omiting||omitting omitt||omit @@ -1105,6 +1118,7 @@ pakage||package paket||packet pallette||palette paln||plan +palne||plane paramameters||parameters paramaters||parameters paramater||parameter @@ -1181,12 +1195,14 @@ previsously||previously primative||primitive princliple||principle priorty||priority +priting||printing privilaged||privileged privilage||privilege priviledge||privilege priviledges||privileges privleges||privileges probaly||probably +probabalistic||probabilistic procceed||proceed proccesors||processors procesed||processed @@ -1460,6 +1476,7 @@ submited||submitted submition||submission succeded||succeeded suceed||succeed +succesfuly||successfully succesfully||successfully succesful||successful successed||succeeded @@ -1503,6 +1520,7 @@ symetric||symmetric synax||syntax synchonized||synchronized sychronization||synchronization +sychronously||synchronously synchronuously||synchronously syncronize||synchronize syncronized||synchronized @@ -1532,6 +1550,7 @@ threee||three threshhold||threshold thresold||threshold throught||through +tansition||transition trackling||tracking troughput||throughput trys||tries @@ -1611,6 +1630,7 @@ unneccessary||unnecessary unnecesary||unnecessary unneedingly||unnecessarily unnsupported||unsupported +unuspported||unsupported unmached||unmatched unprecise||imprecise unpriviledged||unprivileged @@ -1657,6 +1677,7 @@ verfication||verification veriosn||version verisons||versions verison||version +veritical||vertical verson||version vicefersa||vice-versa virtal||virtual @@ -1677,6 +1698,7 @@ whenver||whenever wheter||whether whe||when wierd||weird +wihout||without wiil||will wirte||write withing||within From 9e627588bf91174d8903f5550dc4cffc5c348247 Mon Sep 17 00:00:00 2001 From: Azeem Shaikh Date: Wed, 10 May 2023 21:24:57 +0000 Subject: [PATCH 03/72] procfs: replace all non-returning strlcpy with strscpy strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Link: https://lkml.kernel.org/r/20230510212457.3491385-1-azeemshaikh38@gmail.com Signed-off-by: Azeem Shaikh Reviewed-by: Kees Cook Reviewed-by: Alexey Dobriyan Cc: Baoquan He Cc: David Hildenbrand Cc: Kefeng Wang Cc: Liu Shixin Cc: Lorenzo Stoakes Signed-off-by: Andrew Morton --- fs/proc/kcore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 25b44b303b35..5d0cf59c4926 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -419,7 +419,7 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) char *notes; size_t i = 0; - strlcpy(prpsinfo.pr_psargs, saved_command_line, + strscpy(prpsinfo.pr_psargs, saved_command_line, sizeof(prpsinfo.pr_psargs)); notes = kzalloc(notes_len, GFP_KERNEL); From 3db55767da7427cebc49e22b25b5f50ff455add6 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Fri, 12 May 2023 13:33:38 +0300 Subject: [PATCH 04/72] add intptr_t Add signed intptr_t given that a) it is standard type and b) uintptr_t is in tree. Link: https://lkml.kernel.org/r/ed66b9e4-1fb7-45be-9bb9-d4bc291c691f@p183 Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton --- arch/mips/include/asm/fw/cfe/cfe_api.h | 3 --- include/linux/types.h | 1 + lib/zstd/common/zstd_deps.h | 18 ------------------ 3 files changed, 1 insertion(+), 21 deletions(-) diff --git a/arch/mips/include/asm/fw/cfe/cfe_api.h b/arch/mips/include/asm/fw/cfe/cfe_api.h index 25df2f4deb31..b52a6a9c26f1 100644 --- a/arch/mips/include/asm/fw/cfe/cfe_api.h +++ b/arch/mips/include/asm/fw/cfe/cfe_api.h @@ -17,9 +17,6 @@ #include #include -typedef long intptr_t; - - /* * Constants */ diff --git a/include/linux/types.h b/include/linux/types.h index 688fb943556a..7e5a1fb7cfa1 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -35,6 +35,7 @@ typedef __kernel_uid16_t uid16_t; typedef __kernel_gid16_t gid16_t; typedef unsigned long uintptr_t; +typedef long intptr_t; #ifdef CONFIG_HAVE_UID16 /* This is defined by include/asm-{arch}/posix_types.h */ diff --git a/lib/zstd/common/zstd_deps.h b/lib/zstd/common/zstd_deps.h index f06df065dec0..2c34e8a33a1c 100644 --- a/lib/zstd/common/zstd_deps.h +++ b/lib/zstd/common/zstd_deps.h @@ -105,21 +105,3 @@ static uint64_t ZSTD_div64(uint64_t dividend, uint32_t divisor) { #endif /* ZSTD_DEPS_IO */ #endif /* ZSTD_DEPS_NEED_IO */ - -/* - * Only requested when MSAN is enabled. - * Need: - * intptr_t - */ -#ifdef ZSTD_DEPS_NEED_STDINT -#ifndef ZSTD_DEPS_STDINT -#define ZSTD_DEPS_STDINT - -/* - * The Linux Kernel doesn't provide intptr_t, only uintptr_t, which - * is an unsigned long. - */ -typedef long intptr_t; - -#endif /* ZSTD_DEPS_STDINT */ -#endif /* ZSTD_DEPS_NEED_STDINT */ From 4e2f6342ccaf9d8beb33b398cc2862769c7d2a3b Mon Sep 17 00:00:00 2001 From: Haifeng Xu Date: Mon, 8 May 2023 06:44:58 +0000 Subject: [PATCH 05/72] fork: optimize memcg_charge_kernel_stack() a bit Since commit f1c1a9ee00e4 ("fork: Move memcg_charge_kernel_stack() into CONFIG_VMAP_STACK"), memcg_charge_kernel_stack() has been moved into CONFIG_VMAP_STACK block, so the CONFIG_VMAP_STACK check can be removed. Furthermore, memcg_charge_kernel_stack() is only invoked by alloc_thread_stack_node() instead of dup_task_struct(). If memcg_kmem_charge_page() fails, the uncharge process is handled in memcg_charge_kernel_stack() itself instead of free_thread_stack(), so remove the incorrect comments. If memcg_charge_kernel_stack() fails to charge pages used by kernel stack, only charged pages need to be uncharged. It's unnecessary to uncharge those pages which memory cgroup pointer is NULL. [akpm@linux-foundation.org: remove assertion that PAGE_SIZE is a multiple of 1k] Link: https://lkml.kernel.org/r/20230508064458.32855-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu Cc: Andy Lutomirski Cc: Daniel Bristot de Oliveira Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- kernel/fork.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index ed4e01daccaa..d17995934eb4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -252,23 +252,19 @@ static int memcg_charge_kernel_stack(struct vm_struct *vm) { int i; int ret; + int nr_charged = 0; - BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 != 0); BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE); for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { ret = memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL, 0); if (ret) goto err; + nr_charged++; } return 0; err: - /* - * If memcg_kmem_charge_page() fails, page's memory cgroup pointer is - * NULL, and memcg_kmem_uncharge_page() in free_thread_stack() will - * ignore this page. - */ - for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) + for (i = 0; i < nr_charged; i++) memcg_kmem_uncharge_page(vm->pages[i], 0); return ret; } From 6b81459c9cb0e11e30b07ff0a171c27f3650eb82 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 17 May 2023 09:16:22 +0200 Subject: [PATCH 06/72] squashfs: don't include buffer_head.h Squashfs has stopped using buffers heads in 93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage to BIO"). Link: https://lkml.kernel.org/r/20230517071622.245151-1-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Pankaj Raghav Reviewed-by: Phillip Lougher Signed-off-by: Andrew Morton --- fs/squashfs/block.c | 1 - fs/squashfs/decompressor.c | 1 - fs/squashfs/decompressor_multi_percpu.c | 1 - 3 files changed, 3 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index bed3bb8b27fa..cb4e48baa4ba 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -18,7 +18,6 @@ #include #include #include -#include #include #include "squashfs_fs.h" diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c index 8893cb9b4198..a676084be27e 100644 --- a/fs/squashfs/decompressor.c +++ b/fs/squashfs/decompressor.c @@ -11,7 +11,6 @@ #include #include #include -#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 1dfadf76ed9a..8a218e7c2390 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -7,7 +7,6 @@ #include #include #include -#include #include #include "squashfs_fs.h" From e994f5b677ee016a2535d9df826315122da1ae65 Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Wed, 17 May 2023 16:18:19 +0200 Subject: [PATCH 07/72] squashfs: cache partial compressed blocks Before commit 93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage to BIO"), compressed blocks read by squashfs were cached in the page cache, but that is not the case after that commit. That has lead to squashfs having to re-read a lot of sectors from disk/flash. For example, the first sectors of every metadata block need to be read twice from the disk. Once partially to read the length, and a second time to read the block itself. Also, in linear reads of large files, the last sectors of one data block are re-read from disk when reading the next data block, since the compressed blocks are of variable sizes and not aligned to device blocks. This extra I/O results in a degrade in read performance of, for example, ~16% in one scenario on my ARM platform using squashfs with dm-verity and NAND. Since the decompressed data is cached in the page cache or squashfs' internal metadata and fragment caches, caching _all_ compressed pages would lead to a lot of double caching and is undesirable. But make the code cache any disk blocks which were only partially requested, since these are the ones likely to include data which is needed by other file system blocks. This restores read performance in my test scenario. The compressed block caching is only applied when the disk block size is equal to the page size, to avoid having to deal with caching sub-page reads. [akpm@linux-foundation.org: fs/squashfs/block.c needs linux/pagemap.h] [vincent.whitchurch@axis.com: fix page update race] Link: https://lkml.kernel.org/r/20230526-squashfs-cache-fixup-v1-1-d54a7fa23e7b@axis.com [vincent.whitchurch@axis.com: fix page indices] Link: https://lkml.kernel.org/r/20230526-squashfs-cache-fixup-v1-2-d54a7fa23e7b@axis.com [akpm@linux-foundation.org: fix layout, per hch] Link: https://lkml.kernel.org/r/20230510-squashfs-cache-v4-1-3bd394e1ee71@axis.com Signed-off-by: Vincent Whitchurch Reviewed-by: Christoph Hellwig Reviewed-by: Phillip Lougher Signed-off-by: Andrew Morton --- fs/squashfs/block.c | 117 +++++++++++++++++++++++++++++++++-- fs/squashfs/squashfs_fs_sb.h | 1 + fs/squashfs/super.c | 17 +++++ 3 files changed, 129 insertions(+), 6 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index cb4e48baa4ba..6aa9c2e1e8eb 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -75,10 +76,101 @@ static int copy_bio_to_actor(struct bio *bio, return copied_bytes; } +static int squashfs_bio_read_cached(struct bio *fullbio, + struct address_space *cache_mapping, u64 index, int length, + u64 read_start, u64 read_end, int page_count) +{ + struct page *head_to_cache = NULL, *tail_to_cache = NULL; + struct block_device *bdev = fullbio->bi_bdev; + int start_idx = 0, end_idx = 0; + struct bvec_iter_all iter_all; + struct bio *bio = NULL; + struct bio_vec *bv; + int idx = 0; + int err = 0; + + bio_for_each_segment_all(bv, fullbio, iter_all) { + struct page *page = bv->bv_page; + + if (page->mapping == cache_mapping) { + idx++; + continue; + } + + /* + * We only use this when the device block size is the same as + * the page size, so read_start and read_end cover full pages. + * + * Compare these to the original required index and length to + * only cache pages which were requested partially, since these + * are the ones which are likely to be needed when reading + * adjacent blocks. + */ + if (idx == 0 && index != read_start) + head_to_cache = page; + else if (idx == page_count - 1 && index + length != read_end) + tail_to_cache = page; + + if (!bio || idx != end_idx) { + struct bio *new = bio_alloc_clone(bdev, fullbio, + GFP_NOIO, &fs_bio_set); + + if (bio) { + bio_trim(bio, start_idx * PAGE_SECTORS, + (end_idx - start_idx) * PAGE_SECTORS); + bio_chain(bio, new); + submit_bio(bio); + } + + bio = new; + start_idx = idx; + } + + idx++; + end_idx = idx; + } + + if (bio) { + bio_trim(bio, start_idx * PAGE_SECTORS, + (end_idx - start_idx) * PAGE_SECTORS); + err = submit_bio_wait(bio); + bio_put(bio); + } + + if (err) + return err; + + if (head_to_cache) { + int ret = add_to_page_cache_lru(head_to_cache, cache_mapping, + read_start >> PAGE_SHIFT, + GFP_NOIO); + + if (!ret) { + SetPageUptodate(head_to_cache); + unlock_page(head_to_cache); + } + + } + + if (tail_to_cache) { + int ret = add_to_page_cache_lru(tail_to_cache, cache_mapping, + (read_end >> PAGE_SHIFT) - 1, + GFP_NOIO); + + if (!ret) { + SetPageUptodate(tail_to_cache); + unlock_page(tail_to_cache); + } + } + + return 0; +} + static int squashfs_bio_read(struct super_block *sb, u64 index, int length, struct bio **biop, int *block_offset) { struct squashfs_sb_info *msblk = sb->s_fs_info; + struct address_space *cache_mapping = msblk->cache_mapping; const u64 read_start = round_down(index, msblk->devblksize); const sector_t block = read_start >> msblk->devblksize_log2; const u64 read_end = round_up(index + length, msblk->devblksize); @@ -98,21 +190,34 @@ static int squashfs_bio_read(struct super_block *sb, u64 index, int length, for (i = 0; i < page_count; ++i) { unsigned int len = min_t(unsigned int, PAGE_SIZE - offset, total_len); - struct page *page = alloc_page(GFP_NOIO); + struct page *page = NULL; + + if (cache_mapping) + page = find_get_page(cache_mapping, + (read_start >> PAGE_SHIFT) + i); + if (!page) + page = alloc_page(GFP_NOIO); if (!page) { error = -ENOMEM; goto out_free_bio; } - if (!bio_add_page(bio, page, len, offset)) { - error = -EIO; - goto out_free_bio; - } + + /* + * Use the __ version to avoid merging since we need each page + * to be separate when we check for and avoid cached pages. + */ + __bio_add_page(bio, page, len, offset); offset = 0; total_len -= len; } - error = submit_bio_wait(bio); + if (cache_mapping) + error = squashfs_bio_read_cached(bio, cache_mapping, index, + length, read_start, read_end, + page_count); + else + error = submit_bio_wait(bio); if (error) goto out_free_bio; diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index 72f6f4b37863..c01998eec146 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -47,6 +47,7 @@ struct squashfs_sb_info { struct squashfs_cache *block_cache; struct squashfs_cache *fragment_cache; struct squashfs_cache *read_page; + struct address_space *cache_mapping; int next_meta_index; __le64 *id_table; __le64 *fragment_index; diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index e090fae48e68..22e812808e5c 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -329,6 +329,19 @@ static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc) goto failed_mount; } + if (msblk->devblksize == PAGE_SIZE) { + struct inode *cache = new_inode(sb); + + if (cache == NULL) + goto failed_mount; + + set_nlink(cache, 1); + cache->i_size = OFFSET_MAX; + mapping_set_gfp_mask(cache->i_mapping, GFP_NOFS); + + msblk->cache_mapping = cache->i_mapping; + } + msblk->stream = squashfs_decompressor_setup(sb, flags); if (IS_ERR(msblk->stream)) { err = PTR_ERR(msblk->stream); @@ -454,6 +467,8 @@ failed_mount: squashfs_cache_delete(msblk->block_cache); squashfs_cache_delete(msblk->fragment_cache); squashfs_cache_delete(msblk->read_page); + if (msblk->cache_mapping) + iput(msblk->cache_mapping->host); msblk->thread_ops->destroy(msblk); kfree(msblk->inode_lookup_table); kfree(msblk->fragment_index); @@ -572,6 +587,8 @@ static void squashfs_put_super(struct super_block *sb) squashfs_cache_delete(sbi->block_cache); squashfs_cache_delete(sbi->fragment_cache); squashfs_cache_delete(sbi->read_page); + if (sbi->cache_mapping) + iput(sbi->cache_mapping->host); sbi->thread_ops->destroy(sbi); kfree(sbi->id_table); kfree(sbi->fragment_index); From 6ca0f81c0b96a5e29de48cb02062b5130d27dbe3 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:49 +0200 Subject: [PATCH 08/72] mm: percpu: unhide pcpu_embed_first_chunk prototype Patch series "mm/init/kernel: missing-prototypes warnings". These are patches addressing -Wmissing-prototypes warnings in common kernel code and memory management code files that usually get merged through the -mm tree. This patch (of 12): This function is called whenever CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK or CONFIG_HAVE_SETUP_PER_CPU_AREA, but only declared when the former is set: mm/percpu.c:3055:12: error: no previous prototype for 'pcpu_embed_first_chunk' [-Werror=missing-prototypes] There is no real point in hiding declarations, so just remove the #ifdef here. Link: https://lkml.kernel.org/r/20230517131102.934196-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20230517131102.934196-2-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/percpu.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 1338ea2aa720..42125cf9c506 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -103,12 +103,10 @@ extern void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai); extern void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, void *base_addr); -#ifdef CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, size_t atom_size, pcpu_fc_cpu_distance_fn_t cpu_distance_fn, pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn); -#endif #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK void __init pcpu_populate_pte(unsigned long addr); From 8f14a96386b2676a1ccdd9d2f1732fbd7248fa98 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:50 +0200 Subject: [PATCH 09/72] mm: page_poison: always declare __kernel_map_pages() function The __kernel_map_pages() function is mainly used for CONFIG_DEBUG_PAGEALLOC, but has a number of architecture specific definitions that may also be used in other configurations, as well as a global fallback definition for architectures that do not support DEBUG_PAGEALLOC. When the option is disabled, any definitions without the prototype cause a warning: mm/page_poison.c:102:6: error: no previous prototype for '__kernel_map_pages' [-Werror=missing-prototypes] The function is a trivial nop here, so just declare it anyway to avoid the warning. Link: https://lkml.kernel.org/r/20230517131102.934196-3-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..e95d7c575ea6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3453,13 +3453,12 @@ static inline bool debug_pagealloc_enabled_static(void) return static_branch_unlikely(&_debug_pagealloc_enabled); } -#ifdef CONFIG_DEBUG_PAGEALLOC /* * To support DEBUG_PAGEALLOC architecture must ensure that * __kernel_map_pages() never fails */ extern void __kernel_map_pages(struct page *page, int numpages, int enable); - +#ifdef CONFIG_DEBUG_PAGEALLOC static inline void debug_pagealloc_map_pages(struct page *page, int numpages) { if (debug_pagealloc_enabled_static()) From 52bb85d64361e7861cc2d38cfbc85881bc57c0b6 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:51 +0200 Subject: [PATCH 10/72] mm: sparse: mark populate_section_memmap() static There are two definitions of this function, but the second one lacks the 'static' annotation: mm/sparse.c:704:25: error: no previous prototype for 'populate_section_memmap' [-Werror=missing-prototypes] Link: https://lkml.kernel.org/r/20230517131102.934196-4-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- mm/sparse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/sparse.c b/mm/sparse.c index c2afdb26039e..b8d5d58fe240 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -701,7 +701,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages) return rc; } #else -struct page * __meminit populate_section_memmap(unsigned long pfn, +static struct page * __meminit populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap, struct dev_pagemap *pgmap) { From 6b76ca2ab917185b63ded3d7f85bbf9998ab939e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:53 +0200 Subject: [PATCH 11/72] lib: devmem_is_allowed: include linux/io.h The devmem_is_allowed() function is defined in a file of the same name, but the declaration is in asm/io.h, which is not included there, causing a W=1 warning: lib/devmem_is_allowed.c:20:5: error: no previous prototype for 'devmem_is_allowed' [-Werror=missing-prototypes] Include the appropriate header to avoid the warning. Link: https://lkml.kernel.org/r/20230517131102.934196-6-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- lib/devmem_is_allowed.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/devmem_is_allowed.c b/lib/devmem_is_allowed.c index 60be9e24bd57..9c060c69f134 100644 --- a/lib/devmem_is_allowed.c +++ b/lib/devmem_is_allowed.c @@ -10,6 +10,7 @@ #include #include +#include /* * devmem_is_allowed() checks to see if /dev/mem access to a certain address From ff7138813ac4a20628ce4b40952c69faba835fbb Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:54 +0200 Subject: [PATCH 12/72] locking: add lockevent_read() prototype lockevent_read() has a __weak definition and the only caller in kernel/locking/lock_events.c, plus a strong definition in qspinlock_stat.h that overrides it, but no other declaration. This causes a W=1 warning: kernel/locking/lock_events.c:61:16: error: no previous prototype for 'lockevent_read' [-Werror=missing-prototypes] Add shared prototype to avoid the warnings. Link: https://lkml.kernel.org/r/20230517131102.934196-7-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/locking/lock_events.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/locking/lock_events.h b/kernel/locking/lock_events.h index 8c7e7d25f09c..a6016b91803d 100644 --- a/kernel/locking/lock_events.h +++ b/kernel/locking/lock_events.h @@ -57,4 +57,8 @@ static inline void __lockevent_add(enum lock_events event, int inc) #define lockevent_cond_inc(ev, c) #endif /* CONFIG_LOCK_EVENT_COUNTS */ + +ssize_t lockevent_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos); + #endif /* __LOCKING_LOCK_EVENTS_H */ From 525bb813a995283bef759659cfc00f3fc2d51e81 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:55 +0200 Subject: [PATCH 13/72] panic: hide unused global functions Building with W=1 shows warnings about two functions that have no declaration or caller in certain configurations: kernel/panic.c:688:6: error: no previous prototype for 'warn_slowpath_fmt' [-Werror=missing-prototypes] kernel/panic.c:710:6: error: no previous prototype for '__warn_printk' [-Werror=missing-prototypes] Enclose the definition in the same #ifdef check as the declaration. Link: https://lkml.kernel.org/r/20230517131102.934196-8-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/panic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 886d2ebd0a0d..10effe40a3fa 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -684,6 +684,7 @@ void __warn(const char *file, int line, void *caller, unsigned taint, add_taint(taint, LOCKDEP_STILL_OK); } +#ifdef CONFIG_BUG #ifndef __WARN_FLAGS void warn_slowpath_fmt(const char *file, int line, unsigned taint, const char *fmt, ...) @@ -722,8 +723,6 @@ void __warn_printk(const char *fmt, ...) EXPORT_SYMBOL(__warn_printk); #endif -#ifdef CONFIG_BUG - /* Support resetting WARN*_ONCE state */ static int clear_warn_once_set(void *data, u64 val) From d9cdb43189ef9aa4f8a12b00e86875544942fa6a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:56 +0200 Subject: [PATCH 14/72] panic: make function declarations visible A few panic() related functions have a global definition but not declaration, which causes a warning with W=1: kernel/panic.c:710:6: error: no previous prototype for '__warn_printk' [-Werror=missing-prototypes] kernel/panic.c:756:24: error: no previous prototype for '__stack_chk_fail' [-Werror=missing-prototypes] kernel/exit.c:1917:32: error: no previous prototype for 'abort' [-Werror=missing-prototypes] __warn_printk() is called both as a global function when CONFIG_BUG is enabled, and as a local function in other configs. The other two here are called indirectly from generated or assembler code. Add prototypes for all of these. Link: https://lkml.kernel.org/r/20230517131102.934196-9-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- include/asm-generic/bug.h | 5 +++-- include/linux/panic.h | 3 +++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h index 4050b191e1a9..6e794420bd39 100644 --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -87,10 +87,12 @@ struct bug_entry { * * Use the versions with printk format strings to provide better diagnostics. */ -#ifndef __WARN_FLAGS extern __printf(4, 5) void warn_slowpath_fmt(const char *file, const int line, unsigned taint, const char *fmt, ...); +extern __printf(1, 2) void __warn_printk(const char *fmt, ...); + +#ifndef __WARN_FLAGS #define __WARN() __WARN_printf(TAINT_WARN, NULL) #define __WARN_printf(taint, arg...) do { \ instrumentation_begin(); \ @@ -98,7 +100,6 @@ void warn_slowpath_fmt(const char *file, const int line, unsigned taint, instrumentation_end(); \ } while (0) #else -extern __printf(1, 2) void __warn_printk(const char *fmt, ...); #define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN)) #define __WARN_printf(taint, arg...) do { \ instrumentation_begin(); \ diff --git a/include/linux/panic.h b/include/linux/panic.h index 979b776e3bcb..6717b15e798c 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -32,6 +32,9 @@ extern int sysctl_panic_on_stackoverflow; extern bool crash_kexec_post_notifiers; +extern void __stack_chk_fail(void); +void abort(void); + /* * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It * holds a CPU number which is executing panic() currently. A value of From 23108f6aac4c86f822db59b89d64da0e70bad5fe Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:57 +0200 Subject: [PATCH 15/72] kunit: include debugfs header file An extra #include statement is needed to ensure the prototypes for debugfs interfaces are visible, avoiding this warning: lib/kunit/debugfs.c:28:6: error: no previous prototype for 'kunit_debugfs_cleanup' [-Werror=missing-prototypes] lib/kunit/debugfs.c:33:6: error: no previous prototype for 'kunit_debugfs_init' [-Werror=missing-prototypes] lib/kunit/debugfs.c:102:6: error: no previous prototype for 'kunit_debugfs_create_suite' [-Werror=missing-prototypes] lib/kunit/debugfs.c:118:6: error: no previous prototype for 'kunit_debugfs_destroy_suite' [-Werror=missing-prototypes] Link: https://lkml.kernel.org/r/20230517131102.934196-10-arnd@kernel.org Signed-off-by: Arnd Bergmann Reviewed-by: David Gow Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- lib/kunit/debugfs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c index b08bb1fba106..22c5c496a68f 100644 --- a/lib/kunit/debugfs.c +++ b/lib/kunit/debugfs.c @@ -10,6 +10,7 @@ #include #include "string-stream.h" +#include "debugfs.h" #define KUNIT_DEBUGFS_ROOT "kunit" #define KUNIT_DEBUGFS_RESULTS "results" From ad1a48301f659a02df5bff0a121d4a5c0411d36b Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:10:59 +0200 Subject: [PATCH 16/72] init: consolidate prototypes in linux/init.h The init/main.c file contains some extern declarations for functions defined in architecture code, and it defines some other functions that are called from architecture code with a custom prototype. Both of those result in warnings with 'make W=1': init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes] init/main.c:790:20: error: no previous prototype for 'mem_encrypt_init' [-Werror=missing-prototypes] init/main.c:792:20: error: no previous prototype for 'poking_init' [-Werror=missing-prototypes] arch/arm64/kernel/irq.c:122:13: error: no previous prototype for 'init_IRQ' [-Werror=missing-prototypes] arch/arm64/kernel/time.c:55:13: error: no previous prototype for 'time_init' [-Werror=missing-prototypes] arch/x86/kernel/process.c:935:13: error: no previous prototype for 'arch_post_acpi_subsys_init' [-Werror=missing-prototypes] init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes] kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes] Add prototypes for all of these in include/linux/init.h or another appropriate header, and remove the duplicate declarations from architecture specific code. [sfr@canb.auug.org.au: declare time_init_early()] Link: https://lkml.kernel.org/r/20230519124311.5167221c@canb.auug.org.au Link: https://lkml.kernel.org/r/20230517131102.934196-12-arnd@kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Stephen Rothwell Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/arm/include/asm/irq.h | 1 - arch/microblaze/include/asm/setup.h | 2 -- arch/mips/include/asm/irq.h | 1 - arch/parisc/kernel/smp.c | 1 - arch/powerpc/include/asm/irq.h | 1 - arch/riscv/include/asm/irq.h | 2 -- arch/riscv/include/asm/timex.h | 2 -- arch/s390/kernel/entry.h | 2 -- arch/sh/include/asm/irq.h | 1 - arch/sh/include/asm/rtc.h | 2 -- arch/sparc/include/asm/irq_32.h | 1 - arch/sparc/include/asm/irq_64.h | 1 - arch/sparc/include/asm/timer_64.h | 1 - arch/sparc/kernel/kernel.h | 1 - arch/x86/include/asm/irq.h | 2 -- arch/x86/include/asm/mem_encrypt.h | 3 --- arch/x86/include/asm/time.h | 1 - arch/x86/include/asm/tsc.h | 1 - include/linux/acpi.h | 3 ++- include/linux/delay.h | 1 + include/linux/init.h | 20 ++++++++++++++++++++ init/main.c | 18 ------------------ 22 files changed, 23 insertions(+), 45 deletions(-) diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h index a7c2337b0c7d..18605f1b3580 100644 --- a/arch/arm/include/asm/irq.h +++ b/arch/arm/include/asm/irq.h @@ -27,7 +27,6 @@ struct irqaction; struct pt_regs; void handle_IRQ(unsigned int, struct pt_regs *); -void init_IRQ(void); #ifdef CONFIG_SMP #include diff --git a/arch/microblaze/include/asm/setup.h b/arch/microblaze/include/asm/setup.h index a06cc1f97aa9..3657f5e78a3d 100644 --- a/arch/microblaze/include/asm/setup.h +++ b/arch/microblaze/include/asm/setup.h @@ -16,8 +16,6 @@ extern char *klimit; extern void mmu_reset(void); -void time_init(void); -void init_IRQ(void); void machine_early_init(const char *cmdline, unsigned int ram, unsigned int fdt, unsigned int msr, unsigned int tlb0, unsigned int tlb1); diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h index 44f9824c1d8c..75abfa834ab7 100644 --- a/arch/mips/include/asm/irq.h +++ b/arch/mips/include/asm/irq.h @@ -19,7 +19,6 @@ #define IRQ_STACK_SIZE THREAD_SIZE #define IRQ_STACK_START (IRQ_STACK_SIZE - 16) -extern void __init init_IRQ(void); extern void *irq_stack[NR_CPUS]; /* diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c index b7fc859fa87d..83348125b524 100644 --- a/arch/parisc/kernel/smp.c +++ b/arch/parisc/kernel/smp.c @@ -271,7 +271,6 @@ void arch_send_call_function_single_ipi(int cpu) static void smp_cpu_init(int cpunum) { - extern void init_IRQ(void); /* arch/parisc/kernel/irq.c */ extern void start_cpu_itimer(void); /* arch/parisc/kernel/time.c */ /* Set modes and Enable floating point coprocessor */ diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index deadd2149426..94dffa1dd223 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -50,7 +50,6 @@ extern void *hardirq_ctx[NR_CPUS]; extern void *softirq_ctx[NR_CPUS]; void __do_IRQ(struct pt_regs *regs); -extern void __init init_IRQ(void); int irq_choose_cpu(const struct cpumask *mask); diff --git a/arch/riscv/include/asm/irq.h b/arch/riscv/include/asm/irq.h index 43b9ebfbd943..8e10a94430a2 100644 --- a/arch/riscv/include/asm/irq.h +++ b/arch/riscv/include/asm/irq.h @@ -16,6 +16,4 @@ void riscv_set_intc_hwnode_fn(struct fwnode_handle *(*fn)(void)); struct fwnode_handle *riscv_get_intc_hwnode(void); -extern void __init init_IRQ(void); - #endif /* _ASM_RISCV_IRQ_H */ diff --git a/arch/riscv/include/asm/timex.h b/arch/riscv/include/asm/timex.h index d6a7428f6248..a06697846e69 100644 --- a/arch/riscv/include/asm/timex.h +++ b/arch/riscv/include/asm/timex.h @@ -88,6 +88,4 @@ static inline int read_current_timer(unsigned long *timer_val) return 0; } -extern void time_init(void); - #endif /* _ASM_RISCV_TIMEX_H */ diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h index 34674e38826b..9f41853f36b9 100644 --- a/arch/s390/kernel/entry.h +++ b/arch/s390/kernel/entry.h @@ -34,14 +34,12 @@ void kernel_stack_overflow(struct pt_regs * regs); void handle_signal32(struct ksignal *ksig, sigset_t *oldset, struct pt_regs *regs); -void __init init_IRQ(void); void do_io_irq(struct pt_regs *regs); void do_ext_irq(struct pt_regs *regs); void do_restart(void *arg); void __init startup_init(void); void die(struct pt_regs *regs, const char *str); int setup_profiling_timer(unsigned int multiplier); -void __init time_init(void); unsigned long prepare_ftrace_return(unsigned long parent, unsigned long sp, unsigned long ip); struct s390_mmap_arg_struct; diff --git a/arch/sh/include/asm/irq.h b/arch/sh/include/asm/irq.h index 1c4923502fd4..0f384b1f45ca 100644 --- a/arch/sh/include/asm/irq.h +++ b/arch/sh/include/asm/irq.h @@ -22,7 +22,6 @@ extern unsigned short *irq_mask_register; /* * PINT IRQs */ -void init_IRQ_pint(void); void make_imask_irq(unsigned int irq); static inline int generic_irq_demux(int irq) diff --git a/arch/sh/include/asm/rtc.h b/arch/sh/include/asm/rtc.h index 69dbae2949b0..7fe7002d1d50 100644 --- a/arch/sh/include/asm/rtc.h +++ b/arch/sh/include/asm/rtc.h @@ -2,8 +2,6 @@ #ifndef _ASM_RTC_H #define _ASM_RTC_H -void time_init(void); - #define RTC_CAP_4_DIGIT_YEAR (1 << 0) struct sh_rtc_platform_info { diff --git a/arch/sparc/include/asm/irq_32.h b/arch/sparc/include/asm/irq_32.h index 43ec2609b811..6ee48321cbc2 100644 --- a/arch/sparc/include/asm/irq_32.h +++ b/arch/sparc/include/asm/irq_32.h @@ -17,7 +17,6 @@ #define irq_canonicalize(irq) (irq) -void __init init_IRQ(void); void __init sun4d_init_sbi_irq(void); #define NO_IRQ 0xffffffff diff --git a/arch/sparc/include/asm/irq_64.h b/arch/sparc/include/asm/irq_64.h index 154df2cf19f4..b436029f1ced 100644 --- a/arch/sparc/include/asm/irq_64.h +++ b/arch/sparc/include/asm/irq_64.h @@ -61,7 +61,6 @@ void sun4u_destroy_msi(unsigned int irq); unsigned int irq_alloc(unsigned int dev_handle, unsigned int dev_ino); void irq_free(unsigned int irq); -void __init init_IRQ(void); void fixup_irqs(void); static inline void set_softint(unsigned long bits) diff --git a/arch/sparc/include/asm/timer_64.h b/arch/sparc/include/asm/timer_64.h index dcfad4613e18..ffff52c8b760 100644 --- a/arch/sparc/include/asm/timer_64.h +++ b/arch/sparc/include/asm/timer_64.h @@ -34,7 +34,6 @@ extern struct sparc64_tick_ops *tick_ops; unsigned long sparc64_get_clock_tick(unsigned int cpu); void setup_sparc64_timer(void); -void __init time_init(void); #define TICK_PRIV_BIT BIT(63) #define TICKCMP_IRQ_BIT BIT(63) diff --git a/arch/sparc/kernel/kernel.h b/arch/sparc/kernel/kernel.h index 9cd09a3ef35f..15da3c0597a5 100644 --- a/arch/sparc/kernel/kernel.h +++ b/arch/sparc/kernel/kernel.h @@ -91,7 +91,6 @@ extern int static_irq_count; extern spinlock_t irq_action_lock; void unexpected_irq(int irq, void *dev_id, struct pt_regs * regs); -void init_IRQ(void); /* sun4m_irq.c */ void sun4m_init_IRQ(void); diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h index 768aa234cbb4..29e083b92813 100644 --- a/arch/x86/include/asm/irq.h +++ b/arch/x86/include/asm/irq.h @@ -40,8 +40,6 @@ extern void __handle_irq(struct irq_desc *desc, struct pt_regs *regs); extern void init_ISA_irqs(void); -extern void __init init_IRQ(void); - #ifdef CONFIG_X86_LOCAL_APIC void arch_trigger_cpumask_backtrace(const struct cpumask *mask, bool exclude_self); diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h index b7126701574c..1c22c86d13e6 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -87,9 +87,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { } #endif /* CONFIG_AMD_MEM_ENCRYPT */ -/* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void); - void add_encrypt_protection_map(void); /* diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h index a53961c64a56..f360104ed172 100644 --- a/arch/x86/include/asm/time.h +++ b/arch/x86/include/asm/time.h @@ -6,7 +6,6 @@ #include extern void hpet_time_init(void); -extern void time_init(void); extern bool pit_timer_init(void); extern bool tsc_clocksource_watchdog_disabled(void); diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index fbdc3d951494..1992ef5e41a9 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -32,7 +32,6 @@ extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns); extern void tsc_early_init(void); extern void tsc_init(void); -extern unsigned long calibrate_delay_is_known(void); extern void mark_tsc_unstable(char *reason); extern int unsynchronized_tsc(void); extern int check_tsc_unstable(void); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 7b71dd74baeb..f4c2a87d02c1 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -712,7 +712,6 @@ int acpi_match_platform_list(const struct acpi_platform_list *plat); extern void acpi_early_init(void); extern void acpi_subsystem_init(void); -extern void arch_post_acpi_subsys_init(void); extern int acpi_nvs_register(__u64 start, __u64 size); @@ -1084,6 +1083,8 @@ static inline bool acpi_sleep_state_supported(u8 sleep_state) #endif /* !CONFIG_ACPI */ +extern void arch_post_acpi_subsys_init(void); + #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC int acpi_ioapic_add(acpi_handle root); #else diff --git a/include/linux/delay.h b/include/linux/delay.h index 039e7e0c7378..ff9cda975e30 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -56,6 +56,7 @@ static inline void ndelay(unsigned long x) extern unsigned long lpj_fine; void calibrate_delay(void); +unsigned long calibrate_delay_is_known(void); void __attribute__((weak)) calibration_delay_done(void); void msleep(unsigned int msecs); unsigned long msleep_interruptible(unsigned int msecs); diff --git a/include/linux/init.h b/include/linux/init.h index c5fe6d26f5b1..1200fa99e848 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -152,6 +152,24 @@ extern unsigned int reset_devices; void setup_arch(char **); void prepare_namespace(void); void __init init_rootfs(void); + +void init_IRQ(void); +void time_init(void); +void mem_encrypt_init(void); +void poking_init(void); +void pgtable_cache_init(void); + +extern initcall_entry_t __initcall_start[]; +extern initcall_entry_t __initcall0_start[]; +extern initcall_entry_t __initcall1_start[]; +extern initcall_entry_t __initcall2_start[]; +extern initcall_entry_t __initcall3_start[]; +extern initcall_entry_t __initcall4_start[]; +extern initcall_entry_t __initcall5_start[]; +extern initcall_entry_t __initcall6_start[]; +extern initcall_entry_t __initcall7_start[]; +extern initcall_entry_t __initcall_end[]; + extern struct file_system_type rootfs_fs_type; #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX) @@ -309,6 +327,8 @@ struct obs_kernel_param { int early; }; +extern const struct obs_kernel_param __setup_start[], __setup_end[]; + /* * Only for really core code. See moduleparam.h for the normal way. * diff --git a/init/main.c b/init/main.c index af50044deed5..d4400efbef0a 100644 --- a/init/main.c +++ b/init/main.c @@ -115,10 +115,6 @@ static int kernel_init(void *); -extern void init_IRQ(void); -extern void radix_tree_init(void); -extern void maple_tree_init(void); - /* * Debug helper: via this flag we know that we are in 'early bootup code' * where only the boot processor is running with IRQ disabled. This means @@ -137,7 +133,6 @@ EXPORT_SYMBOL(system_state); #define MAX_INIT_ARGS CONFIG_INIT_ENV_ARG_LIMIT #define MAX_INIT_ENVS CONFIG_INIT_ENV_ARG_LIMIT -extern void time_init(void); /* Default late time init is NULL. archs can override this later. */ void (*__initdata late_time_init)(void); @@ -196,8 +191,6 @@ static const char *argv_init[MAX_INIT_ARGS+2] = { "init", NULL, }; const char *envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, }; static const char *panic_later, *panic_param; -extern const struct obs_kernel_param __setup_start[], __setup_end[]; - static bool __init obsolete_checksetup(char *line) { const struct obs_kernel_param *p; @@ -1263,17 +1256,6 @@ int __init_or_module do_one_initcall(initcall_t fn) } -extern initcall_entry_t __initcall_start[]; -extern initcall_entry_t __initcall0_start[]; -extern initcall_entry_t __initcall1_start[]; -extern initcall_entry_t __initcall2_start[]; -extern initcall_entry_t __initcall3_start[]; -extern initcall_entry_t __initcall4_start[]; -extern initcall_entry_t __initcall5_start[]; -extern initcall_entry_t __initcall6_start[]; -extern initcall_entry_t __initcall7_start[]; -extern initcall_entry_t __initcall_end[]; - static initcall_entry_t *initcall_levels[] __initdata = { __initcall0_start, __initcall1_start, From 73648e6fa79a832a6a50e87d4a9e9da416f82aaa Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:11:00 +0200 Subject: [PATCH 17/72] init: move cifs_root_data() prototype into linux/mount.h cifs_root_data() is defined in cifs and called from early init code, but lacks a global prototype: fs/cifs/cifsroot.c:83:12: error: no previous prototype for 'cifs_root_data' Move the declaration from do_mounts.c into an appropriate header. Link: https://lkml.kernel.org/r/20230517131102.934196-13-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mount.h | 2 ++ init/do_mounts.c | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mount.h b/include/linux/mount.h index 1ea326c368f7..f381eb44b24c 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -124,4 +124,6 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *, struct vfsmount *); extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num); +extern int cifs_root_data(char **dev, char **opts); + #endif /* _LINUX_MOUNT_H */ diff --git a/init/do_mounts.c b/init/do_mounts.c index 811e94daf0a8..83447c46ad6d 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -489,8 +489,6 @@ static int __init mount_nfs_root(void) #ifdef CONFIG_CIFS_ROOT -extern int cifs_root_data(char **dev, char **opts); - #define CIFSROOT_TIMEOUT_MIN 5 #define CIFSROOT_TIMEOUT_MAX 30 #define CIFSROOT_RETRY_MAX 5 From af0a76e1269516d940214be48255669b0b5ff40b Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:11:01 +0200 Subject: [PATCH 18/72] thread_info: move function declarations to linux/thread_info.h There are a few __weak functions in kernel/fork.c, which architectures can override. If there is no prototype, the compiler warns about them: kernel/fork.c:164:13: error: no previous prototype for 'arch_release_task_struct' [-Werror=missing-prototypes] kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes] kernel/fork.c:1086:12: error: no previous prototype for 'arch_dup_task_struct' [-Werror=missing-prototypes] There are already prototypes in a number of architecture specific headers that have addressed those warnings before, but it's much better to have these in a single place so the warning no longer shows up anywhere. Link: https://lkml.kernel.org/r/20230517131102.934196-14-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/arm64/include/asm/thread_info.h | 4 ---- arch/s390/include/asm/thread_info.h | 3 --- arch/sh/include/asm/thread_info.h | 3 --- arch/x86/include/asm/thread_info.h | 3 --- include/linux/thread_info.h | 5 +++++ 5 files changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 848739c15de8..553d1bc559c6 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -55,10 +55,6 @@ struct thread_info { void arch_setup_new_exec(void); #define arch_setup_new_exec arch_setup_new_exec -void arch_release_task_struct(struct task_struct *tsk); -int arch_dup_task_struct(struct task_struct *dst, - struct task_struct *src); - #endif #define TIF_SIGPENDING 0 /* signal pending */ diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index c7c97921ed8d..a674c7d25da5 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -52,9 +52,6 @@ struct thread_info { struct task_struct; -void arch_release_task_struct(struct task_struct *tsk); -int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); - void arch_setup_new_exec(void); #define arch_setup_new_exec arch_setup_new_exec diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h index 1400fbb8b423..9f19a682d315 100644 --- a/arch/sh/include/asm/thread_info.h +++ b/arch/sh/include/asm/thread_info.h @@ -84,9 +84,6 @@ static inline struct thread_info *current_thread_info(void) #define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT) -extern void arch_task_cache_init(void); -extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); -extern void arch_release_task_struct(struct task_struct *tsk); extern void init_thread_xstate(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index f1cccba52eb9..d63b02940747 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -232,9 +232,6 @@ static inline int arch_within_stack_frames(const void * const stack, current_thread_info()->status & TS_COMPAT) #endif -extern void arch_task_cache_init(void); -extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); -extern void arch_release_task_struct(struct task_struct *tsk); extern void arch_setup_new_exec(void); #define arch_setup_new_exec arch_setup_new_exec #endif /* !__ASSEMBLY__ */ diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index c02646884fa8..9ea0b28068f4 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -256,6 +256,11 @@ check_copy_size(const void *addr, size_t bytes, bool is_source) static inline void arch_setup_new_exec(void) { } #endif +void arch_task_cache_init(void); /* for CONFIG_SH */ +void arch_release_task_struct(struct task_struct *tsk); +int arch_dup_task_struct(struct task_struct *dst, + struct task_struct *src); + #endif /* __KERNEL__ */ #endif /* _LINUX_THREAD_INFO_H */ From 3403bb4ea5983b4e84f404f91322a9a0cc75d700 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:11:02 +0200 Subject: [PATCH 19/72] time_namespace: always provide arch_get_vdso_data() prototype for vdso The arch_get_vdso_data() function is defined separately on each architecture, but only called when CONFIG_TIME_NS is set. If the definition is a global function, this causes a W=1 warning without TIME_NS: arch/x86/entry/vdso/vma.c:35:19: error: no previous prototype for 'arch_get_vdso_data' [-Werror=missing-prototypes] Move the prototype out of the #ifdef block to reliably turn off that warning. Link: https://lkml.kernel.org/r/20230517131102.934196-15-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Boqun Feng Cc: Catalin Marinas Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Paris Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Michael Ellerman Cc: Michal Simek Cc: Palmer Dabbelt Cc: Paul Moore Cc: Pavel Machek Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Russell King Cc: Tejun Heo Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/time_namespace.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h index bb9d3f5542f8..03d9c5ac01d1 100644 --- a/include/linux/time_namespace.h +++ b/include/linux/time_namespace.h @@ -44,7 +44,6 @@ struct time_namespace *copy_time_ns(unsigned long flags, struct time_namespace *old_ns); void free_time_ns(struct time_namespace *ns); void timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk); -struct vdso_data *arch_get_vdso_data(void *vvar_page); struct page *find_timens_vvar_page(struct vm_area_struct *vma); static inline void put_time_ns(struct time_namespace *ns) @@ -163,4 +162,6 @@ static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim) } #endif +struct vdso_data *arch_get_vdso_data(void *vvar_page); + #endif /* _LINUX_TIMENS_H */ From e0ddec73fd4822d2ffe914d5ce3e2718f985276a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 14:49:25 +0200 Subject: [PATCH 20/72] kcov: add prototypes for helper functions A number of internal functions in kcov are only called from generated code and don't technically need a declaration, but 'make W=1' warns about global symbols without a prototype: kernel/kcov.c:199:14: error: no previous prototype for '__sanitizer_cov_trace_pc' [-Werror=missing-prototypes] kernel/kcov.c:264:14: error: no previous prototype for '__sanitizer_cov_trace_cmp1' [-Werror=missing-prototypes] kernel/kcov.c:270:14: error: no previous prototype for '__sanitizer_cov_trace_cmp2' [-Werror=missing-prototypes] kernel/kcov.c:276:14: error: no previous prototype for '__sanitizer_cov_trace_cmp4' [-Werror=missing-prototypes] kernel/kcov.c:282:14: error: no previous prototype for '__sanitizer_cov_trace_cmp8' [-Werror=missing-prototypes] kernel/kcov.c:288:14: error: no previous prototype for '__sanitizer_cov_trace_const_cmp1' [-Werror=missing-prototypes] kernel/kcov.c:295:14: error: no previous prototype for '__sanitizer_cov_trace_const_cmp2' [-Werror=missing-prototypes] kernel/kcov.c:302:14: error: no previous prototype for '__sanitizer_cov_trace_const_cmp4' [-Werror=missing-prototypes] kernel/kcov.c:309:14: error: no previous prototype for '__sanitizer_cov_trace_const_cmp8' [-Werror=missing-prototypes] kernel/kcov.c:316:14: error: no previous prototype for '__sanitizer_cov_trace_switch' [-Werror=missing-prototypes] Adding prototypes for these in a header solves that problem, but now there is a mismatch between the built-in type and the prototype on 64-bit architectures because they expect some functions to take a 64-bit 'unsigned long' argument rather than an 'unsigned long long' u64 type: include/linux/kcov.h:84:6: error: conflicting types for built-in function '__sanitizer_cov_trace_switch'; expected 'void(long long unsigned int, void *)' [-Werror=builtin-declaration-mismatch] 84 | void __sanitizer_cov_trace_switch(u64 val, u64 *cases); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avoid this as well with a custom type definition. Link: https://lkml.kernel.org/r/20230517124944.929997-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Andrey Konovalov Cc: Dmitry Vyukov Cc: Rong Tao Signed-off-by: Andrew Morton --- include/linux/kcov.h | 17 +++++++++++++++++ kernel/kcov.c | 7 ++++--- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/linux/kcov.h b/include/linux/kcov.h index ee04256f28af..b851ba415e03 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -72,6 +72,23 @@ static inline void kcov_remote_stop_softirq(void) kcov_remote_stop(); } +#ifdef CONFIG_64BIT +typedef unsigned long kcov_u64; +#else +typedef unsigned long long kcov_u64; +#endif + +void __sanitizer_cov_trace_pc(void); +void __sanitizer_cov_trace_cmp1(u8 arg1, u8 arg2); +void __sanitizer_cov_trace_cmp2(u16 arg1, u16 arg2); +void __sanitizer_cov_trace_cmp4(u32 arg1, u32 arg2); +void __sanitizer_cov_trace_cmp8(kcov_u64 arg1, kcov_u64 arg2); +void __sanitizer_cov_trace_const_cmp1(u8 arg1, u8 arg2); +void __sanitizer_cov_trace_const_cmp2(u16 arg1, u16 arg2); +void __sanitizer_cov_trace_const_cmp4(u32 arg1, u32 arg2); +void __sanitizer_cov_trace_const_cmp8(kcov_u64 arg1, kcov_u64 arg2); +void __sanitizer_cov_trace_switch(kcov_u64 val, void *cases); + #else static inline void kcov_task_init(struct task_struct *t) {} diff --git a/kernel/kcov.c b/kernel/kcov.c index 84c717337df0..f9ac2e9e460f 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -279,7 +279,7 @@ void notrace __sanitizer_cov_trace_cmp4(u32 arg1, u32 arg2) } EXPORT_SYMBOL(__sanitizer_cov_trace_cmp4); -void notrace __sanitizer_cov_trace_cmp8(u64 arg1, u64 arg2) +void notrace __sanitizer_cov_trace_cmp8(kcov_u64 arg1, kcov_u64 arg2) { write_comp_data(KCOV_CMP_SIZE(3), arg1, arg2, _RET_IP_); } @@ -306,16 +306,17 @@ void notrace __sanitizer_cov_trace_const_cmp4(u32 arg1, u32 arg2) } EXPORT_SYMBOL(__sanitizer_cov_trace_const_cmp4); -void notrace __sanitizer_cov_trace_const_cmp8(u64 arg1, u64 arg2) +void notrace __sanitizer_cov_trace_const_cmp8(kcov_u64 arg1, kcov_u64 arg2) { write_comp_data(KCOV_CMP_SIZE(3) | KCOV_CMP_CONST, arg1, arg2, _RET_IP_); } EXPORT_SYMBOL(__sanitizer_cov_trace_const_cmp8); -void notrace __sanitizer_cov_trace_switch(u64 val, u64 *cases) +void notrace __sanitizer_cov_trace_switch(kcov_u64 val, void *arg) { u64 i; + u64 *cases = arg; u64 count = cases[0]; u64 size = cases[1]; u64 type = KCOV_CMP_CONST; From 6aee6723f1b3a4453b327fdafcde362e9befcc11 Mon Sep 17 00:00:00 2001 From: Angus Chen Date: Thu, 18 May 2023 11:53:21 +0800 Subject: [PATCH 21/72] init: add bdev fs printk if mount_block_root failed Booting with the QEMU command line: "qemu-system-x86_64 -append root=/dev/vda rootfstype=ext4 ..." will panic if ext4 is not builtin and a request to load the ext4 module fails. [ 1.729006] VFS: Cannot open root device "vda" or unknown-block(253,0): error -19 [ 1.730603] Please append a correct "root=" boot option; here are the available partitions: [ 1.732323] fd00 256000 vda [ 1.732329] driver: virtio_blk [ 1.734194] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(253,0) [ 1.734771] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-rc2+ #53 [ 1.735194] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1.735772] Call Trace: [ 1.735950] [ 1.736113] dump_stack_lvl+0x32/0x50 [ 1.736367] panic+0x108/0x310 [ 1.736570] mount_block_root+0x161/0x310 [ 1.736849] ? rdinit_setup+0x40/0x40 [ 1.737088] prepare_namespace+0x10c/0x180 [ 1.737393] kernel_init_freeable+0x354/0x450 [ 1.737707] ? rest_init+0xd0/0xd0 [ 1.737945] kernel_init+0x16/0x130 [ 1.738196] ret_from_fork+0x1f/0x30 As a hint, print all the bdev fstypes which are available. [akpm@linux-foundation.org: fix spelling in printk message] Link: https://lkml.kernel.org/r/20230518035321.1672-1-angus.chen@jaguarmicro.com Signed-off-by: Angus Chen Acked-by: Nick Desaulniers Cc: Masahiro Yamada Cc: Mike Rapoport (IBM) Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Vlastimil Babka Cc: Al Viro Signed-off-by: Andrew Morton --- init/do_mounts.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/init/do_mounts.c b/init/do_mounts.c index 83447c46ad6d..9b66667eeea6 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -427,8 +427,19 @@ retry: printk("VFS: Cannot open root device \"%s\" or %s: error %d\n", root_device_name, b, err); printk("Please append a correct \"root=\" boot option; here are the available partitions:\n"); - printk_all_partitions(); + + if (root_fs_names) + num_fs = list_bdev_fs_names(fs_names, PAGE_SIZE); + if (!num_fs) + pr_err("Can't find any bdev filesystem to be used for mount!\n"); + else { + pr_err("List of all bdev filesystems:\n"); + for (i = 0, p = fs_names; i < num_fs; i++, p += strlen(p)+1) + pr_err(" %s", p); + pr_err("\n"); + } + panic("VFS: Unable to mount root fs on %s", b); } if (!(flags & SB_RDONLY)) { From 004444486184a7222a7b4bde0c568f6da3621b25 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 17 May 2023 15:19:31 +0200 Subject: [PATCH 22/72] decompressor: provide missing prototypes The entry points for the decompressor don't always have a prototype included in the .c file: lib/decompress_inflate.c:42:17: error: no previous prototype for '__gunzip' [-Werror=missing-prototypes] lib/decompress_unxz.c:251:17: error: no previous prototype for 'unxz' [-Werror=missing-prototypes] lib/decompress_unzstd.c:331:17: error: no previous prototype for 'unzstd' [-Werror=missing-prototypes] Include the correct headers for unxz and unzstd, and mark the inflate function above as unconditionally 'static' to avoid these warnings. Link: https://lkml.kernel.org/r/20230517131936.936840-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Nick Terrell Signed-off-by: Andrew Morton --- lib/decompress_inflate.c | 2 +- lib/decompress_unxz.c | 2 ++ lib/decompress_unzstd.c | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c index 6130c42b8e59..e19199f4a684 100644 --- a/lib/decompress_inflate.c +++ b/lib/decompress_inflate.c @@ -39,7 +39,7 @@ static long INIT nofill(void *buffer, unsigned long len) } /* Included from initramfs et al code */ -STATIC int INIT __gunzip(unsigned char *buf, long len, +static int INIT __gunzip(unsigned char *buf, long len, long (*fill)(void*, unsigned long), long (*flush)(void*, unsigned long), unsigned char *out_buf, long out_len, diff --git a/lib/decompress_unxz.c b/lib/decompress_unxz.c index 9f4262ee33a5..353268b9f129 100644 --- a/lib/decompress_unxz.c +++ b/lib/decompress_unxz.c @@ -102,6 +102,8 @@ */ #ifdef STATIC # define XZ_PREBOOT +#else +#include #endif #ifdef __KERNEL__ # include diff --git a/lib/decompress_unzstd.c b/lib/decompress_unzstd.c index a512b99ae16a..bba2c0bb10cb 100644 --- a/lib/decompress_unzstd.c +++ b/lib/decompress_unzstd.c @@ -69,6 +69,8 @@ # define UNZSTD_PREBOOT # include "xxhash.c" # include "zstd/decompress_sources.h" +#else +#include #endif #include From 5e008df11c55228a86a1bae692cc2002503572c9 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:25 -0700 Subject: [PATCH 23/72] watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5. This patch series adds the "buddy" hardlockup detector. In brief, the buddy hardlockup detector can detect hardlockups without arch-level support by having CPUs checkup on a "buddy" CPU periodically. Given the new design of this patch series, testing all combinations is fairly difficult. I've attempted to make sure that all combinations of CONFIG_ options are good, but it wouldn't surprise me if I missed something. I apologize in advance and I'll do my best to fix any problems that are found. This patch (of 18): The real watchdog_update_hrtimer_threshold() is defined in kernel/watchdog_hld.c. That file is included if CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP. The dummy version of the function in "nmi.h" didn't get that quite right. While this doesn't appear to be a huge deal, it's nice to make it consistent. It doesn't break builds because CHECK_TIMESTAMP is only defined by x86 so others don't get a double definition, and x86 uses perf lockup detector, so it gets the out of line version. Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.1.I8cbb2f4fa740528fcfade4f5439b6cdcdd059251@changeid Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Douglas Anderson Reviewed-by: Nicholas Piggin Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Cc: Colin Cross Signed-off-by: Andrew Morton --- include/linux/nmi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 048c0b9aa623..771d77b62bc1 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -197,7 +197,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh); #endif #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \ - defined(CONFIG_HARDLOCKUP_DETECTOR) + defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) void watchdog_update_hrtimer_threshold(u64 period); #else static inline void watchdog_update_hrtimer_threshold(u64 period) { } From 4379e59fe5665cfda737e45b8bf2f05321ef049c Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:26 -0700 Subject: [PATCH 24/72] watchdog/perf: more properly prevent false positives with turbo modes Currently, in the watchdog_overflow_callback() we first check to see if the watchdog had been touched and _then_ we handle the workaround for turbo mode. This order should be reversed. Specifically, "touching" the hardlockup detector's watchdog should avoid lockups being detected for one period that should be roughly the same regardless of whether we're running turbo or not. That means that we should do the extra accounting for turbo _before_ we look at (and clear) the global indicating that we've been touched. NOTE: this fix is made based on code inspection. I am not aware of any reports where the old code would have generated false positives. That being said, this order seems more correct and also makes it easier down the line to share code with the "buddy" hardlockup detector. Link: https://lkml.kernel.org/r/20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/watchdog_hld.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 247bf0b1582c..1e8a49dc956e 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -114,14 +114,14 @@ static void watchdog_overflow_callback(struct perf_event *event, /* Ensure the watchdog never gets throttled */ event->hw.interrupts = 0; + if (!watchdog_check_timestamp()) + return; + if (__this_cpu_read(watchdog_nmi_touch) == true) { __this_cpu_write(watchdog_nmi_touch, false); return; } - if (!watchdog_check_timestamp()) - return; - /* check for a hardlockup * This is done by making sure our timer interrupt * is incrementing. The timer interrupt should have From 810b560e8985725dbd57bbb3f188c231365eb5ae Mon Sep 17 00:00:00 2001 From: Lecopzer Chen Date: Fri, 19 May 2023 10:18:27 -0700 Subject: [PATCH 25/72] watchdog: remove WATCHDOG_DEFAULT No reference to WATCHDOG_DEFAULT, remove it. Link: https://lkml.kernel.org/r/20230519101840.v5.3.I6a729209a1320e0ad212176e250ff945b8f91b2a@changeid Signed-off-by: Pingfan Liu Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/watchdog.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 8e61f21e7e33..582d572e1379 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -30,10 +30,8 @@ static DEFINE_MUTEX(watchdog_mutex); #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG) -# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED) # define NMI_WATCHDOG_DEFAULT 1 #else -# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED) # define NMI_WATCHDOG_DEFAULT 0 #endif From 730211182ed083898fa5feb4b28459ffac4c9615 Mon Sep 17 00:00:00 2001 From: Lecopzer Chen Date: Fri, 19 May 2023 10:18:28 -0700 Subject: [PATCH 26/72] watchdog/hardlockup: change watchdog_nmi_enable() to void Nobody cares about the return value of watchdog_nmi_enable(), changing its prototype to void. Link: https://lkml.kernel.org/r/20230519101840.v5.4.Ic3a19b592eb1ac4c6f6eade44ffd943e8637b6e5@changeid Signed-off-by: Pingfan Liu Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Acked-by: Nicholas Piggin Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/sparc/kernel/nmi.c | 8 +++----- include/linux/nmi.h | 2 +- kernel/watchdog.c | 3 +-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c index 060fff95a305..5dcf31f7e81f 100644 --- a/arch/sparc/kernel/nmi.c +++ b/arch/sparc/kernel/nmi.c @@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog); * sparc specific NMI watchdog enable function. * Enables watchdog if it is not enabled already. */ -int watchdog_nmi_enable(unsigned int cpu) +void watchdog_nmi_enable(unsigned int cpu) { if (atomic_read(&nmi_active) == -1) { pr_warn("NMI watchdog cannot be enabled or disabled\n"); - return -1; + return; } /* @@ -295,11 +295,9 @@ int watchdog_nmi_enable(unsigned int cpu) * process first. */ if (!nmi_init_done) - return 0; + return; smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1); - - return 0; } /* * sparc specific NMI watchdog disable function. diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 771d77b62bc1..454fe99c4874 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { return 0; } void watchdog_nmi_stop(void); void watchdog_nmi_start(void); int watchdog_nmi_probe(void); -int watchdog_nmi_enable(unsigned int cpu); +void watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); void lockup_detector_reconfigure(void); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 582d572e1379..c705a18b26bf 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -93,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); * softlockup watchdog start and stop. The arch must select the * SOFTLOCKUP_DETECTOR Kconfig. */ -int __weak watchdog_nmi_enable(unsigned int cpu) +void __weak watchdog_nmi_enable(unsigned int cpu) { hardlockup_detector_perf_enable(); - return 0; } void __weak watchdog_nmi_disable(unsigned int cpu) From 1fafaa7745eeeeffef7155ab5eeb8cc83d04874f Mon Sep 17 00:00:00 2001 From: Pingfan Liu Date: Fri, 19 May 2023 10:18:29 -0700 Subject: [PATCH 27/72] watchdog/perf: ensure CPU-bound context when creating hardlockup detector event hardlockup_detector_event_create() should create perf_event on the current CPU. Preemption could not get disabled because perf_event_create_kernel_counter() allocates memory. Instead, the CPU locality is achieved by processing the code in a per-CPU bound kthread. Add a check to prevent mistakes when calling the code in another code path. Link: https://lkml.kernel.org/r/20230519101840.v5.5.I654063e53782b11d53e736a8ad4897ffd207406a@changeid Signed-off-by: Pingfan Liu Co-developed-by: Lecopzer Chen Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/watchdog_hld.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 1e8a49dc956e..2125b09e09d7 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -165,10 +165,16 @@ static void watchdog_overflow_callback(struct perf_event *event, static int hardlockup_detector_event_create(void) { - unsigned int cpu = smp_processor_id(); + unsigned int cpu; struct perf_event_attr *wd_attr; struct perf_event *evt; + /* + * Preemption is not disabled because memory will be allocated. + * Ensure CPU-locality by calling this in per-CPU kthread. + */ + WARN_ON(!is_percpu_thread()); + cpu = raw_smp_processor_id(); wd_attr = &wd_hw_attr; wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh); From 8b5c59a92b5b37e5c65ec185810d67e3f30f5a2e Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:30 -0700 Subject: [PATCH 28/72] watchdog/hardlockup: add comments to touch_nmi_watchdog() In preparation for the buddy hardlockup detector, add comments to touch_nmi_watchdog() to make it obvious that it touches the configured hardlockup detector regardless of whether it's backed by an NMI. Also note that arch_touch_nmi_watchdog() may not be architecture-specific. Ideally, we'd like to rename these functions but that is a fairly disruptive change touching a lot of drivers. After discussion [1] the plan is to defer this until a good time. [1] https://lore.kernel.org/r/ZFy0TX1tfhlH8gxj@alley [akpm@linux-foundation.org: comment changes, per Petr] Link: https://lkml.kernel.org/r/ZGyONWPXpE1DcxA5@alley Link: https://lkml.kernel.org/r/20230519101840.v5.6.I4e47cbfa1bb2ebbcdb5ca16817aa2887f15dc82c@changeid Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 454fe99c4874..dd75e361a11f 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -125,15 +125,30 @@ void watchdog_nmi_disable(unsigned int cpu); void lockup_detector_reconfigure(void); /** - * touch_nmi_watchdog - restart NMI watchdog timeout. + * touch_nmi_watchdog - manually reset the hardlockup watchdog timeout. * - * If the architecture supports the NMI watchdog, touch_nmi_watchdog() - * may be used to reset the timeout - for code which intentionally - * disables interrupts for a long time. This call is stateless. + * If we support detecting hardlockups, touch_nmi_watchdog() may be + * used to pet the watchdog (reset the timeout) - for code which + * intentionally disables interrupts for a long time. This call is stateless. + * + * Though this function has "nmi" in the name, the hardlockup watchdog might + * not be backed by NMIs. This function will likely be renamed to + * touch_hardlockup_watchdog() in the future. */ static inline void touch_nmi_watchdog(void) { + /* + * Pass on to the hardlockup detector selected via CONFIG_. Note that + * the hardlockup detector may not be arch-specific nor using NMIs + * and the arch_touch_nmi_watchdog() function will likely be renamed + * in the future. + */ arch_touch_nmi_watchdog(); + + /* + * Touching the hardlock detector implicitly resets the + * softlockup detector too + */ touch_softlockup_watchdog(); } From 6ea0d04211a7715a926f5dca4aa1065614fa64da Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:31 -0700 Subject: [PATCH 29/72] watchdog/perf: rename watchdog_hld.c to watchdog_perf.c The code currently in "watchdog_hld.c" is for detecting hardlockups using perf, as evidenced by the line in the Makefile that only compiles this file if CONFIG_HARDLOCKUP_DETECTOR_PERF is defined. Rename the file to prepare for the buddy hardlockup detector, which doesn't use perf. It could be argued that the new name makes it less obvious that this is a hardlockup detector. While true, it's not hard to remember that the "perf" detector is always a hardlockup detector and it's nice not to have names that are too convoluted. Link: https://lkml.kernel.org/r/20230519101840.v5.7.Ice803cb078d0e15fb2cbf49132f096ee2bd4199d@changeid Signed-off-by: Douglas Anderson Acked-by: Nicholas Piggin Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/Makefile | 2 +- kernel/{watchdog_hld.c => watchdog_perf.c} | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) rename kernel/{watchdog_hld.c => watchdog_perf.c} (99%) diff --git a/kernel/Makefile b/kernel/Makefile index b69c95315480..7eb72033143c 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -91,7 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o -obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RELAY) += relay.o obj-$(CONFIG_SYSCTL) += utsname_sysctl.o diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_perf.c similarity index 99% rename from kernel/watchdog_hld.c rename to kernel/watchdog_perf.c index 2125b09e09d7..8b8015758ea5 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_perf.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Detect hard lockups on a system + * Detect hard lockups on a system using perf * * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc. * From 81972551df9d168a8183b786ff4de06008469c2e Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:32 -0700 Subject: [PATCH 30/72] watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c The perf hardlockup detector works by looking at interrupt counts and seeing if they change from run to run. The interrupt counts are managed by the common watchdog code via its watchdog_timer_fn(). Currently the API between the perf detector and the common code is a function: is_hardlockup(). When the hard lockup detector sees that function return true then it handles printing out debug info and inducing a panic if necessary. Let's change the API a little bit in preparation for the buddy hardlockup detector. The buddy hardlockup detector wants to print nearly the same debug info and have nearly the same panic behavior. That means we want to move all that code to the common file. For now, the code in the common file will only be there if the perf hardlockup detector is enabled, but eventually it will be selected by a common config. Right now, this _just_ moves the code from the perf detector file to the common file and changes the names. It doesn't make the changes that the buddy hardlockup detector will need and doesn't do any style cleanups. A future patch will do cleanup to make it more obvious what changed. With the above, we no longer have any callers of is_hardlockup() outside of the "watchdog.c" file, so we can remove it from the header, make it static, and move it to the same "#ifdef" block as our new watchdog_hardlockup_check(). While doing this, it can be noted that even if no hardlockup detectors were configured the existing code used to still have the code for counting/checking "hrtimer_interrupts" even if the perf hardlockup detector wasn't configured. We didn't need to do that, so move all the "hrtimer_interrupts" counting to only be there if the perf hardlockup detector is configured as well. This change is expected to be a no-op. Link: https://lkml.kernel.org/r/20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 5 ++- kernel/watchdog.c | 93 +++++++++++++++++++++++++++++++++--------- kernel/watchdog_perf.c | 42 +------------------ 3 files changed, 78 insertions(+), 62 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index dd75e361a11f..67921436974d 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -15,7 +15,6 @@ void lockup_detector_init(void); void lockup_detector_soft_poweroff(void); void lockup_detector_cleanup(void); -bool is_hardlockup(void); extern int watchdog_user_enabled; extern int nmi_watchdog_user_enabled; @@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +void watchdog_hardlockup_check(struct pt_regs *regs); +#endif + #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) # define NMI_WATCHDOG_SYSCTL_PERM 0644 #else diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c705a18b26bf..12ce37d76e7d 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #endif /* CONFIG_HARDLOCKUP_DETECTOR */ +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) + +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +static DEFINE_PER_CPU(bool, hard_watchdog_warn); +static unsigned long hardlockup_allcpu_dumped; + +static bool is_hardlockup(void) +{ + unsigned long hrint = __this_cpu_read(hrtimer_interrupts); + + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) + return true; + + __this_cpu_write(hrtimer_interrupts_saved, hrint); + return false; +} + +static void watchdog_hardlockup_kick(void) +{ + __this_cpu_inc(hrtimer_interrupts); +} + +void watchdog_hardlockup_check(struct pt_regs *regs) +{ + /* check for a hardlockup + * This is done by making sure our timer interrupt + * is incrementing. The timer interrupt should have + * fired multiple times before we overflow'd. If it hasn't + * then this is a good indication the cpu is stuck + */ + if (is_hardlockup()) { + int this_cpu = smp_processor_id(); + + /* only print hardlockups once */ + if (__this_cpu_read(hard_watchdog_warn) == true) + return; + + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", + this_cpu); + print_modules(); + print_irqtrace_events(current); + if (regs) + show_regs(regs); + else + dump_stack(); + + /* + * Perform all-CPU dump only once to avoid multiple hardlockups + * generating interleaving traces + */ + if (sysctl_hardlockup_all_cpu_backtrace && + !test_and_set_bit(0, &hardlockup_allcpu_dumped)) + trigger_allbutself_cpu_backtrace(); + + if (hardlockup_panic) + nmi_panic(regs, "Hard LOCKUP"); + + __this_cpu_write(hard_watchdog_warn, true); + return; + } + + __this_cpu_write(hard_watchdog_warn, false); + return; +} + +#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ + +static inline void watchdog_hardlockup_kick(void) { } + +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ + /* * These functions can be overridden if an architecture implements its * own hardlockup detector. @@ -176,8 +248,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); static DEFINE_PER_CPU(bool, softlockup_touch_sync); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); static unsigned long soft_lockup_nmi_warn; static int __init nowatchdog_setup(char *str) @@ -312,22 +382,6 @@ static int is_softlockup(unsigned long touch_ts, } /* watchdog detector functions */ -bool is_hardlockup(void) -{ - unsigned long hrint = __this_cpu_read(hrtimer_interrupts); - - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) - return true; - - __this_cpu_write(hrtimer_interrupts_saved, hrint); - return false; -} - -static void watchdog_interrupt_count(void) -{ - __this_cpu_inc(hrtimer_interrupts); -} - static DEFINE_PER_CPU(struct completion, softlockup_completion); static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work); @@ -358,8 +412,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) if (!watchdog_enabled) return HRTIMER_NORESTART; - /* kick the hardlockup detector */ - watchdog_interrupt_count(); + watchdog_hardlockup_kick(); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 8b8015758ea5..04415812d079 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -20,13 +20,11 @@ #include #include -static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); static DEFINE_PER_CPU(struct perf_event *, dead_event); static struct cpumask dead_events_mask; -static unsigned long hardlockup_allcpu_dumped; static atomic_t watchdog_cpus = ATOMIC_INIT(0); notrace void arch_touch_nmi_watchdog(void) @@ -122,45 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event, return; } - /* check for a hardlockup - * This is done by making sure our timer interrupt - * is incrementing. The timer interrupt should have - * fired multiple times before we overflow'd. If it hasn't - * then this is a good indication the cpu is stuck - */ - if (is_hardlockup()) { - int this_cpu = smp_processor_id(); - - /* only print hardlockups once */ - if (__this_cpu_read(hard_watchdog_warn) == true) - return; - - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", - this_cpu); - print_modules(); - print_irqtrace_events(current); - if (regs) - show_regs(regs); - else - dump_stack(); - - /* - * Perform all-CPU dump only once to avoid multiple hardlockups - * generating interleaving traces - */ - if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) - trigger_allbutself_cpu_backtrace(); - - if (hardlockup_panic) - nmi_panic(regs, "Hard LOCKUP"); - - __this_cpu_write(hard_watchdog_warn, true); - return; - } - - __this_cpu_write(hard_watchdog_warn, false); - return; + watchdog_hardlockup_check(regs); } static int hardlockup_detector_event_create(void) From 1610611aadc2241179e7090ba2f3fd0d763d9932 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:33 -0700 Subject: [PATCH 31/72] watchdog/hardlockup: style changes to watchdog_hardlockup_check() / is_hardlockup() These are tiny style changes: - Add a blank line before a "return". - Renames two globals to use the "watchdog_hardlockup" prefix. - Store processor id in "unsigned int" rather than "int". - Minor comment rewording. - Use "else" rather than extra returns since it seemed more symmetric. Link: https://lkml.kernel.org/r/20230519101840.v5.9.I818492c326b632560b09f20d2608455ecf9d3650@changeid Reviewed-by: Petr Mladek Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- kernel/watchdog.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 12ce37d76e7d..169e5dffbc00 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); -static DEFINE_PER_CPU(bool, hard_watchdog_warn); -static unsigned long hardlockup_allcpu_dumped; +static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned); +static unsigned long watchdog_hardlockup_all_cpu_dumped; static bool is_hardlockup(void) { @@ -100,6 +100,7 @@ static bool is_hardlockup(void) return true; __this_cpu_write(hrtimer_interrupts_saved, hrint); + return false; } @@ -110,21 +111,20 @@ static void watchdog_hardlockup_kick(void) void watchdog_hardlockup_check(struct pt_regs *regs) { - /* check for a hardlockup - * This is done by making sure our timer interrupt - * is incrementing. The timer interrupt should have - * fired multiple times before we overflow'd. If it hasn't + /* + * Check for a hardlockup by making sure the CPU's timer + * interrupt is incrementing. The timer interrupt should have + * fired multiple times before we overflow'd. If it hasn't * then this is a good indication the cpu is stuck */ if (is_hardlockup()) { - int this_cpu = smp_processor_id(); + unsigned int this_cpu = smp_processor_id(); - /* only print hardlockups once */ - if (__this_cpu_read(hard_watchdog_warn) == true) + /* Only print hardlockups once. */ + if (__this_cpu_read(watchdog_hardlockup_warned)) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", - this_cpu); + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); print_modules(); print_irqtrace_events(current); if (regs) @@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs) * generating interleaving traces */ if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) trigger_allbutself_cpu_backtrace(); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); - __this_cpu_write(hard_watchdog_warn, true); - return; + __this_cpu_write(watchdog_hardlockup_warned, true); + } else { + __this_cpu_write(watchdog_hardlockup_warned, false); } - - __this_cpu_write(hard_watchdog_warn, false); - return; } #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ From 77c12fc95980d100fdc49e88a5727c242d0dfedc Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:34 -0700 Subject: [PATCH 32/72] watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check() In preparation for the buddy hardlockup detector where the CPU checking for lockup might not be the currently running CPU, add a "cpu" parameter to watchdog_hardlockup_check(). As part of this change, make hrtimer_interrupts an atomic_t since now the CPU incrementing the value and the CPU reading the value might be different. Technially this could also be done with just READ_ONCE and WRITE_ONCE, but atomic_t feels a little cleaner in this case. While hrtimer_interrupts is made atomic_t, we change hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is needed to match the data type backing atomic_t for hrtimer_interrupts. Even if this changes us from 64-bits to 32-bits (which I don't think is true for most compilers), it doesn't really matter. All we ever do is increment it every few seconds and compare it to an old value so 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned" also doesn't matter for simple equality comparisons. hrtimer_interrupts_saved is _not_ switched to atomic_t nor even accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is always consistently accessed with the same CPU. NOTE: with the upcoming "buddy" detector there is one special case. When a CPU goes offline/online then we can change which CPU is the one to consistently access a given instance of hrtimer_interrupts_saved. We still can't end up with a partially updated hrtimer_interrupts_saved, however, because we end up petting all affected CPUs to make sure the new and old CPU can't end up somehow read/write hrtimer_interrupts_saved at the same time. Link: https://lkml.kernel.org/r/20230519101840.v5.10.I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0@changeid Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 2 +- kernel/watchdog.c | 52 ++++++++++++++++++++++++++---------------- kernel/watchdog_perf.c | 2 +- 3 files changed, 34 insertions(+), 22 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 67921436974d..521bf2ee6eff 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -88,7 +88,7 @@ static inline void hardlockup_detector_disable(void) {} #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) -void watchdog_hardlockup_check(struct pt_regs *regs); +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #endif #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 169e5dffbc00..2552e224f76a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned); static unsigned long watchdog_hardlockup_all_cpu_dumped; -static bool is_hardlockup(void) +static bool is_hardlockup(unsigned int cpu) { - unsigned long hrint = __this_cpu_read(hrtimer_interrupts); + int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) return true; - __this_cpu_write(hrtimer_interrupts_saved, hrint); + /* + * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE + * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is + * written/read by a single CPU. + */ + per_cpu(hrtimer_interrupts_saved, cpu) = hrint; return false; } static void watchdog_hardlockup_kick(void) { - __this_cpu_inc(hrtimer_interrupts); + atomic_inc(raw_cpu_ptr(&hrtimer_interrupts)); } -void watchdog_hardlockup_check(struct pt_regs *regs) +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { /* * Check for a hardlockup by making sure the CPU's timer @@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs) * fired multiple times before we overflow'd. If it hasn't * then this is a good indication the cpu is stuck */ - if (is_hardlockup()) { + if (is_hardlockup(cpu)) { unsigned int this_cpu = smp_processor_id(); + struct cpumask backtrace_mask = *cpu_online_mask; /* Only print hardlockups once. */ - if (__this_cpu_read(watchdog_hardlockup_warned)) + if (per_cpu(watchdog_hardlockup_warned, cpu)) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu); print_modules(); print_irqtrace_events(current); - if (regs) - show_regs(regs); - else - dump_stack(); + if (cpu == this_cpu) { + if (regs) + show_regs(regs); + else + dump_stack(); + cpumask_clear_cpu(cpu, &backtrace_mask); + } else { + if (trigger_single_cpu_backtrace(cpu)) + cpumask_clear_cpu(cpu, &backtrace_mask); + } /* - * Perform all-CPU dump only once to avoid multiple hardlockups - * generating interleaving traces + * Perform multi-CPU dump only once to avoid multiple + * hardlockups generating interleaving traces */ if (sysctl_hardlockup_all_cpu_backtrace && !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) - trigger_allbutself_cpu_backtrace(); + trigger_cpumask_backtrace(&backtrace_mask); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); - __this_cpu_write(watchdog_hardlockup_warned, true); + per_cpu(watchdog_hardlockup_warned, cpu) = true; } else { - __this_cpu_write(watchdog_hardlockup_warned, false); + per_cpu(watchdog_hardlockup_warned, cpu) = false; } } diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 04415812d079..4e60e8023515 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -120,7 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event, return; } - watchdog_hardlockup_check(regs); + watchdog_hardlockup_check(smp_processor_id(), regs); } static int hardlockup_detector_event_create(void) From ed92e1ef52224c7c9c15fba559448396b059c2ee Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:35 -0700 Subject: [PATCH 33/72] watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c In preparation for the buddy hardlockup detector, which wants the same petting logic as the current perf hardlockup detector, move the code to watchdog.c. While doing this, rename the global variable to match others nearby. As part of this change we have to change the code to account for the fact that the CPU we're running on might be different than the one we're checking. Currently the code in watchdog.c is guarded by CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem silly. However, a future patch will change this. Link: https://lkml.kernel.org/r/20230519101840.v5.11.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 5 +++-- kernel/watchdog.c | 19 +++++++++++++++++++ kernel/watchdog_perf.c | 19 ------------------- 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 521bf2ee6eff..56fdc3de6894 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -88,7 +88,10 @@ static inline void hardlockup_detector_disable(void) {} #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +void arch_touch_nmi_watchdog(void); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); +#elif !defined(CONFIG_HAVE_NMI_WATCHDOG) +static inline void arch_touch_nmi_watchdog(void) { } #endif #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) @@ -98,7 +101,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) -extern void arch_touch_nmi_watchdog(void); extern void hardlockup_detector_perf_stop(void); extern void hardlockup_detector_perf_restart(void); extern void hardlockup_detector_perf_disable(void); @@ -113,7 +115,6 @@ static inline void hardlockup_detector_perf_enable(void) { } static inline void hardlockup_detector_perf_cleanup(void) { } # if !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } -static inline void arch_touch_nmi_watchdog(void) {} # else static inline int hardlockup_detector_perf_init(void) { return 0; } # endif diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 2552e224f76a..64d7d2a0a7df 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -90,8 +90,22 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned); +static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched); static unsigned long watchdog_hardlockup_all_cpu_dumped; +notrace void arch_touch_nmi_watchdog(void) +{ + /* + * Using __raw here because some code paths have + * preemption enabled. If preemption is enabled + * then interrupts should be enabled too, in which + * case we shouldn't have to worry about the watchdog + * going off. + */ + raw_cpu_write(watchdog_hardlockup_touched, true); +} +EXPORT_SYMBOL(arch_touch_nmi_watchdog); + static bool is_hardlockup(unsigned int cpu) { int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); @@ -116,6 +130,11 @@ static void watchdog_hardlockup_kick(void) void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { + if (per_cpu(watchdog_hardlockup_touched, cpu)) { + per_cpu(watchdog_hardlockup_touched, cpu) = false; + return; + } + /* * Check for a hardlockup by making sure the CPU's timer * interrupt is incrementing. The timer interrupt should have diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 4e60e8023515..547917ebd5d3 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -20,26 +20,12 @@ #include #include -static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); static DEFINE_PER_CPU(struct perf_event *, dead_event); static struct cpumask dead_events_mask; static atomic_t watchdog_cpus = ATOMIC_INIT(0); -notrace void arch_touch_nmi_watchdog(void) -{ - /* - * Using __raw here because some code paths have - * preemption enabled. If preemption is enabled - * then interrupts should be enabled too, in which - * case we shouldn't have to worry about the watchdog - * going off. - */ - raw_cpu_write(watchdog_nmi_touch, true); -} -EXPORT_SYMBOL(arch_touch_nmi_watchdog); - #ifdef CONFIG_HARDLOCKUP_CHECK_TIMESTAMP static DEFINE_PER_CPU(ktime_t, last_timestamp); static DEFINE_PER_CPU(unsigned int, nmi_rearmed); @@ -115,11 +101,6 @@ static void watchdog_overflow_callback(struct perf_event *event, if (!watchdog_check_timestamp()) return; - if (__this_cpu_read(watchdog_nmi_touch) == true) { - __this_cpu_write(watchdog_nmi_touch, false); - return; - } - watchdog_hardlockup_check(smp_processor_id(), regs); } From df95d3085caa5b99a60eb033d7ad6c2ff2b43dbf Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:36 -0700 Subject: [PATCH 34/72] watchdog/hardlockup: rename some "NMI watchdog" constants/function Do a search and replace of: - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED - watchdog_nmi_ => watchdog_hardlockup_ - nmi_watchdog_available => watchdog_hardlockup_available - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT Then update a few comments near where names were changed. This is specifically to make it less confusing when we want to introduce the buddy hardlockup detector, which isn't using NMIs. As part of this, we sanitized a few names for consistency. [trix@redhat.com: make variables static] Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid Signed-off-by: Douglas Anderson Signed-off-by: Tom Rix Reviewed-by: Tom Rix Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/powerpc/include/asm/nmi.h | 4 +- arch/powerpc/kernel/watchdog.c | 12 +-- arch/powerpc/platforms/pseries/mobility.c | 4 +- arch/sparc/kernel/nmi.c | 4 +- include/linux/nmi.h | 24 +++-- kernel/watchdog.c | 113 +++++++++++----------- kernel/watchdog_perf.c | 2 +- 7 files changed, 81 insertions(+), 82 deletions(-) diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h index c3c7adef74de..43bfd4de868f 100644 --- a/arch/powerpc/include/asm/nmi.h +++ b/arch/powerpc/include/asm/nmi.h @@ -5,10 +5,10 @@ #ifdef CONFIG_PPC_WATCHDOG extern void arch_touch_nmi_watchdog(void); long soft_nmi_interrupt(struct pt_regs *regs); -void watchdog_nmi_set_timeout_pct(u64 pct); +void watchdog_hardlockup_set_timeout_pct(u64 pct); #else static inline void arch_touch_nmi_watchdog(void) {} -static inline void watchdog_nmi_set_timeout_pct(u64 pct) {} +static inline void watchdog_hardlockup_set_timeout_pct(u64 pct) {} #endif #ifdef CONFIG_NMI_IPI diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index dbcc4a793f0b..edb2dd1f53eb 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -438,7 +438,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) { int cpu = smp_processor_id(); - if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) + if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED)) return HRTIMER_NORESTART; if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) @@ -479,7 +479,7 @@ static void start_watchdog(void *arg) return; } - if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) + if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED)) return; if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) @@ -546,7 +546,7 @@ static void watchdog_calc_timeouts(void) wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5; } -void watchdog_nmi_stop(void) +void watchdog_hardlockup_stop(void) { int cpu; @@ -554,7 +554,7 @@ void watchdog_nmi_stop(void) stop_watchdog_on_cpu(cpu); } -void watchdog_nmi_start(void) +void watchdog_hardlockup_start(void) { int cpu; @@ -566,7 +566,7 @@ void watchdog_nmi_start(void) /* * Invoked from core watchdog init. */ -int __init watchdog_nmi_probe(void) +int __init watchdog_hardlockup_probe(void) { int err; @@ -582,7 +582,7 @@ int __init watchdog_nmi_probe(void) } #ifdef CONFIG_PPC_PSERIES -void watchdog_nmi_set_timeout_pct(u64 pct) +void watchdog_hardlockup_set_timeout_pct(u64 pct) { pr_info("Set the NMI watchdog timeout factor to %llu%%\n", pct); WRITE_ONCE(wd_timeout_pct, pct); diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 6f30113b5468..cd632ba9ebff 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -750,7 +750,7 @@ static int pseries_migrate_partition(u64 handle) goto out; if (factor) - watchdog_nmi_set_timeout_pct(factor); + watchdog_hardlockup_set_timeout_pct(factor); ret = pseries_suspend(handle); if (ret == 0) { @@ -766,7 +766,7 @@ static int pseries_migrate_partition(u64 handle) pseries_cancel_migration(handle, ret); if (factor) - watchdog_nmi_set_timeout_pct(0); + watchdog_hardlockup_set_timeout_pct(0); out: vas_migration_handler(VAS_RESUME); diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c index 5dcf31f7e81f..9d9e29b75c43 100644 --- a/arch/sparc/kernel/nmi.c +++ b/arch/sparc/kernel/nmi.c @@ -282,7 +282,7 @@ __setup("nmi_watchdog=", setup_nmi_watchdog); * sparc specific NMI watchdog enable function. * Enables watchdog if it is not enabled already. */ -void watchdog_nmi_enable(unsigned int cpu) +void watchdog_hardlockup_enable(unsigned int cpu) { if (atomic_read(&nmi_active) == -1) { pr_warn("NMI watchdog cannot be enabled or disabled\n"); @@ -303,7 +303,7 @@ void watchdog_nmi_enable(unsigned int cpu) * sparc specific NMI watchdog disable function. * Disables watchdog if it is not disabled already. */ -void watchdog_nmi_disable(unsigned int cpu) +void watchdog_hardlockup_disable(unsigned int cpu) { if (atomic_read(&nmi_active) == -1) pr_warn_once("NMI watchdog cannot be enabled or disabled\n"); diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 56fdc3de6894..17b2ae7103d9 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -17,8 +17,6 @@ void lockup_detector_soft_poweroff(void); void lockup_detector_cleanup(void); extern int watchdog_user_enabled; -extern int nmi_watchdog_user_enabled; -extern int soft_watchdog_user_enabled; extern int watchdog_thresh; extern unsigned long watchdog_enabled; @@ -68,17 +66,17 @@ static inline void reset_hung_task_detector(void) { } * 'watchdog_enabled' variable. Each lockup detector has its dedicated bit - * bit 0 for the hard lockup detector and bit 1 for the soft lockup detector. * - * 'watchdog_user_enabled', 'nmi_watchdog_user_enabled' and - * 'soft_watchdog_user_enabled' are variables that are only used as an + * 'watchdog_user_enabled', 'watchdog_hardlockup_user_enabled' and + * 'watchdog_softlockup_user_enabled' are variables that are only used as an * 'interface' between the parameters in /proc/sys/kernel and the internal * state bits in 'watchdog_enabled'. The 'watchdog_thresh' variable is * handled differently because its value is not boolean, and the lockup * detectors are 'suspended' while 'watchdog_thresh' is equal zero. */ -#define NMI_WATCHDOG_ENABLED_BIT 0 -#define SOFT_WATCHDOG_ENABLED_BIT 1 -#define NMI_WATCHDOG_ENABLED (1 << NMI_WATCHDOG_ENABLED_BIT) -#define SOFT_WATCHDOG_ENABLED (1 << SOFT_WATCHDOG_ENABLED_BIT) +#define WATCHDOG_HARDLOCKUP_ENABLED_BIT 0 +#define WATCHDOG_SOFTOCKUP_ENABLED_BIT 1 +#define WATCHDOG_HARDLOCKUP_ENABLED (1 << WATCHDOG_HARDLOCKUP_ENABLED_BIT) +#define WATCHDOG_SOFTOCKUP_ENABLED (1 << WATCHDOG_SOFTOCKUP_ENABLED_BIT) #if defined(CONFIG_HARDLOCKUP_DETECTOR) extern void hardlockup_detector_disable(void); @@ -120,11 +118,11 @@ static inline int hardlockup_detector_perf_init(void) { return 0; } # endif #endif -void watchdog_nmi_stop(void); -void watchdog_nmi_start(void); -int watchdog_nmi_probe(void); -void watchdog_nmi_enable(unsigned int cpu); -void watchdog_nmi_disable(unsigned int cpu); +void watchdog_hardlockup_stop(void); +void watchdog_hardlockup_start(void); +int watchdog_hardlockup_probe(void); +void watchdog_hardlockup_enable(unsigned int cpu); +void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 64d7d2a0a7df..c6790c7dc08d 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -30,17 +30,17 @@ static DEFINE_MUTEX(watchdog_mutex); #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG) -# define NMI_WATCHDOG_DEFAULT 1 +# define WATCHDOG_HARDLOCKUP_DEFAULT 1 #else -# define NMI_WATCHDOG_DEFAULT 0 +# define WATCHDOG_HARDLOCKUP_DEFAULT 0 #endif unsigned long __read_mostly watchdog_enabled; int __read_mostly watchdog_user_enabled = 1; -int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT; -int __read_mostly soft_watchdog_user_enabled = 1; +static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT; +static int __read_mostly watchdog_softlockup_user_enabled = 1; int __read_mostly watchdog_thresh = 10; -static int __read_mostly nmi_watchdog_available; +static int __read_mostly watchdog_hardlockup_available; struct cpumask watchdog_cpumask __read_mostly; unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); @@ -66,7 +66,7 @@ unsigned int __read_mostly hardlockup_panic = */ void __init hardlockup_detector_disable(void) { - nmi_watchdog_user_enabled = 0; + watchdog_hardlockup_user_enabled = 0; } static int __init hardlockup_panic_setup(char *str) @@ -76,9 +76,9 @@ static int __init hardlockup_panic_setup(char *str) else if (!strncmp(str, "nopanic", 7)) hardlockup_panic = 0; else if (!strncmp(str, "0", 1)) - nmi_watchdog_user_enabled = 0; + watchdog_hardlockup_user_enabled = 0; else if (!strncmp(str, "1", 1)) - nmi_watchdog_user_enabled = 1; + watchdog_hardlockup_user_enabled = 1; return 1; } __setup("nmi_watchdog=", hardlockup_panic_setup); @@ -190,40 +190,40 @@ static inline void watchdog_hardlockup_kick(void) { } * These functions can be overridden if an architecture implements its * own hardlockup detector. * - * watchdog_nmi_enable/disable can be implemented to start and stop when + * watchdog_hardlockup_enable/disable can be implemented to start and stop when * softlockup watchdog start and stop. The arch must select the * SOFTLOCKUP_DETECTOR Kconfig. */ -void __weak watchdog_nmi_enable(unsigned int cpu) +void __weak watchdog_hardlockup_enable(unsigned int cpu) { hardlockup_detector_perf_enable(); } -void __weak watchdog_nmi_disable(unsigned int cpu) +void __weak watchdog_hardlockup_disable(unsigned int cpu) { hardlockup_detector_perf_disable(); } -/* Return 0, if a NMI watchdog is available. Error code otherwise */ -int __weak __init watchdog_nmi_probe(void) +/* Return 0, if a hardlockup watchdog is available. Error code otherwise */ +int __weak __init watchdog_hardlockup_probe(void) { return hardlockup_detector_perf_init(); } /** - * watchdog_nmi_stop - Stop the watchdog for reconfiguration + * watchdog_hardlockup_stop - Stop the watchdog for reconfiguration * * The reconfiguration steps are: - * watchdog_nmi_stop(); + * watchdog_hardlockup_stop(); * update_variables(); - * watchdog_nmi_start(); + * watchdog_hardlockup_start(); */ -void __weak watchdog_nmi_stop(void) { } +void __weak watchdog_hardlockup_stop(void) { } /** - * watchdog_nmi_start - Start the watchdog after reconfiguration + * watchdog_hardlockup_start - Start the watchdog after reconfiguration * - * Counterpart to watchdog_nmi_stop(). + * Counterpart to watchdog_hardlockup_stop(). * * The following variables have been updated in update_variables() and * contain the currently valid configuration: @@ -231,23 +231,23 @@ void __weak watchdog_nmi_stop(void) { } * - watchdog_thresh * - watchdog_cpumask */ -void __weak watchdog_nmi_start(void) { } +void __weak watchdog_hardlockup_start(void) { } /** * lockup_detector_update_enable - Update the sysctl enable bit * - * Caller needs to make sure that the NMI/perf watchdogs are off, so this - * can't race with watchdog_nmi_disable(). + * Caller needs to make sure that the hard watchdogs are off, so this + * can't race with watchdog_hardlockup_disable(). */ static void lockup_detector_update_enable(void) { watchdog_enabled = 0; if (!watchdog_user_enabled) return; - if (nmi_watchdog_available && nmi_watchdog_user_enabled) - watchdog_enabled |= NMI_WATCHDOG_ENABLED; - if (soft_watchdog_user_enabled) - watchdog_enabled |= SOFT_WATCHDOG_ENABLED; + if (watchdog_hardlockup_available && watchdog_hardlockup_user_enabled) + watchdog_enabled |= WATCHDOG_HARDLOCKUP_ENABLED; + if (watchdog_softlockup_user_enabled) + watchdog_enabled |= WATCHDOG_SOFTOCKUP_ENABLED; } #ifdef CONFIG_SOFTLOCKUP_DETECTOR @@ -288,7 +288,7 @@ __setup("nowatchdog", nowatchdog_setup); static int __init nosoftlockup_setup(char *str) { - soft_watchdog_user_enabled = 0; + watchdog_softlockup_user_enabled = 0; return 1; } __setup("nosoftlockup", nosoftlockup_setup); @@ -402,7 +402,7 @@ static int is_softlockup(unsigned long touch_ts, unsigned long period_ts, unsigned long now) { - if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){ + if ((watchdog_enabled & WATCHDOG_SOFTOCKUP_ENABLED) && watchdog_thresh) { /* Warn about unreasonable delays. */ if (time_after(now, period_ts + get_softlockup_thresh())) return now - touch_ts; @@ -537,7 +537,7 @@ static void watchdog_enable(unsigned int cpu) complete(done); /* - * Start the timer first to prevent the NMI watchdog triggering + * Start the timer first to prevent the hardlockup watchdog triggering * before the timer has a chance to fire. */ hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); @@ -547,9 +547,9 @@ static void watchdog_enable(unsigned int cpu) /* Initialize timestamp */ update_touch_ts(); - /* Enable the perf event */ - if (watchdog_enabled & NMI_WATCHDOG_ENABLED) - watchdog_nmi_enable(cpu); + /* Enable the hardlockup detector */ + if (watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED) + watchdog_hardlockup_enable(cpu); } static void watchdog_disable(unsigned int cpu) @@ -559,11 +559,11 @@ static void watchdog_disable(unsigned int cpu) WARN_ON_ONCE(cpu != smp_processor_id()); /* - * Disable the perf event first. That prevents that a large delay - * between disabling the timer and disabling the perf event causes - * the perf NMI to detect a false positive. + * Disable the hardlockup detector first. That prevents that a large + * delay between disabling the timer and disabling the hardlockup + * detector causes a false positive. */ - watchdog_nmi_disable(cpu); + watchdog_hardlockup_disable(cpu); hrtimer_cancel(hrtimer); wait_for_completion(this_cpu_ptr(&softlockup_completion)); } @@ -619,7 +619,7 @@ int lockup_detector_offline_cpu(unsigned int cpu) static void __lockup_detector_reconfigure(void) { cpus_read_lock(); - watchdog_nmi_stop(); + watchdog_hardlockup_stop(); softlockup_stop_all(); set_sample_period(); @@ -627,7 +627,7 @@ static void __lockup_detector_reconfigure(void) if (watchdog_enabled && watchdog_thresh) softlockup_start_all(); - watchdog_nmi_start(); + watchdog_hardlockup_start(); cpus_read_unlock(); /* * Must be called outside the cpus locked section to prevent @@ -668,9 +668,9 @@ static __init void lockup_detector_setup(void) static void __lockup_detector_reconfigure(void) { cpus_read_lock(); - watchdog_nmi_stop(); + watchdog_hardlockup_stop(); lockup_detector_update_enable(); - watchdog_nmi_start(); + watchdog_hardlockup_start(); cpus_read_unlock(); } void lockup_detector_reconfigure(void) @@ -725,14 +725,14 @@ static void proc_watchdog_update(void) /* * common function for watchdog, nmi_watchdog and soft_watchdog parameter * - * caller | table->data points to | 'which' - * -------------------|----------------------------|-------------------------- - * proc_watchdog | watchdog_user_enabled | NMI_WATCHDOG_ENABLED | - * | | SOFT_WATCHDOG_ENABLED - * -------------------|----------------------------|-------------------------- - * proc_nmi_watchdog | nmi_watchdog_user_enabled | NMI_WATCHDOG_ENABLED - * -------------------|----------------------------|-------------------------- - * proc_soft_watchdog | soft_watchdog_user_enabled | SOFT_WATCHDOG_ENABLED + * caller | table->data points to | 'which' + * -------------------|----------------------------------|------------------------------- + * proc_watchdog | watchdog_user_enabled | WATCHDOG_HARDLOCKUP_ENABLED | + * | | WATCHDOG_SOFTOCKUP_ENABLED + * -------------------|----------------------------------|------------------------------- + * proc_nmi_watchdog | watchdog_hardlockup_user_enabled | WATCHDOG_HARDLOCKUP_ENABLED + * -------------------|----------------------------------|------------------------------- + * proc_soft_watchdog | watchdog_softlockup_user_enabled | WATCHDOG_SOFTOCKUP_ENABLED */ static int proc_watchdog_common(int which, struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -764,7 +764,8 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write, int proc_watchdog(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - return proc_watchdog_common(NMI_WATCHDOG_ENABLED|SOFT_WATCHDOG_ENABLED, + return proc_watchdog_common(WATCHDOG_HARDLOCKUP_ENABLED | + WATCHDOG_SOFTOCKUP_ENABLED, table, write, buffer, lenp, ppos); } @@ -774,9 +775,9 @@ int proc_watchdog(struct ctl_table *table, int write, int proc_nmi_watchdog(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - if (!nmi_watchdog_available && write) + if (!watchdog_hardlockup_available && write) return -ENOTSUPP; - return proc_watchdog_common(NMI_WATCHDOG_ENABLED, + return proc_watchdog_common(WATCHDOG_HARDLOCKUP_ENABLED, table, write, buffer, lenp, ppos); } @@ -786,7 +787,7 @@ int proc_nmi_watchdog(struct ctl_table *table, int write, int proc_soft_watchdog(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - return proc_watchdog_common(SOFT_WATCHDOG_ENABLED, + return proc_watchdog_common(WATCHDOG_SOFTOCKUP_ENABLED, table, write, buffer, lenp, ppos); } @@ -854,7 +855,7 @@ static struct ctl_table watchdog_sysctls[] = { }, { .procname = "nmi_watchdog", - .data = &nmi_watchdog_user_enabled, + .data = &watchdog_hardlockup_user_enabled, .maxlen = sizeof(int), .mode = NMI_WATCHDOG_SYSCTL_PERM, .proc_handler = proc_nmi_watchdog, @@ -871,7 +872,7 @@ static struct ctl_table watchdog_sysctls[] = { #ifdef CONFIG_SOFTLOCKUP_DETECTOR { .procname = "soft_watchdog", - .data = &soft_watchdog_user_enabled, + .data = &watchdog_softlockup_user_enabled, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_soft_watchdog, @@ -940,8 +941,8 @@ void __init lockup_detector_init(void) cpumask_copy(&watchdog_cpumask, housekeeping_cpumask(HK_TYPE_TIMER)); - if (!watchdog_nmi_probe()) - nmi_watchdog_available = true; + if (!watchdog_hardlockup_probe()) + watchdog_hardlockup_available = true; lockup_detector_setup(); watchdog_sysctl_init(); } diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 547917ebd5d3..9e6042a892b3 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -215,7 +215,7 @@ void __init hardlockup_detector_perf_restart(void) lockdep_assert_cpus_held(); - if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) + if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED)) return; for_each_online_cpu(cpu) { From d9b3629ade8ebffb0075e311409796a56bac8282 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:37 -0700 Subject: [PATCH 35/72] watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly The fact that there watchdog_hardlockup_enable(), watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are declared __weak means that the configured hardlockup detector can define non-weak versions of those functions if it needs to. Instead of doing this, the perf hardlockup detector hooked itself into the default __weak implementation, which was a bit awkward. Clean this up. From comments, it looks as if the original design was done because the __weak function were expected to implemented by the architecture and not by the configured hardlockup detector. This got awkward when we tried to add the buddy lockup detector which was not arch-specific but wanted to hook into those same functions. This is not expected to have any functional impact. Link: https://lkml.kernel.org/r/20230519101840.v5.13.I847d9ec852449350997ba00401d2462a9cb4302b@changeid Signed-off-by: Douglas Anderson Reviewed-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 10 ---------- kernel/watchdog.c | 30 ++++++++++++++++++------------ kernel/watchdog_perf.c | 20 ++++++++++++++------ 3 files changed, 32 insertions(+), 28 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 17b2ae7103d9..e23b4dc01b4a 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -101,21 +101,11 @@ static inline void arch_touch_nmi_watchdog(void) { } #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) extern void hardlockup_detector_perf_stop(void); extern void hardlockup_detector_perf_restart(void); -extern void hardlockup_detector_perf_disable(void); -extern void hardlockup_detector_perf_enable(void); extern void hardlockup_detector_perf_cleanup(void); -extern int hardlockup_detector_perf_init(void); #else static inline void hardlockup_detector_perf_stop(void) { } static inline void hardlockup_detector_perf_restart(void) { } -static inline void hardlockup_detector_perf_disable(void) { } -static inline void hardlockup_detector_perf_enable(void) { } static inline void hardlockup_detector_perf_cleanup(void) { } -# if !defined(CONFIG_HAVE_NMI_WATCHDOG) -static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } -# else -static inline int hardlockup_detector_perf_init(void) { return 0; } -# endif #endif void watchdog_hardlockup_stop(void); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c6790c7dc08d..e67125f64719 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -187,27 +187,33 @@ static inline void watchdog_hardlockup_kick(void) { } #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ /* - * These functions can be overridden if an architecture implements its - * own hardlockup detector. + * These functions can be overridden based on the configured hardlockdup detector. * * watchdog_hardlockup_enable/disable can be implemented to start and stop when - * softlockup watchdog start and stop. The arch must select the + * softlockup watchdog start and stop. The detector must select the * SOFTLOCKUP_DETECTOR Kconfig. */ -void __weak watchdog_hardlockup_enable(unsigned int cpu) -{ - hardlockup_detector_perf_enable(); -} +void __weak watchdog_hardlockup_enable(unsigned int cpu) { } -void __weak watchdog_hardlockup_disable(unsigned int cpu) -{ - hardlockup_detector_perf_disable(); -} +void __weak watchdog_hardlockup_disable(unsigned int cpu) { } /* Return 0, if a hardlockup watchdog is available. Error code otherwise */ int __weak __init watchdog_hardlockup_probe(void) { - return hardlockup_detector_perf_init(); + /* + * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture + * is assumed to have the hard watchdog available and we return 0. + */ + if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG)) + return 0; + + /* + * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG + * are required to implement a non-weak version of this probe function + * to tell whether they are available. If they don't override then + * we'll return -ENODEV. + */ + return -ENODEV; } /** diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 9e6042a892b3..349fcd4d2abc 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -132,10 +132,14 @@ static int hardlockup_detector_event_create(void) } /** - * hardlockup_detector_perf_enable - Enable the local event + * watchdog_hardlockup_enable - Enable the local event + * + * @cpu: The CPU to enable hard lockup on. */ -void hardlockup_detector_perf_enable(void) +void watchdog_hardlockup_enable(unsigned int cpu) { + WARN_ON_ONCE(cpu != smp_processor_id()); + if (hardlockup_detector_event_create()) return; @@ -147,12 +151,16 @@ void hardlockup_detector_perf_enable(void) } /** - * hardlockup_detector_perf_disable - Disable the local event + * watchdog_hardlockup_disable - Disable the local event + * + * @cpu: The CPU to enable hard lockup on. */ -void hardlockup_detector_perf_disable(void) +void watchdog_hardlockup_disable(unsigned int cpu) { struct perf_event *event = this_cpu_read(watchdog_ev); + WARN_ON_ONCE(cpu != smp_processor_id()); + if (event) { perf_event_disable(event); this_cpu_write(watchdog_ev, NULL); @@ -227,9 +235,9 @@ void __init hardlockup_detector_perf_restart(void) } /** - * hardlockup_detector_perf_init - Probe whether NMI event is available at all + * watchdog_hardlockup_probe - Probe whether NMI event is available at all */ -int __init hardlockup_detector_perf_init(void) +int __init watchdog_hardlockup_probe(void) { int ret = hardlockup_detector_event_create(); From 1f423c905a6b43b493df1b259e6e6267e5624e62 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:38 -0700 Subject: [PATCH 36/72] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Implement a hardlockup detector that doesn't doesn't need any extra arch-specific support code to detect lockups. Instead of using something arch-specific we will use the buddy system, where each CPU watches out for another one. Specifically, each CPU will use its softlockup hrtimer to check that the next CPU is processing hrtimer interrupts by verifying that a counter is increasing. NOTE: unlike the other hard lockup detectors, the buddy one can't easily show what's happening on the CPU that locked up just by doing a simple backtrace. It relies on some other mechanism in the system to get information about the locked up CPUs. This could be support for NMI backtraces like [1], it could be a mechanism for printing the PC of locked CPUs at panic time like [2] / [3], or it could be something else. Even though that means we still rely on arch-specific code, this arch-specific code seems to often be implemented even on architectures that don't have a hardlockup detector. This style of hardlockup detector originated in some downstream Android trees and has been rebased on / carried in ChromeOS trees for quite a long time for use on arm and arm64 boards. Historically on these boards we've leveraged mechanism [2] / [3] to get information about hung CPUs, but we could move to [1]. Although the original motivation for the buddy system was for use on systems without an arch-specific hardlockup detector, it can still be useful to use even on systems that _do_ have an arch-specific hardlockup detector. On x86, for instance, there is a 24-part patch series [4] in progress switching the arch-specific hard lockup detector from a scarce perf counter to a less-scarce hardware resource. Potentially the buddy system could be a simpler alternative to free up the perf counter but still get hard lockup detection. Overall, pros (+) and cons (-) of the buddy system compared to an arch-specific hardlockup detector (which might be implemented using perf): + The buddy system is usable on systems that don't have an arch-specific hardlockup detector, like arm32 and arm64 (though it's being worked on for arm64 [5]). + The buddy system may free up scarce hardware resources. + If a CPU totally goes out to lunch (can't process NMIs) the buddy system could still detect the problem (though it would be unlikely to be able to get a stack trace). + The buddy system uses the same timer function to pet the hardlockup detector on the running CPU as it uses to detect hardlockups on other CPUs. Compared to other hardlockup detectors, this means it generates fewer interrupts and thus is likely better able to let CPUs stay idle longer. - If all CPUs are hard locked up at the same time the buddy system can't detect it. - If we don't have SMP we can't use the buddy system. - The buddy system needs an arch-specific mechanism (possibly NMI backtrace) to get info about the locked up CPU. [1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org [2] https://issuetracker.google.com/172213129 [3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html [4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/ [5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/ Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid Signed-off-by: Colin Cross Signed-off-by: Matthias Kaehlcke Signed-off-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Daniel Thompson Cc: "David S. Miller" Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 9 +++- kernel/Makefile | 1 + kernel/watchdog.c | 29 +++++++++---- kernel/watchdog_buddy.c | 93 +++++++++++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 52 +++++++++++++++++++++-- 5 files changed, 173 insertions(+), 11 deletions(-) create mode 100644 kernel/watchdog_buddy.c diff --git a/include/linux/nmi.h b/include/linux/nmi.h index e23b4dc01b4a..1cdadc6a6cfd 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -85,8 +85,9 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) void arch_touch_nmi_watchdog(void); +void watchdog_hardlockup_touch_cpu(unsigned int cpu); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #elif !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline void arch_touch_nmi_watchdog(void) { } @@ -116,6 +117,12 @@ void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); +#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts); +#else +static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {} +#endif + /** * touch_nmi_watchdog - manually reset the hardlockup watchdog timeout. * diff --git a/kernel/Makefile b/kernel/Makefile index 7eb72033143c..f9e3fd9195d9 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -91,6 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) += watchdog_buddy.o obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RELAY) += relay.o diff --git a/kernel/watchdog.c b/kernel/watchdog.c index e67125f64719..10947c835079 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #endif /* CONFIG_HARDLOCKUP_DETECTOR */ -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) } EXPORT_SYMBOL(arch_touch_nmi_watchdog); +void watchdog_hardlockup_touch_cpu(unsigned int cpu) +{ + per_cpu(watchdog_hardlockup_touched, cpu) = true; + + /* Match with smp_rmb() in watchdog_hardlockup_check() */ + smp_wmb(); +} + static bool is_hardlockup(unsigned int cpu) { int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); @@ -123,13 +131,16 @@ static bool is_hardlockup(unsigned int cpu) return false; } -static void watchdog_hardlockup_kick(void) +static unsigned long watchdog_hardlockup_kick(void) { - atomic_inc(raw_cpu_ptr(&hrtimer_interrupts)); + return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts)); } void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { + /* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */ + smp_rmb(); + if (per_cpu(watchdog_hardlockup_touched, cpu)) { per_cpu(watchdog_hardlockup_touched, cpu) = false; return; @@ -180,11 +191,11 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) } } -#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ -static inline void watchdog_hardlockup_kick(void) { } +static inline unsigned long watchdog_hardlockup_kick(void) { return 0; } -#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ /* * These functions can be overridden based on the configured hardlockdup detector. @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; + unsigned long hrtimer_interrupts; if (!watchdog_enabled) return HRTIMER_NORESTART; - watchdog_hardlockup_kick(); + hrtimer_interrupts = watchdog_hardlockup_kick(); + + /* test for hardlockups */ + watchdog_buddy_check_hardlockup(hrtimer_interrupts); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c new file mode 100644 index 000000000000..fee45af2e5bd --- /dev/null +++ b/kernel/watchdog_buddy.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +static cpumask_t __read_mostly watchdog_cpus; + +static unsigned int watchdog_next_cpu(unsigned int cpu) +{ + cpumask_t cpus = watchdog_cpus; + unsigned int next_cpu; + + next_cpu = cpumask_next(cpu, &cpus); + if (next_cpu >= nr_cpu_ids) + next_cpu = cpumask_first(&cpus); + + if (next_cpu == cpu) + return nr_cpu_ids; + + return next_cpu; +} + +int __init watchdog_hardlockup_probe(void) +{ + return 0; +} + +void watchdog_hardlockup_enable(unsigned int cpu) +{ + unsigned int next_cpu; + + /* + * The new CPU will be marked online before the hrtimer interrupt + * gets a chance to run on it. If another CPU tests for a + * hardlockup on the new CPU before it has run its the hrtimer + * interrupt, it will get a false positive. Touch the watchdog on + * the new CPU to delay the check for at least 3 sampling periods + * to guarantee one hrtimer has run on the new CPU. + */ + watchdog_hardlockup_touch_cpu(cpu); + + /* + * We are going to check the next CPU. Our watchdog_hrtimer + * need not be zero if the CPU has already been online earlier. + * Touch the watchdog on the next CPU to avoid false positive + * if we try to check it in less then 3 interrupts. + */ + next_cpu = watchdog_next_cpu(cpu); + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_set_cpu(cpu, &watchdog_cpus); +} + +void watchdog_hardlockup_disable(unsigned int cpu) +{ + unsigned int next_cpu = watchdog_next_cpu(cpu); + + /* + * Offlining this CPU will cause the CPU before this one to start + * checking the one after this one. If this CPU just finished checking + * the next CPU and updating hrtimer_interrupts_saved, and then the + * previous CPU checks it within one sample period, it will trigger a + * false positive. Touch the watchdog on the next CPU to prevent it. + */ + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_clear_cpu(cpu, &watchdog_cpus); +} + +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) +{ + unsigned int next_cpu; + + /* + * Test for hardlockups every 3 samples. The sample period is + * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over + * watchdog_thresh (over by 20%). + */ + if (hrtimer_interrupts % 3 != 0) + return; + + /* check for a hardlockup on the next CPU */ + next_cpu = watchdog_next_cpu(smp_processor_id()); + if (next_cpu >= nr_cpu_ids) + return; + + watchdog_hardlockup_check(next_cpu, NULL); +} diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ce51d4dc6803..abcad0513a32 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC Say N if unsure. -config HARDLOCKUP_DETECTOR_PERF +# Both the "perf" and "buddy" hardlockup detectors count hrtimer +# interrupts. This config enables functions managing this common code. +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER bool select SOFTLOCKUP_DETECTOR +config HARDLOCKUP_DETECTOR_PERF + bool + depends on HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +config HARDLOCKUP_DETECTOR_BUDDY + bool + depends on SMP + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +# For hardlockup detectors you can have one directly provided by the arch +# or use a "non-arch" one. If you're using a "non-arch" one that is +# further divided the perf hardlockup detector (which, confusingly, needs +# arch-provided perf support) and the buddy hardlockup detector (which just +# needs SMP). In either case, using the "non-arch" code conflicts with +# the NMI watchdog code (which is sometimes used directly and sometimes used +# by the arch-provided hardlockup detector). +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG + default y + +config HARDLOCKUP_DETECTOR_PREFER_BUDDY + bool "Prefer the buddy CPU hardlockup detector" + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + help + Say Y here to prefer the buddy hardlockup detector over the perf one. + + With the buddy detector, each CPU uses its softlockup hrtimer + to check that the next CPU is processing hrtimer interrupts by + verifying that a counter is increasing. + + This hardlockup detector is useful on systems that don't have + an arch-specific hardlockup detector or if resources needed + for the hardlockup detector are better used for other things. + +# This will select the appropriate non-arch hardlockdup detector +config HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + # # Enables a timestamp based low pass filter to compensate for perf based # hard lockup detection which runs too fast due to turbo modes. @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" depends on DEBUG_KERNEL && !S390 - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH select LOCKUP_DETECTOR - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + help Say Y here to enable the kernel to act as a watchdog to detect hard lockups. From b17aa959330e8058452297049a0056ba4b9c72e8 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:39 -0700 Subject: [PATCH 37/72] watchdog/perf: add a weak function for an arch to detect if perf can use NMIs On arm64, NMI support needs to be detected at runtime. Add a weak function to the perf hardlockup detector so that an architecture can implement it to detect whether NMIs are available. Link: https://lkml.kernel.org/r/20230519101840.v5.15.Ic55cb6f90ef5967d8aaa2b503a4e67c753f64d3a@changeid Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Lecopzer Chen Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Pingfan Liu Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 1 + kernel/watchdog_perf.c | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 1cdadc6a6cfd..28e65fd1de13 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -208,6 +208,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu) #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF u64 hw_nmi_get_sample_period(int watchdog_thresh); +bool arch_perf_nmi_is_available(void); #endif #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \ diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 349fcd4d2abc..8ea00c4a24b2 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -234,12 +234,22 @@ void __init hardlockup_detector_perf_restart(void) } } +bool __weak __init arch_perf_nmi_is_available(void) +{ + return true; +} + /** * watchdog_hardlockup_probe - Probe whether NMI event is available at all */ int __init watchdog_hardlockup_probe(void) { - int ret = hardlockup_detector_event_create(); + int ret; + + if (!arch_perf_nmi_is_available()) + return -ENODEV; + + ret = hardlockup_detector_event_create(); if (ret) { pr_info("Perf NMI watchdog permanently disabled\n"); From 930d8f8dbab97cb05dba30e67a2dfa0c6dbf4bc7 Mon Sep 17 00:00:00 2001 From: Lecopzer Chen Date: Fri, 19 May 2023 10:18:40 -0700 Subject: [PATCH 38/72] watchdog/perf: adapt the watchdog_perf interface for async model When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not ready yet. E.g. on arm64, PMU is not ready until device_initcall(armv8_pmu_driver_init). And it is deeply integrated with the driver model and cpuhp. Hence it is hard to push this initialization before smp_init(). But it is easy to take an opposite approach and try to initialize the watchdog once again later. The delayed probe is called using workqueues. It need to allocate memory and must be proceed in a normal context. The delayed probe is able to use if watchdog_hardlockup_probe() returns non-zero which means the return code returned when PMU is not ready yet. Provide an API - lockup_detector_retry_init() for anyone who needs to delayed init lockup detector if they had ever failed at lockup_detector_init(). The original assumption is: nobody should use delayed probe after lockup_detector_check() which has __init attribute. That is, anyone uses this API must call between lockup_detector_init() and lockup_detector_check(), and the caller must have __init attribute Link: https://lkml.kernel.org/r/20230519101840.v5.16.If4ad5dd5d09fb1309cebf8bcead4b6a5a7758ca7@changeid Reviewed-by: Petr Mladek Co-developed-by: Pingfan Liu Signed-off-by: Pingfan Liu Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Sumit Garg Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nmi.h | 2 ++ kernel/watchdog.c | 67 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 28e65fd1de13..83577ae736cc 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -13,6 +13,7 @@ #ifdef CONFIG_LOCKUP_DETECTOR void lockup_detector_init(void); +void lockup_detector_retry_init(void); void lockup_detector_soft_poweroff(void); void lockup_detector_cleanup(void); @@ -32,6 +33,7 @@ extern int sysctl_hardlockup_all_cpu_backtrace; #else /* CONFIG_LOCKUP_DETECTOR */ static inline void lockup_detector_init(void) { } +static inline void lockup_detector_retry_init(void) { } static inline void lockup_detector_soft_poweroff(void) { } static inline void lockup_detector_cleanup(void) { } #endif /* !CONFIG_LOCKUP_DETECTOR */ diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 10947c835079..237990e8d345 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -208,7 +208,13 @@ void __weak watchdog_hardlockup_enable(unsigned int cpu) { } void __weak watchdog_hardlockup_disable(unsigned int cpu) { } -/* Return 0, if a hardlockup watchdog is available. Error code otherwise */ +/* + * Watchdog-detector specific API. + * + * Return 0 when hardlockup watchdog is available, negative value otherwise. + * Note that the negative value means that a delayed probe might + * succeed later. + */ int __weak __init watchdog_hardlockup_probe(void) { /* @@ -954,6 +960,62 @@ static void __init watchdog_sysctl_init(void) #define watchdog_sysctl_init() do { } while (0) #endif /* CONFIG_SYSCTL */ +static void __init lockup_detector_delay_init(struct work_struct *work); +static bool allow_lockup_detector_init_retry __initdata; + +static struct work_struct detector_work __initdata = + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); + +static void __init lockup_detector_delay_init(struct work_struct *work) +{ + int ret; + + ret = watchdog_hardlockup_probe(); + if (ret) { + pr_info("Delayed init of the lockup detector failed: %d\n", ret); + pr_info("Hard watchdog permanently disabled\n"); + return; + } + + allow_lockup_detector_init_retry = false; + + watchdog_hardlockup_available = true; + lockup_detector_setup(); +} + +/* + * lockup_detector_retry_init - retry init lockup detector if possible. + * + * Retry hardlockup detector init. It is useful when it requires some + * functionality that has to be initialized later on a particular + * platform. + */ +void __init lockup_detector_retry_init(void) +{ + /* Must be called before late init calls */ + if (!allow_lockup_detector_init_retry) + return; + + schedule_work(&detector_work); +} + +/* + * Ensure that optional delayed hardlockup init is proceed before + * the init code and memory is freed. + */ +static int __init lockup_detector_check(void) +{ + /* Prevent any later retry. */ + allow_lockup_detector_init_retry = false; + + /* Make sure no work is pending. */ + flush_work(&detector_work); + + return 0; + +} +late_initcall_sync(lockup_detector_check); + void __init lockup_detector_init(void) { if (tick_nohz_full_enabled()) @@ -964,6 +1026,9 @@ void __init lockup_detector_init(void) if (!watchdog_hardlockup_probe()) watchdog_hardlockup_available = true; + else + allow_lockup_detector_init_retry = true; + lockup_detector_setup(); watchdog_sysctl_init(); } From 94946f9eaac116f2943ec79ec3df1ec2fc92ae07 Mon Sep 17 00:00:00 2001 From: Lecopzer Chen Date: Fri, 19 May 2023 10:18:41 -0700 Subject: [PATCH 39/72] arm64: add hw_nmi_get_sample_period for preparation of lockup detector Set safe maximum CPU frequency to 5 GHz in case a particular platform doesn't implement cpufreq driver. Although, architecture doesn't put any restrictions on maximum frequency but 5 GHz seems to be safe maximum given the available Arm CPUs in the market which are clocked much less than 5 GHz. On the other hand, we can't make it much higher as it would lead to a large hard-lockup detection timeout on parts which are running slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver. Link: https://lkml.kernel.org/r/20230519101840.v5.17.Ia9d02578e89c3f44d3cb12eec8b0176603c8ab2f@changeid Co-developed-by: Sumit Garg Signed-off-by: Sumit Garg Co-developed-by: Pingfan Liu Signed-off-by: Pingfan Liu Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/watchdog_hld.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 arch/arm64/kernel/watchdog_hld.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 7c2bb4e72476..cc22011ab66a 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -45,6 +45,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += ftrace.o entry-ftrace.o obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_CPU_PM) += sleep.o suspend.o obj-$(CONFIG_CPU_IDLE) += cpuidle.o diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c new file mode 100644 index 000000000000..2401eb1b7e55 --- /dev/null +++ b/arch/arm64/kernel/watchdog_hld.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +/* + * Safe maximum CPU frequency in case a particular platform doesn't implement + * cpufreq driver. Although, architecture doesn't put any restrictions on + * maximum frequency but 5 GHz seems to be safe maximum given the available + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other + * hand, we can't make it much higher as it would lead to a large hard-lockup + * detection timeout on parts which are running slower (eg. 1GHz on + * Developerbox) and doesn't possess a cpufreq driver. + */ +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz +u64 hw_nmi_get_sample_period(int watchdog_thresh) +{ + unsigned int cpu = smp_processor_id(); + unsigned long max_cpu_freq; + + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL; + if (!max_cpu_freq) + max_cpu_freq = SAFE_MAX_CPU_FREQ; + + return (u64)max_cpu_freq * watchdog_thresh; +} From d7a0fe9ef6d6484fca4ba55c19091932337d4272 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 19 May 2023 10:18:42 -0700 Subject: [PATCH 40/72] arm64: enable perf events based hard lockup detector With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support. One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). To cope with that, override arch_perf_nmi_is_available() to let the watchdog framework know PMU not ready, and inform the framework to re-initialize lockup detection once PMU has been initialized. [dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled] Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c363caf8c@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Co-developed-by: Sumit Garg Signed-off-by: Sumit Garg Co-developed-by: Pingfan Liu Signed-off-by: Pingfan Liu Signed-off-by: Lecopzer Chen Signed-off-by: Douglas Anderson Cc: Andi Kleen Cc: Catalin Marinas Cc: Chen-Yu Tsai Cc: Christophe Leroy Cc: Colin Cross Cc: Daniel Thompson Cc: "David S. Miller" Cc: Guenter Roeck Cc: Ian Rogers Cc: Marc Zyngier Cc: Mark Rutland Cc: Masayoshi Mizuma Cc: Matthias Kaehlcke Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Petr Mladek Cc: Randy Dunlap Cc: "Ravi V. Shankar" Cc: Ricardo Neri Cc: Stephane Eranian Cc: Stephen Boyd Cc: Tzung-Bi Shih Cc: Will Deacon Signed-off-by: Andrew Morton --- arch/arm64/Kconfig | 3 +++ arch/arm64/kernel/watchdog_hld.c | 12 ++++++++++++ drivers/perf/arm_pmu.c | 5 +++++ drivers/perf/arm_pmuv3.c | 12 ++++++++++-- include/linux/perf/arm_pmu.h | 2 ++ 5 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1201d25a8a4..ef8776121766 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -203,12 +203,15 @@ config ARM64 select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_GCC_PLUGINS + select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \ + HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IOREMAP_PROT select HAVE_IRQ_TIME_ACCOUNTING select HAVE_KVM select HAVE_NMI select HAVE_PERF_EVENTS + select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_PREEMPT_DYNAMIC_KEY diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c index 2401eb1b7e55..dcd25322127c 100644 --- a/arch/arm64/kernel/watchdog_hld.c +++ b/arch/arm64/kernel/watchdog_hld.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include +#include /* * Safe maximum CPU frequency in case a particular platform doesn't implement @@ -22,3 +24,13 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh) return (u64)max_cpu_freq * watchdog_thresh; } + +bool __init arch_perf_nmi_is_available(void) +{ + /* + * hardlockup_detector_perf_init() will success even if Pseudo-NMI turns off, + * however, the pmu interrupts will act like a normal interrupt instead of + * NMI and the hardlockup detector would be broken. + */ + return arm_pmu_irq_is_nmi(); +} diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 15bd1e34a88e..7b9caa502d33 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -687,6 +687,11 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu) return per_cpu(hw_events->irq, cpu); } +bool arm_pmu_irq_is_nmi(void) +{ + return has_nmi; +} + /* * PMU hardware loses all context when a CPU goes offline. * When a CPU is hotplugged back in, since some hardware registers are diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index c98e4039386d..7b28d65f3f1c 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -22,6 +22,7 @@ #include #include #include +#include #include @@ -1348,10 +1349,17 @@ static struct platform_driver armv8_pmu_driver = { static int __init armv8_pmu_driver_init(void) { + int ret; + if (acpi_disabled) - return platform_driver_register(&armv8_pmu_driver); + ret = platform_driver_register(&armv8_pmu_driver); else - return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); + ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init); + + if (!ret) + lockup_detector_retry_init(); + + return ret; } device_initcall(armv8_pmu_driver_init) diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 525b5d64e394..5b00f5cb4cf9 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -171,6 +171,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu); #define kvm_host_pmu_init(x) do { } while(0) #endif +bool arm_pmu_irq_is_nmi(void); + /* Internal functions only for core arm_pmu code */ struct arm_pmu *armpmu_alloc(void); void armpmu_free(struct arm_pmu *pmu); From 048a9883267f9b8f8e05dca9e9e8e6f991eea61e Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Sat, 20 May 2023 21:25:19 +0300 Subject: [PATCH 41/72] include/linux/math.h: fix mult_frac() multiple argument evaluation bug mult_frac() evaluates _all_ arguments multiple times in the body. Clarify comment while I'm at it. Link: https://lkml.kernel.org/r/f9f9fdbb-ec8e-4f5e-a998-2a58627a1a43@p183 Signed-off-by: Alexey Dobriyan Signed-off-by: Andrew Morton --- include/linux/math.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/include/linux/math.h b/include/linux/math.h index 439b8f0b9ebd..2d388650c556 100644 --- a/include/linux/math.h +++ b/include/linux/math.h @@ -118,17 +118,17 @@ __STRUCT_FRACT(s32) __STRUCT_FRACT(u32) #undef __STRUCT_FRACT -/* - * Multiplies an integer by a fraction, while avoiding unnecessary - * overflow or loss of precision. - */ -#define mult_frac(x, numer, denom)( \ -{ \ - typeof(x) quot = (x) / (denom); \ - typeof(x) rem = (x) % (denom); \ - (quot * (numer)) + ((rem * (numer)) / (denom)); \ -} \ -) +/* Calculate "x * n / d" without unnecessary overflow or loss of precision. */ +#define mult_frac(x, n, d) \ +({ \ + typeof(x) x_ = (x); \ + typeof(n) n_ = (n); \ + typeof(d) d_ = (d); \ + \ + typeof(x_) q = x_ / d_; \ + typeof(x_) r = x_ % d_; \ + q * n_ + r * n_ / d_; \ +}) #define sector_div(a, b) do_div(a, b) From 4df3504e2f17cb35d2c12b07e716ae37971d0d52 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Thu, 25 May 2023 16:26:25 +0200 Subject: [PATCH 42/72] kexec: avoid calculating array size twice Avoid calculating array size twice in kexec_purgatory_setup_sechdrs(). Once using array_size(), and once open-coded. Flagged by Coccinelle: .../kexec_file.c:881:8-25: WARNING: array_size is already used (line 877) to compute the same size No functional change intended. Compile tested only. Link: https://lkml.kernel.org/r/20230525-kexec-array_size-v1-1-8b4bf4f7500a@kernel.org Signed-off-by: Simon Horman Acked-by: Baoquan He Cc: Eric W. Biederman Cc: Zhen Lei Signed-off-by: Andrew Morton --- kernel/kexec_file.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f989f5f1933b..3f5677679744 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -867,6 +867,7 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi, { unsigned long bss_addr; unsigned long offset; + size_t sechdrs_size; Elf_Shdr *sechdrs; int i; @@ -874,11 +875,11 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi, * The section headers in kexec_purgatory are read-only. In order to * have them modifiable make a temporary copy. */ - sechdrs = vzalloc(array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum)); + sechdrs_size = array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum); + sechdrs = vzalloc(sechdrs_size); if (!sechdrs) return -ENOMEM; - memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff, - pi->ehdr->e_shnum * sizeof(Elf_Shdr)); + memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff, sechdrs_size); pi->sechdrs = sechdrs; offset = 0; From d32840ad4a111c6abd651fbf6b5996e6123913da Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Sun, 28 May 2023 21:20:32 +0800 Subject: [PATCH 43/72] ocfs2: correct return value of ocfs2_local_free_info() Now in ocfs2_local_free_info(), it returns 0 even if it actually fails. Though it doesn't cause any real problem since the only caller dquot_disable() ignores the return value, we'd better return correct as it is. Link: https://lkml.kernel.org/r/20230528132033.217664-1-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/quota_local.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c index 5022b3e9bfcd..dfaae1e52412 100644 --- a/fs/ocfs2/quota_local.c +++ b/fs/ocfs2/quota_local.c @@ -811,7 +811,7 @@ static int ocfs2_local_free_info(struct super_block *sb, int type) struct ocfs2_quota_chunk *chunk; struct ocfs2_local_disk_chunk *dchunk; int mark_clean = 1, len; - int status; + int status = 0; iput(oinfo->dqi_gqinode); ocfs2_simple_drop_lockres(OCFS2_SB(sb), &oinfo->dqi_gqlock); @@ -853,17 +853,14 @@ static int ocfs2_local_free_info(struct super_block *sb, int type) oinfo->dqi_libh, olq_update_info, info); - if (status < 0) { + if (status < 0) mlog_errno(status); - goto out; - } - out: ocfs2_inode_unlock(sb_dqopt(sb)->files[type], 1); brelse(oinfo->dqi_libh); brelse(oinfo->dqi_lqi_bh); kfree(oinfo); - return 0; + return status; } static void olq_set_dquot(struct buffer_head *bh, void *private) From 69fe5c430ccd09a4c6e2d0b13d1164812983e2a1 Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Sun, 28 May 2023 21:20:33 +0800 Subject: [PATCH 44/72] ocfs2: cleanup trace events After commit 6dc8bc0fb300 ("ocfs2: switch to iter_file_splice_write()"), ocfs2 has switched from ocfs2_file_splice_write() to iter_file_splice_write(), so cleanup the corresponding trace event as well. Link: https://lkml.kernel.org/r/20230528132033.217664-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi Cc: Al Viro Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/ocfs2_trace.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h index dc4bce1649c1..26effa583be0 100644 --- a/fs/ocfs2/ocfs2_trace.h +++ b/fs/ocfs2/ocfs2_trace.h @@ -1315,8 +1315,6 @@ DEFINE_OCFS2_FILE_OPS(ocfs2_sync_file); DEFINE_OCFS2_FILE_OPS(ocfs2_file_write_iter); -DEFINE_OCFS2_FILE_OPS(ocfs2_file_splice_write); - DEFINE_OCFS2_FILE_OPS(ocfs2_file_read_iter); DEFINE_OCFS2_ULL_ULL_ULL_EVENT(ocfs2_truncate_file); From 1cba6c4309f03de570202c46f03df3f73a0d4c82 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:34 +0800 Subject: [PATCH 45/72] kexec: fix a memory leak in crash_shrink_memory() Patch series "kexec: enable kexec_crash_size to support two crash kernel regions". When crashkernel=X fails to reserve region under 4G, it will fall back to reserve region above 4G and a region of the default size will also be reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only supports one crash kernel region now, the user cannot sense the low memory reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot be freed by writing this file. For example: resource_size(crashk_res) = 512M resource_size(crashk_low_res) = 256M The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be 768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB, which is incorrect. Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them. This patch (of 6): If the value of parameter 'new_size' is in the semi-open and semi-closed interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the calculation result of ram_res is: ram_res->start = crashk_res.end + 1 ram_res->end = crashk_res.end The operation of insert_resource() fails, and ram_res is not added to iomem_resource. As a result, the memory of the control block ram_res is leaked. In fact, on all architectures, the start address and size of crashk_res are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need to round up crashk_res.start again. Instead, we should round up 'new_size' in advance. Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com Fixes: 6480e5a09237 ("kdump: add missing RAM resource in crash_shrink_memory()") Fixes: 06a7f711246b ("kexec: premit reduction of the reserved memory size") Signed-off-by: Zhen Lei Acked-by: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 3d578c6fefee..22acee18195a 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1122,6 +1122,7 @@ int crash_shrink_memory(unsigned long new_size) start = crashk_res.start; end = crashk_res.end; old_size = (end == 0) ? 0 : end - start + 1; + new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; goto unlock; @@ -1133,9 +1134,7 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - start = roundup(start, KEXEC_CRASH_MEM_ALIGN); - end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN); - + end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); if ((start == end) && (crashk_res.parent != NULL)) From 6f22a744f4ee7a22be4704cf93bbe22decc7e79e Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:35 +0800 Subject: [PATCH 46/72] kexec: delete a useless check in crash_shrink_memory() The check '(crashk_res.parent != NULL)' is added by commit e05bd3367bd3 ("kexec: fix Oops in crash_shrink_memory()"), but it's stale now. Because if 'crashk_res' is not reserved, it will be zero in size and will be intercepted by the above 'if (new_size >= old_size)'. Ago: if (new_size >= end - start + 1) Now: old_size = (end == 0) ? 0 : end - start + 1; if (new_size >= old_size) Link: https://lkml.kernel.org/r/20230527123439.772-3-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Cc: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 22acee18195a..d1ab139dd490 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1137,7 +1137,7 @@ int crash_shrink_memory(unsigned long new_size) end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); - if ((start == end) && (crashk_res.parent != NULL)) + if (start == end) release_resource(&crashk_res); ram_res->start = end; From f7f567b95b12eda7a6a273b8cb82d152491bc0da Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:36 +0800 Subject: [PATCH 47/72] kexec: clear crashk_res if all its memory has been released If the resource of crashk_res has been released, it is better to clear crashk_res.start and crashk_res.end. Because 'end = start - 1' is not reasonable, and in some places the test is based on crashk_res.end, not resource_size(&crashk_res). Link: https://lkml.kernel.org/r/20230527123439.772-4-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Acked-by: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index d1ab139dd490..bcc86a250ab3 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1137,15 +1137,18 @@ int crash_shrink_memory(unsigned long new_size) end = start + new_size; crash_free_reserved_phys_range(end, crashk_res.end); - if (start == end) - release_resource(&crashk_res); - ram_res->start = end; ram_res->end = crashk_res.end; ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; ram_res->name = "System RAM"; - crashk_res.end = end - 1; + if (start == end) { + release_resource(&crashk_res); + crashk_res.start = 0; + crashk_res.end = 0; + } else { + crashk_res.end = end - 1; + } insert_resource(&iomem_resource, ram_res); From 8a7db7790a3ff8413bedef886cf135ba301e88d7 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:37 +0800 Subject: [PATCH 48/72] kexec: improve the readability of crash_shrink_memory() The major adjustments are: 1. end = start + new_size. The 'end' here is not an accurate representation, because it is not the new end of crashk_res, but the start of ram_res, difference 1. So eliminate it and replace it with ram_res->start. 2. Use 'ram_res->start' and 'ram_res->end' as arguments to crash_free_reserved_phys_range() to indicate that the memory covered by 'ram_res' is released from the crashk. And keep it close to insert_resource(). 3. Replace 'if (start == end)' with 'if (!new_size)', clear indication that all crashk memory will be shrunken. No functional change. Link: https://lkml.kernel.org/r/20230527123439.772-5-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Acked-by: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index bcc86a250ab3..69fe92141b0b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1108,7 +1108,6 @@ ssize_t crash_get_memory_size(void) int crash_shrink_memory(unsigned long new_size) { int ret = 0; - unsigned long start, end; unsigned long old_size; struct resource *ram_res; @@ -1119,9 +1118,7 @@ int crash_shrink_memory(unsigned long new_size) ret = -ENOENT; goto unlock; } - start = crashk_res.start; - end = crashk_res.end; - old_size = (end == 0) ? 0 : end - start + 1; + old_size = !crashk_res.end ? 0 : resource_size(&crashk_res); new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; @@ -1134,22 +1131,20 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - end = start + new_size; - crash_free_reserved_phys_range(end, crashk_res.end); - - ram_res->start = end; + ram_res->start = crashk_res.start + new_size; ram_res->end = crashk_res.end; ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; ram_res->name = "System RAM"; - if (start == end) { + if (!new_size) { release_resource(&crashk_res); crashk_res.start = 0; crashk_res.end = 0; } else { - crashk_res.end = end - 1; + crashk_res.end = ram_res->start - 1; } + crash_free_reserved_phys_range(ram_res->start, ram_res->end); insert_resource(&iomem_resource, ram_res); unlock: From 5b7bfb32cbaad6ba997ab4295ee50bc9921252f2 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:38 +0800 Subject: [PATCH 49/72] kexec: add helper __crash_shrink_memory() No functional change, in preparation for the next patch so that it is easier to review. [akpm@linux-foundation.org: make __crash_shrink_memory() static] Link: https://lore.kernel.org/oe-kbuild-all/202305280717.Pw06aLkz-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230527123439.772-6-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Acked-by: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 51 ++++++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 22 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 69fe92141b0b..f7ff7de94a61 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1105,11 +1105,38 @@ ssize_t crash_get_memory_size(void) return size; } +static int __crash_shrink_memory(struct resource *old_res, + unsigned long new_size) +{ + struct resource *ram_res; + + ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); + if (!ram_res) + return -ENOMEM; + + ram_res->start = old_res->start + new_size; + ram_res->end = old_res->end; + ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; + ram_res->name = "System RAM"; + + if (!new_size) { + release_resource(old_res); + old_res->start = 0; + old_res->end = 0; + } else { + crashk_res.end = ram_res->start - 1; + } + + crash_free_reserved_phys_range(ram_res->start, ram_res->end); + insert_resource(&iomem_resource, ram_res); + + return 0; +} + int crash_shrink_memory(unsigned long new_size) { int ret = 0; unsigned long old_size; - struct resource *ram_res; if (!kexec_trylock()) return -EBUSY; @@ -1125,27 +1152,7 @@ int crash_shrink_memory(unsigned long new_size) goto unlock; } - ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); - if (!ram_res) { - ret = -ENOMEM; - goto unlock; - } - - ram_res->start = crashk_res.start + new_size; - ram_res->end = crashk_res.end; - ram_res->flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM; - ram_res->name = "System RAM"; - - if (!new_size) { - release_resource(&crashk_res); - crashk_res.start = 0; - crashk_res.end = 0; - } else { - crashk_res.end = ram_res->start - 1; - } - - crash_free_reserved_phys_range(ram_res->start, ram_res->end); - insert_resource(&iomem_resource, ram_res); + ret = __crash_shrink_memory(&crashk_res, new_size); unlock: kexec_unlock(); From 16c6006af4d4e70ecef93977a5314409d931020b Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Sat, 27 May 2023 20:34:39 +0800 Subject: [PATCH 50/72] kexec: enable kexec_crash_size to support two crash kernel regions The crashk_low_res should be considered by /sys/kernel/kexec_crash_size to support two crash kernel regions shrinking if existing. While doing it, crashk_low_res will only be shrunk when the entire crashk_res is empty; and if the crashk_res is empty and crahk_low_res is not, change crashk_low_res to be crashk_res. [bhe@redhat.com: redo changelog] Link: https://lkml.kernel.org/r/20230527123439.772-7-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Acked-by: Baoquan He Cc: Cong Wang Cc: Eric W. Biederman Cc: Michael Holzheu Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index f7ff7de94a61..e2f2574d8b74 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1091,6 +1091,11 @@ __bpf_kfunc void crash_kexec(struct pt_regs *regs) } } +static inline resource_size_t crash_resource_size(const struct resource *res) +{ + return !res->end ? 0 : resource_size(res); +} + ssize_t crash_get_memory_size(void) { ssize_t size = 0; @@ -1098,8 +1103,8 @@ ssize_t crash_get_memory_size(void) if (!kexec_trylock()) return -EBUSY; - if (crashk_res.end != crashk_res.start) - size = resource_size(&crashk_res); + size += crash_resource_size(&crashk_res); + size += crash_resource_size(&crashk_low_res); kexec_unlock(); return size; @@ -1136,7 +1141,7 @@ static int __crash_shrink_memory(struct resource *old_res, int crash_shrink_memory(unsigned long new_size) { int ret = 0; - unsigned long old_size; + unsigned long old_size, low_size; if (!kexec_trylock()) return -EBUSY; @@ -1145,14 +1150,42 @@ int crash_shrink_memory(unsigned long new_size) ret = -ENOENT; goto unlock; } - old_size = !crashk_res.end ? 0 : resource_size(&crashk_res); + + low_size = crash_resource_size(&crashk_low_res); + old_size = crash_resource_size(&crashk_res) + low_size; new_size = roundup(new_size, KEXEC_CRASH_MEM_ALIGN); if (new_size >= old_size) { ret = (new_size == old_size) ? 0 : -EINVAL; goto unlock; } - ret = __crash_shrink_memory(&crashk_res, new_size); + /* + * (low_size > new_size) implies that low_size is greater than zero. + * This also means that if low_size is zero, the else branch is taken. + * + * If low_size is greater than 0, (low_size > new_size) indicates that + * crashk_low_res also needs to be shrunken. Otherwise, only crashk_res + * needs to be shrunken. + */ + if (low_size > new_size) { + ret = __crash_shrink_memory(&crashk_res, 0); + if (ret) + goto unlock; + + ret = __crash_shrink_memory(&crashk_low_res, new_size); + } else { + ret = __crash_shrink_memory(&crashk_res, new_size - low_size); + } + + /* Swap crashk_res and crashk_low_res if needed */ + if (!crashk_res.end && crashk_low_res.end) { + crashk_res.start = crashk_low_res.start; + crashk_res.end = crashk_low_res.end; + release_resource(&crashk_low_res); + crashk_low_res.start = 0; + crashk_low_res.end = 0; + insert_resource(&iomem_resource, &crashk_res); + } unlock: kexec_unlock(); From f26799ffd6c7871aa0a1b8eeff2bd28258368cdb Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 1 Jun 2023 09:07:46 -0700 Subject: [PATCH 51/72] checkpatch: check for 0-length and 1-element arrays Fake flexible arrays have been deprecated since last millennium. Proper C99 flexible arrays must be used throughout the kernel so CONFIG_FORTIFY_SOURCE and CONFIG_UBSAN_BOUNDS can provide proper array bounds checking. [joe@perches.com: various suggestions] Link: https://lkml.kernel.org/r/20230601160746.up.948-kees@kernel.org Link: https://lore.kernel.org/r/20230517204530.never.151-kees@kernel.org Signed-off-by: Kees Cook Acked-by: Gustavo A. R. Silva Acked-by: Joe Perches Cc: Andy Whitcroft Cc: Dwaipayan Ray Cc: Lukas Bulwahn Signed-off-by: Andrew Morton --- scripts/checkpatch.pl | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index b30114d637c4..158479496388 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -7418,6 +7418,16 @@ sub process { } } +# check for array definition/declarations that should use flexible arrays instead + if ($sline =~ /^[\+ ]\s*\}(?:\s*__packed)?\s*;\s*$/ && + $prevline =~ /^\+\s*(?:\}(?:\s*__packed\s*)?|$Type)\s*$Ident\s*\[\s*(0|1)\s*\]\s*;\s*$/) { + if (ERROR("FLEXIBLE_ARRAY", + "Use C99 flexible arrays - see https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays\n" . $hereprev) && + $1 == '0' && $fix) { + $fixed[$fixlinenr - 1] =~ s/\[\s*0\s*\]/[]/; + } + } + # nested likely/unlikely calls if ($line =~ /\b(?:(?:un)?likely)\s*\(\s*!?\s*(IS_ERR(?:_OR_NULL|_VALUE)?|WARN)/) { WARN("LIKELY_MISUSE", From a94181ec064b3ad1b6f573f5953e2011f8c90292 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 7 Jun 2023 16:28:45 +0200 Subject: [PATCH 52/72] syscalls: add sys_ni_posix_timers prototype The sys_ni_posix_timers() definition causes a warning when the declaration is missing, so this needs to be added along with the normal syscalls, outside of the #ifdef. kernel/time/posix-stubs.c:26:17: error: no previous prototype for 'sys_ni_posix_timers' [-Werror=missing-prototypes] Link: https://lkml.kernel.org/r/20230607142925.3126422-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Reviewed-by: Kees Cook Signed-off-by: Andrew Morton --- arch/alpha/kernel/osf_sys.c | 2 -- include/linux/syscalls.h | 1 + kernel/time/posix-stubs.c | 1 + 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 2a9a877a0508..d98701ee36c6 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -1014,8 +1014,6 @@ SYSCALL_DEFINE2(osf_settimeofday, struct timeval32 __user *, tv, return do_sys_settimeofday64(tv ? &kts : NULL, tz ? &ktz : NULL); } -asmlinkage long sys_ni_posix_timers(void); - SYSCALL_DEFINE2(osf_utimes, const char __user *, filename, struct timeval32 __user *, tvs) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 33a0ee3bcb2e..24871f8ec8bb 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1280,6 +1280,7 @@ asmlinkage long sys_ni_syscall(void); #endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */ +asmlinkage long sys_ni_posix_timers(void); /* * Kernel code should not call syscalls (i.e., sys_xyzyyz()) directly. diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c index 828aeecbd1e8..39769b2d1005 100644 --- a/kernel/time/posix-stubs.c +++ b/kernel/time/posix-stubs.c @@ -16,6 +16,7 @@ #include #include #include +#include #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER /* Architectures may override SYS_NI and COMPAT_SYS_NI */ From 9ec272c586b07d1abf73438524bd12b1df9c5f9b Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:31 -0700 Subject: [PATCH 53/72] watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews". This patch series attempts to finish resolving the feedback received from Petr Mladek on the v5 series I posted. Probably the only thing that wasn't fully as clean as Petr requested was the Kconfig stuff. I couldn't find a better way to express it without a more major overhaul. In the very least, I renamed "NON_ARCH" to "PERF_OR_BUDDY" in the hopes that will make it marginally better. Nothing in this series is terribly critical and even the bugfixes are small. However, it does cleanup a few things that were pointed out in review. This patch (of 10): The permissions for the kernel.nmi_watchdog sysctl have always been set at compile time despite the fact that a watchdog can fail to probe. Let's fix this and set the permissions based on whether the hardlockup detector actually probed. Link: https://lkml.kernel.org/r/20230527014153.2793931-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf") Signed-off-by: Douglas Anderson Reported-by: Petr Mladek Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- include/linux/nmi.h | 6 ------ kernel/watchdog.c | 30 ++++++++++++++++++++---------- 2 files changed, 20 insertions(+), 16 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 83577ae736cc..99b7d748ca21 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -95,12 +95,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); static inline void arch_touch_nmi_watchdog(void) { } #endif -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) -# define NMI_WATCHDOG_SYSCTL_PERM 0644 -#else -# define NMI_WATCHDOG_SYSCTL_PERM 0444 -#endif - #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) extern void hardlockup_detector_perf_stop(void); extern void hardlockup_detector_perf_restart(void); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 237990e8d345..4b9e31edb47f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -880,15 +880,6 @@ static struct ctl_table watchdog_sysctls[] = { .extra1 = SYSCTL_ZERO, .extra2 = (void *)&sixty, }, - { - .procname = "nmi_watchdog", - .data = &watchdog_hardlockup_user_enabled, - .maxlen = sizeof(int), - .mode = NMI_WATCHDOG_SYSCTL_PERM, - .proc_handler = proc_nmi_watchdog, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, { .procname = "watchdog_cpumask", .data = &watchdog_cpumask_bits, @@ -952,10 +943,28 @@ static struct ctl_table watchdog_sysctls[] = { {} }; +static struct ctl_table watchdog_hardlockup_sysctl[] = { + { + .procname = "nmi_watchdog", + .data = &watchdog_hardlockup_user_enabled, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_nmi_watchdog, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + static void __init watchdog_sysctl_init(void) { register_sysctl_init("kernel", watchdog_sysctls); + + if (watchdog_hardlockup_available) + watchdog_hardlockup_sysctl[0].mode = 0644; + register_sysctl_init("kernel", watchdog_hardlockup_sysctl); } + #else #define watchdog_sysctl_init() do { } while (0) #endif /* CONFIG_SYSCTL */ @@ -1011,6 +1020,8 @@ static int __init lockup_detector_check(void) /* Make sure no work is pending. */ flush_work(&detector_work); + watchdog_sysctl_init(); + return 0; } @@ -1030,5 +1041,4 @@ void __init lockup_detector_init(void) allow_lockup_detector_init_retry = true; lockup_detector_setup(); - watchdog_sysctl_init(); } From 6426e8d1f27417834ea37e75a9ead832d1cf7713 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:32 -0700 Subject: [PATCH 54/72] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one architecture, we have some special case code in the watchdog core to handle the fact that watchdog_hardlockup_probe() isn't implemented. Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the special case. As a side effect of doing this, code inspection tells us that we could fix a minor bug where the system won't properly realize that NMI watchdogs are disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then nothing will override the "weak" watchdog_hardlockup_probe() and we'll fallback to looking at CONFIG_HAVE_NMI_WATCHDOG. Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/Kconfig | 3 ++- arch/sparc/kernel/nmi.c | 5 +++++ kernel/watchdog.c | 13 ------------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..422f0ffa269e 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -405,7 +405,8 @@ config HAVE_NMI_WATCHDOG bool help The arch provides a low level NMI watchdog. It provides - asm/nmi.h, and defines its own arch_touch_nmi_watchdog(). + asm/nmi.h, and defines its own watchdog_hardlockup_probe() and + arch_touch_nmi_watchdog(). config HAVE_HARDLOCKUP_DETECTOR_ARCH bool diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c index 9d9e29b75c43..17cdfdbf1f3b 100644 --- a/arch/sparc/kernel/nmi.c +++ b/arch/sparc/kernel/nmi.c @@ -65,6 +65,11 @@ void arch_touch_nmi_watchdog(void) } EXPORT_SYMBOL(arch_touch_nmi_watchdog); +int __init watchdog_hardlockup_probe(void) +{ + return 0; +} + static void die_nmi(const char *str, struct pt_regs *regs, int do_panic) { int this_cpu = smp_processor_id(); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 4b9e31edb47f..62230f5b8878 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -217,19 +217,6 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { } */ int __weak __init watchdog_hardlockup_probe(void) { - /* - * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture - * is assumed to have the hard watchdog available and we return 0. - */ - if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG)) - return 0; - - /* - * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG - * are required to implement a non-weak version of this probe function - * to tell whether they are available. If they don't override then - * we'll return -ENODEV. - */ return -ENODEV; } From 2711e4adef4fac2eeaee66e3c22a2f75ee86e7b3 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:33 -0700 Subject: [PATCH 55/72] watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). Using this_cpu_ptr() works fine. Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- kernel/watchdog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 62230f5b8878..32dac8028753 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -133,7 +133,7 @@ static bool is_hardlockup(unsigned int cpu) static unsigned long watchdog_hardlockup_kick(void) { - return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts)); + return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts)); } void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) From 7a71d8e650b06833095e7a0d4206585e8585c00f Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:34 -0700 Subject: [PATCH 56/72] watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started using a cpumask to keep track of which CPUs to backtrace. When setting up this cpumask, it's better to use cpumask_copy() than to just copy the structure directly. Fix this. Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- kernel/watchdog.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 32dac8028753..85f4839b6faf 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -154,7 +154,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) */ if (is_hardlockup(cpu)) { unsigned int this_cpu = smp_processor_id(); - struct cpumask backtrace_mask = *cpu_online_mask; + struct cpumask backtrace_mask; + + cpumask_copy(&backtrace_mask, cpu_online_mask); /* Only print hardlockups once. */ if (per_cpu(watchdog_hardlockup_warned, cpu)) From 05e7b558766114aa9c3d5d3af188a5c574809661 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:35 -0700 Subject: [PATCH 57/72] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() In the patch ("watchdog/hardlockup: add comments to touch_nmi_watchdog()") we adjusted some comments for touch_nmi_watchdog(). The comment about the softlockup had a typo and were also felt to be too obvious. Remove it. Link: https://lkml.kernel.org/r/20230526184139.5.Ia593afc9eb12082d55ea6681dc2c5a89677f20a8@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- include/linux/nmi.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 99b7d748ca21..3625d64da6db 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -140,10 +140,6 @@ static inline void touch_nmi_watchdog(void) */ arch_touch_nmi_watchdog(); - /* - * Touching the hardlock detector implicitly resets the - * softlockup detector too - */ touch_softlockup_watchdog(); } From d3b62ace0f097f1d863fb6c41df3c61503e4ec9e Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:36 -0700 Subject: [PATCH 58/72] watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called In the patch ("watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs"), we added a call from the common watchdog.c file into the buddy. That call could be done more cleanly. Specifically: 1. If we move the call into watchdog_hardlockup_kick() then it keeps watchdog_timer_fn() simpler. 2. We don't need to pass an "unsigned long" to the buddy for the timer count. In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") the count was changed to "atomic_t" which is backed by an int, so we should match types. Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- include/linux/nmi.h | 4 ++-- kernel/watchdog.c | 15 +++++++-------- kernel/watchdog_buddy.c | 2 +- 3 files changed, 10 insertions(+), 11 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 3625d64da6db..d35393405b24 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -114,9 +114,9 @@ void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY -void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts); +void watchdog_buddy_check_hardlockup(int hrtimer_interrupts); #else -static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {} +static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {} #endif /** diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 85f4839b6faf..6cc46b8e3d07 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -131,9 +131,12 @@ static bool is_hardlockup(unsigned int cpu) return false; } -static unsigned long watchdog_hardlockup_kick(void) +static void watchdog_hardlockup_kick(void) { - return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts)); + int new_interrupts; + + new_interrupts = atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts)); + watchdog_buddy_check_hardlockup(new_interrupts); } void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) @@ -195,7 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) #else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ -static inline unsigned long watchdog_hardlockup_kick(void) { return 0; } +static inline void watchdog_hardlockup_kick(void) { } #endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ @@ -449,15 +452,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; - unsigned long hrtimer_interrupts; if (!watchdog_enabled) return HRTIMER_NORESTART; - hrtimer_interrupts = watchdog_hardlockup_kick(); - - /* test for hardlockups */ - watchdog_buddy_check_hardlockup(hrtimer_interrupts); + watchdog_hardlockup_kick(); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index fee45af2e5bd..3ffc5f2ede5a 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -72,7 +72,7 @@ void watchdog_hardlockup_disable(unsigned int cpu) cpumask_clear_cpu(cpu, &watchdog_cpus); } -void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) +void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) { unsigned int next_cpu; From 813efda23934edcad96343fc96727017378c3fe9 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:37 -0700 Subject: [PATCH 59/72] watchdog/buddy: don't copy the cpumask in watchdog_next_cpu() There's no reason to make a copy of the "watchdog_cpus" locally in watchdog_next_cpu(). Making a copy wouldn't make things any more race free and we're just reading the value so there's no need for a copy. Link: https://lkml.kernel.org/r/20230526184139.7.If466f9a2b50884cbf6a1d8ad05525a2c17069407@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- kernel/watchdog_buddy.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index 3ffc5f2ede5a..2ef88722c5e7 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -10,12 +10,11 @@ static cpumask_t __read_mostly watchdog_cpus; static unsigned int watchdog_next_cpu(unsigned int cpu) { - cpumask_t cpus = watchdog_cpus; unsigned int next_cpu; - next_cpu = cpumask_next(cpu, &cpus); + next_cpu = cpumask_next(cpu, &watchdog_cpus); if (next_cpu >= nr_cpu_ids) - next_cpu = cpumask_first(&cpus); + next_cpu = cpumask_first(&watchdog_cpus); if (next_cpu == cpu) return nr_cpu_ids; From 7ece48b7b4a22c1b2d59d7ab8ebcbacbfcaa7872 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:38 -0700 Subject: [PATCH 60/72] watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more complicated than it needed to be. If the "perf" detector is available and we have SMP then we have a choice, so enable the config based on just those two config items. Link: https://lkml.kernel.org/r/20230526184139.8.I49d5b483336b65b8acb1e5066548a05260caf809@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index abcad0513a32..ed7b01c4bd41 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1065,7 +1065,7 @@ config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH config HARDLOCKUP_DETECTOR_PREFER_BUDDY bool "Prefer the buddy CPU hardlockup detector" - depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP help Say Y here to prefer the buddy hardlockup detector over the perf one. From 28168eca3297d68faa8a9433ec93cb6acf06d2f4 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 26 May 2023 18:41:39 -0700 Subject: [PATCH 61/72] watchdog/hardlockup: move SMP barriers from common code to buddy code It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hardlockup detector, that the barriers belong in the buddy code. Let's move them and add clearer comments about why they're needed. Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid Signed-off-by: Douglas Anderson Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- kernel/watchdog.c | 6 ------ kernel/watchdog_buddy.c | 21 +++++++++++++++++++++ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 6cc46b8e3d07..a351ab0c35eb 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -109,9 +109,6 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog); void watchdog_hardlockup_touch_cpu(unsigned int cpu) { per_cpu(watchdog_hardlockup_touched, cpu) = true; - - /* Match with smp_rmb() in watchdog_hardlockup_check() */ - smp_wmb(); } static bool is_hardlockup(unsigned int cpu) @@ -141,9 +138,6 @@ static void watchdog_hardlockup_kick(void) void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { - /* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */ - smp_rmb(); - if (per_cpu(watchdog_hardlockup_touched, cpu)) { per_cpu(watchdog_hardlockup_touched, cpu) = false; return; diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index 2ef88722c5e7..34dbfe091f4b 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -51,6 +51,13 @@ void watchdog_hardlockup_enable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on this CPU before + * other CPUs could see it in watchdog_cpus. The counter + * part is in watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_set_cpu(cpu, &watchdog_cpus); } @@ -68,6 +75,13 @@ void watchdog_hardlockup_disable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on the next CPU before + * this CPU disappear in watchdog_cpus. The counter part is in + * watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_clear_cpu(cpu, &watchdog_cpus); } @@ -88,5 +102,12 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) if (next_cpu >= nr_cpu_ids) return; + /* + * Make sure that the watchdog was touched on next CPU when + * watchdog_next_cpu() returned another one because of + * a change in watchdog_hardlockup_enable()/disable(). + */ + smp_rmb(); + watchdog_hardlockup_check(next_cpu, NULL); } From 4917a25f83a8dc95eafd0107be87d4340f48d265 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:13 +0200 Subject: [PATCH 62/72] watchdog/hardlockup: sort hardlockup detector related config values a logical way Patch series "watchdog/hardlockup: Cleanup configuration of hardlockup detectors", v2. Clean up watchdog Kconfig after introducing the buddy detector. This patch (of 6): There are four possible variants of hardlockup detectors: + buddy: available when SMP is set. + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set. + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set. + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set. Only one hardlockup detector can be compiled in. The selection is done using quite complex dependencies between several CONFIG variables. The following patches will try to make it more straightforward. As a first step, reorder the definitions of the various CONFIG variables. The logical order is: 1. HAVE_* variables define available variants. They are typically defined in the arch/ config files. 2. HARDLOCKUP_DETECTOR y/n variable defines whether the hardlockup detector is enabled at all. 3. HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n variable defines whether the buddy detector should be preferred over the perf one. Note that the arch specific variants are always preferred when available. 4. HARDLOCKUP_DETECTOR_PERF/BUDDY variables define whether the given detector is enabled in the end. 5. HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and HARDLOCKUP_DETECTOR_NON_ARCH are temporary variables that are going to be removed in a followup patch. This is a preparation step for further cleanup. It will change the logic without shuffling the definitions. This change temporary breaks the C-like ordering where the variables are declared or defined before they are used. It is not really needed for Kconfig. Also the following patches will rework the logic so that the ordering will be C-like in the end. The patch just shuffles the definitions. It should not change the existing behavior. Link: https://lkml.kernel.org/r/20230616150618.6073-1-pmladek@suse.com Link: https://lkml.kernel.org/r/20230616150618.6073-2-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- lib/Kconfig.debug | 112 +++++++++++++++++++++++----------------------- 1 file changed, 56 insertions(+), 56 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ed7b01c4bd41..3e91fa33c7a0 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1035,62 +1035,6 @@ config BOOTPARAM_SOFTLOCKUP_PANIC Say N if unsure. -# Both the "perf" and "buddy" hardlockup detectors count hrtimer -# interrupts. This config enables functions managing this common code. -config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER - bool - select SOFTLOCKUP_DETECTOR - -config HARDLOCKUP_DETECTOR_PERF - bool - depends on HAVE_HARDLOCKUP_DETECTOR_PERF - select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER - -config HARDLOCKUP_DETECTOR_BUDDY - bool - depends on SMP - select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER - -# For hardlockup detectors you can have one directly provided by the arch -# or use a "non-arch" one. If you're using a "non-arch" one that is -# further divided the perf hardlockup detector (which, confusingly, needs -# arch-provided perf support) and the buddy hardlockup detector (which just -# needs SMP). In either case, using the "non-arch" code conflicts with -# the NMI watchdog code (which is sometimes used directly and sometimes used -# by the arch-provided hardlockup detector). -config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH - bool - depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG - default y - -config HARDLOCKUP_DETECTOR_PREFER_BUDDY - bool "Prefer the buddy CPU hardlockup detector" - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP - help - Say Y here to prefer the buddy hardlockup detector over the perf one. - - With the buddy detector, each CPU uses its softlockup hrtimer - to check that the next CPU is processing hrtimer interrupts by - verifying that a counter is increasing. - - This hardlockup detector is useful on systems that don't have - an arch-specific hardlockup detector or if resources needed - for the hardlockup detector are better used for other things. - -# This will select the appropriate non-arch hardlockdup detector -config HARDLOCKUP_DETECTOR_NON_ARCH - bool - depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH - select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY - -# -# Enables a timestamp based low pass filter to compensate for perf based -# hard lockup detection which runs too fast due to turbo modes. -# -config HARDLOCKUP_CHECK_TIMESTAMP - bool - # # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard # lockup detector rather than the perf based detector. @@ -1111,6 +1055,62 @@ config HARDLOCKUP_DETECTOR chance to run. The current stack trace is displayed upon detection and the system will stay locked up. +config HARDLOCKUP_DETECTOR_PREFER_BUDDY + bool "Prefer the buddy CPU hardlockup detector" + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + help + Say Y here to prefer the buddy hardlockup detector over the perf one. + + With the buddy detector, each CPU uses its softlockup hrtimer + to check that the next CPU is processing hrtimer interrupts by + verifying that a counter is increasing. + + This hardlockup detector is useful on systems that don't have + an arch-specific hardlockup detector or if resources needed + for the hardlockup detector are better used for other things. + +config HARDLOCKUP_DETECTOR_PERF + bool + depends on HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +config HARDLOCKUP_DETECTOR_BUDDY + bool + depends on SMP + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +# Both the "perf" and "buddy" hardlockup detectors count hrtimer +# interrupts. This config enables functions managing this common code. +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + bool + select SOFTLOCKUP_DETECTOR + +# For hardlockup detectors you can have one directly provided by the arch +# or use a "non-arch" one. If you're using a "non-arch" one that is +# further divided the perf hardlockup detector (which, confusingly, needs +# arch-provided perf support) and the buddy hardlockup detector (which just +# needs SMP). In either case, using the "non-arch" code conflicts with +# the NMI watchdog code (which is sometimes used directly and sometimes used +# by the arch-provided hardlockup detector). +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG + default y + +# This will select the appropriate non-arch hardlockdup detector +config HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + +# +# Enables a timestamp based low pass filter to compensate for perf based +# hard lockup detection which runs too fast due to turbo modes. +# +config HARDLOCKUP_CHECK_TIMESTAMP + bool + config BOOTPARAM_HARDLOCKUP_PANIC bool "Panic (Reboot) On Hard Lockups" depends on HARDLOCKUP_DETECTOR From 1356d0b966e7ed81832af35478b913495cf7792e Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:14 +0200 Subject: [PATCH 63/72] watchdog/hardlockup: make the config checks more straightforward There are four possible variants of hardlockup detectors: + buddy: available when SMP is set. + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set. + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set. + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set. The check for the sparc64 variant is more complicated because HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific and sparc64 specific variant. Therefore it is automatically selected with HAVE_HARDLOCKUP_DETECTOR_ARCH. This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH. It reduces the size of some checks but it makes them harder to follow. Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global HARDLOCKUP_DETECTOR switch is enabled/disabled. Make the logic more straightforward by the following changes: + Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and HAVE_NMI_WATCHDOG in comments. + Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate HAVE_* for all four hardlockup detector variants. Use it in the other conditions instead of SMP. It makes it clear that it is about the buddy detector. + Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand the conditions between the four hardlockup detector variants. + Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY can be enabled. It explains the dependency on the other hardlockup detector variants. Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply". It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when the global HARDLOCKUP_DETECTOR switch is changed. + Add dependency on HARDLOCKUP_DETECTOR so that the affected variables disappear when the hardlockup detectors are disabled. Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY value is not preserved when the global switch is disabled. The user has to make the decision again when it gets re-enabled. Link: https://lkml.kernel.org/r/20230616150618.6073-3-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/Kconfig | 23 +++++++++++++----- lib/Kconfig.debug | 62 +++++++++++++++++++++++++++-------------------- 2 files changed, 53 insertions(+), 32 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 422f0ffa269e..77e5af5fda3f 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -404,17 +404,28 @@ config HAVE_NMI_WATCHDOG depends on HAVE_NMI bool help - The arch provides a low level NMI watchdog. It provides - asm/nmi.h, and defines its own watchdog_hardlockup_probe() and - arch_touch_nmi_watchdog(). + The arch provides its own hardlockup detector implementation instead + of the generic ones. + + Sparc64 defines this variable without HAVE_HARDLOCKUP_DETECTOR_ARCH. + It is the last arch-specific implementation which was developed before + adding the common infrastructure for handling hardlockup detectors. + It is always built. It does _not_ use the common command line + parameters and sysctl interface, except for + /proc/sys/kernel/nmi_watchdog. config HAVE_HARDLOCKUP_DETECTOR_ARCH bool select HAVE_NMI_WATCHDOG help - The arch chooses to provide its own hardlockup detector, which is - a superset of the HAVE_NMI_WATCHDOG. It also conforms to config - interfaces and parameters provided by hardlockup detector subsystem. + The arch provides its own hardlockup detector implementation instead + of the generic ones. + + It uses the same command line parameters, and sysctl interface, + as the generic hardlockup detectors. + + HAVE_NMI_WATCHDOG is selected to build the code shared with + the sparc64 specific implementation. config HAVE_PERF_REGS bool diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 3e91fa33c7a0..a0b0c4decb89 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1035,16 +1035,33 @@ config BOOTPARAM_SOFTLOCKUP_PANIC Say N if unsure. +config HAVE_HARDLOCKUP_DETECTOR_BUDDY + bool + depends on SMP + default y + # -# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard -# lockup detector rather than the perf based detector. +# Global switch whether to build a hardlockup detector at all. It is available +# only when the architecture supports at least one implementation. There are +# two exceptions. The hardlockup detector is never enabled on: +# +# s390: it reported many false positives there +# +# sparc64: has a custom implementation which is not using the common +# hardlockup command line options and sysctl interface. +# +# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific +# implementaion. It is automatically enabled also for other arch-specific +# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check +# of avaialable and supported variants quite tricky. # config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" depends on DEBUG_KERNEL && !S390 - depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH + imply HARDLOCKUP_DETECTOR_PERF + imply HARDLOCKUP_DETECTOR_BUDDY select LOCKUP_DETECTOR - select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH help Say Y here to enable the kernel to act as a watchdog to detect @@ -1055,9 +1072,14 @@ config HARDLOCKUP_DETECTOR chance to run. The current stack trace is displayed upon detection and the system will stay locked up. +# +# Note that arch-specific variants are always preferred. +# config HARDLOCKUP_DETECTOR_PREFER_BUDDY bool "Prefer the buddy CPU hardlockup detector" - depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + depends on HARDLOCKUP_DETECTOR + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY + depends on !HAVE_NMI_WATCHDOG help Say Y here to prefer the buddy hardlockup detector over the perf one. @@ -1071,39 +1093,27 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY config HARDLOCKUP_DETECTOR_PERF bool - depends on HAVE_HARDLOCKUP_DETECTOR_PERF + depends on HARDLOCKUP_DETECTOR + depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + depends on !HAVE_NMI_WATCHDOG select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER config HARDLOCKUP_DETECTOR_BUDDY bool - depends on SMP + depends on HARDLOCKUP_DETECTOR + depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY + depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY + depends on !HAVE_NMI_WATCHDOG select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER +# # Both the "perf" and "buddy" hardlockup detectors count hrtimer # interrupts. This config enables functions managing this common code. +# config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER bool select SOFTLOCKUP_DETECTOR -# For hardlockup detectors you can have one directly provided by the arch -# or use a "non-arch" one. If you're using a "non-arch" one that is -# further divided the perf hardlockup detector (which, confusingly, needs -# arch-provided perf support) and the buddy hardlockup detector (which just -# needs SMP). In either case, using the "non-arch" code conflicts with -# the NMI watchdog code (which is sometimes used directly and sometimes used -# by the arch-provided hardlockup detector). -config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH - bool - depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG - default y - -# This will select the appropriate non-arch hardlockdup detector -config HARDLOCKUP_DETECTOR_NON_ARCH - bool - depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH - select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY - # # Enables a timestamp based low pass filter to compensate for perf based # hard lockup detection which runs too fast due to turbo modes. From 0c68bda69665307bf835b0c433363e5073608c95 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:15 +0200 Subject: [PATCH 64/72] watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h arch_touch_nmi_watchdog() needs a different implementation for various hardlockup detector implementations. And it does nothing when any hardlockup detector is not built at all. arch_touch_nmi_watchdog() is declared via linux/nmi.h. And it must be defined as an empty function when there is no hardlockup detector. It is done directly in this header file for the perf and buddy detectors. And it is done in the included asm/linux.h for arch specific detectors. The reason probably is that the arch specific variants build the code using another conditions. For example, powerpc64/sparc64 builds the code when CONFIG_PPC_WATCHDOG is enabled. Another reason might be that these architectures define more functions in asm/nmi.h anyway. However the generic code actually knows when the function will be implemented. It happens when some full featured or the sparc64-specific hardlockup detector is built. In particular, CONFIG_HARDLOCKUP_DETECTOR can be enabled only when a generic or arch-specific full featured hardlockup detector is available. The only exception is sparc64 which can be built even when the global HARDLOCKUP_DETECTOR switch is disabled. The information about sparc64 is a bit complicated. The hardlockup detector is built there when CONFIG_HAVE_NMI_WATCHDOG is set and CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH is not set. People might wonder whether this change really makes things easier. The motivation is: + The current logic in linux/nmi.h is far from obvious. For example, arch_touch_nmi_watchdog() is defined as {} when neither CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER nor CONFIG_HAVE_NMI_WATCHDOG is defined. + The change synchronizes the checks in lib/Kconfig.debug and in the generic code. + It is a step that will help cleaning HAVE_NMI_WATCHDOG related checks. The change should not change the existing behavior. Link: https://lkml.kernel.org/r/20230616150618.6073-4-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/powerpc/include/asm/nmi.h | 2 -- arch/sparc/include/asm/nmi.h | 1 - include/linux/nmi.h | 13 ++++++++++--- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h index 43bfd4de868f..ce25318c3902 100644 --- a/arch/powerpc/include/asm/nmi.h +++ b/arch/powerpc/include/asm/nmi.h @@ -3,11 +3,9 @@ #define _ASM_NMI_H #ifdef CONFIG_PPC_WATCHDOG -extern void arch_touch_nmi_watchdog(void); long soft_nmi_interrupt(struct pt_regs *regs); void watchdog_hardlockup_set_timeout_pct(u64 pct); #else -static inline void arch_touch_nmi_watchdog(void) {} static inline void watchdog_hardlockup_set_timeout_pct(u64 pct) {} #endif diff --git a/arch/sparc/include/asm/nmi.h b/arch/sparc/include/asm/nmi.h index 90ee7863d9fe..920dc23f443f 100644 --- a/arch/sparc/include/asm/nmi.h +++ b/arch/sparc/include/asm/nmi.h @@ -8,7 +8,6 @@ void nmi_adjust_hz(unsigned int new_hz); extern atomic_t nmi_active; -void arch_touch_nmi_watchdog(void); void start_nmi_watchdog(void *unused); void stop_nmi_watchdog(void *unused); diff --git a/include/linux/nmi.h b/include/linux/nmi.h index d35393405b24..07bf2b813463 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -7,6 +7,8 @@ #include #include + +/* Arch specific watchdogs might need to share extra watchdog-related APIs. */ #if defined(CONFIG_HAVE_NMI_WATCHDOG) #include #endif @@ -87,12 +89,17 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif -#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) +/* Sparc64 has special implemetantion that is always enabled. */ +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || \ + (defined(CONFIG_HAVE_NMI_WATCHDOG) && !defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH)) void arch_touch_nmi_watchdog(void); +#else +static inline void arch_touch_nmi_watchdog(void) { } +#endif + +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) void watchdog_hardlockup_touch_cpu(unsigned int cpu); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); -#elif !defined(CONFIG_HAVE_NMI_WATCHDOG) -static inline void arch_touch_nmi_watchdog(void) { } #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) From a5fcc2367e223c45c78a882438c2b8e13fe0f580 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:16 +0200 Subject: [PATCH 65/72] watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one. CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36. It was a preparation step for introducing the new generic perf hardlockup detector. The existing arch-specific variants did not support the to-be-created generic build configurations, sysctl interface, etc. This distinction was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR") in v2.6.38. CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105 ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used by three architectures, namely blackfin, mn10300, and sparc. The support for blackfin and mn10300 architectures has been completely dropped some time ago. And sparc is the only architecture with the historic NMI watchdog at the moment. And the old sparc implementation is really special. It is always built on sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added in v4.10-rc1. There are only few locations where the sparc64 NMI watchdog interacts with the generic hardlockup detectors code: + implements arch_touch_nmi_watchdog() which is called from the generic touch_nmi_watchdog() + implements watchdog_hardlockup_enable()/disable() to support /proc/sys/kernel/nmi_watchdog + is always preferred over other generic watchdogs, see CONFIG_HARDLOCKUP_DETECTOR + includes asm/nmi.h into linux/nmi.h because some sparc-specific functions are needed in sparc-specific code which includes only linux/nmi.h. The situation became more complicated after the commit 05a4a95279311c3 ("kernel/watchdog: split up config options") and commit 2104180a53698df5 ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1. They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc specific hardlockup detector. It was compatible with the perf one regarding the general boot, sysctl, and programming interfaces. HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of HAVE_NMI_WATCHDOG. It made some sense because all arch-specific detectors had some common requirements, namely: + implemented arch_touch_nmi_watchdog() + included asm/nmi.h into linux/nmi.h + defined the default value for /proc/sys/kernel/nmi_watchdog But it actually has made things pretty complicated when the generic buddy hardlockup detector was added. Before the generic perf detector was newer supported together with an arch-specific one. But the buddy detector could work on any SMP system. It means that an architecture could support both the arch-specific and buddy detector. As a result, there are few tricky dependencies. For example, CONFIG_HARDLOCKUP_DETECTOR depends on: ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH The problem is that the very special sparc implementation is defined as: HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear without reading understanding the history. Make the logic less tricky and more self-explanatory by making HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to HAVE_HARDLOCKUP_DETECTOR_SPARC64. Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF, and HARDLOCKUP_DETECTOR_BUDDY may conflict only with HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR and it is not longer enabled when HAVE_NMI_WATCHDOG is set. Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/Kconfig | 18 ------------------ arch/sparc/Kconfig | 2 +- arch/sparc/Kconfig.debug | 9 +++++++++ include/linux/nmi.h | 5 ++--- kernel/watchdog.c | 2 +- lib/Kconfig.debug | 15 +++++---------- 6 files changed, 18 insertions(+), 33 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 77e5af5fda3f..6517e5477459 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -400,23 +400,8 @@ config HAVE_HARDLOCKUP_DETECTOR_PERF The arch chooses to use the generic perf-NMI-based hardlockup detector. Must define HAVE_PERF_EVENTS_NMI. -config HAVE_NMI_WATCHDOG - depends on HAVE_NMI - bool - help - The arch provides its own hardlockup detector implementation instead - of the generic ones. - - Sparc64 defines this variable without HAVE_HARDLOCKUP_DETECTOR_ARCH. - It is the last arch-specific implementation which was developed before - adding the common infrastructure for handling hardlockup detectors. - It is always built. It does _not_ use the common command line - parameters and sysctl interface, except for - /proc/sys/kernel/nmi_watchdog. - config HAVE_HARDLOCKUP_DETECTOR_ARCH bool - select HAVE_NMI_WATCHDOG help The arch provides its own hardlockup detector implementation instead of the generic ones. @@ -424,9 +409,6 @@ config HAVE_HARDLOCKUP_DETECTOR_ARCH It uses the same command line parameters, and sysctl interface, as the generic hardlockup detectors. - HAVE_NMI_WATCHDOG is selected to build the code shared with - the sparc64 specific implementation. - config HAVE_PERF_REGS bool help diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 8535e19062f6..7297f69635cb 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -33,7 +33,7 @@ config SPARC select ARCH_WANT_IPC_PARSE_VERSION select GENERIC_PCI_IOMAP select HAS_IOPORT - select HAVE_NMI_WATCHDOG if SPARC64 + select HAVE_HARDLOCKUP_DETECTOR_SPARC64 if SPARC64 select HAVE_CBPF_JIT if SPARC32 select HAVE_EBPF_JIT if SPARC64 select HAVE_DEBUG_BUGVERBOSE diff --git a/arch/sparc/Kconfig.debug b/arch/sparc/Kconfig.debug index 6b2bec1888b3..4903b6847e43 100644 --- a/arch/sparc/Kconfig.debug +++ b/arch/sparc/Kconfig.debug @@ -14,3 +14,12 @@ config FRAME_POINTER bool depends on MCOUNT default y + +config HAVE_HARDLOCKUP_DETECTOR_SPARC64 + depends on HAVE_NMI + bool + help + Sparc64 hardlockup detector is the last one developed before adding + the common infrastructure for handling hardlockup detectors. It is + always built. It does _not_ use the common command line parameters + and sysctl interface, except for /proc/sys/kernel/nmi_watchdog. diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 07bf2b813463..63acc6586774 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -9,7 +9,7 @@ #include /* Arch specific watchdogs might need to share extra watchdog-related APIs. */ -#if defined(CONFIG_HAVE_NMI_WATCHDOG) +#if defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) #include #endif @@ -90,8 +90,7 @@ static inline void hardlockup_detector_disable(void) {} #endif /* Sparc64 has special implemetantion that is always enabled. */ -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || \ - (defined(CONFIG_HAVE_NMI_WATCHDOG) && !defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH)) +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) void arch_touch_nmi_watchdog(void); #else static inline void arch_touch_nmi_watchdog(void) { } diff --git a/kernel/watchdog.c b/kernel/watchdog.c index a351ab0c35eb..010fcc3ac141 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -29,7 +29,7 @@ static DEFINE_MUTEX(watchdog_mutex); -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG) +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) # define WATCHDOG_HARDLOCKUP_DEFAULT 1 #else # define WATCHDOG_HARDLOCKUP_DEFAULT 0 diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a0b0c4decb89..e94664339e28 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1050,15 +1050,10 @@ config HAVE_HARDLOCKUP_DETECTOR_BUDDY # sparc64: has a custom implementation which is not using the common # hardlockup command line options and sysctl interface. # -# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific -# implementaion. It is automatically enabled also for other arch-specific -# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check -# of avaialable and supported variants quite tricky. -# config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" - depends on DEBUG_KERNEL && !S390 - depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on DEBUG_KERNEL && !S390 && !HAVE_HARDLOCKUP_DETECTOR_SPARC64 + depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH imply HARDLOCKUP_DETECTOR_PERF imply HARDLOCKUP_DETECTOR_BUDDY select LOCKUP_DETECTOR @@ -1079,7 +1074,7 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY bool "Prefer the buddy CPU hardlockup detector" depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY - depends on !HAVE_NMI_WATCHDOG + depends on !HAVE_HARLOCKUP_DETECTOR_ARCH help Say Y here to prefer the buddy hardlockup detector over the perf one. @@ -1095,7 +1090,7 @@ config HARDLOCKUP_DETECTOR_PERF bool depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY - depends on !HAVE_NMI_WATCHDOG + depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER config HARDLOCKUP_DETECTOR_BUDDY @@ -1103,7 +1098,7 @@ config HARDLOCKUP_DETECTOR_BUDDY depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY - depends on !HAVE_NMI_WATCHDOG + depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER # From 47f4cb433923a08d81f1e5c065cb680215109db9 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:17 +0200 Subject: [PATCH 66/72] watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors. Before, it is far from obvious that the SPARC64 variant is actually used: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y After, it is more clear: $> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/sparc/Kconfig.debug | 7 ++++++- include/linux/nmi.h | 4 ++-- kernel/watchdog.c | 2 +- lib/Kconfig.debug | 2 +- 4 files changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/sparc/Kconfig.debug b/arch/sparc/Kconfig.debug index 4903b6847e43..37e003665de6 100644 --- a/arch/sparc/Kconfig.debug +++ b/arch/sparc/Kconfig.debug @@ -16,10 +16,15 @@ config FRAME_POINTER default y config HAVE_HARDLOCKUP_DETECTOR_SPARC64 - depends on HAVE_NMI bool + depends on HAVE_NMI + select HARDLOCKUP_DETECTOR_SPARC64 help Sparc64 hardlockup detector is the last one developed before adding the common infrastructure for handling hardlockup detectors. It is always built. It does _not_ use the common command line parameters and sysctl interface, except for /proc/sys/kernel/nmi_watchdog. + +config HARDLOCKUP_DETECTOR_SPARC64 + bool + depends on HAVE_HARDLOCKUP_DETECTOR_SPARC64 diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 63acc6586774..e91a1f803d55 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -9,7 +9,7 @@ #include /* Arch specific watchdogs might need to share extra watchdog-related APIs. */ -#if defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) +#if defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) #include #endif @@ -90,7 +90,7 @@ static inline void hardlockup_detector_disable(void) {} #endif /* Sparc64 has special implemetantion that is always enabled. */ -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) void arch_touch_nmi_watchdog(void); #else static inline void arch_touch_nmi_watchdog(void) { } diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 010fcc3ac141..be38276a365f 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -29,7 +29,7 @@ static DEFINE_MUTEX(watchdog_mutex); -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64) +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) # define WATCHDOG_HARDLOCKUP_DEFAULT 1 #else # define WATCHDOG_HARDLOCKUP_DEFAULT 0 diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e94664339e28..f285e9cf967a 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1052,7 +1052,7 @@ config HAVE_HARDLOCKUP_DETECTOR_BUDDY # config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" - depends on DEBUG_KERNEL && !S390 && !HAVE_HARDLOCKUP_DETECTOR_SPARC64 + depends on DEBUG_KERNEL && !S390 && !HARDLOCKUP_DETECTOR_SPARC64 depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH imply HARDLOCKUP_DETECTOR_PERF imply HARDLOCKUP_DETECTOR_BUDDY From 7ca8fe94aa92d9adcd7dcdf64371fc78eb2da3f9 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 16 Jun 2023 17:06:18 +0200 Subject: [PATCH 67/72] watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors. The change allows to clean up dependencies of PPC_WATCHDOG and HAVE_HARDLOCKUP_DETECTOR_PERF definitions for powerpc. As a result HAVE_HARDLOCKUP_DETECTOR_PERF has the same dependencies on arm, x86, powerpc architectures. Link: https://lkml.kernel.org/r/20230616150618.6073-7-pmladek@suse.com Signed-off-by: Petr Mladek Reviewed-by: Douglas Anderson Cc: Christophe Leroy Cc: "David S. Miller" Cc: Michael Ellerman Cc: Nicholas Piggin Signed-off-by: Andrew Morton --- arch/powerpc/Kconfig | 5 ++--- include/linux/nmi.h | 2 +- lib/Kconfig.debug | 9 +++++++++ 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index bff5820b7cda..8eb6c5e9e4f8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -90,8 +90,7 @@ config NMI_IPI config PPC_WATCHDOG bool - depends on HARDLOCKUP_DETECTOR - depends on HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on HARDLOCKUP_DETECTOR_ARCH default y help This is a placeholder when the powerpc hardlockup detector @@ -240,7 +239,7 @@ config PPC select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC select HAVE_GENERIC_VDSO select HAVE_HARDLOCKUP_DETECTOR_ARCH if PPC_BOOK3S_64 && SMP - select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH + select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx) select HAVE_IOREMAP_PROT select HAVE_IRQ_TIME_ACCOUNTING diff --git a/include/linux/nmi.h b/include/linux/nmi.h index e91a1f803d55..e3e6a64b98e0 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -9,7 +9,7 @@ #include /* Arch specific watchdogs might need to share extra watchdog-related APIs. */ -#if defined(CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_ARCH) || defined(CONFIG_HARDLOCKUP_DETECTOR_SPARC64) #include #endif diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index f285e9cf967a..2c4bb72e72ad 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1056,6 +1056,7 @@ config HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH imply HARDLOCKUP_DETECTOR_PERF imply HARDLOCKUP_DETECTOR_BUDDY + imply HARDLOCKUP_DETECTOR_ARCH select LOCKUP_DETECTOR help @@ -1101,6 +1102,14 @@ config HARDLOCKUP_DETECTOR_BUDDY depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER +config HARDLOCKUP_DETECTOR_ARCH + bool + depends on HARDLOCKUP_DETECTOR + depends on HAVE_HARDLOCKUP_DETECTOR_ARCH + help + The arch-specific implementation of the hardlockup detector will + be used. + # # Both the "perf" and "buddy" hardlockup detectors count hrtimer # interrupts. This config enables functions managing this common code. From 875e0c31f84cb264d4d7b1569a1966cedcc4d459 Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Wed, 21 Jun 2023 17:30:50 +0100 Subject: [PATCH 68/72] devres: show which resource was invalid in __devm_ioremap_resource() The other error prints in this call show the resource which wsan't valid, so add this to the first print when it checks for basic validity of the resource. Link: https://lkml.kernel.org/r/20230621163050.477668-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton --- lib/devres.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/devres.c b/lib/devres.c index 6baf43902ead..c44f104b58d5 100644 --- a/lib/devres.c +++ b/lib/devres.c @@ -129,7 +129,7 @@ __devm_ioremap_resource(struct device *dev, const struct resource *res, BUG_ON(!dev); if (!res || resource_type(res) != IORESOURCE_MEM) { - dev_err(dev, "invalid resource\n"); + dev_err(dev, "invalid resource %pR\n", res); return IOMEM_ERR_PTR(-EINVAL); } From df8b78e1630fa6cff82fdc33ce04000e3ed065f7 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Wed, 21 Jun 2023 16:48:19 -0700 Subject: [PATCH 69/72] powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h The powerpc architecture was the only one that defined arch_trigger_cpumask_backtrace() in asm/nmi.h instead of asm/irq.h. Move it to be consistent. This fixes compile time errors introduced by commit 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH"). That commit caused to stop being included if the hardlockup detector wasn't enabled. The specific errors were: error: implicit declaration of function `nmi_cpu_backtrace' error: implicit declaration of function `nmi_trigger_cpumask_backtrace' NOTE: when moving this into irq.h, we also change the guards from just checking if "CONFIG_NMI_IPI" is defined to also checking if "CONFIG_PPC_BOOK3S_64" is defined. This matches the code in arch/powerpc/kernel/stacktrace.c. Previously this worked because was included if "CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH" was defined. For powerpc that's only selected if "CONFIG_PPC_BOOK3S_64" is defined. [dianders@chromium.org: change the guards to include CONFIG_PPC_BOOK3S_64] Link: https://lkml.kernel.org/r/20230622202816.v2.1.Ice67126857506712559078e7de26d32d26e64631@changeid Link: https://lkml.kernel.org/r/20230621164809.1.Ice67126857506712559078e7de26d32d26e64631@changeid Fixes: 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH") Signed-off-by: Douglas Anderson Reported-by: Michael Ellerman Closes: https://lore.kernel.org/r/871qi5otdh.fsf@mail.lhotse Reviewed-by: Petr Mladek Cc: Arnd Bergmann Cc: Christophe Leroy Cc: Douglas Anderson Cc: Laurent Dufour Cc: Nicholas Piggin Cc: Stephen Rothwell Cc: Tom Rix Signed-off-by: Andrew Morton --- arch/powerpc/include/asm/irq.h | 6 ++++++ arch/powerpc/include/asm/nmi.h | 6 ------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 94dffa1dd223..f257cacb49a9 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -53,5 +53,11 @@ void __do_IRQ(struct pt_regs *regs); int irq_choose_cpu(const struct cpumask *mask); +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_NMI_IPI) +extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask, + bool exclude_self); +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace +#endif + #endif /* _ASM_IRQ_H */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h index ce25318c3902..49a75340c3e0 100644 --- a/arch/powerpc/include/asm/nmi.h +++ b/arch/powerpc/include/asm/nmi.h @@ -9,12 +9,6 @@ void watchdog_hardlockup_set_timeout_pct(u64 pct); static inline void watchdog_hardlockup_set_timeout_pct(u64 pct) {} #endif -#ifdef CONFIG_NMI_IPI -extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask, - bool exclude_self); -#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace -#endif - extern void hv_nmi_check_nonrecoverable(struct pt_regs *regs); #endif /* _ASM_NMI_H */ From a8992d8ad7775860594d3d981ef93fc423185fa4 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Fri, 23 Jun 2023 06:07:17 +0200 Subject: [PATCH 70/72] watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY Commit a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific") accidentially introduces a typo in one of the config dependencies of HARDLOCKUP_DETECTOR_PREFER_BUDDY. Fix this accidental typo. Link: https://lkml.kernel.org/r/20230623040717.8645-1-lukas.bulwahn@gmail.com Fixes: a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific") Signed-off-by: Lukas Bulwahn Reviewed-by: Petr Mladek Cc: Douglas Anderson Signed-off-by: Andrew Morton --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2c4bb72e72ad..90c88d752259 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1075,7 +1075,7 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY bool "Prefer the buddy CPU hardlockup detector" depends on HARDLOCKUP_DETECTOR depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY - depends on !HAVE_HARLOCKUP_DETECTOR_ARCH + depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH help Say Y here to prefer the buddy hardlockup detector over the perf one. From 7982f975600d59ae3a3d98831f703e55243aba7c Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 22 Jun 2023 11:27:36 +0100 Subject: [PATCH 71/72] ocfs2: remove redundant assignment to variable bit_off Variable bit_off is being assigned a value that is never read, it is being re-assigned a new value in the following while loop. Remove the assignment. Cleans up clang scan build warning: fs/ocfs2/localalloc.c:976:18: warning: Although the value stored to 'bit_off' is used in the enclosing expression, the value is never actually read from 'bit_off' [deadcode.DeadStores] Link: https://lkml.kernel.org/r/20230622102736.2831126-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/localalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index c4426d12a2ad..c803c10dd97e 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -973,7 +973,7 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, la_start_blk = ocfs2_clusters_to_blocks(osb->sb, le32_to_cpu(la->la_bm_off)); bitmap = la->la_bitmap; - start = count = bit_off = 0; + start = count = 0; left = le32_to_cpu(alloc->id1.bitmap1.i_total); while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) From 4afc9a402aa3890885747b396c1adcd45f127665 Mon Sep 17 00:00:00 2001 From: Yang Li Date: Thu, 8 Jun 2023 16:23:12 +0800 Subject: [PATCH 72/72] kernel/time/posix-stubs.c: remove duplicated include ./kernel/time/posix-stubs.c: linux/syscalls.h is included more than once. Link: https://lkml.kernel.org/r/20230608082312.123939-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5463 Fixes: c1956519cd7e ("syscalls: add sys_ni_posix_timers prototype") Signed-off-by: Yang Li Cc: Arnd Bergmann Signed-off-by: Andrew Morton --- kernel/time/posix-stubs.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c index 39769b2d1005..828aeecbd1e8 100644 --- a/kernel/time/posix-stubs.c +++ b/kernel/time/posix-stubs.c @@ -16,7 +16,6 @@ #include #include #include -#include #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER /* Architectures may override SYS_NI and COMPAT_SYS_NI */