mirror of
https://github.com/tbsdtv/linux_media.git
synced 2025-07-23 04:33:26 +02:00
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
This commit is contained in:
@@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page)
|
||||
open: yes
|
||||
close: yes
|
||||
fault: yes can return with page locked
|
||||
map_pages: yes
|
||||
map_pages: read
|
||||
page_mkwrite: yes can return with page locked
|
||||
pfn_mkwrite: yes
|
||||
access: yes
|
||||
@@ -661,7 +661,7 @@ locked. The VM will unlock the page.
|
||||
|
||||
->map_pages() is called when VM asks to map easy accessible pages.
|
||||
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
||||
till "end_pgoff". ->map_pages() is called with page table locked and must
|
||||
till "end_pgoff". ->map_pages() is called with the RCU lock held and must
|
||||
not block. If it's not possible to reach a page without blocking,
|
||||
filesystem should skip it. Filesystem should use do_set_pte() to setup
|
||||
page table entry. Pointer to entry associated with the page is passed in
|
||||
|
@@ -996,6 +996,7 @@ Example output. You may not have all of these fields.
|
||||
VmallocUsed: 40444 kB
|
||||
VmallocChunk: 0 kB
|
||||
Percpu: 29312 kB
|
||||
EarlyMemtestBad: 0 kB
|
||||
HardwareCorrupted: 0 kB
|
||||
AnonHugePages: 4149248 kB
|
||||
ShmemHugePages: 0 kB
|
||||
@@ -1146,6 +1147,13 @@ VmallocChunk
|
||||
Percpu
|
||||
Memory allocated to the percpu allocator used to back percpu
|
||||
allocations. This stat excludes the cost of metadata.
|
||||
EarlyMemtestBad
|
||||
The amount of RAM/memory in kB, that was identified as corrupted
|
||||
by early memtest. If memtest was not run, this field will not
|
||||
be displayed at all. Size is never rounded down to 0 kB.
|
||||
That means if 0 kB is reported, you can safely assume
|
||||
there was at least one pass of memtest and none of the passes
|
||||
found a single faulty byte of RAM.
|
||||
HardwareCorrupted
|
||||
The amount of RAM/memory in KB, the kernel identifies as
|
||||
corrupted.
|
||||
|
@@ -13,17 +13,29 @@ everything stored therein is lost.
|
||||
|
||||
tmpfs puts everything into the kernel internal caches and grows and
|
||||
shrinks to accommodate the files it contains and is able to swap
|
||||
unneeded pages out to swap space. It has maximum size limits which can
|
||||
be adjusted on the fly via 'mount -o remount ...'
|
||||
unneeded pages out to swap space, if swap was enabled for the tmpfs
|
||||
mount. tmpfs also supports THP.
|
||||
|
||||
If you compare it to ramfs (which was the template to create tmpfs)
|
||||
you gain swapping and limit checking. Another similar thing is the RAM
|
||||
disk (/dev/ram*), which simulates a fixed size hard disk in physical
|
||||
RAM, where you have to create an ordinary filesystem on top. Ramdisks
|
||||
cannot swap and you do not have the possibility to resize them.
|
||||
tmpfs extends ramfs with a few userspace configurable options listed and
|
||||
explained further below, some of which can be reconfigured dynamically on the
|
||||
fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
|
||||
filesystem can be resized but it cannot be resized to a size below its current
|
||||
usage. tmpfs also supports POSIX ACLs, and extended attributes for the
|
||||
trusted.* and security.* namespaces. ramfs does not use swap and you cannot
|
||||
modify any parameter for a ramfs filesystem. The size limit of a ramfs
|
||||
filesystem is how much memory you have available, and so care must be taken if
|
||||
used so to not run out of memory.
|
||||
|
||||
Since tmpfs lives completely in the page cache and on swap, all tmpfs
|
||||
pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||
An alternative to tmpfs and ramfs is to use brd to create RAM disks
|
||||
(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
|
||||
To write data you would just then need to create an regular filesystem on top
|
||||
this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
|
||||
configured in size at initialization and you cannot dynamically resize them.
|
||||
Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
|
||||
block layer at all.
|
||||
|
||||
Since tmpfs lives completely in the page cache and optionally on swap,
|
||||
all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||
free(1). Notice that these counters also include shared memory
|
||||
(shmem, see ipcs(1)). The most reliable way to get the count is
|
||||
using df(1) and du(1).
|
||||
@@ -72,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default
|
||||
is half of the number of your physical RAM pages, or (on a
|
||||
machine with highmem) the number of lowmem RAM pages,
|
||||
whichever is the lower.
|
||||
noswap Disables swap. Remounts must respect the original settings.
|
||||
By default swap is enabled.
|
||||
========= ============================================================
|
||||
|
||||
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
||||
@@ -85,6 +99,36 @@ mount with such options, since it allows any user with write access to
|
||||
use up all the memory on the machine; but enhances the scalability of
|
||||
that instance in a system with many CPUs making intensive use of it.
|
||||
|
||||
tmpfs also supports Transparent Huge Pages which requires a kernel
|
||||
configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
|
||||
your system (has_transparent_hugepage(), which is architecture specific).
|
||||
The mount options for this are:
|
||||
|
||||
====== ============================================================
|
||||
huge=0 never: disables huge pages for the mount
|
||||
huge=1 always: enables huge pages for the mount
|
||||
huge=2 within_size: only allocate huge pages if the page will be
|
||||
fully within i_size, also respect fadvise()/madvise() hints.
|
||||
huge=3 advise: only allocate huge pages if requested with
|
||||
fadvise()/madvise()
|
||||
====== ============================================================
|
||||
|
||||
There is a sysfs file which you can also use to control system wide THP
|
||||
configuration for all tmpfs mounts, the file is:
|
||||
|
||||
/sys/kernel/mm/transparent_hugepage/shmem_enabled
|
||||
|
||||
This sysfs file is placed on top of THP sysfs directory and so is registered
|
||||
by THP code. It is however only used to control all tmpfs mounts with one
|
||||
single knob. Since it controls all tmpfs mounts it should only be used either
|
||||
for emergency or testing purposes. The values you can set for shmem_enabled are:
|
||||
|
||||
== ============================================================
|
||||
-1 deny: disables huge on shm_mnt and all mounts, for
|
||||
emergency use
|
||||
-2 force: enables huge on shm_mnt and all mounts, w/o needing
|
||||
option, for testing
|
||||
== ============================================================
|
||||
|
||||
tmpfs has a mount option to set the NUMA memory allocation policy for
|
||||
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
||||
|
Reference in New Issue
Block a user