As in kvm_ioctl and _kvm_ioctl, add
the respective _vm_ioctl for vm_ioctl.
_vm_ioctl invokes an ioctl using the vm fd,
leaving the caller to test the result.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210318151624.490861-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce new vm_create variants that also takes a number of vcpus,
an amount of per-vcpu pages, and optionally a list of vcpuids. These
variants will create default VMs with enough additional pages to
cover the vcpu stacks, per-vcpu pages, and pagetable pages for all.
The new 'default' variant uses VM_MODE_DEFAULT, whereas the other
new variant accepts the mode as a parameter.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201111122636.73346-6-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the initial dirty ring buffer test.
The current test implements the userspace dirty ring collection, by
only reaping the dirty ring when the ring is full.
So it's still running synchronously like this:
vcpu main thread
1. vcpu dirties pages
2. vcpu gets dirty ring full
(userspace exit)
3. main thread waits until full
(so hardware buffers flushed)
4. main thread collects
5. main thread continues vcpu
6. vcpu continues, goes back to 1
We can't directly collects dirty bits during vcpu execution because
otherwise we can't guarantee the hardware dirty bits were flushed when
we collect and we're very strict on the dirty bits so otherwise we can
fail the future verify procedure. A follow up patch will make this
test to support async just like the existing dirty log test, by adding
a vcpu kick mechanism.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012237.6111-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check for KVM_GET_REG_LIST regressions. The blessed list was
created by running on v4.15 with the --core-reg-fixup option.
The following script was also used in order to annotate system
registers with their names when possible. When new system
registers are added the names can just be added manually using
the same grep.
while read reg; do
if [[ ! $reg =~ ARM64_SYS_REG ]]; then
printf "\t$reg\n"
continue
fi
encoding=$(echo "$reg" | sed "s/ARM64_SYS_REG(//;s/),//")
if ! name=$(grep "$encoding" ../../../../arch/arm64/include/asm/sysreg.h); then
printf "\t$reg\n"
continue
fi
name=$(echo "$name" | sed "s/.*SYS_//;s/[\t ]*sys_reg($encoding)$//")
printf "\t$reg\t/* $name */\n"
done < <(aarch64/get-reg-list --core-reg-fixup --list)
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-3-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add variants of GUEST_ASSERT to pass values back to the host, e.g. to
help debug/understand a failure when the the cause of the assert isn't
necessarily binary.
It'd probably be possible to auto-calculate the number of arguments and
just have a single GUEST_ASSERT, but there are a limited number of
variants and silently eating arguments could lead to subtle code bugs.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200410231707.7128-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The steal-time test confirms what is reported to the guest as stolen
time is consistent with the run_delay reported for the VCPU thread
on the host. Both x86_64 and AArch64 have the concept of steal/stolen
time so this test is introduced for both architectures.
While adding the test we ensure .gitignore has all tests listed
(it was missing s390x/resets) and that the Makefile has all tests
listed in alphabetical order (not really necessary, but it almost
was already...). We also extend the common API with a new num-guest-
pages call and a new timespec call.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Also correct the comment and prototype for vm_create_default(),
as it takes a number of pages, not a size.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move function documentation comment blocks to the header files in
order to avoid duplicating them for each architecture. While at
it clean up and fix up the comment blocks.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
s390 requires 1M aligned guest sizes. Embedding the rounding in
vm_adjust_num_guest_pages() allows us to remove it from a few
other places.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a KVM selftest to test moving the base gfn of a userspace memory
region. Although the basic concept of moving memory regions is not x86
specific, the assumptions regarding large pages and MMIO shenanigans
used to verify the correctness make this x86_64 only for the time being.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There were a few problems with the way we output "debug" messages.
The first is that we used DEBUG() which is defined when NDEBUG is
not defined, but NDEBUG will never be defined for kselftests
because it relies too much on assert(). The next is that most
of the DEBUG() messages were actually "info" messages, which
users may want to turn off if they just want a silent test that
either completes or asserts. Finally, a debug message output from
a library function, and thus for all tests, was annoying when its
information wasn't interesting for a test.
Rework these messages so debug messages only output when DEBUG
is defined and info messages output unless QUIET is defined.
Also name the functions pr_debug and pr_info and make sure that
when they're disabled we eat all the inputs. The later avoids
unused variable warnings when the variables were only defined
for the purpose of printing.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guests and hosts don't have to have the same page size. This means
calculations are necessary when selecting the number of guest pages
to allocate in order to ensure the number is compatible with the
host. Provide utilities to help with those calculations and apply
them where appropriate.
We also revert commit bffed38d4f ("kvm: selftests: aarch64:
dirty_log_test: fix unaligned memslot size") and then use
vm_adjust_num_guest_pages() there instead.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the duplication code in run_test() of dirty_log_test because
after some reordering of functions now we can directly use the outcome
of vm_create().
Meanwhile, with the new VM_MODE_PXXV48_4K, we can safely revert
b442324b58 too where we stick the x86_64 PA width to 39 bits for
dirty_log_test.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The naming VM_MODE_P52V48_4K is explicit but unclear when used on
x86_64 machines, because x86_64 machines are having various physical
address width rather than some static values. Here's some examples:
- Intel Xeon E3-1220: 36 bits
- Intel Core i7-8650: 39 bits
- AMD EPYC 7251: 48 bits
All of them are using 48 bits linear address width but with totally
different physical address width (and most of the old machines should
be less than 52 bits).
Let's create a new guest mode called VM_MODE_PXXV48_4K for current
x86_64 tests and make it as the default to replace the old naming of
VM_MODE_P52V48_4K because it shows more clearly that the PA width is
not really a constant. Meanwhile we also stop assuming all the x86
machines are having 52 bits PA width but instead we fetch the real
vm->pa_bits from CPUID 0x80000008 during runtime.
We currently make this exclusively used by x86_64 but no other arch.
As a slight touch up, moving DEBUG macro from dirty_log_test.c to
kvm_util.h so lib can use it too.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than passing the vm type from the top level to the end of vm
creation, let's simply keep that as an internal of kvm_vm struct and
decide the type in _vm_create(). Several reasons for doing this:
- The vm type is only decided by physical address width and currently
only used in aarch64, so we've got enough information as long as
we're passing vm_guest_mode into _vm_create(),
- This removes a loop dependency between the vm->type and creation of
vms. That's why now we need to parse vm_guest_mode twice sometimes,
once in run_test() and then again in _vm_create(). The follow up
patches will move on to clean up that as well so we can have a
single place to decide guest machine types and so.
Note that this patch will slightly change the behavior of aarch64
tests in that previously most vm_create() callers will directly pass
in type==0 into _vm_create() but now the type will depend on
vm_guest_mode, however it shouldn't affect any user because all
vm_create() users of aarch64 will be using VM_MODE_DEFAULT guest
mode (which is VM_MODE_P40V48_4K) so at last type will still be zero.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The way we exit from a guest to userspace is very specific to the
architecture: On x86, we use PIO, on aarch64 we are using MMIO and on
s390x we're going to use an instruction instead. The possibility to
select a type via the ucall_type_t enum is currently also completely
unused, so the code in ucall.c currently looks more complex than
required. Let's split this up into architecture specific ucall.c
files instead, so we can get rid of the #ifdefs and the unnecessary
ucall_type_t handling.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20190731151525.17156-2-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM/arm updates for 5.3
- Add support for chained PMU counters in guests
- Improve SError handling
- Handle Neoverse N1 erratum #1349291
- Allow side-channel mitigation status to be migrated
- Standardise most AArch64 system register accesses to msr_s/mrs_s
- Fix host MPIDR corruption on 32bit
Pull still more SPDX updates from Greg KH:
"Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update
for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
that were "easy" to determine by pattern matching. The ones after this
are going to be a bit more difficult and the people on the spdx list
will be discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud,
always nice to see in a diffstat"
* tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
...
When running with /sys/module/kvm_intel/parameters/unrestricted_guest=N,
test that a kernel warning does not occur informing us that
vcpu->mmio_needed=1. This can happen when KVM_RUN is called after a
triple fault.
This test was made to detect a bug that was reported by Syzkaller
(https://groups.google.com/forum/#!topic/syzkaller/lHfau8E3SOE) and
fixed with commit bbeac2830f ("KVM: X86: Fix residual mmio emulation
request to userspace").
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The struct kvm_vcpu_events code is only available on certain architectures
(arm, arm64 and x86). To be able to compile kvm_util.c also for other
architectures, we have to fence the code with __KVM_HAVE_VCPU_EVENTS.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190523164309.13345-3-thuth@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
struct kvm_nested_state is only available on x86 so far. To be able
to compile the code on other architectures as well, we need to wrap
the related code with #ifdefs.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add tests for KVM_SET_NESTED_STATE and for various code paths in its implementation in vmx_set_nested_state().
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Documentation/virtual/kvm/api.txt states:
NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
KVM_EXIT_EPR the corresponding operations are complete (and guest
state is consistent) only after userspace has re-entered the
kernel with KVM_RUN. The kernel side will first finish incomplete
operations and then check for pending signals. Userspace can
re-enter the guest with an unmasked signal pending to complete
pending operations.
Because guest state may be inconsistent, starting state migration after
an IO exit without first completing IO may result in test failures, e.g.
a proposed change to KVM's handling of %rip in its fast PIO handling[1]
will cause the new VM, i.e. the post-migration VM, to have its %rip set
to the IN instruction that triggered KVM_EXIT_IO, leading to a test
assertion due to a stage mismatch.
For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip
the test if it's not available. The addition of KVM_CAP_IMMEDIATE_EXIT
predates the state selftest by more than a year.
[1] https://patchwork.kernel.org/patch/10848545/
Fixes: fa3899add1 ("kvm: selftests: add basic test for state save and restore")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When KVM has KVM_CAP_ARM_VM_IPA_SIZE we can test with > 40-bit IPAs by
using the 'type' field of KVM_CREATE_VM.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In case we want to test failing ioctls we need an option to not
fail. Following _vcpu_run() precedent implement _vcpu_ioctl().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's add the 40 PA-bit versions of the VM modes, that AArch64
should have been using, so we can extend the dirty log test without
breaking things.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename VM_MODE_FLAT48PG to be more descriptive of its config and add a
new config that has the same parameters, except with 64K pages.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>