mirror of
https://github.com/tbsdtv/linux_media.git
synced 2025-07-23 04:33:26 +02:00
Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář: "ARM: - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups - Port of dirty_log_test selftest PPC: - Nested HV KVM support for radix guests on POWER9. The performance is much better than with PR KVM. Migration and arbitrary level of nesting is supported. - Disable nested HV-KVM on early POWER9 chips that need a particular hardware bug workaround - One VM per core mode to prevent potential data leaks - PCI pass-through optimization - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base s390: - Initial version of AP crypto virtualization via vfio-mdev - Improvement for vfio-ap - Set the host program identifier - Optimize page table locking x86: - Enable nested virtualization by default - Implement Hyper-V IPI hypercalls - Improve #PF and #DB handling - Allow guests to use Enlightened VMCS - Add migration selftests for VMCS and Enlightened VMCS - Allow coalesced PIO accesses - Add an option to perform nested VMCS host state consistency check through hardware - Automatic tuning of lapic_timer_advance_ns - Many fixes, minor improvements, and cleanups" * tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned Revert "kvm: x86: optimize dr6 restore" KVM: PPC: Optimize clearing TCEs for sparse tables x86/kvm/nVMX: tweak shadow fields selftests/kvm: add missing executables to .gitignore KVM: arm64: Safety check PSTATE when entering guest and handle IL KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips arm/arm64: KVM: Enable 32 bits kvm vcpu events support arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() KVM: arm64: Fix caching of host MDCR_EL2 value KVM: VMX: enable nested virtualization by default KVM/x86: Use 32bit xor to clear registers in svm.c kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD kvm: vmx: Defer setting of DR6 until #DB delivery kvm: x86: Defer setting of CR2 until #PF delivery kvm: x86: Add payload operands to kvm_multiple_exception kvm: x86: Add exception payload fields to kvm_vcpu_events kvm: x86: Add has_payload and payload to kvm_queued_exception KVM: Documentation: Fix omission in struct kvm_vcpu_events KVM: selftests: add Enlightened VMCS test ...
This commit is contained in:
837
Documentation/s390/vfio-ap.txt
Normal file
837
Documentation/s390/vfio-ap.txt
Normal file
@@ -0,0 +1,837 @@
|
||||
Introduction:
|
||||
============
|
||||
The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
|
||||
of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
|
||||
The AP devices provide cryptographic functions to all CPUs assigned to a
|
||||
linux system running in an IBM Z system LPAR.
|
||||
|
||||
The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
|
||||
is to make AP cards available to KVM guests using the VFIO mediated device
|
||||
framework. This implementation relies considerably on the s390 virtualization
|
||||
facilities which do most of the hard work of providing direct access to AP
|
||||
devices.
|
||||
|
||||
AP Architectural Overview:
|
||||
=========================
|
||||
To facilitate the comprehension of the design, let's start with some
|
||||
definitions:
|
||||
|
||||
* AP adapter
|
||||
|
||||
An AP adapter is an IBM Z adapter card that can perform cryptographic
|
||||
functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
|
||||
assigned to the LPAR in which a linux host is running will be available to
|
||||
the linux host. Each adapter is identified by a number from 0 to 255; however,
|
||||
the maximum adapter number is determined by machine model and/or adapter type.
|
||||
When installed, an AP adapter is accessed by AP instructions executed by any
|
||||
CPU.
|
||||
|
||||
The AP adapter cards are assigned to a given LPAR via the system's Activation
|
||||
Profile which can be edited via the HMC. When the linux host system is IPL'd
|
||||
in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and
|
||||
creates a sysfs device for each assigned adapter. For example, if AP adapters
|
||||
4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following
|
||||
sysfs device entries:
|
||||
|
||||
/sys/devices/ap/card04
|
||||
/sys/devices/ap/card0a
|
||||
|
||||
Symbolic links to these devices will also be created in the AP bus devices
|
||||
sub-directory:
|
||||
|
||||
/sys/bus/ap/devices/[card04]
|
||||
/sys/bus/ap/devices/[card04]
|
||||
|
||||
* AP domain
|
||||
|
||||
An adapter is partitioned into domains. An adapter can hold up to 256 domains
|
||||
depending upon the adapter type and hardware configuration. A domain is
|
||||
identified by a number from 0 to 255; however, the maximum domain number is
|
||||
determined by machine model and/or adapter type.. A domain can be thought of
|
||||
as a set of hardware registers and memory used for processing AP commands. A
|
||||
domain can be configured with a secure private key used for clear key
|
||||
encryption. A domain is classified in one of two ways depending upon how it
|
||||
may be accessed:
|
||||
|
||||
* Usage domains are domains that are targeted by an AP instruction to
|
||||
process an AP command.
|
||||
|
||||
* Control domains are domains that are changed by an AP command sent to a
|
||||
usage domain; for example, to set the secure private key for the control
|
||||
domain.
|
||||
|
||||
The AP usage and control domains are assigned to a given LPAR via the system's
|
||||
Activation Profile which can be edited via the HMC. When a linux host system
|
||||
is IPL'd in the LPAR, the AP bus module detects the AP usage and control
|
||||
domains assigned to the LPAR. The domain number of each usage domain and
|
||||
adapter number of each AP adapter are combined to create AP queue devices
|
||||
(see AP Queue section below). The domain number of each control domain will be
|
||||
represented in a bitmask and stored in a sysfs file
|
||||
/sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least
|
||||
significant bit, correspond to domains 0-255.
|
||||
|
||||
* AP Queue
|
||||
|
||||
An AP queue is the means by which an AP command is sent to a usage domain
|
||||
inside a specific adapter. An AP queue is identified by a tuple
|
||||
comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
|
||||
APQI corresponds to a given usage domain number within the adapter. This tuple
|
||||
forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
|
||||
instructions include a field containing the APQN to identify the AP queue to
|
||||
which the AP command is to be sent for processing.
|
||||
|
||||
The AP bus will create a sysfs device for each APQN that can be derived from
|
||||
the cross product of the AP adapter and usage domain numbers detected when the
|
||||
AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
|
||||
domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
|
||||
following sysfs entries:
|
||||
|
||||
/sys/devices/ap/card04/04.0006
|
||||
/sys/devices/ap/card04/04.0047
|
||||
/sys/devices/ap/card0a/0a.0006
|
||||
/sys/devices/ap/card0a/0a.0047
|
||||
|
||||
The following symbolic links to these devices will be created in the AP bus
|
||||
devices subdirectory:
|
||||
|
||||
/sys/bus/ap/devices/[04.0006]
|
||||
/sys/bus/ap/devices/[04.0047]
|
||||
/sys/bus/ap/devices/[0a.0006]
|
||||
/sys/bus/ap/devices/[0a.0047]
|
||||
|
||||
* AP Instructions:
|
||||
|
||||
There are three AP instructions:
|
||||
|
||||
* NQAP: to enqueue an AP command-request message to a queue
|
||||
* DQAP: to dequeue an AP command-reply message from a queue
|
||||
* PQAP: to administer the queues
|
||||
|
||||
AP instructions identify the domain that is targeted to process the AP
|
||||
command; this must be one of the usage domains. An AP command may modify a
|
||||
domain that is not one of the usage domains, but the modified domain
|
||||
must be one of the control domains.
|
||||
|
||||
AP and SIE:
|
||||
==========
|
||||
Let's now take a look at how AP instructions executed on a guest are interpreted
|
||||
by the hardware.
|
||||
|
||||
A satellite control block called the Crypto Control Block (CRYCB) is attached to
|
||||
our main hardware virtualization control block. The CRYCB contains three fields
|
||||
to identify the adapters, usage domains and control domains assigned to the KVM
|
||||
guest:
|
||||
|
||||
* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
|
||||
to the KVM guest. Each bit in the mask, from left to right (i.e. from most
|
||||
significant to least significant bit in big endian order), corresponds to
|
||||
an APID from 0-255. If a bit is set, the corresponding adapter is valid for
|
||||
use by the KVM guest.
|
||||
|
||||
* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
|
||||
assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from
|
||||
most significant to least significant bit in big endian order), corresponds to
|
||||
an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue
|
||||
is valid for use by the KVM guest.
|
||||
|
||||
* The AP Domain Mask field is a bit mask that identifies the AP control domains
|
||||
assigned to the KVM guest. The ADM bit mask controls which domains can be
|
||||
changed by an AP command-request message sent to a usage domain from the
|
||||
guest. Each bit in the mask, from left to right (i.e. from most significant to
|
||||
least significant bit in big endian order), corresponds to a domain from
|
||||
0-255. If a bit is set, the corresponding domain can be modified by an AP
|
||||
command-request message sent to a usage domain.
|
||||
|
||||
If you recall from the description of an AP Queue, AP instructions include
|
||||
an APQN to identify the AP queue to which an AP command-request message is to be
|
||||
sent (NQAP and PQAP instructions), or from which a command-reply message is to
|
||||
be received (DQAP instruction). The validity of an APQN is defined by the matrix
|
||||
calculated from the APM and AQM; it is the cross product of all assigned adapter
|
||||
numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1
|
||||
and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6),
|
||||
(2,5) and (2,6) will be valid for the guest.
|
||||
|
||||
The APQNs can provide secure key functionality - i.e., a private key is stored
|
||||
on the adapter card for each of its domains - so each APQN must be assigned to
|
||||
at most one guest or to the linux host.
|
||||
|
||||
Example 1: Valid configuration:
|
||||
------------------------------
|
||||
Guest1: adapters 1,2 domains 5,6
|
||||
Guest2: adapter 1,2 domain 7
|
||||
|
||||
This is valid because both guests have a unique set of APQNs:
|
||||
Guest1 has APQNs (1,5), (1,6), (2,5), (2,6);
|
||||
Guest2 has APQNs (1,7), (2,7)
|
||||
|
||||
Example 2: Valid configuration:
|
||||
------------------------------
|
||||
Guest1: adapters 1,2 domains 5,6
|
||||
Guest2: adapters 3,4 domains 5,6
|
||||
|
||||
This is also valid because both guests have a unique set of APQNs:
|
||||
Guest1 has APQNs (1,5), (1,6), (2,5), (2,6);
|
||||
Guest2 has APQNs (3,5), (3,6), (4,5), (4,6)
|
||||
|
||||
Example 3: Invalid configuration:
|
||||
--------------------------------
|
||||
Guest1: adapters 1,2 domains 5,6
|
||||
Guest2: adapter 1 domains 6,7
|
||||
|
||||
This is an invalid configuration because both guests have access to
|
||||
APQN (1,6).
|
||||
|
||||
The Design:
|
||||
===========
|
||||
The design introduces three new objects:
|
||||
|
||||
1. AP matrix device
|
||||
2. VFIO AP device driver (vfio_ap.ko)
|
||||
3. VFIO AP mediated matrix pass-through device
|
||||
|
||||
The VFIO AP device driver
|
||||
-------------------------
|
||||
The VFIO AP (vfio_ap) device driver serves the following purposes:
|
||||
|
||||
1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
|
||||
|
||||
2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
|
||||
device and creates the sysfs interfaces for assigning adapters, usage
|
||||
domains, and control domains comprising the matrix for a KVM guest.
|
||||
|
||||
3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
|
||||
SIE state description to grant the guest access to a matrix of AP devices
|
||||
|
||||
Reserve APQNs for exclusive use of KVM guests
|
||||
---------------------------------------------
|
||||
The following block diagram illustrates the mechanism by which APQNs are
|
||||
reserved:
|
||||
|
||||
+------------------+
|
||||
7 remove | |
|
||||
+--------------------> cex4queue driver |
|
||||
| | |
|
||||
| +------------------+
|
||||
|
|
||||
|
|
||||
| +------------------+ +-----------------+
|
||||
| 5 register driver | | 3 create | |
|
||||
| +----------------> Device core +----------> matrix device |
|
||||
| | | | | |
|
||||
| | +--------^---------+ +-----------------+
|
||||
| | |
|
||||
| | +-------------------+
|
||||
| | +-----------------------------------+ |
|
||||
| | | 4 register AP driver | | 2 register device
|
||||
| | | | |
|
||||
+--------+---+-v---+ +--------+-------+-+
|
||||
| | | |
|
||||
| ap_bus +--------------------- > vfio_ap driver |
|
||||
| | 8 probe | |
|
||||
+--------^---------+ +--^--^------------+
|
||||
6 edit | | |
|
||||
apmask | +-----------------------------+ | 9 mdev create
|
||||
aqmask | | 1 modprobe |
|
||||
+--------+-----+---+ +----------------+-+ +------------------+
|
||||
| | | |8 create | mediated |
|
||||
| admin | | VFIO device core |---------> matrix |
|
||||
| + | | | device |
|
||||
+------+-+---------+ +--------^---------+ +--------^---------+
|
||||
| | | |
|
||||
| | 9 create vfio_ap-passthrough | |
|
||||
| +------------------------------+ |
|
||||
+-------------------------------------------------------------+
|
||||
10 assign adapter/domain/control domain
|
||||
|
||||
The process for reserving an AP queue for use by a KVM guest is:
|
||||
|
||||
1. The administrator loads the vfio_ap device driver
|
||||
2. The vfio-ap driver during its initialization will register a single 'matrix'
|
||||
device with the device core. This will serve as the parent device for
|
||||
all mediated matrix devices used to configure an AP matrix for a guest.
|
||||
3. The /sys/devices/vfio_ap/matrix device is created by the device core
|
||||
4 The vfio_ap device driver will register with the AP bus for AP queue devices
|
||||
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
|
||||
driver's probe and remove callback interfaces. Devices older than CEX4 queues
|
||||
are not supported to simplify the implementation by not needlessly
|
||||
complicating the design by supporting older devices that will go out of
|
||||
service in the relatively near future, and for which there are few older
|
||||
systems around on which to test.
|
||||
5. The AP bus registers the vfio_ap device driver with the device core
|
||||
6. The administrator edits the AP adapter and queue masks to reserve AP queues
|
||||
for use by the vfio_ap device driver.
|
||||
7. The AP bus removes the AP queues reserved for the vfio_ap driver from the
|
||||
default zcrypt cex4queue driver.
|
||||
8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
|
||||
it.
|
||||
9. The administrator creates a passthrough type mediated matrix device to be
|
||||
used by a guest
|
||||
10 The administrator assigns the adapters, usage domains and control domains
|
||||
to be exclusively used by a guest.
|
||||
|
||||
Set up the VFIO mediated device interfaces
|
||||
------------------------------------------
|
||||
The VFIO AP device driver utilizes the common interface of the VFIO mediated
|
||||
device core driver to:
|
||||
* Register an AP mediated bus driver to add a mediated matrix device to and
|
||||
remove it from a VFIO group.
|
||||
* Create and destroy a mediated matrix device
|
||||
* Add a mediated matrix device to and remove it from the AP mediated bus driver
|
||||
* Add a mediated matrix device to and remove it from an IOMMU group
|
||||
|
||||
The following high-level block diagram shows the main components and interfaces
|
||||
of the VFIO AP mediated matrix device driver:
|
||||
|
||||
+-------------+
|
||||
| |
|
||||
| +---------+ | mdev_register_driver() +--------------+
|
||||
| | Mdev | +<-----------------------+ |
|
||||
| | bus | | | vfio_mdev.ko |
|
||||
| | driver | +----------------------->+ |<-> VFIO user
|
||||
| +---------+ | probe()/remove() +--------------+ APIs
|
||||
| |
|
||||
| MDEV CORE |
|
||||
| MODULE |
|
||||
| mdev.ko |
|
||||
| +---------+ | mdev_register_device() +--------------+
|
||||
| |Physical | +<-----------------------+ |
|
||||
| | device | | | vfio_ap.ko |<-> matrix
|
||||
| |interface| +----------------------->+ | device
|
||||
| +---------+ | callback +--------------+
|
||||
+-------------+
|
||||
|
||||
During initialization of the vfio_ap module, the matrix device is registered
|
||||
with an 'mdev_parent_ops' structure that provides the sysfs attribute
|
||||
structures, mdev functions and callback interfaces for managing the mediated
|
||||
matrix device.
|
||||
|
||||
* sysfs attribute structures:
|
||||
* supported_type_groups
|
||||
The VFIO mediated device framework supports creation of user-defined
|
||||
mediated device types. These mediated device types are specified
|
||||
via the 'supported_type_groups' structure when a device is registered
|
||||
with the mediated device framework. The registration process creates the
|
||||
sysfs structures for each mediated device type specified in the
|
||||
'mdev_supported_types' sub-directory of the device being registered. Along
|
||||
with the device type, the sysfs attributes of the mediated device type are
|
||||
provided.
|
||||
|
||||
The VFIO AP device driver will register one mediated device type for
|
||||
passthrough devices:
|
||||
/sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough
|
||||
Only the read-only attributes required by the VFIO mdev framework will
|
||||
be provided:
|
||||
... name
|
||||
... device_api
|
||||
... available_instances
|
||||
... device_api
|
||||
Where:
|
||||
* name: specifies the name of the mediated device type
|
||||
* device_api: the mediated device type's API
|
||||
* available_instances: the number of mediated matrix passthrough devices
|
||||
that can be created
|
||||
* device_api: specifies the VFIO API
|
||||
* mdev_attr_groups
|
||||
This attribute group identifies the user-defined sysfs attributes of the
|
||||
mediated device. When a device is registered with the VFIO mediated device
|
||||
framework, the sysfs attribute files identified in the 'mdev_attr_groups'
|
||||
structure will be created in the mediated matrix device's directory. The
|
||||
sysfs attributes for a mediated matrix device are:
|
||||
* assign_adapter:
|
||||
* unassign_adapter:
|
||||
Write-only attributes for assigning/unassigning an AP adapter to/from the
|
||||
mediated matrix device. To assign/unassign an adapter, the APID of the
|
||||
adapter is echoed to the respective attribute file.
|
||||
* assign_domain:
|
||||
* unassign_domain:
|
||||
Write-only attributes for assigning/unassigning an AP usage domain to/from
|
||||
the mediated matrix device. To assign/unassign a domain, the domain
|
||||
number of the the usage domain is echoed to the respective attribute
|
||||
file.
|
||||
* matrix:
|
||||
A read-only file for displaying the APQNs derived from the cross product
|
||||
of the adapter and domain numbers assigned to the mediated matrix device.
|
||||
* assign_control_domain:
|
||||
* unassign_control_domain:
|
||||
Write-only attributes for assigning/unassigning an AP control domain
|
||||
to/from the mediated matrix device. To assign/unassign a control domain,
|
||||
the ID of the domain to be assigned/unassigned is echoed to the respective
|
||||
attribute file.
|
||||
* control_domains:
|
||||
A read-only file for displaying the control domain numbers assigned to the
|
||||
mediated matrix device.
|
||||
|
||||
* functions:
|
||||
* create:
|
||||
allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
|
||||
* Store the reference to the KVM structure for the guest using the mdev
|
||||
* Store the AP matrix configuration for the adapters, domains, and control
|
||||
domains assigned via the corresponding sysfs attributes files
|
||||
* remove:
|
||||
deallocates the mediated matrix device's ap_matrix_mdev structure. This will
|
||||
be allowed only if a running guest is not using the mdev.
|
||||
|
||||
* callback interfaces
|
||||
* open:
|
||||
The vfio_ap driver uses this callback to register a
|
||||
VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
|
||||
device. The open is invoked when QEMU connects the VFIO iommu group
|
||||
for the mdev matrix device to the MDEV bus. Access to the KVM structure used
|
||||
to configure the KVM guest is provided via this callback. The KVM structure,
|
||||
is used to configure the guest's access to the AP matrix defined via the
|
||||
mediated matrix device's sysfs attribute files.
|
||||
* release:
|
||||
unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
|
||||
mdev matrix device and deconfigures the guest's AP matrix.
|
||||
|
||||
Configure the APM, AQM and ADM in the CRYCB:
|
||||
-------------------------------------------
|
||||
Configuring the AP matrix for a KVM guest will be performed when the
|
||||
VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
|
||||
function is called when QEMU connects to KVM. The guest's AP matrix is
|
||||
configured via it's CRYCB by:
|
||||
* Setting the bits in the APM corresponding to the APIDs assigned to the
|
||||
mediated matrix device via its 'assign_adapter' interface.
|
||||
* Setting the bits in the AQM corresponding to the domains assigned to the
|
||||
mediated matrix device via its 'assign_domain' interface.
|
||||
* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
|
||||
mediated matrix device via its 'assign_control_domains' interface.
|
||||
|
||||
The CPU model features for AP
|
||||
-----------------------------
|
||||
The AP stack relies on the presence of the AP instructions as well as two
|
||||
facilities: The AP Facilities Test (APFT) facility; and the AP Query
|
||||
Configuration Information (QCI) facility. These features/facilities are made
|
||||
available to a KVM guest via the following CPU model features:
|
||||
|
||||
1. ap: Indicates whether the AP instructions are installed on the guest. This
|
||||
feature will be enabled by KVM only if the AP instructions are installed
|
||||
on the host.
|
||||
|
||||
2. apft: Indicates the APFT facility is available on the guest. This facility
|
||||
can be made available to the guest only if it is available on the host (i.e.,
|
||||
facility bit 15 is set).
|
||||
|
||||
3. apqci: Indicates the AP QCI facility is available on the guest. This facility
|
||||
can be made available to the guest only if it is available on the host (i.e.,
|
||||
facility bit 12 is set).
|
||||
|
||||
Note: If the user chooses to specify a CPU model different than the 'host'
|
||||
model to QEMU, the CPU model features and facilities need to be turned on
|
||||
explicitly; for example:
|
||||
|
||||
/usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on
|
||||
|
||||
A guest can be precluded from using AP features/facilities by turning them off
|
||||
explicitly; for example:
|
||||
|
||||
/usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
|
||||
|
||||
Note: If the APFT facility is turned off (apft=off) for the guest, the guest
|
||||
will not see any AP devices. The zcrypt device drivers that register for type 10
|
||||
and newer AP devices - i.e., the cex4card and cex4queue device drivers - need
|
||||
the APFT facility to ascertain the facilities installed on a given AP device. If
|
||||
the APFT facility is not installed on the guest, then the probe of device
|
||||
drivers will fail since only type 10 and newer devices can be configured for
|
||||
guest use.
|
||||
|
||||
Example:
|
||||
=======
|
||||
Let's now provide an example to illustrate how KVM guests may be given
|
||||
access to AP facilities. For this example, we will show how to configure
|
||||
three guests such that executing the lszcrypt command on the guests would
|
||||
look like this:
|
||||
|
||||
Guest1
|
||||
------
|
||||
CARD.DOMAIN TYPE MODE
|
||||
------------------------------
|
||||
05 CEX5C CCA-Coproc
|
||||
05.0004 CEX5C CCA-Coproc
|
||||
05.00ab CEX5C CCA-Coproc
|
||||
06 CEX5A Accelerator
|
||||
06.0004 CEX5A Accelerator
|
||||
06.00ab CEX5C CCA-Coproc
|
||||
|
||||
Guest2
|
||||
------
|
||||
CARD.DOMAIN TYPE MODE
|
||||
------------------------------
|
||||
05 CEX5A Accelerator
|
||||
05.0047 CEX5A Accelerator
|
||||
05.00ff CEX5A Accelerator
|
||||
|
||||
Guest2
|
||||
------
|
||||
CARD.DOMAIN TYPE MODE
|
||||
------------------------------
|
||||
06 CEX5A Accelerator
|
||||
06.0047 CEX5A Accelerator
|
||||
06.00ff CEX5A Accelerator
|
||||
|
||||
These are the steps:
|
||||
|
||||
1. Install the vfio_ap module on the linux host. The dependency chain for the
|
||||
vfio_ap module is:
|
||||
* iommu
|
||||
* s390
|
||||
* zcrypt
|
||||
* vfio
|
||||
* vfio_mdev
|
||||
* vfio_mdev_device
|
||||
* KVM
|
||||
|
||||
To build the vfio_ap module, the kernel build must be configured with the
|
||||
following Kconfig elements selected:
|
||||
* IOMMU_SUPPORT
|
||||
* S390
|
||||
* ZCRYPT
|
||||
* S390_AP_IOMMU
|
||||
* VFIO
|
||||
* VFIO_MDEV
|
||||
* VFIO_MDEV_DEVICE
|
||||
* KVM
|
||||
|
||||
If using make menuconfig select the following to build the vfio_ap module:
|
||||
-> Device Drivers
|
||||
-> IOMMU Hardware Support
|
||||
select S390 AP IOMMU Support
|
||||
-> VFIO Non-Privileged userspace driver framework
|
||||
-> Mediated device driver frramework
|
||||
-> VFIO driver for Mediated devices
|
||||
-> I/O subsystem
|
||||
-> VFIO support for AP devices
|
||||
|
||||
2. Secure the AP queues to be used by the three guests so that the host can not
|
||||
access them. To secure them, there are two sysfs files that specify
|
||||
bitmasks marking a subset of the APQN range as 'usable by the default AP
|
||||
queue device drivers' or 'not usable by the default device drivers' and thus
|
||||
available for use by the vfio_ap device driver'. The location of the sysfs
|
||||
files containing the masks are:
|
||||
|
||||
/sys/bus/ap/apmask
|
||||
/sys/bus/ap/aqmask
|
||||
|
||||
The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
|
||||
(APID). Each bit in the mask, from left to right (i.e., from most significant
|
||||
to least significant bit in big endian order), corresponds to an APID from
|
||||
0-255. If a bit is set, the APID is marked as usable only by the default AP
|
||||
queue device drivers; otherwise, the APID is usable by the vfio_ap
|
||||
device driver.
|
||||
|
||||
The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
|
||||
(APQI). Each bit in the mask, from left to right (i.e., from most significant
|
||||
to least significant bit in big endian order), corresponds to an APQI from
|
||||
0-255. If a bit is set, the APQI is marked as usable only by the default AP
|
||||
queue device drivers; otherwise, the APQI is usable by the vfio_ap device
|
||||
driver.
|
||||
|
||||
Take, for example, the following mask:
|
||||
|
||||
0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
|
||||
|
||||
It indicates:
|
||||
|
||||
1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
|
||||
belong to the vfio_ap device driver's pool.
|
||||
|
||||
The APQN of each AP queue device assigned to the linux host is checked by the
|
||||
AP bus against the set of APQNs derived from the cross product of APIDs
|
||||
and APQIs marked as usable only by the default AP queue device drivers. If a
|
||||
match is detected, only the default AP queue device drivers will be probed;
|
||||
otherwise, the vfio_ap device driver will be probed.
|
||||
|
||||
By default, the two masks are set to reserve all APQNs for use by the default
|
||||
AP queue device drivers. There are two ways the default masks can be changed:
|
||||
|
||||
1. The sysfs mask files can be edited by echoing a string into the
|
||||
respective sysfs mask file in one of two formats:
|
||||
|
||||
* An absolute hex string starting with 0x - like "0x12345678" - sets
|
||||
the mask. If the given string is shorter than the mask, it is padded
|
||||
with 0s on the right; for example, specifying a mask value of 0x41 is
|
||||
the same as specifying:
|
||||
|
||||
0x4100000000000000000000000000000000000000000000000000000000000000
|
||||
|
||||
Keep in mind that the mask reads from left to right (i.e., most
|
||||
significant to least significant bit in big endian order), so the mask
|
||||
above identifies device numbers 1 and 7 (01000001).
|
||||
|
||||
If the string is longer than the mask, the operation is terminated with
|
||||
an error (EINVAL).
|
||||
|
||||
* Individual bits in the mask can be switched on and off by specifying
|
||||
each bit number to be switched in a comma separated list. Each bit
|
||||
number string must be prepended with a ('+') or minus ('-') to indicate
|
||||
the corresponding bit is to be switched on ('+') or off ('-'). Some
|
||||
valid values are:
|
||||
|
||||
"+0" switches bit 0 on
|
||||
"-13" switches bit 13 off
|
||||
"+0x41" switches bit 65 on
|
||||
"-0xff" switches bit 255 off
|
||||
|
||||
The following example:
|
||||
+0,-6,+0x47,-0xf0
|
||||
|
||||
Switches bits 0 and 71 (0x47) on
|
||||
Switches bits 6 and 240 (0xf0) off
|
||||
|
||||
Note that the bits not specified in the list remain as they were before
|
||||
the operation.
|
||||
|
||||
2. The masks can also be changed at boot time via parameters on the kernel
|
||||
command line like this:
|
||||
|
||||
ap.apmask=0xffff ap.aqmask=0x40
|
||||
|
||||
This would create the following masks:
|
||||
|
||||
apmask:
|
||||
0xffff000000000000000000000000000000000000000000000000000000000000
|
||||
|
||||
aqmask:
|
||||
0x4000000000000000000000000000000000000000000000000000000000000000
|
||||
|
||||
Resulting in these two pools:
|
||||
|
||||
default drivers pool: adapter 0-15, domain 1
|
||||
alternate drivers pool: adapter 16-255, domains 0, 2-255
|
||||
|
||||
Securing the APQNs for our example:
|
||||
----------------------------------
|
||||
To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
|
||||
06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
|
||||
APQNs can either be removed from the default masks:
|
||||
|
||||
echo -5,-6 > /sys/bus/ap/apmask
|
||||
|
||||
echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask
|
||||
|
||||
Or the masks can be set as follows:
|
||||
|
||||
echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \
|
||||
> apmask
|
||||
|
||||
echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \
|
||||
> aqmask
|
||||
|
||||
This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004,
|
||||
06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The
|
||||
sysfs directory for the vfio_ap device driver will now contain symbolic links
|
||||
to the AP queue devices bound to it:
|
||||
|
||||
/sys/bus/ap
|
||||
... [drivers]
|
||||
...... [vfio_ap]
|
||||
......... [05.0004]
|
||||
......... [05.0047]
|
||||
......... [05.00ab]
|
||||
......... [05.00ff]
|
||||
......... [06.0004]
|
||||
......... [06.0047]
|
||||
......... [06.00ab]
|
||||
......... [06.00ff]
|
||||
|
||||
Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later)
|
||||
can be bound to the vfio_ap device driver. The reason for this is to
|
||||
simplify the implementation by not needlessly complicating the design by
|
||||
supporting older devices that will go out of service in the relatively near
|
||||
future and for which there are few older systems on which to test.
|
||||
|
||||
The administrator, therefore, must take care to secure only AP queues that
|
||||
can be bound to the vfio_ap device driver. The device type for a given AP
|
||||
queue device can be read from the parent card's sysfs directory. For example,
|
||||
to see the hardware type of the queue 05.0004:
|
||||
|
||||
cat /sys/bus/ap/devices/card05/hwtype
|
||||
|
||||
The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the
|
||||
vfio_ap device driver.
|
||||
|
||||
3. Create the mediated devices needed to configure the AP matrixes for the
|
||||
three guests and to provide an interface to the vfio_ap driver for
|
||||
use by the guests:
|
||||
|
||||
/sys/devices/vfio_ap/matrix/
|
||||
--- [mdev_supported_types]
|
||||
------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
|
||||
--------- create
|
||||
--------- [devices]
|
||||
|
||||
To create the mediated devices for the three guests:
|
||||
|
||||
uuidgen > create
|
||||
uuidgen > create
|
||||
uuidgen > create
|
||||
|
||||
or
|
||||
|
||||
echo $uuid1 > create
|
||||
echo $uuid2 > create
|
||||
echo $uuid3 > create
|
||||
|
||||
This will create three mediated devices in the [devices] subdirectory named
|
||||
after the UUID written to the create attribute file. We call them $uuid1,
|
||||
$uuid2 and $uuid3 and this is the sysfs directory structure after creation:
|
||||
|
||||
/sys/devices/vfio_ap/matrix/
|
||||
--- [mdev_supported_types]
|
||||
------ [vfio_ap-passthrough]
|
||||
--------- [devices]
|
||||
------------ [$uuid1]
|
||||
--------------- assign_adapter
|
||||
--------------- assign_control_domain
|
||||
--------------- assign_domain
|
||||
--------------- matrix
|
||||
--------------- unassign_adapter
|
||||
--------------- unassign_control_domain
|
||||
--------------- unassign_domain
|
||||
|
||||
------------ [$uuid2]
|
||||
--------------- assign_adapter
|
||||
--------------- assign_control_domain
|
||||
--------------- assign_domain
|
||||
--------------- matrix
|
||||
--------------- unassign_adapter
|
||||
----------------unassign_control_domain
|
||||
----------------unassign_domain
|
||||
|
||||
------------ [$uuid3]
|
||||
--------------- assign_adapter
|
||||
--------------- assign_control_domain
|
||||
--------------- assign_domain
|
||||
--------------- matrix
|
||||
--------------- unassign_adapter
|
||||
----------------unassign_control_domain
|
||||
----------------unassign_domain
|
||||
|
||||
4. The administrator now needs to configure the matrixes for the mediated
|
||||
devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3).
|
||||
|
||||
This is how the matrix is configured for Guest1:
|
||||
|
||||
echo 5 > assign_adapter
|
||||
echo 6 > assign_adapter
|
||||
echo 4 > assign_domain
|
||||
echo 0xab > assign_domain
|
||||
|
||||
Control domains can similarly be assigned using the assign_control_domain
|
||||
sysfs file.
|
||||
|
||||
If a mistake is made configuring an adapter, domain or control domain,
|
||||
you can use the unassign_xxx files to unassign the adapter, domain or
|
||||
control domain.
|
||||
|
||||
To display the matrix configuration for Guest1:
|
||||
|
||||
cat matrix
|
||||
|
||||
This is how the matrix is configured for Guest2:
|
||||
|
||||
echo 5 > assign_adapter
|
||||
echo 0x47 > assign_domain
|
||||
echo 0xff > assign_domain
|
||||
|
||||
This is how the matrix is configured for Guest3:
|
||||
|
||||
echo 6 > assign_adapter
|
||||
echo 0x47 > assign_domain
|
||||
echo 0xff > assign_domain
|
||||
|
||||
In order to successfully assign an adapter:
|
||||
|
||||
* The adapter number specified must represent a value from 0 up to the
|
||||
maximum adapter number configured for the system. If an adapter number
|
||||
higher than the maximum is specified, the operation will terminate with
|
||||
an error (ENODEV).
|
||||
|
||||
* All APQNs that can be derived from the adapter ID and the IDs of
|
||||
the previously assigned domains must be bound to the vfio_ap device
|
||||
driver. If no domains have yet been assigned, then there must be at least
|
||||
one APQN with the specified APID bound to the vfio_ap driver. If no such
|
||||
APQNs are bound to the driver, the operation will terminate with an
|
||||
error (EADDRNOTAVAIL).
|
||||
|
||||
No APQN that can be derived from the adapter ID and the IDs of the
|
||||
previously assigned domains can be assigned to another mediated matrix
|
||||
device. If an APQN is assigned to another mediated matrix device, the
|
||||
operation will terminate with an error (EADDRINUSE).
|
||||
|
||||
In order to successfully assign a domain:
|
||||
|
||||
* The domain number specified must represent a value from 0 up to the
|
||||
maximum domain number configured for the system. If a domain number
|
||||
higher than the maximum is specified, the operation will terminate with
|
||||
an error (ENODEV).
|
||||
|
||||
* All APQNs that can be derived from the domain ID and the IDs of
|
||||
the previously assigned adapters must be bound to the vfio_ap device
|
||||
driver. If no domains have yet been assigned, then there must be at least
|
||||
one APQN with the specified APQI bound to the vfio_ap driver. If no such
|
||||
APQNs are bound to the driver, the operation will terminate with an
|
||||
error (EADDRNOTAVAIL).
|
||||
|
||||
No APQN that can be derived from the domain ID and the IDs of the
|
||||
previously assigned adapters can be assigned to another mediated matrix
|
||||
device. If an APQN is assigned to another mediated matrix device, the
|
||||
operation will terminate with an error (EADDRINUSE).
|
||||
|
||||
In order to successfully assign a control domain, the domain number
|
||||
specified must represent a value from 0 up to the maximum domain number
|
||||
configured for the system. If a control domain number higher than the maximum
|
||||
is specified, the operation will terminate with an error (ENODEV).
|
||||
|
||||
5. Start Guest1:
|
||||
|
||||
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
|
||||
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
|
||||
|
||||
7. Start Guest2:
|
||||
|
||||
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
|
||||
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
|
||||
|
||||
7. Start Guest3:
|
||||
|
||||
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
|
||||
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
|
||||
|
||||
When the guest is shut down, the mediated matrix devices may be removed.
|
||||
|
||||
Using our example again, to remove the mediated matrix device $uuid1:
|
||||
|
||||
/sys/devices/vfio_ap/matrix/
|
||||
--- [mdev_supported_types]
|
||||
------ [vfio_ap-passthrough]
|
||||
--------- [devices]
|
||||
------------ [$uuid1]
|
||||
--------------- remove
|
||||
|
||||
|
||||
echo 1 > remove
|
||||
|
||||
This will remove all of the mdev matrix device's sysfs structures including
|
||||
the mdev device itself. To recreate and reconfigure the mdev matrix device,
|
||||
all of the steps starting with step 3 will have to be performed again. Note
|
||||
that the remove will fail if a guest using the mdev is still running.
|
||||
|
||||
It is not necessary to remove an mdev matrix device, but one may want to
|
||||
remove it if no guest will use it during the remaining lifetime of the linux
|
||||
host. If the mdev matrix device is removed, one may want to also reconfigure
|
||||
the pool of adapters and queues reserved for use by the default drivers.
|
||||
|
||||
Limitations
|
||||
===========
|
||||
* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
|
||||
to the default drivers pool of a queue that is still assigned to a mediated
|
||||
device in use by a guest. It is incumbent upon the administrator to
|
||||
ensure there is no mediated device in use by a guest to which the APQN is
|
||||
assigned lest the host be given access to the private data of the AP queue
|
||||
device such as a private key configured specifically for the guest.
|
||||
|
||||
* Dynamically modifying the AP matrix for a running guest (which would amount to
|
||||
hot(un)plug of AP devices for the guest) is currently not supported
|
||||
|
||||
* Live guest migration is not supported for guests using AP devices.
|
Reference in New Issue
Block a user