Directvisor enables the direct interrupt delivery and PCI-device passthrough by VT-x and VT-d and supports the VM live migration. It is based on QEMU/KVM and ethernet bonding driver.
- Intel Virtualization Technology Extensions (VT-x).
- Intel Virtualization Technology for Directed I/O (VT-d).
Example: o40
- Linux 4.10.1 for both the host and guest.
- Our QEMU 4.1.0.
- VFIO.
| Host Kernel Parameter | Description |
|---|---|
| intel_iommu=on | Enable Intel IOMMU driver |
| intel_pstate=disable | Do not enable intel_pstate as the default scaling driver |
| intremap=no_x2apic_optout | BIOS x2APIC opt-out request will be ignored |
| isolcpus=1-n | Isolate CPUs from the general scheduler |
| lapic=notscdeadline | Use programmable the timer unit in LAPIC |
| nohz=off | Use a periodic timer |
| transparent_hugepage=madvise | Use madvise for transparent hugepages |
| nowatchdog | Disable both soft-lockup and NMI watchdog |
| rcupdate.rcu_cpu_stall_suppress=1 | Suppress RCU CPU stall warning messages |
| Guest Kernel Parameter | Description |
|---|---|
| lapic=notscdeadline | Use programmable the timer unit in LAPIC |
| nohz=off | Use a periodic timer |
| transparent_hugepage=madvise | Use madvise for transparent hugepages |
| nowatchdog | Disable both soft-lockup and NMI watchdog |
| rcupdate.rcu_cpu_stall_suppress=1 | Suppress RCU CPU stall warning messages |
| console=ttyS0,115200n8 | Output to serial console |
$ apt install gcc g++ make binutils bc libssl-dev
$ wget https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/linux-4.10.1.tar.gz
$ tar zxf linux-4.10.1.tar.gz
$ cd linux-4.10.1
$ touch .scmversion
$ make oldconfig
$ make -j $(nproc)
$ make -j $(nproc) INSTALL_MOD_STRIP=1 modules_install
$ make -j $(nproc) INSTALL_MOD_STRIP=1 install
$ reboot
$ apt install git libglib2.0-dev libfdt-dev libpixman-1-dev zlib1g-dev flex bison libnfs-dev libiscsi-dev libcap-dev libattr1-dev
$ git clone git@github.com:osnetsvn/qemu-4.1.0.git
$ cd qemu-4.1.0
$ git checkout origin/mvm -b mvm
$ ./configure --target-list=x86_64-softmmu --enable-virtfs
$ make -j $(nproc)
$ make -j $(nproc)
$ make -j $(nproc) install
$ git clone git@github.com:osnetsvn/directvisor.git
$ cd directvisor
$ git checkout origin/mvm-4.10.1 -b mvm-4.10.1
# Since the KVM module is out of tree,
# The out-of-tree KVM compilation requires the following changes in order to use the trace points. This is the restriction imposed by TRACE_INCLUDE_PATH. Full pathname for out-of-tree modules must be used.
# x86-kvm/trace.h
# #define TRACE_INCLUDE_PATH /path/to/directvisor/x86-kvm
# x86-kvm/mmutrace.h
# #define TRACE_INCLUDE_PATH /path/to/directvisor/x86-kvm
$ make
$ make remove
$ make insert
Example: Intel Corporation I350 Gigabit Network Controller
$ nvfs=1
$ interface="enp7s0f1"
$ echo $nvfs > /sys/class/net/$interface/device/sriov_numvfs
$ modprobe vfio-pci
# example: device driver, bus/device/function and vendor:device of vf is igbvf, 0000:07:10.1 and 8086:1520, respectively.
$ echo "0000:07:10.1" > /sys/bus/pci/drivers/igbvf/unbind
$ echo "8086 1520" > /sys/bus/pci/drivers/vfio-pci/new_id
$ sudo apt install uml-utilities
# assume the software bridge is configured.
$ br="br0"
$ tap="qtap0"
$ tunctl -t $tap -u `whoami`
$ brctl addif $br $tap
$ ifconfig $tap up
Example: map VCPU(0) to PCPU(1) and VCPU(1) to PCPU(2)
$ last=1
$ for i in `seq 0 $last`; do echo "$i $(( i + 1 ))"; done > map.txt
$ cat map.txt
0 1
1 2
For the second VM, we will do the same by specifying the mapping differently. For example, the hypervisor would like to pass-through PCPU(3) to VCPU(0), PCPU(4) to VCPU(1). Then,the file contains the following three records:
$ cat map.txt
0 3
1 4
$ vcpu=2
$ memory="2G"
$ vm_image="vm.qcow2"
$ index=0
$ bdf="07:10.1"
$ cpumap="map.txt"
$ BASE=7000
$ tport=$(( BASE + index ))
$ mport=$(( 1000 + BASE + index ))
$ qsock="/tmp/qmp${index}.sock"
$ tap="qtap${index}"
$ octet=$(printf '%.2x\n' $(( ++index )))
$ mac="52:54:00:12:34:${octet}"
$ qemu-system-x86_64 \
-enable-kvm \
-cpu host,host-cache-info=on \
-smp cpus=${vcpu},sockets=1,threads=1,maxcpus=${vcpu} \
-m ${memory} \
-drive file=${vm_image},if=virtio \
-device vfio-pci,host=${bdf},id=pnic \
-osnet_cpumap path=${cpumap} \
-qmp unix:${qsock},server,nowait \
-serial telnet:127.0.0.1:${tport},server,nowait \
-monitor telnet:127.0.0.1:${mport},server,nowait \
-parallel none \
-serial none \
-net none \
-vga none \
-nographic \
-nodefaults
#-netdev tap,ifname=$tap,id=$tap,script=no,downscript=no,vhost=on \
#-device virtio-net,netdev=$tap,mac=${mac},id=$tap \
Example: pin VCPU(0) to PCPU(1) and VCPU(1) to PCPU(2)
# get the vcpu thread ids
$ socket="/tmp/qmp0.sock"
$ echo '{"execute":"qmp_capabilities"}{"execute":"query-cpus"}' | nc -U ${socket} | tail -n 1 | tr ',{}[]' '\n' | awk '/thread_id/ {print $2}'
# example: vcpu(0) id is 1000 and vcpu(1) id is 1001
# example: pcpu(0) is 1 and pcpu(1) is 2
$ taskset -cp 1 1000
$ taskset -cp 2 1001
$ git clone git@github.com:osnetsvn/directvisor.git
$ cd directvisor
$ git checkout origin/guest -b guest
$ git checkout 81b75d8 -b guest-4.10.1
$ make
$ make insert
$ cd did
$ ./did.sh enable_did
$ cd directvisor/did
$ ./did.sh disable_did
$ cd ..
$ make remove
- Since the guest controls the LAPIC timers, the timers should stop in the host, but continue to work in the guest. The number of timer interrupts stops incrementing in the host, while the number of timer interrupts increments in the guest.
$ cat /proc/interrupts
- Run the iperf server on the guest with DID enabled.
iperf -s -i 1 - Run the iperf client on one of the clusters.
# assume the VM's IP is 10.128.0.131. iperf -c 10.128.0.131 -t 10 -i 1
Cyclictest will measure the timer latency.
- Clone the cyclictest repo
$ git clone https://github.com/osnetsvn/cyclictest.git - Compile and run the test
$ cd cyclictest $ make $ mkdir trace $ ./kcyclictest.sh 1 20 hi - View the test result
$ cat /sys/kernel/debug/tracing/trace - Clean the test result
$ echo > /sys/kernel/debug/tracing/trace
We will measure the network throughput and latency.
-
Install iPerf3
$ sudo apt-get install iperf3 -
Run iPerf servers on another host
$ ./run_iperf_servers.sh -
Tune network configuration in the guest
$ ./tune_network.sh -
Change the iPerf servers IP address
$ vim run_iperf_clients.sh -
Run iPerf clients in the guest
$ ./run_iperf_clients.shWe will observe following bandwidth performance on 40Gbps NIC
s1: - - - - - - - - - - - - - - - - - - - - - - - - - s1: [ ID] Interval Transfer Bandwidth Retr s1: [ 4] 0.00-15.00 sec 32.6 GBytes 18.7 Gbits/sec 0 sender s1: [ 4] 0.00-15.00 sec 32.6 GBytes 18.7 Gbits/sec receiver s1: s1: iperf Done. s2: [ 4] 14.00-15.00 sec 2.18 GBytes 18.8 Gbits/sec 0 1.17 MBytes s2: - - - - - - - - - - - - - - - - - - - - - - - - - s2: [ ID] Interval Transfer Bandwidth Retr s2: [ 4] 0.00-15.00 sec 32.6 GBytes 18.7 Gbits/sec 0 sender s2: [ 4] 0.00-15.00 sec 32.6 GBytes 18.7 Gbits/sec receiver -
Run ping to measure the network latency
$ ping -c 10 192.168.0.13 PING 192.168.0.13 (192.168.0.13) 56(84) bytes of data. 64 bytes from 192.168.0.13: icmp_seq=1 ttl=64 time=0.117 ms 64 bytes from 192.168.0.13: icmp_seq=2 ttl=64 time=0.114 ms 64 bytes from 192.168.0.13: icmp_seq=3 ttl=64 time=0.108 ms 64 bytes from 192.168.0.13: icmp_seq=4 ttl=64 time=0.146 ms 64 bytes from 192.168.0.13: icmp_seq=5 ttl=64 time=0.142 ms 64 bytes from 192.168.0.13: icmp_seq=6 ttl=64 time=0.106 ms 64 bytes from 192.168.0.13: icmp_seq=7 ttl=64 time=0.107 ms 64 bytes from 192.168.0.13: icmp_seq=8 ttl=64 time=0.119 ms 64 bytes from 192.168.0.13: icmp_seq=9 ttl=64 time=0.139 ms 64 bytes from 192.168.0.13: icmp_seq=10 ttl=64 time=0.138 ms


