-
Notifications
You must be signed in to change notification settings - Fork 10
[RLC-9] Rebase Custom Changes to rlc-9/5.14.0-611.26.1.el9_7 #848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLC-9] Rebase Custom Changes to rlc-9/5.14.0-611.26.1.el9_7 #848
Conversation
jira SECO-170 In Rocky9 if you run ./run_vmtests.sh -t hmm it will fail and cause an infinite loop on ASSERTs in FIXTURE_TEARDOWN() This temporary fix is based on the discussion here https://patchwork.kernel.org/project/linux-kselftest/patch/26017fe3-5ad7-6946-57db-e5ec48063ceb@suse.cz/#25046055 We will investigate further kselftest updates that will resolve the root causes of this. Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
In essiv_aead_setkey(), use the same logic as crypto_authenc_esn_setkey() to zeroize keys on exit. [Sultan: touched up commit message] Signed-off-by: Jason Rodriguez <jrodriguez@ciq.com>
Using the kernel crypto API, the SHA3-256 algorithm is used as
conditioning element to replace the LFSR in the Jitter RNG. All other
parts of the Jitter RNG are unchanged.
The application and use of the SHA-3 conditioning operation is identical
to the user space Jitter RNG 3.4.0 by applying the following concept:
- the Jitter RNG initializes a SHA-3 state which acts as the "entropy
pool" when the Jitter RNG is allocated.
- When a new time delta is obtained, it is inserted into the "entropy
pool" with a SHA-3 update operation. Note, this operation in most of
the cases is a simple memcpy() onto the SHA-3 stack.
- To cause a true SHA-3 operation for each time delta operation, a
second SHA-3 operation is performed hashing Jitter RNG status
information. The final message digest is also inserted into the
"entropy pool" with a SHA-3 update operation. Yet, this data is not
considered to provide any entropy, but it shall stir the entropy pool.
- To generate a random number, a SHA-3 final operation is performed to
calculate a message digest followed by an immediate SHA-3 init to
re-initialize the "entropy pool". The obtained message digest is one
block of the Jitter RNG that is returned to the caller.
Mathematically speaking, the random number generated by the Jitter RNG
is:
aux_t = SHA-3(Jitter RNG state data)
Jitter RNG block = SHA-3(time_i || aux_i || time_(i-1) || aux_(i-1) ||
... || time_(i-255) || aux_(i-255))
when assuming that the OSR = 1, i.e. the default value.
This operation implies that the Jitter RNG has an output-blocksize of
256 bits instead of the 64 bits of the LFSR-based Jitter RNG that is
replaced with this patch.
The patch also replaces the varying number of invocations of the
conditioning function with one fixed number of invocations. The use
of the conditioning function consistent with the userspace Jitter RNG
library version 3.4.0.
The code is tested with a system that exhibited the least amount of
entropy generated by the Jitter RNG: the SiFive Unmatched RISC-V
system. The measured entropy rate is well above the heuristically
implied entropy value of 1 bit of entropy per time delta. On all other
tested systems, the measured entropy rate is even higher by orders
of magnitude. The measurement was performed using updated tooling
provided with the user space Jitter RNG library test framework.
The performance of the Jitter RNG with this patch is about en par
with the performance of the Jitter RNG without the patch.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-port of commit bb897c5
Author: Stephan Müller <smueller@chronox.de>
Date: Fri Apr 21 08:08:04 2023 +0200
Signed-off-by: Jeremy Allison <jallison@ciq.com>
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
cryptographic information should be zeroized once they are no longer
needed. Accomplish this by using kfree_sensitive for buffers that
previously held the private key.
Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Back-ported from commit 23e4099
Author: Hailey Mothershead <hailmo@amazon.com>
Date: Mon Apr 15 22:19:15 2024 +0000
Signed-off-by: Jeremy Allison <jallison@ciq.com>
private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key. However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first. Note that this patch slightly changes the behavior of this function: previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails. Signed-off-by: Joachim Vandersmissen <git@jvdsn.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jonathan Maple <jmaple@ciq.com>
[ Upstream commit ba3c557 ] When the mpi_ec_ctx structure is initialized, some fields are not cleared, causing a crash when referencing the field when the structure was released. Initially, this issue was ignored because memory for mpi_ec_ctx is allocated with the __GFP_ZERO flag. For example, this error will be triggered when calculating the Za value for SM2 separately. Fixes: d58bb7e ("lib/mpi: Introduce ec implementation to MPI library") Cc: stable@vger.kernel.org # v6.5 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Jonathan Maple <jmaple@ciq.com>
When FIPS mode is enabled (via fips=1), there is an absolute need for the
DRBG to be available. This is at odds with the fact that the DRBG can be
built as a module when in FIPS mode, leaving critical RNG functionality at
the whims of userspace.
Userspace could simply rmmod the DRBG module, or not provide it at all and
thus a different stdrng algorithm could be used without anyone noticing.
Additionally, when running a FIPS-enabled userspace, modprobe itself may
perform a getrandom() syscall _before_ loading a given module. As a result,
there's a possible deadlock scenario where the RNG core (crypto/rng.c)
initializes _before_ the DRBG, thereby installing its getrandom() override
without having an stdrng algorithm available. Then, when userspace calls
getrandom() which redirects to the override in crypto/rng.c,
crypto_alloc_rng("stdrng") invokes the UMH (modprobe) to load the DRBG
(which is aliased to stdrng). And *then* that modprobe invocation gets
stuck at getrandom() because there's no stdrng algorithm available!
There are too many risks that come with allowing the DRBG and RNG core to
be modular for FIPS mode. Therefore, make CRYPTO_FIPS require the DRBG to
be built-in, which in turn makes the DRBG require the RNG core to be
built-in. That way, it's guaranteed for these drivers to be built-in when
running in FIPS mode.
Also clean up the CRYPTO_FIPS option name and remove the CRYPTO_ANSI_CPRNG
dependency since it's obsolete for FIPS now.
Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
It is technically a risk to permit extrng registration by modules after kernel init completes. Since there is only one user of the extrng interface and it is imperative that it is the _only_ registered extrng for FIPS compliance, restrict the extrng registration interface to only permit registration during kernel init and only from built-in drivers. This also eliminates the risks associated with the extrng interface itself being designed to solely accommodate a single registration, which would therefore permit the registered extrng to be overridden or even removed by an unrelated module. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
There is no reason this refcount should be a signed int. Convert it to an unsigned int, thereby also making it less likely to ever overflow. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
None of the ciphers used by the DRBG have an alignment requirement; thus, they all return 0 from .crypto_init, resulting in inconsistent alignment across all buffers. Align all buffers to at least a cache line to improve performance. This is especially useful when multiple DRBG instances are used, since it prevents false sharing of cache lines between the different instances. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Since crypto_devrandom_read_iter() is invoked directly by user tasks and is accessible by every task in the system, there are glaring priority inversions on crypto_reseed_rng_lock and crypto_default_rng_lock. Tasks of arbitrary scheduling priority access crypto_devrandom_read_iter(). When a low-priority task owns one of the mutex locks, higher-priority tasks waiting on that mutex lock are stalled until the low-priority task is done. Fix the priority inversions by converting the mutex locks into rt_mutex locks which have PI support. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Like pin_user_pages_fast(), but with the internal-only FOLL_FAST_ONLY flag. This complements the get_user_pages*() API, which already has get_user_pages_fast_only(). Note that pin_user_pages_fast_only() used to exist but was removed in upstream commit edad1bb ("mm/gup: remove pin_user_pages_fast_only()") due to it not having any users. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
When the kernel is booted with fips=1, the RNG exposed to userspace is hijacked away from the CRNG and redirects to crypto_devrandom_read_iter(), which utilizes the DRBG. Notably, crypto_devrandom_read_iter() maintains just two global DRBG instances _for the entire system_, and the two instances serve separate request types: one instance for GRND_RANDOM requests (crypto_reseed_rng), and one instance for non-GRND_RANDOM requests (crypto_default_rng). So in essence, for requests of a single type, there is just one global RNG for all CPUs in the entire system, which scales _very_ poorly. To make matters worse, the temporary buffer used to ferry data between the DRBG and userspace is woefully small at only 256 bytes, which doesn't do a good job of maximizing throughput from the DRBG. This results in lost performance when userspace requests >256 bytes; it is observed that DRBG throughput improves by 70% on an i9-13900H when the buffer size is increased to 4096 bytes (one page). Going beyond the size of one page up to the DRBG maximum request limit of 65536 bytes produces diminishing returns of only 3% improved throughput in comparison. And going below the size of one page produces progressively less throughput at each power of 2: there's a 5% loss going from 4096 bytes to 2048 bytes and a 9% loss going from 2048 bytes to 1024 bytes. Thus, this implements per-CPU DRBG instances utilizing a page-sized buffer for each CPU to utilize the DRBG itself more effectively. On top of that, for non-GRND_RANDOM requests, the DRBG's operations now occur under a local lock that disables preemption on non-PREEMPT_RT kernels, which not only keeps each CPU's DRBG instance isolated from another, but also improves temporal cache locality while the DRBG actively generates a new string of random bytes. Prefaulting one user destination page at a time is also employed to prevent a DRBG instance from getting blocked on page faults, thereby maximizing the use of the DRBG so that the only bottleneck is the DRBG itself. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
We want to hard set the x86_64 FIPS required configs rather than rely on default settings in the kernel, should these ever change without our knowing it would not be something we would have actively checked. The configs are a limited set of configs that is expanded out when building using `make olddefconfig` a common practice in kernel building. Note had to manually add the following since its normaly set by the RPM build process. CONFIG_CRYPTO_FIPS_NAME="Rocky Linux 9 Kernel Cryptographic API" Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Simplifies the workflow to use the reusable workflow defined in main branch. This reduces duplication and makes the workflow easier to maintain across multiple branches. The workflow was renamed because it now includes validation over and above just checking for upstream fixes Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3207 feature tools_hv commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit a9c0b33 Allow the KVP daemon to log the KVP updates triggered in the VM with a new debug flag(-d). When the daemon is started with this flag, it logs updates and debug information in syslog with loglevel LOG_DEBUG. This information comes in handy for debugging issues where the key-value pairs for certain pools show mismatch/incorrect values. The distro-vendors can further consume these changes and modify the respective service files to redirect the logs to specific files as needed. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com> (cherry picked from commit a9c0b33) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
In FIPS mode, the DRBG must take precedence over all stdrng algorithms. The only problem standing in the way of this is that a different stdrng algorithm could get registered and utilized before the DRBG is registered, and since crypto_alloc_rng() only allocates an stdrng algorithm when there's no existing allocation, this means that it's possible for the wrong stdrng algorithm to remain in use indefinitely. This issue is also often impossible to observe from userspace; an RNG other than the DRBG could be used somewhere in the kernel and userspace would be none the wiser. To ensure this can never happen, only allow stdrng instances from the DRBG to be registered when running in FIPS mode. This works since the previous commit forces the DRBG to be built into the kernel when CONFIG_CRYPTO_FIPS is enabled, so the DRBG's presence is guaranteed when fips_enabled is true. Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
jira LE-4466 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 5da8a8b For supporting dynamic MSI-X vector allocation by PCI controllers, enabling the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() to prepare the MSI descriptor is also needed. Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic MSI-X vector allocation. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> (cherry picked from commit 5da8a8b) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4466 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit ad518f2 Allow dynamic MSI-X vector allocation for pci_hyperv PCI controller by adding support for the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and using pci_msix_prepare_desc() to prepare the MSI-X descriptors. Feature support added for both x86 and ARM64 Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> (cherry picked from commit ad518f2) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4466 commit-author Yury Norov <yury.norov@gmail.com> commit 4607617 Commit 91bfe21 ("net: mana: add a function to spread IRQs per CPUs") added the irq_setup() function that distributes IRQs on CPUs according to a tricky heuristic. The corresponding commit message explains the heuristic. Duplicate it in the source code to make available for readers without digging git in history. Also, add more detailed explanation about how the heuristics is implemented. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> (cherry picked from commit 4607617) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4466 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 845c62c In order to prepare the MANA driver to allocate the MSI-X IRQs dynamically, we need to enhance irq_setup() to allow skipping affinitizing IRQs to the first CPU sibling group. This would be for cases when the number of IRQs is less than or equal to the number of online CPUs. In such cases for dynamically added IRQs the first CPU sibling group would already be affinitized with HWC IRQ. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> (cherry picked from commit 845c62c) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4466 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 7553911 upstream-diff There were conflicts seen when applying this patch due to following commits present in our tree before this patch. 590bcf1 ("net: mana: Add handler for hardware servicing events") 00c2b0f ("net: mana: Fix warnings for missing export.h header inclusion") Currently, the MANA driver allocates MSI-X vectors statically based on MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends up allocating more vectors than it needs. This is because, by this time we do not have a HW channel and do not know how many IRQs should be allocated. To avoid this, we allocate 1 MSI-X vector during the creation of HWC and after getting the value supported by hardware, dynamically add the remaining MSI-X vectors. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> (cherry picked from commit 7553911) Signed-off-by: Shreeya Patel <spatel@ciq.com> Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4472 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit 75cabb4 upstream-diff There was a conflict seen when applying this patch due to the following commit not present in our tree. 92272ec ("eth: add missing xdp.h includes in drivers") Introduce support for net_shaper_ops in the MANA driver, enabling configuration of rate limiting on the MANA NIC. To apply rate limiting, the driver issues a HWC command via mana_set_bw_clamp() and updates the corresponding shaper object in the net_shaper cache. If an error occurs during this process, the driver restores the previous speed by querying the current link configuration using mana_query_link_cfg(). The minimum supported bandwidth is 100 Mbps, and only values that are exact multiples of 100 Mbps are allowed. Any other values are rejected. To remove a shaper, the driver resets the bandwidth to the maximum supported by the SKU using mana_set_bw_clamp() and clears the associated cache entry. If an error occurs during this process, the shaper details are retained. On the hardware that does not support these APIs, the net-shaper calls to set speed would fail. Set the speed: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do set --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }, "bw-max": 200000000 }' Get the shaper details: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do get --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }}' > {'bw-max': 200000000, > 'handle': {'scope': 'netdev'}, > 'ifindex': $IFINDEX, > 'metric': 'bps'} Delete the shaper object: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do delete --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev","id":'$ID' }}' Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-3-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 75cabb4) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4472 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit a6d5edf Allow mana ethtool get_link_ksettings operation to report the maximum speed supported by the SKU in mbps. The driver retrieves this information by issuing a HWC command to the hardware via mana_query_link_cfg(), which retrieves the SKU's maximum supported speed. These APIs when invoked on hardware that are older/do not support these APIs, the speed would be reported as UNKNOWN. Before: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes After: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 16000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-4-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit a6d5edf) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4472 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit ca8ac48 upstream-diff There were conflicts seen when applying this patch due to the following patch being in our tree before this one. 7a3c235 ("net: mana: Handle Reset Request from MANA NIC") If any of the HWC commands are not recognized by the underlying hardware, the hardware returns the response header status of -1. Log the information using netdev_info_once to avoid multiple error logs in dmesg. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/1750144656-2021-5-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit ca8ac48) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4472 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit 11cd020 Fix build errors when CONFIG_NET_SHAPER is disabled, including: drivers/net/ethernet/microsoft/mana/mana_en.c:804:10: error: 'const struct net_device_ops' has no member named 'net_shaper_ops' 804 | .net_shaper_ops = &mana_shaper_ops, drivers/net/ethernet/microsoft/mana/mana_en.c:804:35: error: initialization of 'int (*)(struct net_device *, struct neigh_parms *)' from incompatible pointer type 'const struct net_shaper_ops *' [-Werror=incompatible-pointer-types] 804 | .net_shaper_ops = &mana_shaper_ops, Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Fixes: 75cabb4 ("net: mana: Add support for net_shaper_ops") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506230625.bfUlqb8o-lkp@intel.com/ Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1750851355-8067-1-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 11cd020) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4526 commit-author Zhiyue Qiu <zhiyueqiu@microsoft.com> commit 084f35b Add packet and request port counters to mana_ib. Signed-off-by: Zhiyue Qiu <zhiyueqiu@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1752143395-5324-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 084f35b) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4523 commit-author Konstantin Taranov <kotaranov@microsoft.com> commit 44d69d3 Drain send WRs of the GSI QP on device removal. In rare servicing scenarios, the hardware may delete the state of the GSI QP, preventing it from generating CQEs for pending send WRs. Since WRs submitted to the GSI QP hold CM resources, the device cannot be removed until those WRs are completed. This patch marks all pending send WRs as failed, allowing the GSI QP to release the CM resources and enabling safe device removal. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1753779618-23629-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 44d69d3) Signed-off-by: Shreeya Patel <spatel@ciq.com>
…nnel open. jira LE-4493 commit-author Dipayaan Roy <dipayanroy@linux.microsoft.com> commit 9448ccd The hv_netvsc driver currently enables NAPI after opening the primary and subchannels. This ordering creates a race: if the Hyper-V host places data in the host -> guest ring buffer and signals the channel before napi_enable() has been called, the channel callback will run but napi_schedule_prep() will return false. As a result, the NAPI poller never gets scheduled, the data in the ring buffer is not consumed, and the receive queue may remain permanently stuck until another interrupt happens to arrive. Fix this by enabling NAPI and registering it with the RX/TX queues before vmbus channel is opened. This guarantees that any early host signal after open will correctly trigger NAPI scheduling and the ring buffer will be drained. Fixes: 76bb5db ("netvsc: fix use after free on module removal") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250825115627.GA32189@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 9448ccd) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4496 commit-author Haiyang Zhang <haiyangz@microsoft.com> commit c4deabb If HW Channel (HWC) is not responding, reduce the waiting time, so further steps will fail quickly. This will prevent getting stuck for a long time (30 minutes or more), for example, during unloading while HWC is not responding. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1757537841-5063-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit c4deabb) Signed-off-by: Shreeya Patel <spatel@ciq.com>
jira LE-4520 commit-author Shiraz Saleem <shirazsaleem@microsoft.com> commit 2bd7dd3 Extend modify QP to support further attributes: local_ack_timeout, UD qkey, rate_limit, qp_access_flags, flow_label, max_rd_atomic. Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1757923172-4475-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 2bd7dd3) Signed-off-by: Shreeya Patel <spatel@ciq.com>
…/O issuing CPU jira LE-4536 commit-author Long Li <longli@microsoft.com> commit b69ffea When selecting an outgoing channel for I/O, storvsc tries to select a channel with a returning CPU that is not the same as issuing CPU. This worked well in the past, however it doesn't work well when the Hyper-V exposes a large number of channels (up to the number of all CPUs). Use a different CPU for returning channel is not efficient on Hyper-V. Change this behavior by preferring to the channel with the same CPU as the current I/O issuing CPU whenever possible. Tests have shown improvements in newer Hyper-V/Azure environment, and no regression with older Hyper-V/Azure environments. Tested-by: Raheel Abdul Faizy <rabdulfaizy@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Message-Id: <1759381530-7414-1-git-send-email-longli@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit b69ffea) Signed-off-by: Shreeya Patel <spatel@ciq.com>
…es to improve memory efficiency. jira LE-4489 commit-author Dipayaan Roy <dipayanroy@linux.microsoft.com> commit 730ff06 upstream-diff This patch was causing build failures due to missing commit 0f92140 ("memory-provider: dmabuf devmem memory provider") To fix it, we have removed pprm.queue_idx parameter which seems is not being used even after being set because of the missing commit. This patch enhances RX buffer handling in the mana driver by allocating pages from a page pool and slicing them into MTU-sized fragments, rather than dedicating a full page per packet. This approach is especially beneficial on systems with large base page sizes like 64KB. Key improvements: - Proper integration of page pool for RX buffer allocations. - MTU-sized buffer slicing to improve memory utilization. - Reduce overall per Rx queue memory footprint. - Automatic fallback to full-page buffers when: * Jumbo frames are enabled (MTU > PAGE_SIZE / 2). * The XDP path is active, to avoid complexities with fragment reuse. Testing on VMs with 64KB pages shows around 200% throughput improvement. Memory efficiency is significantly improved due to reduced wastage in page allocations. Example: We are now able to fit 35 rx buffers in a single 64kb page for MTU size of 1500, instead of 1 rx buffer per page previously. Tested: - iperf3, iperf2, and nttcp benchmarks. - Jumbo frames with MTU 9000. - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for testing the XDP path in driver. - Memory leak detection (kmemleak). - Driver load/unload, reboot, and stress scenarios. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 730ff06) Signed-off-by: Shreeya Patel <spatel@ciq.com>
PlaidCat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
|
🤖 Validation Checks In Progress Workflow run: https://github.com/ctrliq/kernel-src-tree/actions/runs/21603015050 |
🔍 Interdiff Analysis
diff -u b/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
--- b/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -335,8 +360,6 @@ INTERDIFF: rejected hunk from patch1, cannot diff context
static int kvp_key_add_or_modify(int pool, const __u8 *key, int key_size,
const __u8 *value, int value_size)
{
- int i;
- int num_records;
struct kvp_record *record;
int num_blocks;
@@ -347,8 +372,6 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
static int kvp_key_add_or_modify(int pool, const __u8 *key, int key_size,
const __u8 *value, int value_size)
{
- int i;
- int num_records;
struct kvp_record *record;
int num_blocks;
diff -u b/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
--- b/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -6,6 +6,8 @@ INTERDIFF: rejected hunk from patch1, cannot diff context
#include <linux/pci.h>
#include <linux/utsname.h>
#include <linux/version.h>
+#include <linux/msi.h>
+#include <linux/irqdomain.h>
#include <linux/export.h>
#include <net/mana/mana.h>
@@ -6,6 +6,8 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
#include <linux/pci.h>
#include <linux/utsname.h>
#include <linux/version.h>
+#include <linux/msi.h>
+#include <linux/irqdomain.h>
#include <net/mana/mana.h>
@@ -6,6 +6,6 @@
#include <linux/pci.h>
#include <linux/utsname.h>
#include <linux/version.h>
+#include <linux/export.h>
#include <net/mana/mana.h>
-
@@ -1642,8 +1780,6 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
gc->max_num_msix = 0;
gc->num_msix_usable = 0;
- kfree(gc->irq_contexts);
- gc->irq_contexts = NULL;
}
static int mana_gd_setup(struct pci_dev *pdev)
@@ -1801,8 +1939,6 @@ INTERDIFF: rejected hunk from patch1, cannot diff context
gc->max_num_msix = 0;
gc->num_msix_usable = 0;
- kfree(gc->irq_contexts);
- gc->irq_contexts = NULL;
}
static int mana_gd_setup(struct pci_dev *pdev)
diff -u b/include/net/mana/gdma.h b/include/net/mana/gdma.h
--- b/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -577,10 +577,10 @@
-#define GDMA_DRV_CAP_FLAGS1 \
- (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
- GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
+/* Driver can self reset on EQE notification */
+#define GDMA_DRV_CAP_FLAG_1_SELF_RESET_ON_EQE BIT(14)
+
GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT | \
- GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP)
-
-#define GDMA_DRV_CAP_FLAGS2 0
+ GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP | \
+ GDMA_DRV_CAP_FLAG_1_SELF_RESET_ON_EQE | \
+ GDMA_DRV_CAP_FLAG_1_HANDLE_RECONFIG_EQE)
@@ -578,8 +578,11 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
/* Driver can handle holes (zeros) in the device list */
#define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
+/* Driver supports dynamic MSI-X vector allocation */
+#define GDMA_DRV_CAP_FLAG_1_DYNAMIC_IRQ_ALLOC_SUPPORT BIT(13)
+
#define GDMA_DRV_CAP_FLAGS1 \
(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT | \
@@ -583,7 +586,8 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT | \
- GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP)
+ GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP | \
+ GDMA_DRV_CAP_FLAG_1_DYNAMIC_IRQ_ALLOC_SUPPORT)
#define GDMA_DRV_CAP_FLAGS2 0
@@ -582,6 +582,9 @@ INTERDIFF: rejected hunk from patch1, cannot diff context
/* Driver can handle holes (zeros) in the device list */
#define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
+/* Driver supports dynamic MSI-X vector allocation */
+#define GDMA_DRV_CAP_FLAG_1_DYNAMIC_IRQ_ALLOC_SUPPORT BIT(13)
+
/* Driver can self reset on EQE notification */
#define GDMA_DRV_CAP_FLAG_1_SELF_RESET_ON_EQE BIT(14)
@@ -594,6 +597,7 @@ INTERDIFF: rejected hunk from patch1, cannot diff context
GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT | \
GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP | \
+ GDMA_DRV_CAP_FLAG_1_DYNAMIC_IRQ_ALLOC_SUPPORT | \
GDMA_DRV_CAP_FLAG_1_SELF_RESET_ON_EQE | \
GDMA_DRV_CAP_FLAG_1_HANDLE_RECONFIG_EQE)
diff -u b/include/net/mana/mana.h b/include/net/mana/mana.h
--- b/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -1,8 +1,6 @@
+#ifndef _MANA_H
#define _MANA_H
-#include <net/xdp.h>
-
-#include <net/net_shaper.h>
-
#include "gdma.h"
#include "hw_channel.h"
+
@@ -5,6 +5,7 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
#define _MANA_H
#include <net/xdp.h>
+#include <net/net_shaper.h>
#include "gdma.h"
#include "hw_channel.h"
diff -u b/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
--- b/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -847,6 +847,9 @@ INTERDIFF: rejected hunk from patch2, cannot diff context
err = mana_gd_send_request(gc, in_len, in_buf, out_len,
out_buf);
if (err || resp->status) {
+ if (err == -EOPNOTSUPP)
+ return err;
+
if (req->req.msg_type != MANA_QUERY_PHY_STAT)
dev_err(dev, "Failed to send mana message: %d, 0x%x\n",
err, resp->status);
only in patch2:
unchanged:
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -891,6 +891,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
}
if (ctx->status_code && ctx->status_code != GDMA_STATUS_MORE_ENTRIES) {
+ if (ctx->status_code == GDMA_STATUS_CMD_UNSUPPORTED) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
if (req_msg->req.msg_type != MANA_QUERY_PHY_STAT)
dev_err(hwc->dev, "HWC: Failed hw_channel req: 0x%x\n",
ctx->status_code);
only in patch2:
unchanged:
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -10,6 +10,7 @@
#include "shm_channel.h"
#define GDMA_STATUS_MORE_ENTRIES 0x00000105
+#define GDMA_STATUS_CMD_UNSUPPORTED 0xffffffff
/* Structures labeled with "HW DATA" are exchanged with the hardware. All of
* them are naturally aligned and hence don't need __packed.
diff -u b/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
--- b/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2471,6 +2471,7 @@
pprm.napi = &rxq->rx_cq.napi;
pprm.netdev = rxq->ndev;
pprm.order = get_order(rxq->alloc_size);
+ pprm.queue_idx = rxq->rxq_idx;
pprm.dev = gc->dev;
/* Let the page pool do the dma map when page sharing with multipleThis is an automated interdiff check for backported commits. |
|
✅ Validation checks completed successfully View full results: https://github.com/ctrliq/kernel-src-tree/actions/runs/21603015050 |
https://ciqinc.atlassian.net/browse/KERNEL-542
Update process (This kernel CentOS base for 5.14.0-611)
src.rpms hosted by RESFrlc-9/5.14.0-611.X.1.el9_7branchelrelease.Rebuild Log
Build
Kselftests