Skip to content

viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode#1125

Open
FelixMcFelix wants to merge 17 commits into
masterfrom
felixmcfelix/viona-guest-promisc
Open

viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode#1125
FelixMcFelix wants to merge 17 commits into
masterfrom
felixmcfelix/viona-guest-promisc

Conversation

@FelixMcFelix

@FelixMcFelix FelixMcFelix commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

This PR adds support for the VIRTIO_NET_F_CTRL_RX capability, which allows guests to request specific MAC address filters in addition to their primary MAC, and to explicitly inform the host whether the device should be shifted into promiscuous mode.

The hope is that this allows us to take a guest out of all-multicast promiscuous mode in viona itself, which generates extra per-packet work (checking "is this packet multicast?") and contention in the Tx and Rx paths as both reach for and refcount the same list of promisc callbacks on the client.

Testing in falcon makes it look as though Debian will just immediately insert multicast entries, although I don't know if that's purely an artefact of my environment. Using one of the helios-dev images appears to correctly move out of promiscuous mode and keep the filter tables empty.

Closes #1122. Closes #1127.

@FelixMcFelix FelixMcFelix changed the title viona: support (enough) VIRTIO_NET_F_CTRL_RX to remove promisc viona: support (enough) VIRTIO_NET_F_CTRL_RX to leave all-multicast mode Apr 24, 2026
@FelixMcFelix

Copy link
Copy Markdown
Contributor Author

I've tested this so far on Alpine Linux 3.23.3, debian 13.2, and the helios-2.8 image we use for a4x2 testing. The first two are multiqueue-capable, whereas illumos is not -- control requests appear to be correctly served on queue 22 for the former and queue 2 for the latter.

The typical initialisation sequence looks to be that guests will send explicit promisc off signals for alongside every MAC filter update. snoop -rd vioif0 will also have illumos explicitly move us in and out of promiscuous mode.

Running on top of OPTE and/or in omicron standalone doesn't really stop Linux from asking for multicast MACs (i.e., they're not a result of anything coming in link-local). So for most guests we will need to get explicit MAC filter support into viona proper, or they'll just put themselves back into all-multicast mode. illumos, at least, keeps the tables empty. On Debian, I'm seeing:

Got MAC list [
    MacAddr([33, 33, 0, 0, 0, 1]), MacAddr([1, 0, 5e, 0, 0, 1]), MacAddr([1, 80, c2, 0, 0, 0]),
    MacAddr([1, 80, c2, 0, 0, 3]), MacAddr([1, 80, c2, 0, 0, e]), MacAddr([33, 33, ff, e7, 9c, e4]),
    MacAddr([33, 33, 0, 1, 0, 3]), MacAddr([1, 0, 5e, 0, 0, fc]), MacAddr([33, 33, ff, 0, 6, da])
]

With visible group subscriptions on:

IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      224.0.0.1
enp0s6          1      224.0.0.252
enp0s6          1      224.0.0.1
lo              1      ff02::1
lo              1      ff01::1
enp0s6          1      ff02::1:ff00:6da
enp0s6          1      ff02::1:3
enp0s6          2      ff02::1:ffe7:9ce4
enp0s6          2      ff02::1
enp0s6          1      ff01::1

So the MAC table can be decoded as:

IPv6(ff01::1 [All Nodes, Node Local], ff02::1 [All Nodes, Link Local]),
IPv4(224.0.0.1 [All Systems on Subnet]),
STP/LLDP(additional),
LLDP(additional),
LLDP(primary)/PTP,
IPv6(ff02::1:ffe7:9ce4 [solicited node addr]),
IPv6(ff02::1:3 [LLMNR]),
IPv4(224.0.0.252 [LLMNR]),
IPv6(ff02::1:ff00:6da [solicited node addr])

ip link set multicast off dev enp0s6 still leaves a bunch of active MACs including registered multicast addresses for all-/solicited-node, but those are sort of intrinsic to the stack. I think I'm happy at least that we're at parity with how cbhyve uses this machinery in pci_viona_eval_promisc.

@FelixMcFelix FelixMcFelix marked this pull request as ready for review April 29, 2026 11:28
Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment thread lib/propolis/src/hw/virtio/viona.rs

@iximeow iximeow left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the slow turnaround on this review! a couple of actual issues, a couple of me-trying-to-update-my-mental-model :)

Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment on lines +53 to +58
/// The index of the control queue when multiqueue ([`VIRTIO_NET_F_MQ`]) has
/// not been negotiated.
///
/// In this case, the driver will behave as though we have allocated only one
/// Rx/Tx queue pair, followed by the control queue.
pub const VIRTIO_NO_MQ_CTRL_Q_INDEX: usize = 2;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw part of the problem with picking 2 here on our own is that VirtQueues::iter() will still use get_control() to tack on a control queue at the end, so reset_queues_locked() will reset queues 0, 1, and propolis-VirtQueue-22 and I think leave queue 2 as it was before the reset. that's what has me thinking about keeping the notion of control-ness totally internal to VirtQueues for now, rather than splitting it between there and virtio/viona.rs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to go for a mechanism where we provide a list of control queue indices separately from the main number of queues, and adapt iter/iter_all to account for the cases where our control queues might come after a gap. My thinking here is that the other device families look sorta... arbitrary in where they put their control queue(s), but viona in particular may not have a control queue, and it may be directly after the in-use queues or at the very end of the maximum. I think it's right that PciVirtioViona is the one to make that call, given it depends wholly on the negotiated features.

Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
.queues
.set_len(npairs * 2 + 1)
.expect("num queue pairs");
self.virtio_state.queues.set_len(npairs * 2).expect("num queue pairs");

@FelixMcFelix FelixMcFelix May 27, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little torn on something here and in the concept of len as used between VirtQueues and PciVirtioViona, and would really like some more input. Conceptually it is right that we have 2 * n_pairs + 1 active queue pairs if we've negotiated a control queue. But VirtQueues will treat the first len entries as being valid -- if we get a set_use_pairs(2) against a maximum max_pairs = 11 we'll mistakenly treat queue 5 as enabled, and iter/iter_all will return queues [0, 5] ∪ { 22 }. So I've picked n_pairs * 2 to at least return the right queues in the iterators.

But then both before and after this change, queues.len() and queues.iter().len() are different quantities. Is that something we care about? Do we need to document this difference, or store two counts in the VQ list?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that something we care about?

it's something I care about, because I agree it's weird, but it's not a correctness issue anywhere that I know of.

I think this change is reasonable, but in the course of talking with you about this change I'm wondering if we can't actually have a dyn VirtQueue where the device is responsible for setting up queues with the appropriate operations for queue_notify, or queue_change(.. VqChange::IntrCfg), etc. this would give a cleaner separation for the virtio/viona I/O queues vs other kinds of I/O, and lets us handle multiple different kinds of control queue. I'm not totally sure how this works in the face of migration (it's not like we're exporting the vtable) but I think it's promising, and so I'm less worried about the numbers being a bit funky for now.

FelixMcFelix and others added 3 commits May 27, 2026 18:49
having `is_control` survive across reset is benign but might be
confusing for debugging: after a device reset the previous control
queue(s[^1]) remains "is_control: true" all the way until a VirtIO
driver negotiates features and has prompted us to set new control
queues. In the case of viona, if we tried to manage (say, pause) queues
before feature negotiation was complete, we've left a more complicated
situation to think through!

instead, we can clear `is_control` on reset, but doing so exposes a bug.
reset_queues_locked() would drive the queue state change *after*
clearing queue state, but queue_change() itself uses queue state to
operate correctly. if queue_change() wanted to tear down Propolis-side
bookkeeping (or debugging, logging, ...) we'll have removed necessary
data before giving it a chance to do its job.

so, `VirtQueue::reset` should come *after* a successful queue_change().
this has the mild benefit(?) of if we've failed to reset the queue we'll
leave it as it was. might be helpful for debugging the erroneous case?
Comment thread lib/propolis/src/hw/virtio/queue.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs Outdated
Comment thread lib/propolis/src/hw/virtio/viona.rs
Comment on lines +1045 to +1053
// Don't inflict promiscuous mode on drivers which request only their
// own MAC address.
let filter_is_self = !state.unicast_mac_filters.is_empty()
&& state
.unicast_mac_filters
.iter()
.all(|mac| mac == &self.mac_addr);
let need_promisc =
state.filter.contains(FilterState::PROMISCUOUS) || !filter_is_self;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please double-check me on this one, but I noticed that filter_is_self was checking if the first MAC was self.mac_addr and that unicast_mac_filters.is_empty() == false, which would have meant that if a guest requested its MAC as well as another, we'd think it's filter_is_self.

in 7afb19d I've moved things around a little but more importantly it's .iter().all() rather than .get(0).map(..).

I haven't tested this (I'm not really sure how I'd do the checks you mentioned in this comment) so I'm working mostly on boolean reasoning: in the case of an empty list we'd have (filter_is_self || state.unicast_mac_filters.is_empty()) be true because the former would get bool::default() || is_empty() for false || true, and that case now becomes all(..) && !is_empty() for true && false. non-empty lists are more obviously (to me) correct in this rewriting, but I was surprised to realize I couldn't think about the rewrite as "move expressions and apply De Morgan's" :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd originally skipped a && state.unicast_mac_filters.len() == 1, so you're right that there's a bug here. The existence of this check is maybe a bit misleading though without an explanation -- I have never seen a guest pass down its own MAC here, so this is purely defensive programming (i.e., an empty list does imply filter_is_self because it implicitly includes the MAC in config space, self.mac_addr). We ideally always have an empty unicast table in that light.

(The checks in the linked comment were mostly from shoving eprintln!s in various spots and inspecting stderr.)

I'll fix this up and amend the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Viona control queue index is incorrect Viona links never leave VIONA_PROMISC_MULTI

2 participants