Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750
Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750singalsu wants to merge 12 commits into
Conversation
40bb97f to
1768663
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the MFCC feature extraction path to improve Mel log precision (moving key Mel outputs to 32-bit Q9.23) and adds SoundWire (SDW) topology support for branched “audio features capture” pipelines (MFCC/Mel output alongside normal capture).
Changes:
- Switch Mel filterbank 32-bit output to Q9.23 (int32) and propagate this through MFCC processing and tuning utilities.
- Add new topology2 pipeline/class and SDW platform includes to expose MFCC/Mel “audio features capture” PCMs for jack and DMIC.
- Refactor MFCC tune scripts (run script + MATLAB/Octave decoders) to handle multiple bit depths and Xtensa runs.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/topology/topology2/platform/intel/sdw-jack-generic.conf | Inserts a module-copier stage in the jack capture path to act as a branch point for audio-feature capture. |
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf | New SDW jack MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline. |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf | New SDW DMIC MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline. |
| tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf | New reusable pipeline class intended to perform SRC then MFCC then host capture. |
| tools/topology/topology2/include/common/common_definitions.conf | Adds feature flags to gate SDW jack/DMIC audio-feature capture includes. |
| tools/topology/topology2/development/tplg-targets.cmake | Adds new SDW topology build targets enabling MFCC audio-feature capture. |
| tools/topology/topology2/cavs-sdw.conf | Includes the new pipeline class and gates inclusion of new SDW audio-feature capture platform snippets. |
| test/cmocka/src/math/auditory/auditory.c | Updates unit test to accommodate 32-bit Mel log output and compares against legacy reference after downscaling. |
| src/math/auditory/mel_filterbank_32.c | Changes psy_apply_mel_filterbank_32() output from int16 Q9.7 to int32 Q9.23. |
| src/include/sof/math/fft.h | Adds icomplex16 include (header dependency fix). |
| src/include/sof/math/auditory.h | Updates psy_apply_mel_filterbank_32() signature to int32 output. |
| src/include/sof/audio/mfcc/mfcc_comp.h | Forces MFCC to 32-bit FFT path and extends state for 32-bit Mel log storage/output pointers. |
| src/audio/mfcc/tune/run_mfcc.sh | Refactors MFCC tuning runner into reusable functions and adds optional Xtensa testbench execution. |
| src/audio/mfcc/tune/README.txt | Updates tuning documentation to match new output files and decode workflow. |
| src/audio/mfcc/tune/decode_mel.m | Extends Mel decoder to support s16/s24/s32 formats and raw/wav reading. |
| src/audio/mfcc/tune/decode_all.m | New helper to decode/plot all generated MFCC/Mel outputs in one go. |
| src/audio/mfcc/mfcc.c | Simplifies prepare logging; removes a sink buffer size check. |
| src/audio/mfcc/mfcc_setup.c | Adjusts setup behavior for sample rate mismatch; adds scratch allocation for 32-bit Mel log output and updates free paths. |
| src/audio/mfcc/mfcc_hifi4.c | Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling. |
| src/audio/mfcc/mfcc_hifi3.c | Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling. |
| src/audio/mfcc/mfcc_generic.c | Removes duplicate fft-fill implementation. |
| src/audio/mfcc/mfcc_common.c | Implements shared fft-fill routine; updates Mel processing to use/maintain Q9.23 and updates s24/s32 Mel-only output behavior. |
| src/audio/mfcc/Kconfig | Switches MFCC to select 32-bit Mel filterbank support. |
| scripts/rebuild-testbench.sh | Exports XTENSA_PATH in generated Xtensa environment setup script. |
Comments suppressed due to low confidence (1)
src/audio/mfcc/Kconfig:13
- MFCC is now hard-coded to use 32-bit FFT (MFCC_FFT_BITS=32), but this Kconfig only selects MATH_FFT (which defaults to 16-bit FFT support) and does not select MATH_32BIT_FFT. This can lead to link/build failures when fft_execute_32() isn’t compiled. Select MATH_32BIT_FFT here (or make MFCC_FFT_BITS configurable and select the matching FFT width).
select CORDIC_FIXED
select MATH_32BIT_MEL_FILTERBANK
select MATH_AUDITORY
select MATH_DCT
select MATH_DECIBELS
select MATH_FFT
select MATH_MATRIX
select MATH_WINDOW
| { | ||
| source src.$index.1 | ||
| sink mfcc.$index.1 | ||
| } |
There was a problem hiding this comment.
It's a common convention in pipeline classes to leave last widget (copier) unconnected. The upper level topology can then add widgets to pipeline if need. Also the copier index seems to be the PCM ID.
Change the Mel filterbank 32-bit variant psy_apply_mel_filterbank_32() output from int16_t Q9.7 (was wrongly commented as Q8.7) to int32_t Q9.23 format for improved signal resolution. The output parameter type is changed from int16_t* to int32_t* in both the implementation and the header declaration. The auditory unit test is updated to allocate int32_t output and convert Q9.23 to Q9.7 for comparison against existing reference vectors. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The input samples must be shifted logically to sign bit and then shifted right arithmetically into place for the 16 bit saturation instruction to work correctly. This fixes a possible overflow with large input. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Remove the duplicate AE_MULFP32X16X2RS_H call in the 32-bit FFT path of mfcc_apply_window(). Its result was immediately overwritten by the AE_MULFP32X16X2RS_L call on the next line, making it dead code. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
1768663 to
d64404a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
src/audio/mfcc/Kconfig:13
- COMP_MFCC now hard-codes 32-bit MFCC processing (MFCC_FFT_BITS=32) and the code calls fft_execute_32(), but this Kconfig only selects MATH_FFT (which defaults to 16-bit support) and does not select MATH_32BIT_FFT. With default Kconfig values this can lead to missing 32-bit FFT objects at link time. Please select MATH_32BIT_FFT here (or make MFCC_FFT_BITS conditional on CONFIG_MATH_32BIT_FFT).
config COMP_MFCC
tristate "MFCC component"
depends on COMP_MODULE_ADAPTER
select CORDIC_FIXED
select MATH_32BIT_MEL_FILTERBANK
select MATH_AUDITORY
select MATH_DCT
select MATH_DECIBELS
select MATH_FFT
select MATH_MATRIX
select MATH_WINDOW
| /* get sink data format and period bytes */ | ||
| sink_format = audio_stream_get_frm_fmt(&sinkb->stream); | ||
| sink_period_bytes = audio_stream_period_bytes(&sinkb->stream, dev->frames); | ||
| comp_info(dev, "source_format = %d, sink_format = %d", | ||
| source_format, sink_format); | ||
| if (audio_stream_get_size(&sinkb->stream) < sink_period_bytes) { | ||
| comp_err(dev, "sink buffer size %d is insufficient < %d", | ||
| audio_stream_get_size(&sinkb->stream), sink_period_bytes); | ||
| ret = -ENOMEM; | ||
| goto err; | ||
| } | ||
| comp_info(dev, "source_format = %d, sink_format = %d", source_format, sink_format); | ||
|
|
||
| cd->config = comp_get_data_blob(cd->model_handler, &data_size, NULL); |
There was a problem hiding this comment.
No. Module adapter sets the buffers, and we want to minimize the module code from this kind of checks.
| "true" "platform/intel/sdw-jack-audio-feature.conf" | ||
| } | ||
|
|
||
| IncludeByKey.SDW_DMIC_AUDIO_FEATURE_CAPTURE { | ||
| "true" "platform/intel/sdw-dmic-audio-feature.conf" |
There was a problem hiding this comment.
I want to keep it as it is and get an error if non-supported build is attempted. A successful topologies build with enabled features not present is a worse option.
| SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_NAME "Microphone Audio Features" | ||
| SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_ID 48 | ||
| SDW_DMIC_AUDIO_FEATURE_CAPTURE_STREAM_NAME "Microphone Audio Features Stream" | ||
| SDW_DMIC_AUDIO_FEATURE_CAPTURE_PIPELINE_ID 131 |
There was a problem hiding this comment.
I prefer to not do this. The PCM ID may need adjustment later. I picked one that appeared to be free in current topologies and also no planned use (RESERVED) in https://github.com/thesofproject/sof/files/5954259/PCMDeviceList.txt .
| Object.Base.input_audio_format [ | ||
| { | ||
| in_bit_depth 32 | ||
| in_valid_bit_depth 32 | ||
| in_rate 48000 | ||
| } | ||
| ] | ||
| Object.Base.output_audio_format [ | ||
| { | ||
| out_bit_depth 32 | ||
| out_valid_bit_depth 32 | ||
| out_rate 16000 | ||
| } |
There was a problem hiding this comment.
Pipeline class is independent of DAI type (SDW). I'll try to add three input rates for SRC and test that the kernel driver can resolve that. It should be the simplest way.
d64404a to
5d61528
Compare
|
This PR version removed the MFCC_FFT_BITS 16 code. There were some broken configuration option combinations and I did not want to fix them when knowing that we need 32 bits to have enough small delta to a floating point audio features extractor (in OpenVINO). |
This patch switches MFCC_FFT_BITS from 16 to 32 to use 32-bit FFT mode for better precision in the MFCC processing pipeline. In cepstral mode (num_ceps > 0), the 32-bit Q9.23 Mel output from psy_apply_mel_filterbank_32() is converted to 16-bit Q9.7 before the existing 16-bit DCT calculation, preserving the current DCT and cepstral lifter behavior. In Mel-only mode, output format depends on sink format: - s16: Q9.7 (current format, backwards compatible) - s24: Q9.15 (one int32_t per Mel value) - s32: Q9.23 (full precision, one int32_t per Mel value) The mel_log_32 scratch buffer is placed after power_spectra in the fft_buf scratch area. A bounds check is added in mfcc_setup() to fail if num_mel_bins exceeds the available scratch space. The decode_mel.m Octave script is updated with s24 and s32 format support for the changed output encoding. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Remove the MFCC_FFT_BITS == 16 code path from the MFCC component. The 16-bit FFT version's accuracy differed too much compared to reference audio features. Only 32-bit FFT is kept. This removes the MFCC_NORMALIZE_FFT logic (only needed for 16-bit), the fft_execute_16() and psy_apply_mel_filterbank_16() branches, the icomplex16 FFT buffer types, and the HiFi3/HiFi4/generic 16-bit mfcc_apply_window() and mfcc_normalize_fft_buffer() variants. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
When MFCC_FFT_BITS is 32, the HiFi3/4 mfcc_fill_fft_buffer() used AE_S16_0_XP to write 16-bit samples into 32-bit icomplex32 containers. This left the upper 16 bits of .real with stale data and .imag unzeroed, causing corrupted FFT input after the first frame when scratch buffers are reused for power_spectra and mel_log_32. Replace all platform-specific implementations with a single generic C version in mfcc_common.c. The function performs only data copying with no arithmetic, so HiFi intrinsics provide very little benefit. The new implementation uses int32_t pointer type with matching element stride, and relies on the caller's bzero of fft_buf to keep imaginary parts zero. Fix mel_log_32 scratch space check to use fft_buffer_size instead of assuming sizeof(icomplex32) per element, which overestimated available space by 2x. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
In 32-bit FFT mode the input data is 16-bit stored in the lower half of a 32-bit icomplex32 container. The AE_MULFP32X16X2RS_L intrinsic performs a Q1.31 x Q1.15 fractional multiply, so the 16-bit sample must first be shifted left by 16 to Q1.31 format. Without this shift the multiply treats the value as having 16 zero fractional bits, producing near-zero windowed output and a corrupt FFT result. Add the missing AE_SLAI32S(sample, 16) before the multiply in both HiFi3 and HiFi4 mfcc_apply_window() 32-bit paths, matching the generic C implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add missing cleanup for fft_plan. After mod_fft_plan_new() succeeds, failures in window setup and mel filterbank initialization jumped to free_fft_out, leaking the fft_plan. Add free_fft_plan label and route these error paths through it. Add missing cleanup for lifter.matrix. Late validation checks (mel_log_32 space, output capacity) jumped to free_dct_matrix, skipping the lifter matrix that may have been allocated. Add free_lifter label for these paths. Replace rfree() with mod_free() in all error cleanup labels to match the mod_zalloc() allocations and the existing mfcc_free_buffers() implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Refactor run_mfcc.sh into functions for input conversion and testbench execution to reduce code duplication. Add Xtensa testbench support when XTENSA_PATH environment variable is set, producing xt_ prefixed output files. Add decode_all.m Octave script to decode and plot all MFCC cepstral and Mel spectrogram output files from run_mfcc.sh, including Xtensa variants. Update README.txt to document the current run_mfcc.sh output files, Xtensa support, and decode_all.m usage. Export XTENSA_PATH in rebuild-testbench.sh so that run_mfcc.sh can find the Xtensa toolchain path for the testbench build. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The checks previously done in prepare() are done in the module adapter. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The module copier allows to branch the capture pipeline for different processing. In this patch series the module-copier is added to be able to run audio features extraction from the shared headset microphone endpoint. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add a new host-gateway-src-mfcc-capture pipeline class that chains SRC (48 kHz to 16 kHz) with the MFCC component for audio features extraction. Two new platform configuration files are added: - sdw-jack-audio-feature.conf: taps the SoundWire jack capture path (module-copier 11) into an SRC+MFCC pipeline (pipeline 130, PCM 47) - sdw-dmic-audio-feature.conf: taps the SoundWire DMIC capture path (module-copier 41) into an SRC+MFCC pipeline (pipeline 131, PCM 48) Both are gated by new IncludeByKey defines SDW_JACK_AUDIO_FEATURE_CAPTURE and SDW_DMIC_AUDIO_FEATURE_CAPTURE (default false) in cavs-sdw.conf. Development topology targets are added for MTL rt713 and ARL cs42l43+cs35l56 configurations with MFCC features capture enabled. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
5d61528 to
8c02462
Compare
| config COMP_MFCC | ||
| tristate "MFCC component" | ||
| depends on COMP_MODULE_ADAPTER | ||
| select CORDIC_FIXED | ||
| select MATH_16BIT_MEL_FILTERBANK | ||
| select MATH_32BIT_MEL_FILTERBANK | ||
| select MATH_AUDITORY | ||
| select MATH_DCT |
This PR contains more updates for MFCC.