Faculty of Engineering


Research Highlights

Separating Source Signals from Noisy Mixed Signals in Reverberant Environments

Separating a source signal from a mixture of multiple signals is a long-term research topic in the field of signal processing, and one that is still under study today.

In the world of acoustic research, the word “cocktail party problem” is famous. It involves a scenario where several talkers are speaking simultaneously, but we want to separate person’s speech onto different audio channels. For that purpose, researchers have been trying to separate each source from multichannel signals recorded by several microphones. Such separation is indispensable in many applications, such as automatic speech recognition and surveillance systems with audio recognition capabilities. This is because current AI-based recognition systems are weak in recognizing multiple, overlapped voices.

One solution to this problem is “beam forming.” By adding slight appropriate delays to multi-microphone signals and summing them up, sound recording with directional sensitivity can be achieved. However, the separation capability of this method is not sufficient for speech in an ordinary room. One reason is that sensitivity control is limited, especially at low frequencies. Another reason is that wavefronts from a source arrive at a microphone from many directions due to sound reflection of walls and other reverberations. More elaborate beam forming was developed to takes account of all such reflections by using pre-measured environmental data called relative transfer functions (RTF).

For the last two decades, blind source separation (BSS) has been studied for estimating sound source signals only from multi-microphone signals. BSS achieved great success in non-reverberant environments. Though this research has made progress thanks to many ideas, such as frequency-domain BSS, formulation based on likelihood, and permutation alignment, the separation performance of BSS has still remained insufficient in reverberant environments.

However, it has recently been discovered that beam forming can be further improved by combining a sparse signal processing approach, which is utilized in Dr. Emura’s research. Furthermore, strictly applying a probabilistic model of source signals included in sparse signal processing turned out to significantly improve the separation performance of BSS under reverberant conditions.

As in most signal processing methods, the separation performance of BSS significantly degrades in noisy environments. Thus, Dr. Emura is trying to further improve BSS methods to handle noisy environments by means of theories and techniques of constrained optimization, probabilistic modeling, and machine learning.