app_confbridge: Update dsp_silence_threshold and dsp_talking_threshold docs.

The dsp_talking_threshold does not represent time in milliseconds. It represents the average magnitude per sample in the audio packets. This is what the DSP uses to determine if a packet is silence or talking/noise. Change-Id: If6f939c100eb92a5ac6c21236559018eeaf58443
2025-09-06 12:36:58 +00:00 · 2018-01-30 15:00:32 -06:00
parent 6c5e3226ec
commit b9024197ab
7 changed files with 151 additions and 110 deletions
--- a/apps/confbridge/conf_config_parser.c
+++ b/apps/confbridge/conf_config_parser.c
@@ -144,72 +144,66 @@
 					</para></description>
 				</configOption>
 				<configOption name="dsp_silence_threshold">
-					<synopsis>The number of milliseconds of detected silence necessary to trigger silence detection</synopsis>
+					<synopsis>The number of milliseconds of silence necessary to declare talking stopped.</synopsis>
-					<description><para>
+					<description>
-					The time in milliseconds of sound falling within the what
+						<para>The time in milliseconds of sound falling below the
-					the dsp has established as baseline silence before a user
+						<replaceable>dsp_talking_threshold</replaceable> option when
-					is considered be silent.  This value affects several
+						a user is considered to stop talking.  This value affects several
-					operations and should not be changed unless the impact
+						operations and should not be changed unless the impact on call
-					on call quality is fully understood.</para>
+						quality is fully understood.
-					<para>What this value affects internally:</para>
+						</para>
-					<para>
+						<para>What this value affects internally:
-						1. When talk detection AMI events are enabled, this value
+						</para>
 						<para>1. When talk detection AMI events are enabled, this value
 						determines when the user has stopped talking after a
 						period of talking.  If this value is set too low
 						AMI events indicating the user has stopped talking
 						may get falsely sent out when the user briefly pauses
 						during mid sentence.
-					</para>
+						</para>
-					<para>
+						<para>2. The <replaceable>drop_silence</replaceable> option
-						2. The <replaceable>drop_silence</replaceable> option depends on this value to
+						depends on this value to determine when the user's audio should
-						determine when the user's audio should begin to be
+						begin to be dropped from the conference bridge after the user
 						dropped from the conference bridge after the user
 						stops talking.  If this value is set too low the user's
-						audio stream may sound choppy to the other participants.
+						audio stream may sound choppy to the other participants.  This
-						This is caused by the user transitioning constantly from
+						is caused by the user transitioning constantly from silence to
-						silence to talking during mid sentence.
+						talking during mid sentence.
-					</para>
+						</para>
-					<para>
+						<para>The best way to approach this option is to set it slightly
-						The best way to approach this option is to set it slightly above
+						above the maximum amount of milliseconds of silence a user may
-						the maximum amount of ms of silence a user may generate during
+						generate during natural speech.
-						natural speech.
+						</para>
-					</para>
+						<para>Valid values are 1 through 2^31.</para>
 					<para>By default this value is 2500ms. Valid values are 1 through 2^31.</para>
 					</description>
 				</configOption>
 				<configOption name="dsp_talking_threshold">
-					<synopsis>The number of milliseconds of detected non-silence necessary to triger talk detection</synopsis>
+					<synopsis>Average magnitude threshold to determine talking.</synopsis>
-					<description><para>
+					<description>
-						The time in milliseconds of sound above what the dsp has
+						<para>The minimum average magnitude per sample in a frame
-						established as base line silence for a user before a user
+						for the DSP to consider talking/noise present.  A value below
-						is considered to be talking.  This value affects several
+						this level is considered silence.  This value affects several
-						operations and should not be changed unless the impact on
+						operations and should not be changed unless the impact on call
-						call quality is fully understood.</para>
+						quality is fully understood.
 						<para>
 						What this value affects internally:
 						</para>
-						<para>
+						<para>What this value affects internally:
 						1. Audio is only mixed out of a user's incoming audio stream
 						if talking is detected.  If this value is set too
 						loose the user will hear themselves briefly each
 						time they begin talking until the dsp has time to
 						establish that they are in fact talking.
 						</para>
-						<para>
+						<para>1. Audio is only mixed out of a user's incoming audio
-						2. When talk detection AMI events are enabled, this value
+						stream if talking is detected.  If this value is set too
 						high the user will hear himself talking.
 						</para>
 						<para>2. When talk detection AMI events are enabled, this value
 						determines when talking has begun which results in
-						an AMI event to fire.  If this value is set too tight
+						an AMI event to fire.  If this value is set too low
 						AMI events may be falsely triggered by variants in
 						room noise.
 						</para>
-						<para>
+						<para>3. The <replaceable>drop_silence</replaceable> option
-						3. The <replaceable>drop_silence</replaceable> option depends on this value to determine
+						depends on this value to determine when the user's audio should
-						when the user's audio should be mixed into the bridge
+						be mixed into the bridge after periods of silence.  If this value
-						after periods of silence.  If this value is too loose
+						is too high the user's speech will get discarded as they will
-						the beginning of a user's speech will get cut off as they
+						be considered silent.
 						transition from silence to talking.
 						</para>
-						<para>By default this value is 160 ms. Valid values are 1 through 2^31</para>
+						<para>Valid values are 1 through 2^15.</para>
 					</description>
 				</configOption>
 				<configOption name="jitterbuffer">
@@ -1479,7 +1473,7 @@ static char *handle_cli_confbridge_show_user_profile(struct ast_cli_entry *e, in
 		"enabled" : "disabled");
 	ast_cli(a->fd,"Silence Threshold:       %ums\n",
 		u_profile.silence_threshold);
-	ast_cli(a->fd,"Talking Threshold:       %ums\n",
+	ast_cli(a->fd,"Talking Threshold:       %u\n",
 		u_profile.talking_threshold);
 	ast_cli(a->fd,"Denoise:                 %s\n",
 		u_profile.flags & USER_OPT_DENOISE ?
--- a/apps/confbridge/include/confbridge.h
+++ b/apps/confbridge/include/confbridge.h
@@ -41,7 +41,10 @@
 #define DEFAULT_BRIDGE_PROFILE "default_bridge"
 #define DEFAULT_MENU_PROFILE "default_menu"
 /*! Default minimum average magnitude threshold to determine talking by the DSP. */
 #define DEFAULT_TALKING_THRESHOLD 160
 /*! Default time in ms of silence necessary to declare talking stopped by the bridge. */
 #define DEFAULT_SILENCE_THRESHOLD 2500
 enum user_profile_flags {
@@ -140,9 +143,9 @@ struct user_profile {
 	char announcement[PATH_MAX];
 	unsigned int flags;
 	unsigned int announce_user_count_all_after;
-	/*! The time in ms of talking before a user is considered to be talking by the dsp. */
+	/*! Minimum average magnitude threshold to determine talking by the DSP. */
 	unsigned int talking_threshold;
-	/*! The time in ms of silence before a user is considered to be silent by the dsp. */
+	/*! Time in ms of silence necessary to declare talking stopped by the bridge. */
 	unsigned int silence_threshold;
 	/*! The time in ms the user may stay in the confbridge */
 	unsigned int timeout;
--- a/bridges/bridge_softmix.c
+++ b/bridges/bridge_softmix.c
@@ -53,9 +53,16 @@
 /*! \brief Number of mixing iterations to perform between gathering statistics. */
 #define SOFTMIX_STAT_INTERVAL 100
-/* This is the threshold in ms at which a channel's own audio will stop getting
+/*!
- * mixed out its own write audio stream because it is not talking. */
+ * \brief Default time in ms of silence necessary to declare talking stopped by the bridge.
 *
 * \details
 * This is the time at which a channel's own audio will stop getting
 * mixed out of its own write audio stream because it is no longer talking.
 */
 #define DEFAULT_SOFTMIX_SILENCE_THRESHOLD 2500
 /*! Default minimum average magnitude threshold to determine talking by the DSP. */
 #define DEFAULT_SOFTMIX_TALKING_THRESHOLD 160
 #define SOFTBRIDGE_VIDEO_DEST_PREFIX "softbridge_dest"
--- a/configs/samples/confbridge.conf.sample
+++ b/configs/samples/confbridge.conf.sample
@@ -49,59 +49,67 @@ type=user
                       ; noise from the conference. Highly recommended for large conferences
                       ; due to its performance enhancements.
-;dsp_talking_threshold=128  ; The time in milliseconds of sound above what the dsp has
+;dsp_talking_threshold=128  ; Average magnitude threshold to determine talking.
-                            ; established as base line silence for a user before a user
+                            ;
-                            ; is considered to be talking.  This value affects several
+                            ; The minimum average magnitude per sample in a frame for the
                            ; DSP to consider talking/noise present.  A value below this
                            ; level is considered silence.  This value affects several
                            ; operations and should not be changed unless the impact on
                            ; call quality is fully understood.
                            ;
                            ; What this value affects internally:
                            ;
-                            ; 1. Audio is only mixed out of a user's incoming audio stream
+                            ; 1. Audio is only mixed out of a user's incoming audio
-                            ;    if talking is detected.  If this value is set too
+                            ;    stream if talking is detected.  If this value is set too
-                            ;    loose the user will hear themselves briefly each
+                            ;    high the user will hear himself talking.
                            ;    time they begin talking until the dsp has time to
                            ;    establish that they are in fact talking.
                            ; 2. When talk detection AMI events are enabled, this value
                            ;    determines when talking has begun which results in
                            ;    an AMI event to fire.  If this value is set too tight
                            ;    AMI events may be falsely triggered by variants in
                            ;    room noise.
                            ; 3. The drop_silence option depends on this value to determine
                            ;    when the user's audio should be mixed into the bridge
                            ;    after periods of silence.  If this value is too loose
                            ;    the beginning of a user's speech will get cut off as they
                            ;    transition from silence to talking.
                            ;
-                            ; By default this value is 160 ms. Valid values are 1 through 2^31
+                            ; 2. When talk detection AMI events are enabled, this value
                            ;    determines when talking has begun which results in an
                            ;    AMI event to fire.  If this value is set too low AMI
                            ;    events may be falsely triggered by variants in room
                            ;    noise.
                            ;
                            ; 3. The 'drop_silence' option depends on this value to
                            ;    determine when the user's audio should be mixed into the
                            ;    bridge after periods of silence.  If this value is too
                            ;    high the user's speech will get discarded as they will
                            ;    be considered silent.
                            ;
                            ; Valid values are 1 through 2^15.
                            ; By default this value is 160.
-;dsp_silence_threshold=2000 ; The time in milliseconds of sound falling within the what
+;dsp_silence_threshold=2000 ; The number of milliseconds of silence necessary to declare
-                            ; the dsp has established as baseline silence before a user
+                            ; talking stopped.
-                            ; is considered be silent.  This value affects several
+                            ;
-                            ; operations and should not be changed unless the impact
+                            ; The time in milliseconds of sound falling below the
-                            ; on call quality is fully understood.
+                            ; 'dsp_talking_threshold' option when a user is considered to
                            ; stop talking.  This value affects several operations and
                            ; should not be changed unless the impact on call quality is
                            ; fully understood.
                            ;
                            ; What this value affects internally:
                            ;
                            ; 1. When talk detection AMI events are enabled, this value
                            ;    determines when the user has stopped talking after a
-                            ;    period of talking.  If this value is set too low
+                            ;    period of talking.  If this value is set too low AMI
-                            ;    AMI events indicating the user has stopped talking
+                            ;    events indicating the user has stopped talking may get
-                            ;    may get falsely sent out when the user briefly pauses
+                            ;    falsely sent out when the user briefly pauses during mid
-                            ;    during mid sentence.
+                            ;    sentence.
-                            ; 2. The drop_silence option depends on this value to
+                            ;
                            ; 2. The 'drop_silence' option depends on this value to
                            ;    determine when the user's audio should begin to be
-                            ;    dropped from the conference bridge after the user
+                            ;    dropped from the conference bridge after the user stops
-                            ;    stops talking.  If this value is set too low the user's
+                            ;    talking.  If this value is set too low the user's audio
-                            ;    audio stream may sound choppy to the other participants.
+                            ;    stream may sound choppy to the other participants.  This
-                            ;    This is caused by the user transitioning constantly from
+                            ;    is caused by the user transitioning constantly from
                            ;    silence to talking during mid sentence.
                            ;
-                            ; The best way to approach this option is to set it slightly above
+                            ; The best way to approach this option is to set it slightly
-                            ; the maximum amount of ms of silence a user may generate during
+                            ; above the maximum amount of milliseconds of silence a user
-                            ; natural speech.
+                            ; may generate during natural speech.
                            ;
-                            ; By default this value is 2500ms. Valid values are 1 through 2^31
+                            ; Valid values are 1 through 2^31.
                            ; By default this value is 2500ms.
 ;talk_detection_events=yes ; This option sets whether or not notifications of when a user
                           ; begins and ends talking should be sent out as events over AMI.
--- a/include/asterisk/bridge_technology.h
+++ b/include/asterisk/bridge_technology.h
@@ -46,11 +46,9 @@ enum ast_bridge_preference {
 * performing talking optimizations.
 */
 struct ast_bridge_tech_optimizations {
-	/*! The amount of time in ms that talking must be detected before
+	/*! Minimum average magnitude threshold to determine talking by the DSP. */
 	 *  the dsp determines that talking has occurred */
 	unsigned int talking_threshold;
-	/*! The amount of time in ms that silence must be detected before
+	/*! Time in ms of silence necessary to declare talking stopped by the bridge. */
 	 *  the dsp determines that talking has stopped */
 	unsigned int silence_threshold;
 	/*! Whether or not the bridging technology should drop audio
 	 *  detected as silence from the mix. */
--- a/include/asterisk/dsp.h
+++ b/include/asterisk/dsp.h
@@ -87,7 +87,7 @@ void ast_dsp_free(struct ast_dsp *dsp);
 * created with */
 unsigned int ast_dsp_get_sample_rate(const struct ast_dsp *dsp);
-/*! \brief Set threshold value for silence */
+/*! \brief Set the minimum average magnitude threshold to determine talking by the DSP. */
 void ast_dsp_set_threshold(struct ast_dsp *dsp, int threshold);
 /*! \brief Set number of required cadences for busy */
@@ -106,19 +106,41 @@ int ast_dsp_set_call_progress_zone(struct ast_dsp *dsp, char *zone);
   busies, and call progress, all dependent upon which features are enabled */
 struct ast_frame *ast_dsp_process(struct ast_channel *chan, struct ast_dsp *dsp, struct ast_frame *inf);
-/*! \brief Return non-zero if this is silence.  Updates "totalsilence" with the total
+/*!
-   number of seconds of silence  */
+ * \brief Process the audio frame for silence.
 *
 * \param dsp DSP processing audio media.
 * \param f Audio frame to process.
 * \param totalsilence Variable to set to the total accumulated silence in ms
 * seen by the DSP since the last noise.
 *
 * \return Non-zero if the frame is silence.
 */
 int ast_dsp_silence(struct ast_dsp *dsp, struct ast_frame *f, int *totalsilence);
-/*! \brief Return non-zero if this is silence.  Updates "totalsilence" with the total
+/*!
-   number of seconds of silence. Returns the average energy of the samples in the frame
+ * \brief Process the audio frame for silence.
-   in frames_energy variable. */
+ *
 * \param dsp DSP processing audio media.
 * \param f Audio frame to process.
 * \param totalsilence Variable to set to the total accumulated silence in ms
 * seen by the DSP since the last noise.
 * \param frames_energy Variable to set to the average energy of the samples in the frame.
 *
 * \return Non-zero if the frame is silence.
 */
 int ast_dsp_silence_with_energy(struct ast_dsp *dsp, struct ast_frame *f, int *totalsilence, int *frames_energy);
 /*!
- * \brief Return non-zero if this is noise.  Updates "totalnoise" with the total
+ * \brief Process the audio frame for noise.
 * number of seconds of noise
 * \since 1.6.1
 *
 * \param dsp DSP processing audio media.
 * \param f Audio frame to process.
 * \param totalnoise Variable to set to the total accumulated noise in ms
 * seen by the DSP since the last silence.
 *
 * \return Non-zero if the frame is silence.
 */
 int ast_dsp_noise(struct ast_dsp *dsp, struct ast_frame *f, int *totalnoise);
--- a/main/dsp.c
+++ b/main/dsp.c
@@ -122,12 +122,19 @@ static struct progress {
 	{ GSAMP_SIZE_UK, { 350, 400, 440 } },				/*!< UK */
 };
-/*!\brief This value is the minimum threshold, calculated by averaging all
+/*!
- * of the samples within a frame, for which a frame is determined to either
+ * \brief Default minimum average magnitude threshold to determine talking/noise by the DSP.
- * be silence (below the threshold) or noise (above the threshold).  Please
+ *
- * note that while the default threshold is an even exponent of 2, there is
+ * \details
- * no requirement that it be so.  The threshold will accept any value between
+ * The magnitude calculated for this threshold is determined by
- * 0 and 32767.
+ * averaging the absolute value of all samples within a frame.
 *
 * This value is the threshold for which a frame's average magnitude
 * is determined to either be silence (below the threshold) or
 * noise/talking (at or above the threshold).  Please note that while
 * the default threshold is an even exponent of 2, there is no
 * requirement that it be so.  The threshold will work for any value
 * between 1 and 2^15.
 */
 #define DEFAULT_THRESHOLD	512
@@ -397,7 +404,9 @@ typedef struct {
 struct ast_dsp {
 	struct ast_frame f;
 	int threshold;
 	/*! Accumulated total silence in ms since last talking/noise. */
 	int totalsilence;
 	/*! Accumulated total talking/noise in ms since last silence. */
 	int totalnoise;
 	int features;
 	int ringtimeout;