12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574 |
- """
- The AudioToTextRecorder class in the provided code facilitates
- fast speech-to-text transcription.
- The class employs the faster_whisper library to transcribe the recorded audio
- into text using machine learning models, which can be run either on a GPU or
- CPU. Voice activity detection (VAD) is built in, meaning the software can
- automatically start or stop recording based on the presence or absence of
- speech. It integrates wake word detection through the pvporcupine library,
- allowing the software to initiate recording when a specific word or phrase
- is spoken. The system provides real-time feedback and can be further
- customized.
- Features:
- - Voice Activity Detection: Automatically starts/stops recording when speech
- is detected or when speech ends.
- - Wake Word Detection: Starts recording when a specified wake word (or words)
- is detected.
- - Event Callbacks: Customizable callbacks for when recording starts
- or finishes.
- - Fast Transcription: Returns the transcribed text from the audio as fast
- as possible.
- Author: Kolja Beigel
- """
- import torch.multiprocessing as mp
- from typing import List, Union
- import faster_whisper
- import collections
- import numpy as np
- import pvporcupine
- import traceback
- import threading
- import webrtcvad
- import itertools
- import pyaudio
- import logging
- import struct
- import torch
- import halo
- import time
- import os
- import re
- import gc
- INIT_MODEL_TRANSCRIPTION = "tiny"
- INIT_MODEL_TRANSCRIPTION_REALTIME = "tiny"
- INIT_REALTIME_PROCESSING_PAUSE = 0.2
- INIT_SILERO_SENSITIVITY = 0.4
- INIT_WEBRTC_SENSITIVITY = 3
- INIT_POST_SPEECH_SILENCE_DURATION = 0.6
- INIT_MIN_LENGTH_OF_RECORDING = 0.5
- INIT_MIN_GAP_BETWEEN_RECORDINGS = 0
- INIT_WAKE_WORDS_SENSITIVITY = 0.6
- INIT_PRE_RECORDING_BUFFER_DURATION = 1.0
- INIT_WAKE_WORD_ACTIVATION_DELAY = 0.0
- INIT_WAKE_WORD_TIMEOUT = 5.0
- ALLOWED_LATENCY_LIMIT = 10
- TIME_SLEEP = 0.02
- SAMPLE_RATE = 16000
- BUFFER_SIZE = 512
- INT16_MAX_ABS_VALUE = 32768.0
- class AudioToTextRecorder:
- """
- A class responsible for capturing audio from the microphone, detecting
- voice activity, and then transcribing the captured audio using the
- `faster_whisper` model.
- """
- def __init__(self,
- model: str = INIT_MODEL_TRANSCRIPTION,
- language: str = "",
- compute_type: str = "default",
- input_device_index: int = 0,
- gpu_device_index: Union[int, List[int]] = 0,
- on_recording_start=None,
- on_recording_stop=None,
- on_transcription_start=None,
- ensure_sentence_starting_uppercase=True,
- ensure_sentence_ends_with_period=True,
- use_microphone=True,
- spinner=True,
- level=logging.WARNING,
- # Realtime transcription parameters
- enable_realtime_transcription=False,
- realtime_model_type=INIT_MODEL_TRANSCRIPTION_REALTIME,
- realtime_processing_pause=INIT_REALTIME_PROCESSING_PAUSE,
- on_realtime_transcription_update=None,
- on_realtime_transcription_stabilized=None,
- # Voice activation parameters
- silero_sensitivity: float = INIT_SILERO_SENSITIVITY,
- silero_use_onnx: bool = False,
- webrtc_sensitivity: int = INIT_WEBRTC_SENSITIVITY,
- post_speech_silence_duration: float = (
- INIT_POST_SPEECH_SILENCE_DURATION
- ),
- min_length_of_recording: float = (
- INIT_MIN_LENGTH_OF_RECORDING
- ),
- min_gap_between_recordings: float = (
- INIT_MIN_GAP_BETWEEN_RECORDINGS
- ),
- pre_recording_buffer_duration: float = (
- INIT_PRE_RECORDING_BUFFER_DURATION
- ),
- on_vad_detect_start=None,
- on_vad_detect_stop=None,
- # Wake word parameters
- wake_words: str = "",
- wake_words_sensitivity: float = INIT_WAKE_WORDS_SENSITIVITY,
- wake_word_activation_delay: float = (
- INIT_WAKE_WORD_ACTIVATION_DELAY
- ),
- wake_word_timeout: float = INIT_WAKE_WORD_TIMEOUT,
- on_wakeword_detected=None,
- on_wakeword_timeout=None,
- on_wakeword_detection_start=None,
- on_wakeword_detection_end=None,
- ):
- """
- Initializes an audio recorder and transcription
- and wake word detection.
- Args:
- - model (str, default="tiny"): Specifies the size of the transcription
- model to use or the path to a converted model directory.
- Valid options are 'tiny', 'tiny.en', 'base', 'base.en',
- 'small', 'small.en', 'medium', 'medium.en', 'large-v1',
- 'large-v2'.
- If a specific size is provided, the model is downloaded
- from the Hugging Face Hub.
- - language (str, default=""): Language code for speech-to-text engine.
- If not specified, the model will attempt to detect the language
- automatically.
- - compute_type (str, default="default"): Specifies the type of
- computation to be used for transcription.
- See https://opennmt.net/CTranslate2/quantization.html.
- - input_device_index (int, default=0): The index of the audio input
- device to use.
- - gpu_device_index (int, default=0): Device ID to use.
- The model can also be loaded on multiple GPUs by passing a list of
- IDs (e.g. [0, 1, 2, 3]). In that case, multiple transcriptions can
- run in parallel when transcribe() is called from multiple Python
- threads
- - on_recording_start (callable, default=None): Callback function to be
- called when recording of audio to be transcripted starts.
- - on_recording_stop (callable, default=None): Callback function to be
- called when recording of audio to be transcripted stops.
- - on_transcription_start (callable, default=None): Callback function
- to be called when transcription of audio to text starts.
- - ensure_sentence_starting_uppercase (bool, default=True): Ensures
- that every sentence detected by the algorithm starts with an
- uppercase letter.
- - ensure_sentence_ends_with_period (bool, default=True): Ensures that
- every sentence that doesn't end with punctuation such as "?", "!"
- ends with a period
- - use_microphone (bool, default=True): Specifies whether to use the
- microphone as the audio input source. If set to False, the
- audio input source will be the audio data sent through the
- feed_audio() method.
- - spinner (bool, default=True): Show spinner animation with current
- state.
- - level (int, default=logging.WARNING): Logging level.
- - enable_realtime_transcription (bool, default=False): Enables or
- disables real-time transcription of audio. When set to True, the
- audio will be transcribed continuously as it is being recorded.
- - realtime_model_type (str, default="tiny"): Specifies the machine
- learning model to be used for real-time transcription. Valid
- options include 'tiny', 'tiny.en', 'base', 'base.en', 'small',
- 'small.en', 'medium', 'medium.en', 'large-v1', 'large-v2'.
- - realtime_processing_pause (float, default=0.1): Specifies the time
- interval in seconds after a chunk of audio gets transcribed. Lower
- values will result in more "real-time" (frequent) transcription
- updates but may increase computational load.
- - on_realtime_transcription_update = A callback function that is
- triggered whenever there's an update in the real-time
- transcription. The function is called with the newly transcribed
- text as its argument.
- - on_realtime_transcription_stabilized = A callback function that is
- triggered when the transcribed text stabilizes in quality. The
- stabilized text is generally more accurate but may arrive with a
- slight delay compared to the regular real-time updates.
- - silero_sensitivity (float, default=SILERO_SENSITIVITY): Sensitivity
- for the Silero Voice Activity Detection model ranging from 0
- (least sensitive) to 1 (most sensitive). Default is 0.5.
- - silero_use_onnx (bool, default=False): Enables usage of the
- pre-trained model from Silero in the ONNX (Open Neural Network
- Exchange) format instead of the PyTorch format. This is
- recommended for faster performance.
- - webrtc_sensitivity (int, default=WEBRTC_SENSITIVITY): Sensitivity
- for the WebRTC Voice Activity Detection engine ranging from 0
- (least aggressive / most sensitive) to 3 (most aggressive,
- least sensitive). Default is 3.
- - post_speech_silence_duration (float, default=0.2): Duration in
- seconds of silence that must follow speech before the recording
- is considered to be completed. This ensures that any brief
- pauses during speech don't prematurely end the recording.
- - min_gap_between_recordings (float, default=1.0): Specifies the
- minimum time interval in seconds that should exist between the
- end of one recording session and the beginning of another to
- prevent rapid consecutive recordings.
- - min_length_of_recording (float, default=1.0): Specifies the minimum
- duration in seconds that a recording session should last to ensure
- meaningful audio capture, preventing excessively short or
- fragmented recordings.
- - pre_recording_buffer_duration (float, default=0.2): Duration in
- seconds for the audio buffer to maintain pre-roll audio
- (compensates speech activity detection latency)
- - on_vad_detect_start (callable, default=None): Callback function to
- be called when the system listens for voice activity.
- - on_vad_detect_stop (callable, default=None): Callback function to be
- called when the system stops listening for voice activity.
- - wake_words (str, default=""): Comma-separated string of wake words to
- initiate recording. Supported wake words include:
- 'alexa', 'americano', 'blueberry', 'bumblebee', 'computer',
- 'grapefruits', 'grasshopper', 'hey google', 'hey siri', 'jarvis',
- 'ok google', 'picovoice', 'porcupine', 'terminator'.
- - wake_words_sensitivity (float, default=0.5): Sensitivity for wake
- word detection, ranging from 0 (least sensitive) to 1 (most
- sensitive). Default is 0.5.
- - wake_word_activation_delay (float, default=0): Duration in seconds
- after the start of monitoring before the system switches to wake
- word activation if no voice is initially detected. If set to
- zero, the system uses wake word activation immediately.
- - wake_word_timeout (float, default=5): Duration in seconds after a
- wake word is recognized. If no subsequent voice activity is
- detected within this window, the system transitions back to an
- inactive state, awaiting the next wake word or voice activation.
- - on_wakeword_detected (callable, default=None): Callback function to
- be called when a wake word is detected.
- - on_wakeword_timeout (callable, default=None): Callback function to
- be called when the system goes back to an inactive state after when
- no speech was detected after wake word activation
- - on_wakeword_detection_start (callable, default=None): Callback
- function to be called when the system starts to listen for wake
- words
- - on_wakeword_detection_end (callable, default=None): Callback
- function to be called when the system stops to listen for
- wake words (e.g. because of timeout or wake word detected)
- Raises:
- Exception: Errors related to initializing transcription
- model, wake word detection, or audio recording.
- """
- self.language = language
- self.compute_type = compute_type
- self.input_device_index = input_device_index
- self.gpu_device_index = gpu_device_index
- self.wake_words = wake_words
- self.wake_word_activation_delay = wake_word_activation_delay
- self.wake_word_timeout = wake_word_timeout
- self.ensure_sentence_starting_uppercase = (
- ensure_sentence_starting_uppercase
- )
- self.ensure_sentence_ends_with_period = (
- ensure_sentence_ends_with_period
- )
- self.use_microphone = use_microphone
- self.min_gap_between_recordings = min_gap_between_recordings
- self.min_length_of_recording = min_length_of_recording
- self.pre_recording_buffer_duration = pre_recording_buffer_duration
- self.post_speech_silence_duration = post_speech_silence_duration
- self.on_recording_start = on_recording_start
- self.on_recording_stop = on_recording_stop
- self.on_wakeword_detected = on_wakeword_detected
- self.on_wakeword_timeout = on_wakeword_timeout
- self.on_vad_detect_start = on_vad_detect_start
- self.on_vad_detect_stop = on_vad_detect_stop
- self.on_wakeword_detection_start = on_wakeword_detection_start
- self.on_wakeword_detection_end = on_wakeword_detection_end
- self.on_transcription_start = on_transcription_start
- self.enable_realtime_transcription = enable_realtime_transcription
- self.realtime_model_type = realtime_model_type
- self.realtime_processing_pause = realtime_processing_pause
- self.on_realtime_transcription_update = (
- on_realtime_transcription_update
- )
- self.on_realtime_transcription_stabilized = (
- on_realtime_transcription_stabilized
- )
- self.allowed_latency_limit = ALLOWED_LATENCY_LIMIT
- self.level = level
- self.audio_queue = mp.Queue()
- self.buffer_size = BUFFER_SIZE
- self.sample_rate = SAMPLE_RATE
- self.recording_start_time = 0
- self.recording_stop_time = 0
- self.wake_word_detect_time = 0
- self.silero_check_time = 0
- self.silero_working = False
- self.speech_end_silence_start = 0
- self.silero_sensitivity = silero_sensitivity
- self.listen_start = 0
- self.spinner = spinner
- self.halo = None
- self.state = "inactive"
- self.wakeword_detected = False
- self.text_storage = []
- self.realtime_stabilized_text = ""
- self.realtime_stabilized_safetext = ""
- self.is_webrtc_speech_active = False
- self.is_silero_speech_active = False
- self.recording_thread = None
- self.realtime_thread = None
- self.audio_interface = None
- self.audio = None
- self.stream = None
- self.start_recording_event = threading.Event()
- self.stop_recording_event = threading.Event()
- # Initialize the logging configuration with the specified level
- log_format = 'RealTimeSTT: %(name)s - %(levelname)s - %(message)s'
- # Create a logger
- logger = logging.getLogger()
- logger.setLevel(level) # Set the root logger's level
- # Create a file handler and set its level
- file_handler = logging.FileHandler('realtimesst.log')
- file_handler.setLevel(logging.DEBUG)
- file_handler.setFormatter(logging.Formatter(log_format))
- # Create a console handler and set its level
- console_handler = logging.StreamHandler()
- console_handler.setLevel(level)
- console_handler.setFormatter(logging.Formatter(log_format))
- # Add the handlers to the logger
- logger.addHandler(file_handler)
- logger.addHandler(console_handler)
- self.is_shut_down = False
- self.shutdown_event = mp.Event()
- logging.info("Starting RealTimeSTT")
- # Start transcription worker process
- try:
- # Only set the start method if it hasn't been set already
- if mp.get_start_method(allow_none=True) is None:
- mp.set_start_method("spawn")
- except RuntimeError as e:
- print("Start method has already been set. Details:", e)
- self.interrupt_stop_event = mp.Event()
- self.was_interrupted = mp.Event()
- self.main_transcription_ready_event = mp.Event()
- self.parent_transcription_pipe, child_transcription_pipe = mp.Pipe()
- self.transcript_process = mp.Process(
- target=AudioToTextRecorder._transcription_worker,
- args=(
- child_transcription_pipe,
- model,
- self.compute_type,
- self.gpu_device_index,
- self.main_transcription_ready_event,
- self.shutdown_event,
- self.interrupt_stop_event
- )
- )
- self.transcript_process.start()
- # Start audio data reading process
- if use_microphone:
- self.reader_process = mp.Process(
- target=AudioToTextRecorder._audio_data_worker,
- args=(
- self.audio_queue,
- self.sample_rate,
- self.buffer_size,
- self.input_device_index,
- self.shutdown_event,
- self.interrupt_stop_event
- )
- )
- self.reader_process.start()
- # Initialize the realtime transcription model
- if self.enable_realtime_transcription:
- try:
- logging.info("Initializing faster_whisper realtime "
- f"transcription model {self.realtime_model_type}"
- )
- self.realtime_model_type = faster_whisper.WhisperModel(
- model_size_or_path=self.realtime_model_type,
- device='cuda' if torch.cuda.is_available() else 'cpu',
- compute_type=self.compute_type,
- device_index=self.gpu_device_index
- )
- except Exception as e:
- logging.exception("Error initializing faster_whisper "
- f"realtime transcription model: {e}"
- )
- raise
- logging.debug("Faster_whisper realtime speech to text "
- "transcription model initialized successfully")
- # Setup wake word detection
- if wake_words:
- self.wake_words_list = [
- word.strip() for word in wake_words.lower().split(',')
- ]
- sensitivity_list = [
- float(wake_words_sensitivity)
- for _ in range(len(self.wake_words_list))
- ]
- try:
- self.porcupine = pvporcupine.create(
- keywords=self.wake_words_list,
- sensitivities=sensitivity_list
- )
- self.buffer_size = self.porcupine.frame_length
- self.sample_rate = self.porcupine.sample_rate
- except Exception as e:
- logging.exception("Error initializing porcupine "
- f"wake word detection engine: {e}"
- )
- raise
- logging.debug("Porcupine wake word detection "
- "engine initialized successfully"
- )
- # Setup voice activity detection model WebRTC
- try:
- logging.info("Initializing WebRTC voice with "
- f"Sensitivity {webrtc_sensitivity}"
- )
- self.webrtc_vad_model = webrtcvad.Vad()
- self.webrtc_vad_model.set_mode(webrtc_sensitivity)
- except Exception as e:
- logging.exception("Error initializing WebRTC voice "
- f"activity detection engine: {e}"
- )
- raise
- logging.debug("WebRTC VAD voice activity detection "
- "engine initialized successfully"
- )
- # Setup voice activity detection model Silero VAD
- try:
- self.silero_vad_model, _ = torch.hub.load(
- repo_or_dir="snakers4/silero-vad",
- model="silero_vad",
- verbose=False,
- onnx=silero_use_onnx
- )
- except Exception as e:
- logging.exception(f"Error initializing Silero VAD "
- f"voice activity detection engine: {e}"
- )
- raise
- logging.debug("Silero VAD voice activity detection "
- "engine initialized successfully"
- )
- self.audio_buffer = collections.deque(
- maxlen=int((self.sample_rate // self.buffer_size) *
- self.pre_recording_buffer_duration)
- )
- self.frames = []
- # Recording control flags
- self.is_recording = False
- self.is_running = True
- self.start_recording_on_voice_activity = False
- self.stop_recording_on_voice_deactivity = False
- # Start the recording worker thread
- self.recording_thread = threading.Thread(target=self._recording_worker)
- self.recording_thread.daemon = True
- self.recording_thread.start()
- # Start the realtime transcription worker thread
- self.realtime_thread = threading.Thread(target=self._realtime_worker)
- self.realtime_thread.daemon = True
- self.realtime_thread.start()
- # Wait for transcription models to start
- logging.debug('Waiting for main transcription model to start')
- self.main_transcription_ready_event.wait()
- logging.debug('Main transcription model ready')
- logging.debug('RealtimeSTT initialization completed successfully')
- @staticmethod
- def _transcription_worker(conn,
- model_path,
- compute_type,
- gpu_device_index,
- ready_event,
- shutdown_event,
- interrupt_stop_event):
- """
- Worker method that handles the continuous
- process of transcribing audio data.
- This method runs in a separate process and is responsible for:
- - Initializing the `faster_whisper` model used for transcription.
- - Receiving audio data sent through a pipe and using the model
- to transcribe it.
- - Sending transcription results back through the pipe.
- - Continuously checking for a shutdown event to gracefully
- terminate the transcription process.
- Args:
- conn (multiprocessing.Connection): The connection endpoint used
- for receiving audio data and sending transcription results.
- model_path (str): The path to the pre-trained faster_whisper model
- for transcription.
- compute_type (str): Specifies the type of computation to be used
- for transcription.
- gpu_device_index (int): Device ID to use.
- ready_event (threading.Event): An event that is set when the
- transcription model is successfully initialized and ready.
- shutdown_event (threading.Event): An event that, when set,
- signals this worker method to terminate.
- Raises:
- Exception: If there is an error while initializing the
- transcription model.
- """
- logging.info("Initializing faster_whisper "
- f"main transcription model {model_path}"
- )
- try:
- model = faster_whisper.WhisperModel(
- model_size_or_path=model_path,
- device='cuda' if torch.cuda.is_available() else 'cpu',
- compute_type=compute_type,
- device_index=gpu_device_index
- )
- except Exception as e:
- logging.exception("Error initializing main "
- f"faster_whisper transcription model: {e}"
- )
- raise
- ready_event.set()
- logging.debug("Faster_whisper main speech to text "
- "transcription model initialized successfully"
- )
- while not shutdown_event.is_set():
- try:
- if conn.poll(0.5):
- audio, language = conn.recv()
- try:
- segments = model.transcribe(
- audio, language=language if language else None
- )
- segments = segments[0]
- transcription = " ".join(seg.text for seg in segments)
- transcription = transcription.strip()
- conn.send(('success', transcription))
- except faster_whisper.WhisperError as e:
- logging.error(f"Whisper transcription error: {e}")
- conn.send(('error', str(e)))
- except Exception as e:
- logging.error(f"General transcription error: {e}")
- conn.send(('error', str(e)))
- else:
- # If there's no data, sleep / prevent busy waiting
- time.sleep(0.02)
- except KeyboardInterrupt:
- interrupt_stop_event.set()
- logging.debug("Transcription worker process "
- "finished due to KeyboardInterrupt"
- )
- break
- @staticmethod
- def _audio_data_worker(audio_queue,
- sample_rate,
- buffer_size,
- input_device_index,
- shutdown_event,
- interrupt_stop_event):
- """
- Worker method that handles the audio recording process.
- This method runs in a separate process and is responsible for:
- - Setting up the audio input stream for recording.
- - Continuously reading audio data from the input stream
- and placing it in a queue.
- - Handling errors during the recording process, including
- input overflow.
- - Gracefully terminating the recording process when a shutdown
- event is set.
- Args:
- audio_queue (queue.Queue): A queue where recorded audio
- data is placed.
- sample_rate (int): The sample rate of the audio input stream.
- buffer_size (int): The size of the buffer used in the audio
- input stream.
- input_device_index (int): The index of the audio input device
- shutdown_event (threading.Event): An event that, when set, signals
- this worker method to terminate.
- Raises:
- Exception: If there is an error while initializing the audio
- recording.
- """
- logging.info("Initializing audio recording "
- "(creating pyAudio input stream)"
- )
- try:
- audio_interface = pyaudio.PyAudio()
- stream = audio_interface.open(rate=sample_rate,
- format=pyaudio.paInt16,
- channels=1,
- input=True,
- frames_per_buffer=buffer_size,
- input_device_index=input_device_index,
- )
- except Exception as e:
- logging.exception("Error initializing pyaudio "
- f"audio recording: {e}"
- )
- raise
- logging.debug("Audio recording (pyAudio input "
- "stream) initialized successfully"
- )
- try:
- while not shutdown_event.is_set():
- try:
- data = stream.read(buffer_size)
- except OSError as e:
- if e.errno == pyaudio.paInputOverflowed:
- logging.warning("Input overflowed. Frame dropped.")
- else:
- logging.error(f"Error during recording: {e}")
- tb_str = traceback.format_exc()
- print(f"Traceback: {tb_str}")
- print(f"Error: {e}")
- continue
- except Exception as e:
- logging.error(f"Error during recording: {e}")
- tb_str = traceback.format_exc()
- print(f"Traceback: {tb_str}")
- print(f"Error: {e}")
- continue
- audio_queue.put(data)
- except KeyboardInterrupt:
- interrupt_stop_event.set()
- logging.debug("Audio data worker process "
- "finished due to KeyboardInterrupt"
- )
- finally:
- stream.stop_stream()
- stream.close()
- audio_interface.terminate()
- def wakeup(self):
- """
- If in wake work modus, wake up as if a wake word was spoken.
- """
- self.listen_start = time.time()
- def abort(self):
- self.start_recording_on_voice_activity = False
- self.stop_recording_on_voice_deactivity = False
- self._set_state("inactive")
- self.interrupt_stop_event.set()
- self.was_interrupted.wait()
- self.was_interrupted.clear()
- def wait_audio(self):
- """
- Waits for the start and completion of the audio recording process.
- This method is responsible for:
- - Waiting for voice activity to begin recording if not yet started.
- - Waiting for voice inactivity to complete the recording.
- - Setting the audio buffer from the recorded frames.
- - Resetting recording-related attributes.
- Side effects:
- - Updates the state of the instance.
- - Modifies the audio attribute to contain the processed audio data.
- """
- self.listen_start = time.time()
- # If not yet started recording, wait for voice activity to initiate.
- if not self.is_recording and not self.frames:
- self._set_state("listening")
- self.start_recording_on_voice_activity = True
- # Wait until recording starts
- while not self.interrupt_stop_event.is_set():
- if self.start_recording_event.wait(timeout=0.02):
- break
- # If recording is ongoing, wait for voice inactivity
- # to finish recording.
- if self.is_recording:
- self.stop_recording_on_voice_deactivity = True
- # Wait until recording stops
- while not self.interrupt_stop_event.is_set():
- if (self.stop_recording_event.wait(timeout=0.02)):
- break
- # Convert recorded frames to the appropriate audio format.
- audio_array = np.frombuffer(b''.join(self.frames), dtype=np.int16)
- self.audio = audio_array.astype(np.float32) / INT16_MAX_ABS_VALUE
- self.frames.clear()
- # Reset recording-related timestamps
- self.recording_stop_time = 0
- self.listen_start = 0
- self._set_state("inactive")
- def transcribe(self):
- """
- Transcribes audio captured by this class instance using the
- `faster_whisper` model.
- Automatically starts recording upon voice activity if not manually
- started using `recorder.start()`.
- Automatically stops recording upon voice deactivity if not manually
- stopped with `recorder.stop()`.
- Processes the recorded audio to generate transcription.
- Args:
- on_transcription_finished (callable, optional): Callback function
- to be executed when transcription is ready.
- If provided, transcription will be performed asynchronously,
- and the callback will receive the transcription as its argument.
- If omitted, the transcription will be performed synchronously,
- and the result will be returned.
- Returns (if no callback is set):
- str: The transcription of the recorded audio.
- Raises:
- Exception: If there is an error during the transcription process.
- """
- self._set_state("transcribing")
- self.parent_transcription_pipe.send((self.audio, self.language))
- status, result = self.parent_transcription_pipe.recv()
- self._set_state("inactive")
- if status == 'success':
- return self._preprocess_output(result)
- else:
- logging.error(result)
- raise Exception(result)
- def text(self,
- on_transcription_finished=None,
- ):
- """
- Transcribes audio captured by this class instance
- using the `faster_whisper` model.
- - Automatically starts recording upon voice activity if not manually
- started using `recorder.start()`.
- - Automatically stops recording upon voice deactivity if not manually
- stopped with `recorder.stop()`.
- - Processes the recorded audio to generate transcription.
- Args:
- on_transcription_finished (callable, optional): Callback function
- to be executed when transcription is ready.
- If provided, transcription will be performed asynchronously, and
- the callback will receive the transcription as its argument.
- If omitted, the transcription will be performed synchronously,
- and the result will be returned.
- Returns (if not callback is set):
- str: The transcription of the recorded audio
- """
- self.interrupt_stop_event.clear()
- self.was_interrupted.clear()
- self.wait_audio()
- if self.is_shut_down or self.interrupt_stop_event.is_set():
- if self.interrupt_stop_event.is_set():
- self.was_interrupted.set()
- return ""
- if on_transcription_finished:
- threading.Thread(target=on_transcription_finished,
- args=(self.transcribe(),)).start()
- else:
- return self.transcribe()
- def start(self):
- """
- Starts recording audio directly without waiting for voice activity.
- """
- # Ensure there's a minimum interval
- # between stopping and starting recording
- if (time.time() - self.recording_stop_time
- < self.min_gap_between_recordings):
- logging.info("Attempted to start recording "
- "too soon after stopping."
- )
- return self
- logging.info("recording started")
- self._set_state("recording")
- self.text_storage = []
- self.realtime_stabilized_text = ""
- self.realtime_stabilized_safetext = ""
- self.wakeword_detected = False
- self.wake_word_detect_time = 0
- self.frames = []
- self.is_recording = True
- self.recording_start_time = time.time()
- self.is_silero_speech_active = False
- self.is_webrtc_speech_active = False
- self.stop_recording_event.clear()
- self.start_recording_event.set()
- if self.on_recording_start:
- self.on_recording_start()
- return self
- def stop(self):
- """
- Stops recording audio.
- """
- # Ensure there's a minimum interval
- # between starting and stopping recording
- if (time.time() - self.recording_start_time
- < self.min_length_of_recording):
- logging.info("Attempted to stop recording "
- "too soon after starting."
- )
- return self
- logging.info("recording stopped")
- self.is_recording = False
- self.recording_stop_time = time.time()
- self.is_silero_speech_active = False
- self.is_webrtc_speech_active = False
- self.silero_check_time = 0
- self.start_recording_event.clear()
- self.stop_recording_event.set()
- if self.on_recording_stop:
- self.on_recording_stop()
- return self
- def feed_audio(self, chunk):
- """
- Feed an audio chunk into the processing pipeline. Chunks are
- accumulated until the buffer size is reached, and then the accumulated
- data is fed into the audio_queue.
- """
- # Check if the buffer attribute exists, if not, initialize it
- if not hasattr(self, 'buffer'):
- self.buffer = bytearray()
- # Append the chunk to the buffer
- self.buffer += chunk
- buf_size = 2 * self.buffer_size # silero complains if too short
- # Check if the buffer has reached or exceeded the buffer_size
- while len(self.buffer) >= buf_size:
- # Extract self.buffer_size amount of data from the buffer
- to_process = self.buffer[:buf_size]
- self.buffer = self.buffer[buf_size:]
- # Feed the extracted data to the audio_queue
- self.audio_queue.put(to_process)
- def shutdown(self):
- """
- Safely shuts down the audio recording by stopping the
- recording worker and closing the audio stream.
- """
- # Force wait_audio() and text() to exit
- self.is_shut_down = True
- self.start_recording_event.set()
- self.stop_recording_event.set()
- self.shutdown_event.set()
- self.is_recording = False
- self.is_running = False
- logging.debug('Finishing recording thread')
- if self.recording_thread:
- self.recording_thread.join()
- logging.debug('Terminating reader process')
- # Give it some time to finish the loop and cleanup.
- if self.use_microphone:
- self.reader_process.join(timeout=10)
- if self.reader_process.is_alive():
- logging.warning("Reader process did not terminate "
- "in time. Terminating forcefully."
- )
- self.reader_process.terminate()
- logging.debug('Terminating transcription process')
- self.transcript_process.join(timeout=10)
- if self.transcript_process.is_alive():
- logging.warning("Transcript process did not terminate "
- "in time. Terminating forcefully."
- )
- self.transcript_process.terminate()
- self.parent_transcription_pipe.close()
- logging.debug('Finishing realtime thread')
- if self.realtime_thread:
- self.realtime_thread.join()
- if self.enable_realtime_transcription:
- if self.realtime_model_type:
- del self.realtime_model_type
- self.realtime_model_type = None
- gc.collect()
- def _recording_worker(self):
- """
- The main worker method which constantly monitors the audio
- input for voice activity and accordingly starts/stops the recording.
- """
- logging.debug('Starting recording worker')
- try:
- was_recording = False
- delay_was_passed = False
- # Continuously monitor audio for voice activity
- while self.is_running:
- try:
- data = self.audio_queue.get()
- # Handle queue overflow
- queue_overflow_logged = False
- while (self.audio_queue.qsize() >
- self.allowed_latency_limit):
- if not queue_overflow_logged:
- logging.warning("Audio queue size exceeds latency "
- "limit. Current size: "
- f"{self.audio_queue.qsize()}. "
- "Discarding old audio chunks."
- )
- queue_overflow_logged = True
- data = self.audio_queue.get()
- except BrokenPipeError:
- print("BrokenPipeError _recording_worker")
- self.is_running = False
- break
- if not self.is_recording:
- # Handle not recording state
- time_since_listen_start = (time.time() - self.listen_start
- if self.listen_start else 0)
- wake_word_activation_delay_passed = (
- time_since_listen_start >
- self.wake_word_activation_delay
- )
- # Handle wake-word timeout callback
- if wake_word_activation_delay_passed \
- and not delay_was_passed:
- if self.wake_words and self.wake_word_activation_delay:
- if self.on_wakeword_timeout:
- self.on_wakeword_timeout()
- delay_was_passed = wake_word_activation_delay_passed
- # Set state and spinner text
- if not self.recording_stop_time:
- if self.wake_words \
- and wake_word_activation_delay_passed \
- and not self.wakeword_detected:
- self._set_state("wakeword")
- else:
- if self.listen_start:
- self._set_state("listening")
- else:
- self._set_state("inactive")
- # Detect wake words if applicable
- if self.wake_words and wake_word_activation_delay_passed:
- try:
- pcm = struct.unpack_from(
- "h" * self.buffer_size,
- data
- )
- wakeword_index = self.porcupine.process(pcm)
- except struct.error:
- logging.error("Error unpacking audio data "
- "for wake word processing.")
- continue
- except Exception as e:
- logging.error(f"Wake word processing error: {e}")
- continue
- # If a wake word is detected
- if wakeword_index >= 0:
- # Removing the wake word from the recording
- samples_for_0_1_sec = int(self.sample_rate * 0.1)
- start_index = max(
- 0,
- len(self.audio_buffer) - samples_for_0_1_sec
- )
- temp_samples = collections.deque(
- itertools.islice(
- self.audio_buffer,
- start_index,
- None)
- )
- self.audio_buffer.clear()
- self.audio_buffer.extend(temp_samples)
- self.wake_word_detect_time = time.time()
- self.wakeword_detected = True
- if self.on_wakeword_detected:
- self.on_wakeword_detected()
- # Check for voice activity to
- # trigger the start of recording
- if ((not self.wake_words
- or not wake_word_activation_delay_passed)
- and self.start_recording_on_voice_activity) \
- or self.wakeword_detected:
- if self._is_voice_active():
- logging.info("voice activity detected")
- self.start()
- if self.is_recording:
- self.start_recording_on_voice_activity = False
- # Add the buffered audio
- # to the recording frames
- self.frames.extend(list(self.audio_buffer))
- self.audio_buffer.clear()
- self.silero_vad_model.reset_states()
- else:
- data_copy = data[:]
- self._check_voice_activity(data_copy)
- self.speech_end_silence_start = 0
- else:
- # If we are currently recording
- # Stop the recording if silence is detected after speech
- if self.stop_recording_on_voice_deactivity:
- if not self._is_webrtc_speech(data, True):
- # Voice deactivity was detected, so we start
- # measuring silence time before stopping recording
- if self.speech_end_silence_start == 0:
- self.speech_end_silence_start = time.time()
- else:
- self.speech_end_silence_start = 0
- # Wait for silence to stop recording after speech
- if self.speech_end_silence_start and time.time() - \
- self.speech_end_silence_start > \
- self.post_speech_silence_duration:
- logging.info("voice deactivity detected")
- self.stop()
- if not self.is_recording and was_recording:
- # Reset after stopping recording to ensure clean state
- self.stop_recording_on_voice_deactivity = False
- if time.time() - self.silero_check_time > 0.1:
- self.silero_check_time = 0
- # Handle wake word timeout (waited to long initiating
- # speech after wake word detection)
- if self.wake_word_detect_time and time.time() - \
- self.wake_word_detect_time > self.wake_word_timeout:
- self.wake_word_detect_time = 0
- if self.wakeword_detected and self.on_wakeword_timeout:
- self.on_wakeword_timeout()
- self.wakeword_detected = False
- was_recording = self.is_recording
- if self.is_recording:
- self.frames.append(data)
- if not self.is_recording or self.speech_end_silence_start:
- self.audio_buffer.append(data)
- except Exception as e:
- if not self.interrupt_stop_event.is_set():
- logging.error(f"Unhandled exeption in _recording_worker: {e}")
- raise
- def _realtime_worker(self):
- """
- Performs real-time transcription if the feature is enabled.
- The method is responsible transcribing recorded audio frames
- in real-time based on the specified resolution interval.
- The transcribed text is stored in `self.realtime_transcription_text`
- and a callback
- function is invoked with this text if specified.
- """
- try:
- logging.debug('Starting realtime worker')
- # Return immediately if real-time transcription is not enabled
- if not self.enable_realtime_transcription:
- return
- # Continue running as long as the main process is active
- while self.is_running:
- # Check if the recording is active
- if self.is_recording:
- # Sleep for the duration of the transcription resolution
- time.sleep(self.realtime_processing_pause)
- # Convert the buffer frames to a NumPy array
- audio_array = np.frombuffer(
- b''.join(self.frames),
- dtype=np.int16
- )
- # Normalize the array to a [-1, 1] range
- audio_array = audio_array.astype(np.float32) / \
- INT16_MAX_ABS_VALUE
- # Perform transcription and assemble the text
- segments = self.realtime_model_type.transcribe(
- audio_array,
- language=self.language if self.language else None
- )
- # double check recording state
- # because it could have changed mid-transcription
- if self.is_recording and time.time() - \
- self.recording_start_time > 0.5:
- logging.debug('Starting realtime transcription')
- self.realtime_transcription_text = " ".join(
- seg.text for seg in segments[0]
- )
- self.realtime_transcription_text = \
- self.realtime_transcription_text.strip()
- self.text_storage.append(
- self.realtime_transcription_text
- )
- # Take the last two texts in storage, if they exist
- if len(self.text_storage) >= 2:
- last_two_texts = self.text_storage[-2:]
- # Find the longest common prefix
- # between the two texts
- prefix = os.path.commonprefix(
- [last_two_texts[0], last_two_texts[1]]
- )
- # This prefix is the text that was transcripted
- # two times in the same way
- # Store as "safely detected text"
- if len(prefix) >= \
- len(self.realtime_stabilized_safetext):
- # Only store when longer than the previous
- # as additional security
- self.realtime_stabilized_safetext = prefix
- # Find parts of the stabilized text
- # in the freshly transcripted text
- matching_pos = self._find_tail_match_in_text(
- self.realtime_stabilized_safetext,
- self.realtime_transcription_text
- )
- if matching_pos < 0:
- if self.realtime_stabilized_safetext:
- self._on_realtime_transcription_stabilized(
- self._preprocess_output(
- self.realtime_stabilized_safetext,
- True
- )
- )
- else:
- self._on_realtime_transcription_stabilized(
- self._preprocess_output(
- self.realtime_transcription_text,
- True
- )
- )
- else:
- # We found parts of the stabilized text
- # in the transcripted text
- # We now take the stabilized text
- # and add only the freshly transcripted part to it
- output_text = self.realtime_stabilized_safetext + \
- self.realtime_transcription_text[matching_pos:]
- # This yields us the "left" text part as stabilized
- # AND at the same time delivers fresh detected
- # parts on the first run without the need for
- # two transcriptions
- self._on_realtime_transcription_stabilized(
- self._preprocess_output(output_text, True)
- )
- # Invoke the callback with the transcribed text
- self._on_realtime_transcription_update(
- self._preprocess_output(
- self.realtime_transcription_text,
- True
- )
- )
- # If not recording, sleep briefly before checking again
- else:
- time.sleep(TIME_SLEEP)
- except Exception as e:
- logging.error(f"Unhandled exeption in _realtime_worker: {e}")
- raise
- def _is_silero_speech(self, data):
- """
- Returns true if speech is detected in the provided audio data
- Args:
- data (bytes): raw bytes of audio data (1024 raw bytes with
- 16000 sample rate and 16 bits per sample)
- """
- self.silero_working = True
- audio_chunk = np.frombuffer(data, dtype=np.int16)
- audio_chunk = audio_chunk.astype(np.float32) / INT16_MAX_ABS_VALUE
- vad_prob = self.silero_vad_model(
- torch.from_numpy(audio_chunk),
- SAMPLE_RATE).item()
- is_silero_speech_active = vad_prob > (1 - self.silero_sensitivity)
- if is_silero_speech_active:
- self.is_silero_speech_active = True
- self.silero_working = False
- return is_silero_speech_active
- def _is_webrtc_speech(self, data, all_frames_must_be_true=False):
- """
- Returns true if speech is detected in the provided audio data
- Args:
- data (bytes): raw bytes of audio data (1024 raw bytes with
- 16000 sample rate and 16 bits per sample)
- """
- # Number of audio frames per millisecond
- frame_length = int(self.sample_rate * 0.01) # for 10ms frame
- num_frames = int(len(data) / (2 * frame_length))
- speech_frames = 0
- for i in range(num_frames):
- start_byte = i * frame_length * 2
- end_byte = start_byte + frame_length * 2
- frame = data[start_byte:end_byte]
- if self.webrtc_vad_model.is_speech(frame, self.sample_rate):
- speech_frames += 1
- if not all_frames_must_be_true:
- return True
- if all_frames_must_be_true:
- return speech_frames == num_frames
- else:
- return False
- def _check_voice_activity(self, data):
- """
- Initiate check if voice is active based on the provided data.
- Args:
- data: The audio data to be checked for voice activity.
- """
- self.is_webrtc_speech_active = self._is_webrtc_speech(data)
- # First quick performing check for voice activity using WebRTC
- if self.is_webrtc_speech_active:
- if not self.silero_working:
- self.silero_working = True
- # Run the intensive check in a separate thread
- threading.Thread(
- target=self._is_silero_speech,
- args=(data,)).start()
- def _is_voice_active(self):
- """
- Determine if voice is active.
- Returns:
- bool: True if voice is active, False otherwise.
- """
- return self.is_webrtc_speech_active and self.is_silero_speech_active
- def _set_state(self, new_state):
- """
- Update the current state of the recorder and execute
- corresponding state-change callbacks.
- Args:
- new_state (str): The new state to set.
- """
- # Check if the state has actually changed
- if new_state == self.state:
- return
- # Store the current state for later comparison
- old_state = self.state
- # Update to the new state
- self.state = new_state
- # Execute callbacks based on transitioning FROM a particular state
- if old_state == "listening":
- if self.on_vad_detect_stop:
- self.on_vad_detect_stop()
- elif old_state == "wakeword":
- if self.on_wakeword_detection_end:
- self.on_wakeword_detection_end()
- # Execute callbacks based on transitioning TO a particular state
- if new_state == "listening":
- if self.on_vad_detect_start:
- self.on_vad_detect_start()
- self._set_spinner("speak now")
- if self.spinner and self.halo:
- self.halo._interval = 250
- elif new_state == "wakeword":
- if self.on_wakeword_detection_start:
- self.on_wakeword_detection_start()
- self._set_spinner(f"say {self.wake_words}")
- if self.spinner and self.halo:
- self.halo._interval = 500
- elif new_state == "transcribing":
- if self.on_transcription_start:
- self.on_transcription_start()
- self._set_spinner("transcribing")
- if self.spinner and self.halo:
- self.halo._interval = 50
- elif new_state == "recording":
- self._set_spinner("recording")
- if self.spinner and self.halo:
- self.halo._interval = 100
- elif new_state == "inactive":
- if self.spinner and self.halo:
- self.halo.stop()
- self.halo = None
- def _set_spinner(self, text):
- """
- Update the spinner's text or create a new
- spinner with the provided text.
- Args:
- text (str): The text to be displayed alongside the spinner.
- """
- if self.spinner:
- # If the Halo spinner doesn't exist, create and start it
- if self.halo is None:
- self.halo = halo.Halo(text=text)
- self.halo.start()
- # If the Halo spinner already exists, just update the text
- else:
- self.halo.text = text
- def _preprocess_output(self, text, preview=False):
- """
- Preprocesses the output text by removing any leading or trailing
- whitespace, converting all whitespace sequences to a single space
- character, and capitalizing the first character of the text.
- Args:
- text (str): The text to be preprocessed.
- Returns:
- str: The preprocessed text.
- """
- text = re.sub(r'\s+', ' ', text.strip())
- if self.ensure_sentence_starting_uppercase:
- if text:
- text = text[0].upper() + text[1:]
- # Ensure the text ends with a proper punctuation
- # if it ends with an alphanumeric character
- if not preview:
- if self.ensure_sentence_ends_with_period:
- if text and text[-1].isalnum():
- text += '.'
- return text
- def _find_tail_match_in_text(self, text1, text2, length_of_match=10):
- """
- Find the position where the last 'n' characters of text1
- match with a substring in text2.
- This method takes two texts, extracts the last 'n' characters from
- text1 (where 'n' is determined by the variable 'length_of_match'), and
- searches for an occurrence of this substring in text2, starting from
- the end of text2 and moving towards the beginning.
- Parameters:
- - text1 (str): The text containing the substring that we want to find
- in text2.
- - text2 (str): The text in which we want to find the matching
- substring.
- - length_of_match(int): The length of the matching string that we are
- looking for
- Returns:
- int: The position (0-based index) in text2 where the matching
- substring starts. If no match is found or either of the texts is
- too short, returns -1.
- """
- # Check if either of the texts is too short
- if len(text1) < length_of_match or len(text2) < length_of_match:
- return -1
- # The end portion of the first text that we want to compare
- target_substring = text1[-length_of_match:]
- # Loop through text2 from right to left
- for i in range(len(text2) - length_of_match + 1):
- # Extract the substring from text2
- # to compare with the target_substring
- current_substring = text2[len(text2) - i - length_of_match:
- len(text2) - i]
- # Compare the current_substring with the target_substring
- if current_substring == target_substring:
- # Position in text2 where the match starts
- return len(text2) - i
- return -1
- def _on_realtime_transcription_stabilized(self, text):
- """
- Callback method invoked when the real-time transcription stabilizes.
- This method is called internally when the transcription text is
- considered "stable" meaning it's less likely to change significantly
- with additional audio input. It notifies any registered external
- listener about the stabilized text if recording is still ongoing.
- This is particularly useful for applications that need to display
- live transcription results to users and want to highlight parts of the
- transcription that are less likely to change.
- Args:
- text (str): The stabilized transcription text.
- """
- if self.on_realtime_transcription_stabilized:
- if self.is_recording:
- self.on_realtime_transcription_stabilized(text)
- def _on_realtime_transcription_update(self, text):
- """
- Callback method invoked when there's an update in the real-time
- transcription.
- This method is called internally whenever there's a change in the
- transcription text, notifying any registered external listener about
- the update if recording is still ongoing. This provides a mechanism
- for applications to receive and possibly display live transcription
- updates, which could be partial and still subject to change.
- Args:
- text (str): The updated transcription text.
- """
- if self.on_realtime_transcription_update:
- if self.is_recording:
- self.on_realtime_transcription_update(text)
- def __enter__(self):
- """
- Method to setup the context manager protocol.
- This enables the instance to be used in a `with` statement, ensuring
- proper resource management. When the `with` block is entered, this
- method is automatically called.
- Returns:
- self: The current instance of the class.
- """
- return self
- def __exit__(self, exc_type, exc_value, traceback):
- """
- Method to define behavior when the context manager protocol exits.
- This is called when exiting the `with` block and ensures that any
- necessary cleanup or resource release processes are executed, such as
- shutting down the system properly.
- Args:
- exc_type (Exception or None): The type of the exception that
- caused the context to be exited, if any.
- exc_value (Exception or None): The exception instance that caused
- the context to be exited, if any.
- traceback (Traceback or None): The traceback corresponding to the
- exception, if any.
- """
- self.shutdown()
|