Selaa lähdekoodia

Merge branch 'master' into master

Kolja Beigel 5 kuukautta sitten
vanhempi
commit
691b0720c8

+ 163 - 5
README.md

@@ -1,11 +1,17 @@
-
 # RealtimeSTT
+[![PyPI](https://img.shields.io/pypi/v/RealtimeSTT)](https://pypi.org/project/RealtimeSTT/)
+[![Downloads](https://static.pepy.tech/badge/RealtimeSTT)](https://pepy.tech/project/KoljaB/RealtimeSTT)
+[![GitHub release](https://img.shields.io/github/release/KoljaB/RealtimeSTT.svg)](https://GitHub.com/KoljaB/RealtimeSTT/releases/)
+[![GitHub commits](https://badgen.net/github/commits/KoljaB/RealtimeSTT)](https://GitHub.com/Naereen/KoljaB/RealtimeSTT/commit/)
+[![GitHub forks](https://img.shields.io/github/forks/KoljaB/RealtimeSTT.svg?style=social&label=Fork&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/network/)
+[![GitHub stars](https://img.shields.io/github/stars/KoljaB/RealtimeSTT.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/stargazers/)
 
 *Easy-to-use, low-latency speech-to-text library for realtime applications*
 
 ## New
 
-Custom wake words with [OpenWakeWord](#openwakeword). Thanks to the [developers](https://github.com/dscripka/openWakeWord) of this!
+- AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
+- reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)
 
 ## About the Project
 
@@ -18,16 +24,53 @@ It's ideal for:
 - **Voice Assistants**
 - Applications requiring **fast and precise** speech-to-text conversion
 
-https://github.com/KoljaB/RealtimeSTT/assets/7604638/207cb9a2-4482-48e7-9d2b-0722c3ee6d14
+https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5
 
 ### Updates
 
-Latest Version: v0.2.41
+Latest Version: v0.3.7
 
 See [release history](https://github.com/KoljaB/RealtimeSTT/releases).
 
 > **Hint:** *Since we use the `multiprocessing` module now, ensure to include the `if __name__ == '__main__':` protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the [official Python documentation on `multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming).*
 
+## Quick Examples
+
+### Print everything being said:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+import pyautogui
+
+def process_text(text):
+    print(text)
+
+if __name__ == '__main__':
+    print("Wait until it says 'speak now'")
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+
+### Type everything being said:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+import pyautogui
+
+def process_text(text):
+    pyautogui.typewrite(text + " ")
+
+if __name__ == '__main__':
+    print("Wait until it says 'speak now'")
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+*Will type everything being said into your selected text box*
+
 ### Features
 
 - **Voice Activity Detection**: Automatically detects when you start and stop speaking.
@@ -158,6 +201,19 @@ recorder.stop()
 print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder()
+    recorder.start()
+    input("Press Enter to stop recording...")
+    recorder.stop()
+    print("Transcription: ", recorder.text())
+```
+
 ### Automatic Recording
 
 Recording based on voice activity detection.
@@ -167,8 +223,19 @@ with AudioToTextRecorder() as recorder:
     print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    with AudioToTextRecorder() as recorder:
+        print("Transcription: ", recorder.text())
+```
+
 When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:
 
+
 ```python
 def process_text(text):
     print (text)
@@ -177,6 +244,21 @@ while True:
     recorder.text(process_text)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+def process_text(text):
+    print(text)
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+
 ### Wakewords
 
 Keyword activation before detecting voice. Write the comma-separated list of your desired activation keywords into the wake_words parameter. You can choose wake words from these list: alexa, americano, blueberry, bumblebee, computer, grapefruits, grasshopper, hey google, hey siri, jarvis, ok google, picovoice, porcupine, terminator. 
@@ -188,6 +270,18 @@ print('Say "Jarvis" then speak.')
 print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(wake_words="jarvis")
+
+    print('Say "Jarvis" to start recording.')
+    print(recorder.text())
+```
+
 ### Callbacks
 
 You can set callback functions to be executed on different events (see [Configuration](#configuration)) :
@@ -203,6 +297,22 @@ recorder = AudioToTextRecorder(on_recording_start=my_start_callback,
                                on_recording_stop=my_stop_callback)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+def start_callback():
+    print("Recording started!")
+
+def stop_callback():
+    print("Recording stopped!")
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(on_recording_start=start_callback,
+                                   on_recording_stop=stop_callback)
+```
+
 ### Feed chunks
 
 If you don't want to use the local microphone set use_microphone parameter to false and provide raw PCM audiochunks in 16-bit mono (samplerate 16000) with this method:
@@ -211,6 +321,20 @@ If you don't want to use the local microphone set use_microphone parameter to fa
 recorder.feed_audio(audio_chunk)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(use_microphone=False)
+    with open("audio_chunk.pcm", "rb") as f:
+        audio_chunk = f.read()
+
+    recorder.feed_audio(audio_chunk)
+    print("Transcription: ", recorder.text())
+```
+
 ### Shutdown
 
 You can shutdown the recorder safely by using the context manager protocol:
@@ -220,12 +344,25 @@ with AudioToTextRecorder() as recorder:
     [...]
 ```
 
+
 Or you can call the shutdown method manually (if using "with" is not feasible):
 
 ```python
 recorder.shutdown()
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    with AudioToTextRecorder() as recorder:
+        [...]
+    # or manually shutdown if "with" is not used
+    recorder.shutdown()
+```
+
 ## Testing the Library
 
 The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.
@@ -298,6 +435,8 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
 
 - **level** (int, default=logging.WARNING): Logging level.
 
+- **init_logging** (bool, default=True): Whether to initialize the logging framework. Set to False to manage this yourself.
+
 - **handle_buffer_overflow** (bool, default=True): If set, the system will log a warning when an input overflow occurs during recording and remove the data from the buffer.
 
 - **beam_size** (int, default=5): The beam size to use for beam search decoding.
@@ -310,6 +449,14 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
 
 - **debug_mode** (bool, default=False): If set, the system prints additional debug information to the console.
 
+- **print_transcription_time** (bool, default=False): Logs the processing time of the main model transcription. This can be useful for performance monitoring and debugging.
+
+- **early_transcription_on_silence** (int, default=0): If set, the system will transcribe audio faster when silence is detected. Transcription will start after the specified milliseconds. Keep this value lower than `post_speech_silence_duration`, ideally around `post_speech_silence_duration` minus the estimated transcription time with the main model. If silence lasts longer than `post_speech_silence_duration`, the recording is stopped, and the transcription is submitted. If voice activity resumes within this period, the transcription is discarded. This results in faster final transcriptions at the cost of additional GPU load due to some unnecessary final transcriptions.
+
+- **allowed_latency_limit** (int, default=100): Specifies the maximum number of unprocessed chunks in the queue before discarding chunks. This helps prevent the system from being overwhelmed and losing responsiveness in real-time applications.
+
+- **no_log_file** (bool, default=False): If set, the system will skip writing the debug log file, reducing disk I/O. Useful if logging to a file is not needed and performance is a priority.
+
 #### Real-time Transcription Parameters
 
 > **Note**: *When enabling realtime description a GPU installation is strongly advised. Using realtime transcription may create high GPU loads.*
@@ -404,6 +551,17 @@ Suggested starting parameters for OpenWakeWord usage:
         ) as recorder:
 ```
 
+## FAQ
+
+### Q: I encountered the following error: "Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor." How do I fix this?
+
+**A:** This issue arises from a mismatch between the version of `ctranslate2` and cuDNN. The `ctranslate2` library was updated to version 4.5.0, which uses cuDNN 9.2. There are two ways to resolve this issue:
+1. **Downgrade `ctranslate2` to version 4.4.0**:
+   ```bash
+   pip install ctranslate2==4.4.0
+   ```
+2. **Upgrade cuDNN** on your system to version 9.2 or above.
+
 ## Contribution
 
 Contributions are always welcome! 
@@ -412,7 +570,7 @@ Shoutout to [Steven Linn](https://github.com/stevenlafl) for providing docker su
 
 ## License
 
-MIT
+[MIT](https://github.com/KoljaB/RealtimeSTT?tab=MIT-1-ov-file)
 
 ## Author
 

+ 2 - 1
RealtimeSTT/__init__.py

@@ -1 +1,2 @@
-from .audio_recorder import AudioToTextRecorder
+from .audio_recorder import AudioToTextRecorder
+from .audio_recorder_client import AudioToTextRecorderClient

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 563 - 216
RealtimeSTT/audio_recorder.py


+ 758 - 0
RealtimeSTT/audio_recorder_client.py

@@ -0,0 +1,758 @@
+log_outgoing_chunks = False
+debug_mode = False
+
+from typing import Iterable, List, Optional, Union
+from urllib.parse import urlparse
+from datetime import datetime
+import subprocess
+import websocket
+import threading
+import platform
+import logging
+import pyaudio
+import socket
+import struct
+import signal
+import json
+import time
+import sys
+import os
+
+DEFAULT_CONTROL_URL = "ws://127.0.0.1:8011"
+DEFAULT_DATA_URL = "ws://127.0.0.1:8012"
+
+INIT_MODEL_TRANSCRIPTION = "tiny"
+INIT_MODEL_TRANSCRIPTION_REALTIME = "tiny"
+INIT_REALTIME_PROCESSING_PAUSE = 0.2
+INIT_SILERO_SENSITIVITY = 0.4
+INIT_WEBRTC_SENSITIVITY = 3
+INIT_POST_SPEECH_SILENCE_DURATION = 0.6
+INIT_MIN_LENGTH_OF_RECORDING = 0.5
+INIT_MIN_GAP_BETWEEN_RECORDINGS = 0
+INIT_WAKE_WORDS_SENSITIVITY = 0.6
+INIT_PRE_RECORDING_BUFFER_DURATION = 1.0
+INIT_WAKE_WORD_ACTIVATION_DELAY = 0.0
+INIT_WAKE_WORD_TIMEOUT = 5.0
+INIT_WAKE_WORD_BUFFER_DURATION = 0.1
+ALLOWED_LATENCY_LIMIT = 100
+
+CHUNK = 1024
+FORMAT = pyaudio.paInt16
+CHANNELS = 1
+SAMPLE_RATE = 16000
+BUFFER_SIZE = 512
+
+INIT_HANDLE_BUFFER_OVERFLOW = False
+if platform.system() != 'Darwin':
+    INIT_HANDLE_BUFFER_OVERFLOW = True
+
+# Define ANSI color codes for terminal output
+class bcolors:
+    HEADER = '\033[95m'   # Magenta
+    OKBLUE = '\033[94m'   # Blue
+    OKCYAN = '\033[96m'   # Cyan
+    OKGREEN = '\033[92m'  # Green
+    WARNING = '\033[93m'  # Yellow
+    FAIL = '\033[91m'     # Red
+    ENDC = '\033[0m'      # Reset to default
+    BOLD = '\033[1m'
+    UNDERLINE = '\033[4m'
+
+class AudioToTextRecorderClient:
+    """
+    A class responsible for capturing audio from the microphone, detecting
+    voice activity, and then transcribing the captured audio using the
+    `faster_whisper` model.
+    """
+
+    def __init__(self,
+                 model: str = INIT_MODEL_TRANSCRIPTION,
+                 language: str = "",
+                 compute_type: str = "default",
+                 input_device_index: int = None,
+                 gpu_device_index: Union[int, List[int]] = 0,
+                 device: str = "cuda",
+                 on_recording_start=None,
+                 on_recording_stop=None,
+                 on_transcription_start=None,
+                 ensure_sentence_starting_uppercase=True,
+                 ensure_sentence_ends_with_period=True,
+                 use_microphone=True,
+                 spinner=True,
+                 level=logging.WARNING,
+
+                 # Realtime transcription parameters
+                 enable_realtime_transcription=False,
+                 use_main_model_for_realtime=False,
+                 realtime_model_type=INIT_MODEL_TRANSCRIPTION_REALTIME,
+                 realtime_processing_pause=INIT_REALTIME_PROCESSING_PAUSE,
+                 on_realtime_transcription_update=None,
+                 on_realtime_transcription_stabilized=None,
+
+                 # Voice activation parameters
+                 silero_sensitivity: float = INIT_SILERO_SENSITIVITY,
+                 silero_use_onnx: bool = False,
+                 silero_deactivity_detection: bool = False,
+                 webrtc_sensitivity: int = INIT_WEBRTC_SENSITIVITY,
+                 post_speech_silence_duration: float = (
+                     INIT_POST_SPEECH_SILENCE_DURATION
+                 ),
+                 min_length_of_recording: float = (
+                     INIT_MIN_LENGTH_OF_RECORDING
+                 ),
+                 min_gap_between_recordings: float = (
+                     INIT_MIN_GAP_BETWEEN_RECORDINGS
+                 ),
+                 pre_recording_buffer_duration: float = (
+                     INIT_PRE_RECORDING_BUFFER_DURATION
+                 ),
+                 on_vad_detect_start=None,
+                 on_vad_detect_stop=None,
+
+                 # Wake word parameters
+                 wakeword_backend: str = "pvporcupine",
+                 openwakeword_model_paths: str = None,
+                 openwakeword_inference_framework: str = "onnx",
+                 wake_words: str = "",
+                 wake_words_sensitivity: float = INIT_WAKE_WORDS_SENSITIVITY,
+                 wake_word_activation_delay: float = (
+                    INIT_WAKE_WORD_ACTIVATION_DELAY
+                 ),
+                 wake_word_timeout: float = INIT_WAKE_WORD_TIMEOUT,
+                 wake_word_buffer_duration: float = INIT_WAKE_WORD_BUFFER_DURATION,
+                 on_wakeword_detected=None,
+                 on_wakeword_timeout=None,
+                 on_wakeword_detection_start=None,
+                 on_wakeword_detection_end=None,
+                 on_recorded_chunk=None,
+                 debug_mode=False,
+                 handle_buffer_overflow: bool = INIT_HANDLE_BUFFER_OVERFLOW,
+                 beam_size: int = 5,
+                 beam_size_realtime: int = 3,
+                 buffer_size: int = BUFFER_SIZE,
+                 sample_rate: int = SAMPLE_RATE,
+                 initial_prompt: Optional[Union[str, Iterable[int]]] = None,
+                 suppress_tokens: Optional[List[int]] = [-1],
+                 print_transcription_time: bool = False,
+                 early_transcription_on_silence: int = 0,
+                 allowed_latency_limit: int = ALLOWED_LATENCY_LIMIT,
+                 no_log_file: bool = False,
+                 use_extended_logging: bool = False,
+
+                 # Server urls
+                 control_url: str = DEFAULT_CONTROL_URL,
+                 data_url: str = DEFAULT_DATA_URL,
+                 autostart_server: bool = True,
+                 ):
+
+        # Set instance variables from constructor parameters
+        self.model = model
+        self.language = language
+        self.compute_type = compute_type
+        self.input_device_index = input_device_index
+        self.gpu_device_index = gpu_device_index
+        self.device = device
+        self.on_recording_start = on_recording_start
+        self.on_recording_stop = on_recording_stop
+        self.on_transcription_start = on_transcription_start
+        self.ensure_sentence_starting_uppercase = ensure_sentence_starting_uppercase
+        self.ensure_sentence_ends_with_period = ensure_sentence_ends_with_period
+        self.use_microphone = use_microphone
+        self.spinner = spinner
+        self.level = level
+
+        # Real-time transcription parameters
+        self.enable_realtime_transcription = enable_realtime_transcription
+        self.use_main_model_for_realtime = use_main_model_for_realtime
+        self.realtime_model_type = realtime_model_type
+        self.realtime_processing_pause = realtime_processing_pause
+        self.on_realtime_transcription_update = on_realtime_transcription_update
+        self.on_realtime_transcription_stabilized = on_realtime_transcription_stabilized
+
+        # Voice activation parameters
+        self.silero_sensitivity = silero_sensitivity
+        self.silero_use_onnx = silero_use_onnx
+        self.silero_deactivity_detection = silero_deactivity_detection
+        self.webrtc_sensitivity = webrtc_sensitivity
+        self.post_speech_silence_duration = post_speech_silence_duration
+        self.min_length_of_recording = min_length_of_recording
+        self.min_gap_between_recordings = min_gap_between_recordings
+        self.pre_recording_buffer_duration = pre_recording_buffer_duration
+        self.on_vad_detect_start = on_vad_detect_start
+        self.on_vad_detect_stop = on_vad_detect_stop
+
+        # Wake word parameters
+        self.wakeword_backend = wakeword_backend
+        self.openwakeword_model_paths = openwakeword_model_paths
+        self.openwakeword_inference_framework = openwakeword_inference_framework
+        self.wake_words = wake_words
+        self.wake_words_sensitivity = wake_words_sensitivity
+        self.wake_word_activation_delay = wake_word_activation_delay
+        self.wake_word_timeout = wake_word_timeout
+        self.wake_word_buffer_duration = wake_word_buffer_duration
+        self.on_wakeword_detected = on_wakeword_detected
+        self.on_wakeword_timeout = on_wakeword_timeout
+        self.on_wakeword_detection_start = on_wakeword_detection_start
+        self.on_wakeword_detection_end = on_wakeword_detection_end
+        self.on_recorded_chunk = on_recorded_chunk
+        self.debug_mode = debug_mode
+        self.handle_buffer_overflow = handle_buffer_overflow
+        self.beam_size = beam_size
+        self.beam_size_realtime = beam_size_realtime
+        self.buffer_size = buffer_size
+        self.sample_rate = sample_rate
+        self.initial_prompt = initial_prompt
+        self.suppress_tokens = suppress_tokens
+        self.print_transcription_time = print_transcription_time
+        self.early_transcription_on_silence = early_transcription_on_silence
+        self.allowed_latency_limit = allowed_latency_limit
+        self.no_log_file = no_log_file
+        self.use_extended_logging = use_extended_logging
+
+        # Server URLs
+        self.control_url = control_url
+        self.data_url = data_url
+        self.autostart_server = autostart_server
+
+        # Instance variables
+        self.muted = False
+        self.recording_thread = None
+        self.is_running = True
+        self.connection_established = threading.Event()
+        self.recording_start = threading.Event()
+        self.final_text_ready = threading.Event()
+        self.realtime_text = ""
+        self.final_text = ""
+
+        self.request_counter = 0
+        self.pending_requests = {}  # Map from request_id to threading.Event and value
+
+        if self.debug_mode:
+            print("Checking STT server")
+        if not self.connect():
+            print("Failed to connect to the server.", file=sys.stderr)
+        else:
+            if self.debug_mode:
+                print("STT server is running and connected.")
+
+        if self.use_microphone:
+            self.start_recording()
+
+    def text(self, on_transcription_finished=None):
+        self.realtime_text = ""
+        self.submitted_realtime_text = ""
+        self.final_text = ""
+        self.final_text_ready.clear()
+
+        self.recording_start.set()
+
+        try:
+            total_wait_time = 0
+            wait_interval = 0.02  # Wait in small intervals, e.g., 100ms
+            max_wait_time = 60  # Timeout after 60 seconds
+
+            while total_wait_time < max_wait_time:
+                if self.final_text_ready.wait(timeout=wait_interval):
+                    break  # Break if transcription is ready
+                
+                # if not self.realtime_text == self.submitted_realtime_text:
+                #     if self.on_realtime_transcription_update:
+                #         self.on_realtime_transcription_update(self.realtime_text)
+                #     self.submitted_realtime_text = self.realtime_text
+
+                total_wait_time += wait_interval
+                
+                # Check if a manual interrupt has occurred
+                if total_wait_time >= max_wait_time:
+                    if self.debug_mode:
+                        print("Timeout while waiting for text from the server.")
+                    self.recording_start.clear()
+                    if on_transcription_finished:
+                        threading.Thread(target=on_transcription_finished, args=("",)).start()
+                    return ""
+
+            self.recording_start.clear()
+
+            if on_transcription_finished:
+                threading.Thread(target=on_transcription_finished, args=(self.final_text,)).start()
+
+            return self.final_text
+
+        except KeyboardInterrupt:
+            if self.debug_mode:
+                print("KeyboardInterrupt in record_and_send_audio, exiting...")
+            raise KeyboardInterrupt
+
+        except Exception as e:
+            print(f"Error in AudioToTextRecorderClient.text(): {e}")
+            return ""
+
+    def feed_audio(self, chunk, original_sample_rate=16000):
+        metadata = {"sampleRate": original_sample_rate}
+        metadata_json = json.dumps(metadata)
+        metadata_length = len(metadata_json)
+        message = struct.pack('<I', metadata_length) + metadata_json.encode('utf-8') + chunk
+
+        if self.is_running:
+            self.data_ws.send(message, opcode=websocket.ABNF.OPCODE_BINARY)
+
+    def set_microphone(self, microphone_on=True):
+        """
+        Set the microphone on or off.
+        """
+        self.muted = not microphone_on
+
+    def abort(self):
+        self.call_method("abort")
+
+    def wakeup(self):
+        self.call_method("wakeup")
+
+    def clear_audio_queue(self):
+        self.call_method("clear_audio_queue")
+
+    def stop(self):
+        self.call_method("stop")
+
+    def connect(self):
+        if not self.ensure_server_running():
+            print("Cannot start STT server. Exiting.")
+            return False
+        
+        try:
+            # Connect to control WebSocket
+            self.control_ws = websocket.WebSocketApp(self.control_url,
+                                                     on_message=self.on_control_message,
+                                                     on_error=self.on_error,
+                                                     on_close=self.on_close,
+                                                     on_open=self.on_control_open)
+
+            self.control_ws_thread = threading.Thread(target=self.control_ws.run_forever)
+            self.control_ws_thread.daemon = False
+            self.control_ws_thread.start()
+
+            # Connect to data WebSocket
+            self.data_ws = websocket.WebSocketApp(self.data_url,
+                                                  on_message=self.on_data_message,
+                                                  on_error=self.on_error,
+                                                  on_close=self.on_close,
+                                                  on_open=self.on_data_open)
+
+            self.data_ws_thread = threading.Thread(target=self.data_ws.run_forever)
+            self.data_ws_thread.daemon = False
+            self.data_ws_thread.start()
+
+            # Wait for the connections to be established
+            if not self.connection_established.wait(timeout=10):
+                print("Timeout while connecting to the server.")
+                return False
+
+            if self.debug_mode:
+                print("WebSocket connections established successfully.")
+            return True
+        except Exception as e:
+            print(f"Error while connecting to the server: {e}")
+            return False
+
+    def start_server(self):
+        args = ['stt-server']
+
+        # Map constructor parameters to server arguments
+        if self.model:
+            args += ['--model', self.model]
+        if self.realtime_model_type:
+            args += ['--realtime_model_type', self.realtime_model_type]
+        if self.language:
+            args += ['--language', self.language]
+        if self.silero_sensitivity is not None:
+            args += ['--silero_sensitivity', str(self.silero_sensitivity)]
+        if self.silero_use_onnx:
+            args.append('--silero_use_onnx')  # flag, no need for True/False
+        if self.webrtc_sensitivity is not None:
+            args += ['--webrtc_sensitivity', str(self.webrtc_sensitivity)]
+        if self.min_length_of_recording is not None:
+            args += ['--min_length_of_recording', str(self.min_length_of_recording)]
+        if self.min_gap_between_recordings is not None:
+            args += ['--min_gap_between_recordings', str(self.min_gap_between_recordings)]
+        if self.realtime_processing_pause is not None:
+            args += ['--realtime_processing_pause', str(self.realtime_processing_pause)]
+        if self.early_transcription_on_silence is not None:
+            args += ['--early_transcription_on_silence', str(self.early_transcription_on_silence)]
+        if self.silero_deactivity_detection:
+            args.append('--silero_deactivity_detection')  # flag, no need for True/False
+        if self.beam_size is not None:
+            args += ['--beam_size', str(self.beam_size)]
+        if self.beam_size_realtime is not None:
+            args += ['--beam_size_realtime', str(self.beam_size_realtime)]
+        if self.wake_words is not None:
+            args += ['--wake_words', str(self.wake_words)]
+        if self.wake_words_sensitivity is not None:
+            args += ['--wake_words_sensitivity', str(self.wake_words_sensitivity)]
+        if self.wake_word_timeout is not None:
+            args += ['--wake_word_timeout', str(self.wake_word_timeout)]
+        if self.wake_word_activation_delay is not None:
+            args += ['--wake_word_activation_delay', str(self.wake_word_activation_delay)]
+        if self.wakeword_backend is not None:
+            args += ['--wakeword_backend', str(self.wakeword_backend)]
+        if self.openwakeword_model_paths:
+            args += ['--openwakeword_model_paths', str(self.openwakeword_model_paths)]
+        if self.openwakeword_inference_framework is not None:
+            args += ['--openwakeword_inference_framework', str(self.openwakeword_inference_framework)]
+        if self.wake_word_buffer_duration is not None:
+            args += ['--wake_word_buffer_duration', str(self.wake_word_buffer_duration)]
+        if self.use_main_model_for_realtime:
+            args.append('--use_main_model_for_realtime')  # flag, no need for True/False
+        if self.use_extended_logging:
+            args.append('--use_extended_logging')  # flag, no need for True/False
+
+        if self.control_url:
+            parsed_control_url = urlparse(self.control_url)
+            if parsed_control_url.port:
+                args += ['--control_port', str(parsed_control_url.port)]
+        if self.data_url:
+            parsed_data_url = urlparse(self.data_url)
+            if parsed_data_url.port:
+                args += ['--data_port', str(parsed_data_url.port)]
+        if self.initial_prompt:
+            sanitized_prompt = self.initial_prompt.replace("\n", "\\n")
+            args += ['--initial_prompt', sanitized_prompt]
+
+        # Start the subprocess with the mapped arguments
+        if os.name == 'nt':  # Windows
+            cmd = 'start /min cmd /c ' + subprocess.list2cmdline(args)
+            if debug_mode:
+                print(f"Opening server with cli command: {cmd}")
+            subprocess.Popen(cmd, shell=True)
+        else:  # Unix-like systems
+            subprocess.Popen(args, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, start_new_session=True)
+        print("STT server start command issued. Please wait a moment for it to initialize.", file=sys.stderr)
+
+    def is_server_running(self):
+        parsed_url = urlparse(self.control_url)
+        host = parsed_url.hostname
+        port = parsed_url.port or 80
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            return s.connect_ex((host, port)) == 0
+
+    def ensure_server_running(self):
+        if not self.is_server_running():
+            if self.debug_mode:
+                print("STT server is not running.", file=sys.stderr)
+            if self.autostart_server or self.ask_to_start_server():
+                self.start_server()
+                if self.debug_mode:
+                    print("Waiting for STT server to start...", file=sys.stderr)
+                for _ in range(20):  # Wait up to 20 seconds
+                    if self.is_server_running():
+                        if self.debug_mode:
+                            print("STT server started successfully.", file=sys.stderr)
+                        time.sleep(2)  # Give the server a moment to fully initialize
+                        return True
+                    time.sleep(1)
+                print("Failed to start STT server.", file=sys.stderr)
+                return False
+            else:
+                print("STT server is required. Please start it manually.", file=sys.stderr)
+                return False
+        return True
+
+    def start_recording(self):
+        self.recording_thread = threading.Thread(target=self.record_and_send_audio)
+        self.recording_thread.daemon = False
+        self.recording_thread.start()
+
+    def setup_audio(self):
+        try:
+            self.audio_interface = pyaudio.PyAudio()
+            self.input_device_index = None
+            try:
+                default_device = self.audio_interface.get_default_input_device_info()
+                self.input_device_index = default_device['index']
+            except OSError as e:
+                print(f"No default input device found: {e}")
+                return False
+
+            self.device_sample_rate = 16000  # Try 16000 Hz first
+
+            try:
+                self.stream = self.audio_interface.open(
+                    format=FORMAT,
+                    channels=CHANNELS,
+                    rate=self.device_sample_rate,
+                    input=True,
+                    frames_per_buffer=CHUNK,
+                    input_device_index=self.input_device_index,
+                )
+                if self.debug_mode:
+                    print(f"Audio recording initialized successfully at {self.device_sample_rate} Hz")
+                return True
+            except Exception as e:
+                print(f"Failed to initialize audio stream at {self.device_sample_rate} Hz: {e}")
+                return False
+
+        except Exception as e:
+            print(f"Error initializing audio recording: {e}")
+            if self.audio_interface:
+                self.audio_interface.terminate()
+            return False
+
+    def record_and_send_audio(self):
+        try:
+            if not self.setup_audio():
+                raise Exception("Failed to set up audio recording.")
+
+            if self.debug_mode:
+                print("Recording and sending audio...")
+
+            while self.is_running:
+                if self.muted:
+                    time.sleep(0.01)
+                    continue
+
+                try:
+                    audio_data = self.stream.read(CHUNK)
+
+                    if self.on_recorded_chunk:
+                        self.on_recorded_chunk(audio_data)
+
+                    if self.muted:
+                        continue
+
+                    if self.recording_start.is_set():
+                        metadata = {"sampleRate": self.device_sample_rate}
+                        metadata_json = json.dumps(metadata)
+                        metadata_length = len(metadata_json)
+                        message = struct.pack('<I', metadata_length) + metadata_json.encode('utf-8') + audio_data
+
+                        if self.is_running:
+                            if log_outgoing_chunks:
+                                print(".", flush=True, end='')
+                            self.data_ws.send(message, opcode=websocket.ABNF.OPCODE_BINARY)
+                except KeyboardInterrupt:  # handle manual interruption (Ctrl+C)
+                    if self.debug_mode:
+                        print("KeyboardInterrupt in record_and_send_audio, exiting...")
+                    break
+                except Exception as e:
+                    print(f"Error sending audio data: {e}")
+                    break  # Exit the recording loop
+
+        except Exception as e:
+            print(f"Error in record_and_send_audio: {e}")
+        finally:
+            self.cleanup_audio()
+
+    def cleanup_audio(self):
+        try:
+            if self.stream:
+                self.stream.stop_stream()
+                self.stream.close()
+                self.stream = None
+            if self.audio_interface:
+                self.audio_interface.terminate()
+                self.audio_interface = None
+        except Exception as e:
+            print(f"Error cleaning up audio resources: {e}")
+
+    def on_control_message(self, ws, message):
+        try:
+            data = json.loads(message)
+            # Handle server response with status
+            if 'status' in data:
+                if data['status'] == 'success':
+                    if 'parameter' in data and 'value' in data:
+                        request_id = data.get('request_id')
+                        if request_id is not None and request_id in self.pending_requests:
+                            if self.debug_mode:
+                                print(f"Parameter {data['parameter']} = {data['value']}")
+                            self.pending_requests[request_id]['value'] = data['value']
+                            self.pending_requests[request_id]['event'].set()
+                elif data['status'] == 'error':
+                    print(f"Server Error: {data.get('message', '')}")
+            else:
+                print(f"Unknown control message format: {data}")
+        except json.JSONDecodeError:
+            print(f"Received non-JSON control message: {message}")
+        except Exception as e:
+            print(f"Error processing control message: {e}")
+
+    # Handle real-time transcription and full sentence updates
+    def on_data_message(self, ws, message):
+        try:
+            data = json.loads(message)
+            # Handle real-time transcription updates
+            if data.get('type') == 'realtime':
+                if data['text'] != self.realtime_text:
+                    self.realtime_text = data['text']
+
+                    timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                    print(f"Realtime text [{timestamp}]: {bcolors.OKCYAN}{self.realtime_text}{bcolors.ENDC}")
+
+                    if self.on_realtime_transcription_update:
+                        # Call the callback in a new thread to avoid blocking
+                        threading.Thread(
+                            target=self.on_realtime_transcription_update,
+                            args=(self.realtime_text,)
+                        ).start()
+
+            # Handle full sentences
+            elif data.get('type') == 'fullSentence':
+                self.final_text = data['text']
+                self.final_text_ready.set()
+
+            elif data.get('type') == 'recording_start':
+                if self.on_recording_start:
+                    self.on_recording_start()
+            elif data.get('type') == 'recording_stop':
+                if self.on_recording_stop:
+                    self.on_recording_stop()
+            elif data.get('type') == 'transcription_start':
+                if self.on_transcription_start:
+                    self.on_transcription_start()
+            elif data.get('type') == 'vad_detect_start':
+                if self.on_vad_detect_start:
+                    self.on_vad_detect_start()
+            elif data.get('type') == 'vad_detect_stop':
+                if self.on_vad_detect_stop:
+                    self.on_vad_detect_stop()
+            elif data.get('type') == 'wakeword_detected':
+                if self.on_wakeword_detected:
+                    self.on_wakeword_detected()
+            elif data.get('type') == 'wakeword_detection_start':
+                if self.on_wakeword_detection_start:
+                    self.on_wakeword_detection_start()
+            elif data.get('type') == 'wakeword_detection_end':
+                if self.on_wakeword_detection_end:
+                    self.on_wakeword_detection_end()
+            elif data.get('type') == 'recorded_chunk':
+                pass
+
+            else:
+                print(f"Unknown data message format: {data}")
+
+        except json.JSONDecodeError:
+            print(f"Received non-JSON data message: {message}")
+        except Exception as e:
+            print(f"Error processing data message: {e}")
+
+    def on_error(self, ws, error):
+        print(f"WebSocket error: {error}")
+
+    def on_close(self, ws, close_status_code, close_msg):
+        if self.debug_mode:
+            if ws == self.data_ws:
+                print(f"Data WebSocket connection closed: {close_status_code} - {close_msg}")
+            elif ws == self.control_ws:
+                print(f"Control WebSocket connection closed: {close_status_code} - {close_msg}")
+        
+        self.is_running = False
+
+    def on_control_open(self, ws):
+        if self.debug_mode:
+            print("Control WebSocket connection opened.")
+        self.connection_established.set()
+
+    def on_data_open(self, ws):
+        if self.debug_mode:
+            print("Data WebSocket connection opened.")
+
+    def set_parameter(self, parameter, value):
+        command = {
+            "command": "set_parameter",
+            "parameter": parameter,
+            "value": value
+        }
+        self.control_ws.send(json.dumps(command))
+
+    def get_parameter(self, parameter):
+        # Generate a unique request_id
+        request_id = self.request_counter
+        self.request_counter += 1
+
+        # Prepare the command with the request_id
+        command = {
+            "command": "get_parameter",
+            "parameter": parameter,
+            "request_id": request_id
+        }
+
+        # Create an event to wait for the response
+        event = threading.Event()
+        self.pending_requests[request_id] = {'event': event, 'value': None}
+
+        # Send the command to the server
+        self.control_ws.send(json.dumps(command))
+
+        # Wait for the response or timeout after 5 seconds
+        if event.wait(timeout=5):
+            value = self.pending_requests[request_id]['value']
+            # Clean up the pending request
+            del self.pending_requests[request_id]
+            return value
+        else:
+            print(f"Timeout waiting for get_parameter {parameter}")
+            # Clean up the pending request
+            del self.pending_requests[request_id]
+            return None
+
+    def call_method(self, method, args=None, kwargs=None):
+        command = {
+            "command": "call_method",
+            "method": method,
+            "args": args or [],
+            "kwargs": kwargs or {}
+        }
+        self.control_ws.send(json.dumps(command))
+
+    def shutdown(self):
+        self.is_running = False
+        #self.stop_event.set()
+        if self.control_ws:
+            self.control_ws.close()
+        if self.data_ws:
+            self.data_ws.close()
+
+        # Join threads to ensure they finish before exiting
+        if self.control_ws_thread:
+            self.control_ws_thread.join()
+        if self.data_ws_thread:
+            self.data_ws_thread.join()
+        if self.recording_thread:
+            self.recording_thread.join()
+
+        # Clean up audio resources
+        if self.stream:
+            self.stream.stop_stream()
+            self.stream.close()
+        if self.audio_interface:
+            self.audio_interface.terminate()
+
+    def __enter__(self):
+        """
+        Method to setup the context manager protocol.
+
+        This enables the instance to be used in a `with` statement, ensuring
+        proper resource management. When the `with` block is entered, this
+        method is automatically called.
+
+        Returns:
+            self: The current instance of the class.
+        """
+        return self
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        """
+        Method to define behavior when the context manager protocol exits.
+
+        This is called when exiting the `with` block and ensures that any
+        necessary cleanup or resource release processes are executed, such as
+        shutting down the system properly.
+
+        Args:
+            exc_type (Exception or None): The type of the exception that
+              caused the context to be exited, if any.
+            exc_value (Exception or None): The exception instance that caused
+              the context to be exited, if any.
+            traceback (Traceback or None): The traceback corresponding to the
+              exception, if any.
+        """
+        self.shutdown()

+ 434 - 0
RealtimeSTT_server/README.md

@@ -0,0 +1,434 @@
+# RealtimeSTT Server and Client
+
+This directory contains the server and client implementations for the RealtimeSTT library, providing real-time speech-to-text transcription with WebSocket interfaces. The server allows clients to connect via WebSocket to send audio data and receive real-time transcription updates. The client handles communication with the server, allowing audio recording, parameter management, and control commands.
+
+## Table of Contents
+
+- [Features](#features)
+- [Installation](#installation)
+- [Server Usage](#server-usage)
+  - [Starting the Server](#starting-the-server)
+  - [Server Parameters](#server-parameters)
+- [Client Usage](#client-usage)
+  - [Starting the Client](#starting-the-client)
+  - [Client Parameters](#client-parameters)
+- [WebSocket Interface](#websocket-interface)
+- [Examples](#examples)
+  - [Starting the Server and Client](#starting-the-server-and-client)
+  - [Setting Parameters](#setting-parameters)
+  - [Retrieving Parameters](#retrieving-parameters)
+  - [Calling Server Methods](#calling-server-methods)
+- [Contributing](#contributing)
+- [License](#license)
+
+## Features
+
+- **Real-Time Transcription**: Provides real-time speech-to-text transcription using pre-configured or user-defined STT models.
+- **WebSocket Communication**: Makes use of WebSocket connections for control commands and data handling.
+- **Flexible Recording Options**: Supports configurable pauses for sentence detection and various voice activity detection (VAD) methods.
+- **VAD Support**: Includes support for Silero and WebRTC VAD for robust voice activity detection.
+- **Wake Word Detection**: Capable of detecting wake words to initiate transcription.
+- **Configurable Parameters**: Allows fine-tuning of recording and transcription settings via command-line arguments or control commands.
+
+## Installation
+
+Ensure you have Python 3.8 or higher installed. Install the required packages using:
+
+```bash
+pip install git+https://github.com/KoljaB/RealtimeSTT.git@dev
+```
+
+## Server Usage
+
+### Starting the Server
+
+Start the server using the command-line interface:
+
+```bash
+stt-server [OPTIONS]
+```
+
+The server will initialize and begin listening for WebSocket connections on the specified control and data ports.
+
+### Server Parameters
+
+You can configure the server using the following command-line arguments:
+
+### Available Parameters:
+
+#### `-m`, `--model`
+
+- **Type**: `str`
+- **Default**: `'large-v2'`
+- **Description**: Path to the Speech-to-Text (STT) model or specify a model size. Options include: `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, `medium.en`, `large-v1`, `large-v2`, or any HuggingFace CTranslate2 STT model such as `deepdml/faster-whisper-large-v3-turbo-ct2`.
+
+#### `-r`, `--rt-model`, `--realtime_model_type`
+
+- **Type**: `str`
+- **Default**: `'tiny.en'`
+- **Description**: Model size for real-time transcription. Options are the same as for `--model`. This is used only if real-time transcription is enabled (`--enable_realtime_transcription`).
+
+#### `-l`, `--lang`, `--language`
+
+- **Type**: `str`
+- **Default**: `'en'`
+- **Description**: Language code for the STT model to transcribe in a specific language. Leave this empty for auto-detection based on input audio. Default is `'en'`. [List of supported language codes](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L11-L110).
+
+#### `-i`, `--input-device`, `--input_device_index`
+
+- **Type**: `int`
+- **Default**: `1`
+- **Description**: Index of the audio input device to use. Use this option to specify a particular microphone or audio input device based on your system.
+
+#### `-c`, `--control`, `--control_port`
+
+- **Type**: `int`
+- **Default**: `8011`
+- **Description**: The port number used for the control WebSocket connection. Control connections are used to send and receive commands to the server.
+
+#### `-d`, `--data`, `--data_port`
+
+- **Type**: `int`
+- **Default**: `8012`
+- **Description**: The port number used for the data WebSocket connection. Data connections are used to send audio data and receive transcription updates in real time.
+
+#### `-w`, `--wake_words`
+
+- **Type**: `str`
+- **Default**: `""` (empty string)
+- **Description**: Specify the wake word(s) that will trigger the server to start listening. For example, setting this to `"Jarvis"` will make the system start transcribing when it detects the wake word `"Jarvis"`.
+
+#### `-D`, `--debug`
+
+- **Action**: `store_true`
+- **Description**: Enable debug logging for detailed server operations.
+
+#### `-W`, `--write`
+
+- **Metavar**: `FILE`
+- **Description**: Save received audio to a WAV file.
+
+#### `--silero_sensitivity`
+
+- **Type**: `float`
+- **Default**: `0.05`
+- **Description**: Sensitivity level for Silero Voice Activity Detection (VAD), with a range from `0` to `1`. Lower values make the model less sensitive, useful for noisy environments.
+
+#### `--silero_use_onnx`
+
+- **Action**: `store_true`
+- **Default**: `False`
+- **Description**: Enable the ONNX version of the Silero model for faster performance with lower resource usage.
+
+#### `--webrtc_sensitivity`
+
+- **Type**: `int`
+- **Default**: `3`
+- **Description**: Sensitivity level for WebRTC Voice Activity Detection (VAD), with a range from `0` to `3`. Higher values make the model less sensitive, useful for cleaner environments.
+
+#### `--min_length_of_recording`
+
+- **Type**: `float`
+- **Default**: `1.1`
+- **Description**: Minimum duration of valid recordings in seconds. This prevents very short recordings from being processed, which could be caused by noise or accidental sounds.
+
+#### `--min_gap_between_recordings`
+
+- **Type**: `float`
+- **Default**: `0`
+- **Description**: Minimum time (in seconds) between consecutive recordings. Setting this helps avoid overlapping recordings when there's a brief silence between them.
+
+#### `--enable_realtime_transcription`
+
+- **Action**: `store_true`
+- **Default**: `True`
+- **Description**: Enable continuous real-time transcription of audio as it is received. When enabled, transcriptions are sent in near real-time.
+
+#### `--realtime_processing_pause`
+
+- **Type**: `float`
+- **Default**: `0.02`
+- **Description**: Time interval (in seconds) between processing audio chunks for real-time transcription. Lower values increase responsiveness but may put more load on the CPU.
+
+#### `--silero_deactivity_detection`
+
+- **Action**: `store_true`
+- **Default**: `True`
+- **Description**: Use the Silero model for end-of-speech detection. This option can provide more robust silence detection in noisy environments, though it consumes more GPU resources.
+
+#### `--early_transcription_on_silence`
+
+- **Type**: `float`
+- **Default**: `0.2`
+- **Description**: Start transcription after the specified seconds of silence. This is useful when you want to trigger transcription mid-speech when there is a brief pause. Should be lower than `post_speech_silence_duration`. Set to `0` to disable.
+
+#### `--beam_size`
+
+- **Type**: `int`
+- **Default**: `5`
+- **Description**: Beam size for the main transcription model. Larger values may improve transcription accuracy but increase the processing time.
+
+#### `--beam_size_realtime`
+
+- **Type**: `int`
+- **Default**: `3`
+- **Description**: Beam size for the real-time transcription model. A smaller beam size allows for faster real-time processing but may reduce accuracy.
+
+#### `--initial_prompt`
+
+- **Type**: `str`
+- **Default**:
+
+  ```
+  End incomplete sentences with ellipses. Examples: 
+  Complete: The sky is blue. 
+  Incomplete: When the sky... 
+  Complete: She walked home. 
+  Incomplete: Because he...
+  ```
+
+- **Description**: Initial prompt that guides the transcription model to produce transcriptions in a particular style or format. The default provides instructions for handling sentence completions and ellipsis usage.
+
+#### `--end_of_sentence_detection_pause`
+
+- **Type**: `float`
+- **Default**: `0.45`
+- **Description**: The duration of silence (in seconds) that the model should interpret as the end of a sentence. This helps the system detect when to finalize the transcription of a sentence.
+
+#### `--unknown_sentence_detection_pause`
+
+- **Type**: `float`
+- **Default**: `0.7`
+- **Description**: The duration of pause (in seconds) that the model should interpret as an incomplete or unknown sentence. This is useful for identifying when a sentence is trailing off or unfinished.
+
+#### `--mid_sentence_detection_pause`
+
+- **Type**: `float`
+- **Default**: `2.0`
+- **Description**: The duration of pause (in seconds) that the model should interpret as a mid-sentence break. Longer pauses can indicate a pause in speech but not necessarily the end of a sentence.
+
+#### `--wake_words_sensitivity`
+
+- **Type**: `float`
+- **Default**: `0.5`
+- **Description**: Sensitivity level for wake word detection, with a range from `0` (most sensitive) to `1` (least sensitive). Adjust this value based on your environment to ensure reliable wake word detection.
+
+#### `--wake_word_timeout`
+
+- **Type**: `float`
+- **Default**: `5.0`
+- **Description**: Maximum time in seconds that the system will wait for a wake word before timing out. After this timeout, the system stops listening for wake words until reactivated.
+
+#### `--wake_word_activation_delay`
+
+- **Type**: `float`
+- **Default**: `20`
+- **Description**: The delay in seconds before the wake word detection is activated after the system starts listening. This prevents false positives during the start of a session.
+
+#### `--wakeword_backend`
+
+- **Type**: `str`
+- **Default**: `'none'`
+- **Description**: The backend used for wake word detection. You can specify different backends such as `"default"` or any custom implementations depending on your setup.
+
+#### `--openwakeword_model_paths`
+
+- **Type**: `str` (accepts multiple values)
+- **Description**: A list of file paths to OpenWakeWord models. This is useful if you are using OpenWakeWord for wake word detection and need to specify custom models.
+
+#### `--openwakeword_inference_framework`
+
+- **Type**: `str`
+- **Default**: `'tensorflow'`
+- **Description**: The inference framework to use for OpenWakeWord models. Supported frameworks could include `"tensorflow"`, `"pytorch"`, etc.
+
+#### `--wake_word_buffer_duration`
+
+- **Type**: `float`
+- **Default**: `1.0`
+- **Description**: Duration of the buffer in seconds for wake word detection. This sets how long the system will store the audio before and after detecting the wake word.
+
+#### `--use_main_model_for_realtime`
+
+- **Action**: `store_true`
+- **Description**: Enable this option if you want to use the main model for real-time transcription, instead of the smaller, faster real-time model. Using the main model may provide better accuracy but at the cost of higher processing time.
+
+#### `--use_extended_logging`
+
+- **Action**: `store_true`
+- **Description**: Writes extensive log messages for the recording worker that processes the audio chunks.
+
+#### `--logchunks`
+
+- **Action**: `store_true`
+- **Description**: Enable logging of incoming audio chunks (periods).
+
+**Example:**
+
+```bash
+stt-server -m small.en -l en -c 9001 -d 9002
+```
+
+## Client Usage
+
+### Starting the Client
+
+Start the client using:
+
+```bash
+stt [OPTIONS]
+```
+
+The client connects to the STT server's control and data WebSocket URLs to facilitate real-time speech transcription and control.
+
+### Available Parameters for STT Client:
+
+#### `-c`, `--control`, `--control_url`
+
+- **Type**: `str`
+- **Default**: `DEFAULT_CONTROL_URL`
+- **Description**: Specifies the STT control WebSocket URL used for sending and receiving commands to/from the STT server.
+
+#### `-d`, `--data`, `--data_url`
+
+- **Type**: `str`
+- **Default**: `DEFAULT_DATA_URL`
+- **Description**: Specifies the STT data WebSocket URL used for transmitting audio data and receiving transcription updates.
+
+#### `-D`, `--debug`
+
+- **Action**: `store_true`
+- **Description**: Enables debug mode, providing detailed output for server-client interactions.
+
+#### `-n`, `--norealtime`
+
+- **Action**: `store_true`
+- **Description**: Disables real-time output, preventing transcription updates from being shown live as they are processed.
+
+#### `-W`, `--write`
+
+- **Metavar**: `FILE`
+- **Description**: Saves recorded audio to a specified WAV file for later playback or analysis.
+
+#### `-s`, `--set`
+
+- **Type**: `list`
+- **Metavar**: `('PARAM', 'VALUE')`
+- **Action**: `append`
+- **Description**: Sets a parameter for the recorder. Can be used multiple times to set different parameters. Each occurrence must be followed by the parameter name and value.
+
+#### `-m`, `--method`
+
+- **Type**: `list`
+- **Metavar**: `METHOD`
+- **Action**: `append`
+- **Description**: Calls a specified method on the recorder with optional arguments. Multiple methods can be invoked by repeating this parameter.
+
+#### `-g`, `--get`
+
+- **Type**: `list`
+- **Metavar**: `PARAM`
+- **Action**: `append`
+- **Description**: Retrieves the value of a specified recorder parameter. Can be used multiple times to get multiple parameter values.
+
+#### `-l`, `--loop`
+
+- **Action**: `store_true`
+- **Description**: Runs the client in a loop, allowing it to continuously transcribe speech without exiting after each session.
+
+**Example:**
+
+```bash
+stt -s silero_sensitivity 0.1 
+stt -g silero_sensitivity
+```
+
+## WebSocket Interface
+
+The server uses two WebSocket connections:
+
+1. **Control WebSocket**: Used to send and receive control commands, such as setting parameters or invoking recorder methods.
+
+2. **Data WebSocket**: Used to send audio data for transcription and receive real-time transcription updates.
+
+## Examples
+
+### Starting the Server and Client
+
+1. **Start the Server with Default Settings:**
+
+   ```bash
+   stt-server
+   ```
+
+2. **Start the Client with Default Settings:**
+
+   ```bash
+   stt
+   ```
+
+### Setting Parameters
+
+Set the Silero sensitivity to `0.1`:
+
+```bash
+stt -s silero_sensitivity 0.1
+```
+
+### Retrieving Parameters
+
+Get the current Silero sensitivity value:
+
+```bash
+stt -g silero_sensitivity
+```
+
+### Calling Server Methods
+
+Call the `set_microphone` method on the recorder:
+
+```bash
+stt -m set_microphone False
+```
+
+### Running in Debug Mode
+
+Enable debug mode for detailed logging:
+
+```bash
+stt -D
+```
+
+## Contributing
+
+Contributions are welcome! Please open an issue or submit a pull request on GitHub.
+
+## License
+
+This project is licensed under the MIT License. See the [LICENSE](../LICENSE) file for details.
+
+# Additional Information
+
+The server and client scripts are designed to work seamlessly together, enabling efficient real-time speech transcription with minimal latency. The flexibility in configuration allows users to tailor the system to specific needs, such as adjusting sensitivity levels for different environments or selecting appropriate STT models based on resource availability.
+
+**Note:** Ensure that the server is running before starting the client. The client includes functionality to check if the server is running and can prompt the user to start it if necessary.
+
+# Troubleshooting
+
+- **Server Not Starting:** If the server fails to start, check that all dependencies are installed and that the specified ports are not in use.
+
+- **Audio Issues:** Ensure that the correct audio input device index is specified if using a device other than the default.
+
+- **WebSocket Connection Errors:** Verify that the control and data URLs are correct and that the server is listening on those ports.
+
+# Contact
+
+For questions or support, please open an issue on the [GitHub repository](https://github.com/KoljaB/RealtimeSTT/issues).
+
+# Acknowledgments
+
+Special thanks to the contributors of the RealtimeSTT library and the open-source community for their continuous support.
+
+---
+
+**Disclaimer:** This software is provided "as is", without warranty of any kind, express or implied. Use it at your own risk.

+ 0 - 0
RealtimeSTT_server/__init__.py


+ 242 - 0
RealtimeSTT_server/index.html

@@ -0,0 +1,242 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Browser STT Client</title>
+  <style>
+    body {
+      background-color: #f4f4f9;
+      color: #333;
+      font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+      display: flex;
+      align-items: center;
+      justify-content: center;
+      height: 100vh;
+      margin: 0;
+    }
+    #container {
+      display: flex;
+      flex-direction: column;
+      align-items: center;
+      width: 100%;
+      max-width: 700px;
+      padding: 20px;
+      box-sizing: border-box;
+      gap: 20px; /* Add more vertical space between items */
+      height: 90%; /* Fixed height to prevent layout shift */
+    }
+    #status {
+      color: #0056b3;
+      font-size: 20px;
+      text-align: center;
+    }
+    #transcriptionContainer {
+      height: 90px; /* Fixed height for approximately 3 lines of text */
+      overflow-y: auto;
+      width: 100%;
+      padding: 10px;
+      box-sizing: border-box;
+      background-color: #f9f9f9;
+      border: 1px solid #ddd;
+      border-radius: 5px;
+    }
+    #transcription {
+      font-size: 18px;
+      line-height: 1.6;
+      color: #333;
+      word-wrap: break-word;
+    }
+    #fullTextContainer {
+      height: 150px; /* Fixed height to prevent layout shift */
+      overflow-y: auto;
+      width: 100%;
+      padding: 10px;
+      box-sizing: border-box;
+      background-color: #f9f9f9;
+      border: 1px solid #ddd;
+      border-radius: 5px;
+    }
+    #fullText {
+      color: #4CAF50;
+      font-size: 18px;
+      font-weight: 600;
+      word-wrap: break-word;
+    }
+    .last-word {
+      color: #007bff;
+      font-weight: 600;
+    }
+    button {
+      padding: 12px 24px;
+      font-size: 16px;
+      cursor: pointer;
+      border: none;
+      border-radius: 5px;
+      margin: 5px;
+      transition: background-color 0.3s ease;
+      color: #fff;
+      background-color: #0056b3;
+      box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
+    }
+    button:hover {
+      background-color: #007bff;
+    }
+    button:disabled {
+      background-color: #cccccc;
+      cursor: not-allowed;
+    }
+  </style>
+</head>
+<body>
+  <div id="container">
+    <div id="status">Press "Start Recording"...</div>
+    <button id="startButton" onclick="startRecording()">Start Recording</button>
+    <button id="stopButton" onclick="stopRecording()" disabled>Stop Recording</button>
+    <div id="transcriptionContainer">
+      <div id="transcription" class="realtime"></div>
+    </div>
+    <div id="fullTextContainer">
+      <div id="fullText"></div>
+    </div>
+  </div>
+
+  <script>
+    const statusDiv = document.getElementById("status");
+    const transcriptionDiv = document.getElementById("transcription");
+    const fullTextDiv = document.getElementById("fullText");
+    const startButton = document.getElementById("startButton");
+    const stopButton = document.getElementById("stopButton");
+
+    const controlURL = "ws://127.0.0.1:8011";
+    const dataURL = "ws://127.0.0.1:8012";
+    let dataSocket;
+    let audioContext;
+    let mediaStream;
+    let mediaProcessor;
+
+    // Connect to the data WebSocket
+    function connectToDataSocket() {
+      dataSocket = new WebSocket(dataURL);
+
+      dataSocket.onopen = () => {
+        statusDiv.textContent = "Connected to STT server.";
+        console.log("Connected to data WebSocket.");
+      };
+
+      dataSocket.onmessage = (event) => {
+        try {
+          const message = JSON.parse(event.data);
+
+          if (message.type === "realtime") {
+            // Show real-time transcription with the last word in bold, orange
+            let words = message.text.split(" ");
+            let lastWord = words.pop();
+            transcriptionDiv.innerHTML = `${words.join(" ")} <span class="last-word">${lastWord}</span>`;
+
+            // Auto-scroll to the bottom of the transcription container
+            const transcriptionContainer = document.getElementById("transcriptionContainer");
+            transcriptionContainer.scrollTop = transcriptionContainer.scrollHeight;
+          } else if (message.type === "fullSentence") {
+            // Accumulate the final transcription in green
+            fullTextDiv.innerHTML += message.text + " ";
+            transcriptionDiv.innerHTML = message.text;
+
+            // Scroll to the bottom of fullTextContainer when new text is added
+            const fullTextContainer = document.getElementById("fullTextContainer");
+            fullTextContainer.scrollTop = fullTextContainer.scrollHeight;
+          }
+        } catch (e) {
+          console.error("Error parsing message:", e);
+        }
+      };
+
+      dataSocket.onclose = () => {
+        statusDiv.textContent = "Disconnected from STT server.";
+      };
+
+      dataSocket.onerror = (error) => {
+        console.error("WebSocket error:", error);
+        statusDiv.textContent = "Error connecting to the STT server.";
+      };
+    }
+
+    // Start recording audio from the microphone
+    async function startRecording() {
+      try {
+        startButton.disabled = true;
+        stopButton.disabled = false;
+        statusDiv.textContent = "Recording...";
+        transcriptionDiv.textContent = "";
+        fullTextDiv.textContent = "";
+
+        audioContext = new AudioContext();
+        mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
+        const input = audioContext.createMediaStreamSource(mediaStream);
+
+        // Set up processor for audio chunks
+        mediaProcessor = audioContext.createScriptProcessor(1024, 1, 1);
+        mediaProcessor.onaudioprocess = (event) => {
+          const audioData = event.inputBuffer.getChannelData(0);
+          sendAudioChunk(audioData, audioContext.sampleRate);
+        };
+
+        input.connect(mediaProcessor);
+        mediaProcessor.connect(audioContext.destination);
+
+        connectToDataSocket();
+      } catch (error) {
+        console.error("Error accessing microphone:", error);
+        statusDiv.textContent = "Error accessing microphone.";
+        stopRecording();
+      }
+    }
+
+    // Stop recording audio and close resources
+    function stopRecording() {
+      if (mediaProcessor && audioContext) {
+        mediaProcessor.disconnect();
+        audioContext.close();
+      }
+
+      if (mediaStream) {
+        mediaStream.getTracks().forEach(track => track.stop());
+      }
+
+      if (dataSocket) {
+        dataSocket.close();
+      }
+
+      startButton.disabled = false;
+      stopButton.disabled = true;
+      statusDiv.textContent = "Stopped recording.";
+    }
+
+    // Send an audio chunk to the server
+    function sendAudioChunk(audioData, sampleRate) {
+      if (dataSocket && dataSocket.readyState === WebSocket.OPEN) {
+        const float32Array = new Float32Array(audioData);
+        const pcm16Data = new Int16Array(float32Array.length);
+
+        for (let i = 0; i < float32Array.length; i++) {
+          pcm16Data[i] = Math.max(-1, Math.min(1, float32Array[i])) * 0x7FFF;
+        }
+
+        const metadata = JSON.stringify({ sampleRate });
+        const metadataLength = new Uint32Array([metadata.length]);
+        const metadataBuffer = new TextEncoder().encode(metadata);
+
+        const message = new Uint8Array(
+          metadataLength.byteLength + metadataBuffer.byteLength + pcm16Data.byteLength
+        );
+        
+        message.set(new Uint8Array(metadataLength.buffer), 0);
+        message.set(metadataBuffer, metadataLength.byteLength);
+        message.set(new Uint8Array(pcm16Data.buffer), metadataLength.byteLength + metadataBuffer.byteLength);
+
+        dataSocket.send(message);
+      }
+    }
+  </script>
+</body>
+</html>

+ 105 - 0
RealtimeSTT_server/install_packages.py

@@ -0,0 +1,105 @@
+import subprocess
+import sys
+import importlib
+
+def check_and_install_packages(packages):
+    """
+    Checks if the specified packages are installed, and if not, prompts the user
+    to install them.
+
+    Parameters:
+    - packages: A list of dictionaries, each containing:
+        - 'module_name': The module or package name to import.
+        - 'attribute': (Optional) The attribute or class to check within the module.
+        - 'install_name': The name used in the pip install command.
+        - 'version': (Optional) Version constraint for the package.
+    """
+    for package in packages:
+        module_name = package['module_name']
+        attribute = package.get('attribute')
+        install_name = package.get('install_name', module_name)
+        version = package.get('version', '')
+
+        try:
+            # Attempt to import the module
+            module = importlib.import_module(module_name)
+            # If an attribute is specified, check if it exists
+            if attribute:
+                getattr(module, attribute)
+        except (ImportError, AttributeError):
+            user_input = input(
+                f"This program requires '{module_name}'"
+                f"{'' if not attribute else ' with attribute ' + attribute}, which is not installed or missing.\n"
+                f"Do you want to install '{install_name}' now? (y/n): "
+            )
+            if user_input.strip().lower() == 'y':
+                try:
+                    # Build the pip install command
+                    install_command = [sys.executable, "-m", "pip", "install"]
+                    if version:
+                        install_command.append(f"{install_name}{version}")
+                    else:
+                        install_command.append(install_name)
+
+                    subprocess.check_call(install_command)
+                    # Try to import again after installation
+                    module = importlib.import_module(module_name)
+                    if attribute:
+                        getattr(module, attribute)
+                    print(f"Successfully installed '{install_name}'.")
+                except Exception as e:
+                    print(f"An error occurred while installing '{install_name}': {e}")
+                    sys.exit(1)
+            else:
+                print(f"The program requires '{install_name}' to run. Exiting...")
+                sys.exit(1)
+
+
+# import subprocess
+# import sys
+
+# def check_and_install_packages(packages):
+#     """
+#     Checks if the specified packages are installed, and if not, prompts the user
+#     to install them.
+
+#     Parameters:
+#     - packages: A list of dictionaries, each containing:
+#         - 'import_name': The name used in the import statement.
+#         - 'install_name': (Optional) The name used in the pip install command.
+#                           Defaults to 'import_name' if not provided.
+#         - 'version': (Optional) Version constraint for the package.
+#     """
+#     for package in packages:
+#         import_name = package['import_name']
+#         install_name = package.get('install_name', import_name)
+#         version = package.get('version', '')
+
+#         try:
+#             print(f"import {import_name}")
+#             __import__(import_name)
+#             print(f"imported {import_name}")
+#         except ImportError:
+#             user_input = input(
+#                 f"This program requires the '{import_name}' library, which is not installed.\n"
+#                 f"Do you want to install it now? (y/n): "
+#             )
+#             if user_input.strip().lower() == 'y':
+#                 try:
+#                     # Build the pip install command
+#                     install_command = [sys.executable, "-m", "pip", "install"]
+#                     if version:
+#                         install_command.append(f"{install_name}{version}")
+#                     else:
+#                         install_command.append(install_name)
+
+#                     subprocess.check_call(install_command)
+#                     __import__(import_name)
+#                     print(f"Successfully installed '{install_name}'.")
+#                 except Exception as e:
+#                     print(f"An error occurred while installing '{install_name}': {e}")
+#                     sys.exit(1)
+#             else:
+#                 print(f"The program requires the '{import_name}' library to run. Exiting...")
+#                 sys.exit(1)
+

+ 659 - 0
RealtimeSTT_server/stt_cli_client.py

@@ -0,0 +1,659 @@
+"""
+This is a command-line client for the Speech-to-Text (STT) server.
+It records audio from the default input device and sends it to the server for speech recognition.
+It can also process commands to set parameters, get parameter values, or call methods on the server.
+
+Usage:
+    stt [--control-url CONTROL_URL] [--data-url DATA_URL] [--debug] [--norealtime] [--set-param PARAM VALUE] [--call-method METHOD [ARGS ...]] [--get-param PARAM]
+
+Options:
+    - `-c, --control, --control_url`: STT Control WebSocket URL; default `DEFAULT_CONTROL_URL`.
+    - `-d, --data, --data_url`: STT Data WebSocket URL; default `DEFAULT_DATA_URL`.
+    - `-D, --debug`: Enable debug mode.
+    - `-n, --norealtime`: Disable real-time output.
+    - `-W, --write`: Save recorded audio to a WAV file.
+    - `-s, --set`: Set a recorder parameter with format `PARAM VALUE`; can be used multiple times.
+    - `-m, --method`: Call a recorder method with optional arguments; can be used multiple times.
+    - `-g, --get`: Get a recorder parameter's value; can be used multiple times.
+    - `-l, --loop`: Continuously transcribe speech without exiting.
+"""
+
+from urllib.parse import urlparse
+from queue import Queue
+import subprocess
+import threading
+import websocket
+import argparse
+import pyaudio
+import struct
+import socket
+import shutil
+import queue 
+import json
+import time
+import wave
+import sys
+import os
+
+os.environ['ALSA_LOG_LEVEL'] = 'none'
+
+# Constants
+CHUNK = 1024
+FORMAT = pyaudio.paInt16
+CHANNELS = 1
+RATE = 44100
+DEFAULT_CONTROL_URL = "ws://127.0.0.1:8011"
+DEFAULT_DATA_URL = "ws://127.0.0.1:8012"
+
+# Initialize colorama
+from colorama import init, Fore, Style
+init()
+
+# Stop websocket from spamming the log
+websocket.enableTrace(False)
+
+class STTWebSocketClient:
+    def __init__(self, control_url, data_url, debug=False, file_output=None, norealtime=False, writechunks=None, continuous=False):
+        self.control_url = control_url
+        self.data_url = data_url
+        self.control_ws = None
+        self.data_ws_app = None
+        self.data_ws_connected = None
+        self.is_running = True
+        self.debug = debug
+        self.file_output = file_output
+        self.last_text = ""
+        self.console_width = shutil.get_terminal_size().columns
+        self.recording_indicator = "🔴"
+        self.norealtime = norealtime
+        self.connection_established = threading.Event()
+        self.message_queue = Queue()
+        self.commands = Queue()
+        self.stop_event = threading.Event()
+        self.chunks_sent = 0
+        self.last_chunk_time = time.time()
+        self.writechunks = writechunks
+        self.continuous = continuous
+
+        self.debug_print("Initializing STT WebSocket Client")
+        self.debug_print(f"Control URL: {control_url}")
+        self.debug_print(f"Data URL: {data_url}")
+        self.debug_print(f"File Output: {file_output}")
+        self.debug_print(f"No Realtime: {norealtime}")
+        self.debug_print(f"Write Chunks: {writechunks}")
+        self.debug_print(f"Continuous Mode: {continuous}")
+
+        # Audio attributes
+        self.audio_interface = None
+        self.stream = None
+        self.device_sample_rate = None
+        self.input_device_index = None
+
+        # Threads
+        self.control_ws_thread = None
+        self.data_ws_thread = None
+        self.recording_thread = None
+
+
+    def debug_print(self, message):
+        if self.debug:
+            timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
+            thread_name = threading.current_thread().name
+            print(f"{Fore.CYAN}[DEBUG][{timestamp}][{thread_name}] {message}{Style.RESET_ALL}", file=sys.stderr)
+
+    def connect(self):
+        if not self.ensure_server_running():
+            self.debug_print("Cannot start STT server. Exiting.")
+            return False
+
+        try:
+            self.debug_print("Attempting to establish WebSocket connections...")
+
+            # Connect to control WebSocket
+            self.debug_print(f"Connecting to control WebSocket at {self.control_url}")
+            self.control_ws = websocket.WebSocketApp(self.control_url,
+                                                     on_message=self.on_control_message,
+                                                     on_error=self.on_error,
+                                                     on_close=self.on_close,
+                                                     on_open=self.on_control_open)
+
+            self.control_ws_thread = threading.Thread(target=self.control_ws.run_forever)
+            self.control_ws_thread.daemon = False
+            self.debug_print("Starting control WebSocket thread")
+            self.control_ws_thread.start()
+
+            # Connect to data WebSocket
+            self.debug_print(f"Connecting to data WebSocket at {self.data_url}")
+            self.data_ws_app = websocket.WebSocketApp(self.data_url,
+                                                      on_message=self.on_data_message,
+                                                      on_error=self.on_error,
+                                                      on_close=self.on_close,
+                                                      on_open=self.on_data_open)
+
+            self.data_ws_thread = threading.Thread(target=self.data_ws_app.run_forever)
+            self.data_ws_thread.daemon = False
+            self.debug_print("Starting data WebSocket thread")
+            self.data_ws_thread.start()
+
+            self.debug_print("Waiting for connections to be established...")
+            if not self.connection_established.wait(timeout=10):
+                self.debug_print("Timeout while connecting to the server.")
+                return False
+
+            self.debug_print("WebSocket connections established successfully.")
+            return True
+        except Exception as e:
+            self.debug_print(f"Error while connecting to the server: {str(e)}")
+            return False
+
+    def on_control_open(self, ws):
+        self.debug_print("Control WebSocket connection opened successfully")
+        self.connection_established.set()
+        self.start_command_processor()
+
+    def on_data_open(self, ws):
+        self.debug_print("Data WebSocket connection opened successfully")
+        self.data_ws_connected = ws
+        self.start_recording()
+
+    def on_error(self, ws, error):
+        self.debug_print(f"WebSocket error occurred: {str(error)}")
+        self.debug_print(f"Error type: {type(error)}")
+
+    def on_close(self, ws, close_status_code, close_msg):
+        if ws == self.data_ws_connected:
+            self.debug_print(f"Data connection closed (code {close_status_code}, msg: {close_msg})")
+        elif ws == self.control_ws:
+            self.debug_print(f"Control connection closed (code {close_status_code}, msg: {close_msg})")
+        else:
+            self.debug_print(f"Unknown connection closed (code {close_status_code}, msg: {close_msg})")
+
+        self.is_running = False
+        self.stop_event.set()
+
+    def is_server_running(self):
+        parsed_url = urlparse(self.control_url)
+        host = parsed_url.hostname
+        port = parsed_url.port or 80
+        self.debug_print(f"Checking if server is running at {host}:{port}")
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            result = s.connect_ex((host, port)) == 0
+            self.debug_print(f"Server status check result: {'running' if result else 'not running'}")
+            return result
+
+    def ask_to_start_server(self):
+        response = input("Would you like to start the STT server now? (y/n): ").strip().lower()
+        return response == 'y' or response == 'yes'
+
+    def start_server(self):
+        if os.name == 'nt':  # Windows
+            subprocess.Popen('start /min cmd /c stt-server', shell=True)
+        else:  # Unix-like systems
+            terminal_emulators = [
+                'gnome-terminal',
+                'x-terminal-emulator',
+                'konsole',
+                'xfce4-terminal',
+                'lxterminal',
+                'xterm',
+                'mate-terminal',
+                'terminator',
+                'tilix',
+                'alacritty',
+                'urxvt',
+                'eterm',
+                'rxvt',
+                'kitty',
+                'hyper'
+            ]
+
+            terminal = None
+            for term in terminal_emulators:
+                if shutil.which(term):
+                    terminal = term
+                    break
+
+            if terminal:
+                terminal_exec_options = {
+                    'x-terminal-emulator': ['--'],
+                    'gnome-terminal': ['--'],
+                    'mate-terminal': ['--'],
+                    'terminator': ['--'],
+                    'tilix': ['--'],
+                    'konsole': ['-e'],
+                    'xfce4-terminal': ['-e'],
+                    'lxterminal': ['-e'],
+                    'alacritty': ['-e'],
+                    'xterm': ['-e'],
+                    'rxvt': ['-e'],
+                    'urxvt': ['-e'],
+                    'eterm': ['-e'],
+                    'kitty': [],
+                    'hyper': ['--command']
+                }
+
+                exec_option = terminal_exec_options.get(terminal, None)
+                if exec_option is not None:
+                    subprocess.Popen([terminal] + exec_option + ['stt-server'], start_new_session=True)
+                    print(f"STT server started in a new terminal window using {terminal}.", file=sys.stderr)
+                else:
+                    print(f"Unsupported terminal emulator '{terminal}'. Please start the STT server manually.", file=sys.stderr)
+            else:
+                print("No supported terminal emulator found. Please start the STT server manually.", file=sys.stderr)
+
+    def ensure_server_running(self):
+        if not self.is_server_running():
+            print("STT server is not running.", file=sys.stderr)
+            if self.ask_to_start_server():
+                self.start_server()
+                print("Waiting for STT server to start...", file=sys.stderr)
+                for _ in range(20):  # Wait up to 20 seconds
+                    if self.is_server_running():
+                        print("STT server started successfully.", file=sys.stderr)
+                        time.sleep(2)  # Give the server a moment to fully initialize
+                        return True
+                    time.sleep(1)
+                print("Failed to start STT server.", file=sys.stderr)
+                return False
+            else:
+                print("STT server is required. Please start it manually.", file=sys.stderr)
+                return False
+        return True
+
+    def on_control_message(self, ws, message):
+        try:
+            self.debug_print(f"Received control message: {message}")
+            data = json.loads(message)
+            if 'status' in data:
+                self.debug_print(f"Message status: {data['status']}")
+                if data['status'] == 'success':
+                    if 'parameter' in data and 'value' in data:
+                        self.debug_print(f"Parameter update: {data['parameter']} = {data['value']}")
+                        print(f"Parameter {data['parameter']} = {data['value']}")
+                elif data['status'] == 'error':
+                    self.debug_print(f"Server error received: {data.get('message', '')}")
+                    print(f"Server Error: {data.get('message', '')}")
+            else:
+                self.debug_print(f"Unknown control message format: {data}")
+        except json.JSONDecodeError:
+            self.debug_print(f"Failed to decode JSON control message: {message}")
+        except Exception as e:
+            self.debug_print(f"Error processing control message: {str(e)}")
+
+    def on_data_message(self, ws, message):
+        try:
+            self.debug_print(f"Received data message: {message}")
+            data = json.loads(message)
+            message_type = data.get('type')
+            self.debug_print(f"Message type: {message_type}")
+
+            if message_type == 'realtime':
+                if data['text'] != self.last_text:
+                    self.debug_print(f"New realtime text received: {data['text']}")
+                    self.last_text = data['text']
+                    if not self.norealtime:
+                        self.update_progress_bar(self.last_text)
+
+            elif message_type == 'fullSentence':
+                self.debug_print(f"Full sentence received: {data['text']}")
+                if self.file_output:
+                    self.debug_print("Writing to file output")
+                    sys.stderr.write('\r\033[K')
+                    sys.stderr.write(data['text'])
+                    sys.stderr.write('\n')
+                    sys.stderr.flush()
+                    print(data['text'], file=self.file_output)
+                    self.file_output.flush()
+                else:
+                    self.finish_progress_bar()
+                    print(f"{data['text']}")
+
+                if not self.continuous:                    
+                    self.is_running = False
+                    self.stop_event.set()
+
+            elif message_type in {
+                'vad_detect_start',
+                'vad_detect_stop',
+                'recording_start',
+                'recording_stop',
+                'wakeword_detected',
+                'wakeword_detection_start',
+                'wakeword_detection_end',
+                'transcription_start'}:
+                pass  # Known message types, no action needed
+            else:
+                self.debug_print(f"Other message type received: {message_type}")
+
+        except json.JSONDecodeError:
+            self.debug_print(f"Failed to decode JSON data message: {message}")
+        except Exception as e:
+            self.debug_print(f"Error processing data message: {str(e)}")
+
+    def show_initial_indicator(self):
+        if self.norealtime:
+            return
+        initial_text = f"{self.recording_indicator}\b\b"
+        sys.stderr.write(initial_text)
+        sys.stderr.flush()
+
+    def update_progress_bar(self, text):
+        try:
+            available_width = self.console_width - 5  # Adjust for progress bar decorations
+            sys.stderr.write('\r\033[K')  # Clear the current line
+            words = text.split()
+            last_chars = ""
+            for word in reversed(words):
+                if len(last_chars) + len(word) + 1 > available_width:
+                    break
+                last_chars = word + " " + last_chars
+            last_chars = last_chars.strip()
+            colored_text = f"{Fore.YELLOW}{last_chars}{Style.RESET_ALL}{self.recording_indicator}\b\b"
+            sys.stderr.write(colored_text)
+            sys.stderr.flush()
+        except Exception as e:
+            self.debug_print(f"Error updating progress bar: {e}")
+
+    def finish_progress_bar(self):
+        try:
+            sys.stderr.write('\r\033[K')
+            sys.stderr.flush()
+        except Exception as e:
+            self.debug_print(f"Error finishing progress bar: {e}")
+
+    def stop(self):
+        self.finish_progress_bar()
+        self.is_running = False
+        self.stop_event.set()
+        self.debug_print("Stopping client and cleaning up resources.")
+        if self.control_ws:
+            self.control_ws.close()
+        if self.data_ws_connected:
+            self.data_ws_connected.close()
+
+        # Join threads to ensure they finish before exiting
+        current_thread = threading.current_thread()
+        if self.control_ws_thread and self.control_ws_thread != current_thread:
+            self.control_ws_thread.join()
+        if self.data_ws_thread and self.data_ws_thread != current_thread:
+            self.data_ws_thread.join()
+        if self.recording_thread and self.recording_thread != current_thread:
+            self.recording_thread.join()
+
+        # Clean up audio resources
+        if self.stream:
+            self.stream.stop_stream()
+            self.stream.close()
+        if self.audio_interface:
+            self.audio_interface.terminate()
+
+    def start_recording(self):
+        self.recording_thread = threading.Thread(target=self.record_and_send_audio)
+        self.recording_thread.daemon = False  # Set to False to ensure proper shutdown
+        self.recording_thread.start()
+
+    def record_and_send_audio(self):
+        try:
+            if not self.setup_audio():
+                self.debug_print("Failed to set up audio recording")
+                raise Exception("Failed to set up audio recording.")
+
+            # Initialize WAV file writer if writechunks is provided
+            if self.writechunks:
+                self.wav_file = wave.open(self.writechunks, 'wb')
+                self.wav_file.setnchannels(CHANNELS)
+                self.wav_file.setsampwidth(pyaudio.get_sample_size(FORMAT))
+                self.wav_file.setframerate(self.device_sample_rate)  # Use self.device_sample_rate
+
+            self.debug_print("Starting audio recording and transmission")
+            self.show_initial_indicator()
+
+            while self.is_running and not self.stop_event.is_set():
+                try:
+                    audio_data = self.stream.read(CHUNK)
+                    self.chunks_sent += 1
+                    current_time = time.time()
+                    elapsed = current_time - self.last_chunk_time
+
+                    # Write to WAV file if enabled
+                    if self.writechunks:
+                        self.wav_file.writeframes(audio_data)
+
+                    if self.chunks_sent % 100 == 0:  # Log every 100 chunks
+                        self.debug_print(f"Sent {self.chunks_sent} chunks. Last chunk took {elapsed:.3f}s")
+
+                    metadata = {"sampleRate": self.device_sample_rate}
+                    metadata_json = json.dumps(metadata)
+                    metadata_length = len(metadata_json)
+                    message = struct.pack('<I', metadata_length) + metadata_json.encode('utf-8') + audio_data
+
+                    if self.is_running and not self.stop_event.is_set():
+                        self.debug_print(f"Sending audio chunk {self.chunks_sent}: {len(audio_data)} bytes, metadata: {metadata_json}")
+                        self.data_ws_connected.send(message, opcode=websocket.ABNF.OPCODE_BINARY)
+
+                    self.last_chunk_time = current_time
+
+                except Exception as e:
+                    self.debug_print(f"Error sending audio data: {str(e)}")
+                    break
+
+        except Exception as e:
+            self.debug_print(f"Error in record_and_send_audio: {str(e)}")
+        finally:
+            self.cleanup_audio()
+
+    def setup_audio(self):
+        try:
+            self.debug_print("Initializing PyAudio interface")
+            self.audio_interface = pyaudio.PyAudio()
+            self.input_device_index = None
+
+            try:
+                default_device = self.audio_interface.get_default_input_device_info()
+                self.input_device_index = default_device['index']
+                self.debug_print(f"Default input device found: {default_device}")
+            except OSError as e:
+                self.debug_print(f"No default input device found: {str(e)}")
+                return False
+
+            self.device_sample_rate = 16000
+            self.debug_print(f"Attempting to open audio stream with sample rate {self.device_sample_rate} Hz")
+
+            try:
+                self.stream = self.audio_interface.open(
+                    format=FORMAT,
+                    channels=CHANNELS,
+                    rate=self.device_sample_rate,
+                    input=True,
+                    frames_per_buffer=CHUNK,
+                    input_device_index=self.input_device_index,
+                )
+                self.debug_print(f"Audio stream initialized successfully")
+                self.debug_print(f"Audio parameters: rate={self.device_sample_rate}, channels={CHANNELS}, format={FORMAT}, chunk={CHUNK}")
+                return True
+            except Exception as e:
+                self.debug_print(f"Failed to initialize audio stream: {str(e)}")
+                return False
+
+        except Exception as e:
+            self.debug_print(f"Error in setup_audio: {str(e)}")
+            if self.audio_interface:
+                self.audio_interface.terminate()
+            return False
+
+    def cleanup_audio(self):
+        self.debug_print("Cleaning up audio resources")
+        try:
+            if self.stream:
+                self.debug_print("Stopping and closing audio stream")
+                self.stream.stop_stream()
+                self.stream.close()
+                self.stream = None
+            if self.audio_interface:
+                self.debug_print("Terminating PyAudio interface")
+                self.audio_interface.terminate()
+                self.audio_interface = None
+            if self.writechunks and self.wav_file:
+                self.debug_print("Closing WAV file")
+                self.wav_file.close()
+        except Exception as e:
+            self.debug_print(f"Error during audio cleanup: {str(e)}")
+
+    def set_parameter(self, parameter, value):
+        command = {
+            "command": "set_parameter",
+            "parameter": parameter,
+            "value": value
+        }
+        self.control_ws.send(json.dumps(command))
+
+    def get_parameter(self, parameter):
+        command = {
+            "command": "get_parameter",
+            "parameter": parameter
+        }
+        self.control_ws.send(json.dumps(command))
+
+    def call_method(self, method, args=None, kwargs=None):
+        command = {
+            "command": "call_method",
+            "method": method,
+            "args": args or [],
+            "kwargs": kwargs or {}
+        }
+        self.control_ws.send(json.dumps(command))
+
+    def start_command_processor(self):
+        self.command_thread = threading.Thread(target=self.command_processor)
+        self.command_thread.daemon = False  # Ensure it is not a daemon thread
+        self.command_thread.start()
+
+
+    def command_processor(self):
+        self.debug_print("Starting command processor thread")
+        while not self.stop_event.is_set():
+            try:
+                command = self.commands.get(timeout=0.1)
+                self.debug_print(f"Processing command: {command}")
+                if command['type'] == 'set_parameter':
+                    self.debug_print(f"Setting parameter: {command['parameter']} = {command['value']}")
+                    self.set_parameter(command['parameter'], command['value'])
+                elif command['type'] == 'get_parameter':
+                    self.debug_print(f"Getting parameter: {command['parameter']}")
+                    self.get_parameter(command['parameter'])
+                elif command['type'] == 'call_method':
+                    self.debug_print(f"Calling method: {command['method']} with args: {command.get('args')} and kwargs: {command.get('kwargs')}")
+                    self.call_method(command['method'], command.get('args'), command.get('kwargs'))
+            except queue.Empty:
+                continue
+            except Exception as e:
+                self.debug_print(f"Error in command processor: {str(e)}")
+
+        self.debug_print("Command processor thread stopping")
+
+    def add_command(self, command):
+        self.commands.put(command)
+
+def main():
+    parser = argparse.ArgumentParser(description="STT Client")
+
+    parser.add_argument("-c", "--control", "--control_url", default=DEFAULT_CONTROL_URL,
+                        help="STT Control WebSocket URL")
+    parser.add_argument("-d", "--data", "--data_url", default=DEFAULT_DATA_URL,
+                        help="STT Data WebSocket URL")
+    parser.add_argument("-D", "--debug", action="store_true",
+                        help="Enable debug mode")
+    parser.add_argument("-n", "--norealtime", action="store_true",
+                        help="Disable real-time output")
+    parser.add_argument("-W", "--write", metavar="FILE",
+                        help="Save recorded audio to a WAV file")
+    parser.add_argument("-s", "--set", nargs=2, metavar=('PARAM', 'VALUE'), action='append',
+                        help="Set a recorder parameter (can be used multiple times)")
+    parser.add_argument("-m", "--method", nargs='+', metavar='METHOD', action='append',
+                        help="Call a recorder method with optional arguments")
+    parser.add_argument("-g", "--get", nargs=1, metavar='PARAM', action='append',
+                        help="Get a recorder parameter's value (can be used multiple times)")
+    parser.add_argument("-l", "--loop", action="store_true",
+                        help="Continuously transcribe speech without exiting")
+    
+    args = parser.parse_args()
+
+    # Check if output is being redirected
+    if not os.isatty(sys.stdout.fileno()):
+        file_output = sys.stdout
+    else:
+        file_output = None
+
+
+    client = STTWebSocketClient(
+        args.control,
+        args.data,
+        args.debug,
+        file_output,
+        args.norealtime,  # Adjusted logic for real-time output
+        args.write,
+        continuous=args.loop
+    )
+
+    def signal_handler(sig, frame):
+        client.stop()
+        sys.exit(0)
+
+    import signal
+    signal.signal(signal.SIGINT, signal_handler)
+
+    try:
+        if client.connect():
+            # Process command-line parameters
+            if args.set:
+                for param, value in args.set:
+                    try:
+                        if '.' in value:
+                            value = float(value)
+                        else:
+                            value = int(value)
+                    except ValueError:
+                        pass  # Keep as string if not a number
+
+                    client.add_command({
+                        'type': 'set_parameter',
+                        'parameter': param,
+                        'value': value
+                    })
+
+            if args.get:
+                for param_list in args.get:
+                    param = param_list[0]
+                    client.add_command({
+                        'type': 'get_parameter',
+                        'parameter': param
+                    })
+
+            if args.method:
+                for method_call in args.method:
+                    method = method_call[0]
+                    args_list = method_call[1:] if len(method_call) > 1 else []
+                    client.add_command({
+                        'type': 'call_method',
+                        'method': method,
+                        'args': args_list
+                    })
+
+            # If command-line parameters were used (like --get-param), wait for them to be processed
+            if args.set or args.get or args.method:
+                while not client.commands.empty():
+                    time.sleep(0.1)
+
+            # Start recording directly if no command-line params were provided
+            while client.is_running:
+                time.sleep(0.1)
+
+        else:
+            print("Failed to connect to the server.", file=sys.stderr)
+    except Exception as e:
+        print(f"An error occurred: {e}")
+    finally:
+        client.stop()
+
+if __name__ == "__main__":
+    main()

+ 805 - 0
RealtimeSTT_server/stt_server.py

@@ -0,0 +1,805 @@
+"""
+Speech-to-Text (STT) Server with Real-Time Transcription and WebSocket Interface
+
+This server provides real-time speech-to-text (STT) transcription using the RealtimeSTT library. It allows clients to connect via WebSocket to send audio data and receive real-time transcription updates. The server supports configurable audio recording parameters, voice activity detection (VAD), and wake word detection. It is designed to handle continuous transcription as well as post-recording processing, enabling real-time feedback with the option to improve final transcription quality after the complete sentence is recognized.
+
+### Features:
+- Real-time transcription using pre-configured or user-defined STT models.
+- WebSocket-based communication for control and data handling.
+- Flexible recording and transcription options, including configurable pauses for sentence detection.
+- Supports Silero and WebRTC VAD for robust voice activity detection.
+
+### Starting the Server:
+You can start the server using the command-line interface (CLI) command `stt-server`, passing the desired configuration options.
+
+```bash
+stt-server [OPTIONS]
+```
+
+### Available Parameters:
+    - `-m, --model`: Model path or size; default 'large-v2'.
+    - `-r, --rt-model, --realtime_model_type`: Real-time model size; default 'tiny.en'.
+    - `-l, --lang, --language`: Language code for transcription; default 'en'.
+    - `-i, --input-device, --input_device_index`: Audio input device index; default 1.
+    - `-c, --control, --control_port`: WebSocket control port; default 8011.
+    - `-d, --data, --data_port`: WebSocket data port; default 8012.
+    - `-w, --wake_words`: Wake word(s) to trigger listening; default "".
+    - `-D, --debug`: Enable debug logging.
+    - `-W, --write`: Save audio to WAV file.
+    - `-s, --silence_timing`: Enable dynamic silence duration for sentence detection; default True. 
+    - `--silero_sensitivity`: Silero VAD sensitivity (0-1); default 0.05.
+    - `--silero_use_onnx`: Use Silero ONNX model; default False.
+    - `--webrtc_sensitivity`: WebRTC VAD sensitivity (0-3); default 3.
+    - `--min_length_of_recording`: Minimum recording duration in seconds; default 1.1.
+    - `--min_gap_between_recordings`: Min time between recordings in seconds; default 0.
+    - `--enable_realtime_transcription`: Enable real-time transcription; default True.
+    - `--realtime_processing_pause`: Pause between audio chunk processing; default 0.02.
+    - `--silero_deactivity_detection`: Use Silero for end-of-speech detection; default True.
+    - `--early_transcription_on_silence`: Start transcription after silence in seconds; default 0.2.
+    - `--beam_size`: Beam size for main model; default 5.
+    - `--beam_size_realtime`: Beam size for real-time model; default 3.
+    - `--initial_prompt`: Initial transcription guidance prompt.
+    - `--end_of_sentence_detection_pause`: Silence duration for sentence end detection; default 0.45.
+    - `--unknown_sentence_detection_pause`: Pause duration for incomplete sentence detection; default 0.7.
+    - `--mid_sentence_detection_pause`: Pause for mid-sentence break; default 2.0.
+    - `--wake_words_sensitivity`: Wake word detection sensitivity (0-1); default 0.5.
+    - `--wake_word_timeout`: Wake word timeout in seconds; default 5.0.
+    - `--wake_word_activation_delay`: Delay before wake word activation; default 20.
+    - `--wakeword_backend`: Backend for wake word detection; default 'none'.
+    - `--openwakeword_model_paths`: Paths to OpenWakeWord models.
+    - `--openwakeword_inference_framework`: OpenWakeWord inference framework; default 'tensorflow'.
+    - `--wake_word_buffer_duration`: Wake word buffer duration in seconds; default 1.0.
+    - `--use_main_model_for_realtime`: Use main model for real-time transcription.
+    - `--use_extended_logging`: Enable extensive log messages.
+    - `--logchunks`: Log incoming audio chunks.
+
+### WebSocket Interface:
+The server supports two WebSocket connections:
+1. **Control WebSocket**: Used to send and receive commands, such as setting parameters or calling recorder methods.
+2. **Data WebSocket**: Used to send audio data for transcription and receive real-time transcription updates.
+
+The server will broadcast real-time transcription updates to all connected clients on the data WebSocket.
+"""
+
+from .install_packages import check_and_install_packages
+from difflib import SequenceMatcher
+from collections import deque
+from datetime import datetime
+import logging
+import asyncio
+import pyaudio
+import sys
+
+
+debug_logging = False
+extended_logging = False
+send_recorded_chunk = False
+log_incoming_chunks = False
+silence_timing = False
+writechunks = False#
+wav_file = None
+
+hard_break_even_on_background_noise = 3.0
+hard_break_even_on_background_noise_min_texts = 3
+hard_break_even_on_background_noise_min_similarity = 0.99
+hard_break_even_on_background_noise_min_chars = 15
+
+
+text_time_deque = deque()
+loglevel = logging.WARNING
+
+FORMAT = pyaudio.paInt16
+CHANNELS = 1
+
+
+if sys.platform == 'win32':
+    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
+
+
+check_and_install_packages([
+    {
+        'module_name': 'RealtimeSTT',                 # Import module
+        'attribute': 'AudioToTextRecorder',           # Specific class to check
+        'install_name': 'RealtimeSTT',                # Package name for pip install
+    },
+    {
+        'module_name': 'websockets',                  # Import module
+        'install_name': 'websockets',                 # Package name for pip install
+    },
+    {
+        'module_name': 'numpy',                       # Import module
+        'install_name': 'numpy',                      # Package name for pip install
+    },
+    {
+        'module_name': 'scipy.signal',                # Submodule of scipy
+        'attribute': 'resample',                      # Specific function to check
+        'install_name': 'scipy',                      # Package name for pip install
+    }
+])
+
+# Define ANSI color codes for terminal output
+class bcolors:
+    HEADER = '\033[95m'   # Magenta
+    OKBLUE = '\033[94m'   # Blue
+    OKCYAN = '\033[96m'   # Cyan
+    OKGREEN = '\033[92m'  # Green
+    WARNING = '\033[93m'  # Yellow
+    FAIL = '\033[91m'     # Red
+    ENDC = '\033[0m'      # Reset to default
+    BOLD = '\033[1m'
+    UNDERLINE = '\033[4m'
+
+print(f"{bcolors.BOLD}{bcolors.OKCYAN}Starting server, please wait...{bcolors.ENDC}")
+
+# Initialize colorama
+from colorama import init, Fore, Style
+init()
+
+from RealtimeSTT import AudioToTextRecorder
+from scipy.signal import resample
+import numpy as np
+import websockets
+import threading
+import logging
+import wave
+import json
+import time
+
+global_args = None
+recorder = None
+recorder_config = {}
+recorder_ready = threading.Event()
+recorder_thread = None
+stop_recorder = False
+prev_text = ""
+
+# Define allowed methods and parameters for security
+allowed_methods = [
+    'set_microphone',
+    'abort',
+    'stop',
+    'clear_audio_queue',
+    'wakeup',
+    'shutdown',
+    'text',
+]
+allowed_parameters = [
+    'silero_sensitivity',
+    'wake_word_activation_delay',
+    'post_speech_silence_duration',
+    'listen_start',
+    'recording_stop_time',
+    'last_transcription_bytes',
+    'last_transcription_bytes_b64',
+]
+
+# Queues and connections for control and data
+control_connections = set()
+data_connections = set()
+control_queue = asyncio.Queue()
+audio_queue = asyncio.Queue()
+
+def preprocess_text(text):
+    # Remove leading whitespaces
+    text = text.lstrip()
+
+    # Remove starting ellipses if present
+    if text.startswith("..."):
+        text = text[3:]
+
+    if text.endswith("...'."):
+        text = text[:-1]
+
+    if text.endswith("...'"):
+        text = text[:-1]
+
+    # Remove any leading whitespaces again after ellipses removal
+    text = text.lstrip()
+
+    # Uppercase the first letter
+    if text:
+        text = text[0].upper() + text[1:]
+    
+    return text
+
+def debug_print(message):
+    if debug_logging:
+        timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
+        thread_name = threading.current_thread().name
+        print(f"{Fore.CYAN}[DEBUG][{timestamp}][{thread_name}] {message}{Style.RESET_ALL}", file=sys.stderr)
+
+def text_detected(text, loop):
+    global prev_text
+
+    text = preprocess_text(text)
+
+    if silence_timing:
+        def ends_with_ellipsis(text: str):
+            if text.endswith("..."):
+                return True
+            if len(text) > 1 and text[:-1].endswith("..."):
+                return True
+            return False
+
+        def sentence_end(text: str):
+            sentence_end_marks = ['.', '!', '?', '。']
+            if text and text[-1] in sentence_end_marks:
+                return True
+            return False
+
+
+        if ends_with_ellipsis(text):
+            recorder.post_speech_silence_duration = global_args.mid_sentence_detection_pause
+        elif sentence_end(text) and sentence_end(prev_text) and not ends_with_ellipsis(prev_text):
+            recorder.post_speech_silence_duration = global_args.end_of_sentence_detection_pause
+        else:
+            recorder.post_speech_silence_duration = global_args.unknown_sentence_detection_pause
+
+
+        # Append the new text with its timestamp
+        current_time = time.time()
+        text_time_deque.append((current_time, text))
+
+        # Remove texts older than hard_break_even_on_background_noise seconds
+        while text_time_deque and text_time_deque[0][0] < current_time - hard_break_even_on_background_noise:
+            text_time_deque.popleft()
+
+        # Check if at least hard_break_even_on_background_noise_min_texts texts have arrived within the last hard_break_even_on_background_noise seconds
+        if len(text_time_deque) >= hard_break_even_on_background_noise_min_texts:
+            texts = [t[1] for t in text_time_deque]
+            first_text = texts[0]
+            last_text = texts[-1]
+
+            # Compute the similarity ratio between the first and last texts
+            similarity = SequenceMatcher(None, first_text, last_text).ratio()
+
+            if similarity > hard_break_even_on_background_noise_min_similarity and len(first_text) > hard_break_even_on_background_noise_min_chars:
+                recorder.stop()
+                recorder.clear_audio_queue()
+                prev_text = ""
+
+    prev_text = text
+
+    # Put the message in the audio queue to be sent to clients
+    message = json.dumps({
+        'type': 'realtime',
+        'text': text
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+    # Get current timestamp in HH:MM:SS.nnn format
+    timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+
+    if extended_logging:
+        print(f"  [{timestamp}] Realtime text: {bcolors.OKCYAN}{text}{bcolors.ENDC}\n", flush=True, end="")
+    else:
+        print(f"\r[{timestamp}] {bcolors.OKCYAN}{text}{bcolors.ENDC}", flush=True, end='')
+
+def on_recording_start(loop):
+    # Send a message to the client indicating recording has started
+    message = json.dumps({
+        'type': 'recording_start'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_recording_stop(loop):
+    # Send a message to the client indicating recording has stopped
+    message = json.dumps({
+        'type': 'recording_stop'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_vad_detect_start(loop):
+    message = json.dumps({
+        'type': 'vad_detect_start'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_vad_detect_stop(loop):
+    message = json.dumps({
+        'type': 'vad_detect_stop'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_wakeword_detected(loop):
+    # Send a message to the client when wake word detection starts
+    message = json.dumps({
+        'type': 'wakeword_detected'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_wakeword_detection_start(loop):
+    # Send a message to the client when wake word detection starts
+    message = json.dumps({
+        'type': 'wakeword_detection_start'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_wakeword_detection_end(loop):
+    # Send a message to the client when wake word detection ends
+    message = json.dumps({
+        'type': 'wakeword_detection_end'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+def on_transcription_start(loop):
+    # Send a message to the client when transcription starts
+    message = json.dumps({
+        'type': 'transcription_start'
+    })
+    asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+# def on_realtime_transcription_update(text, loop):
+#     # Send real-time transcription updates to the client
+#     text = preprocess_text(text)
+#     message = json.dumps({
+#         'type': 'realtime_update',
+#         'text': text
+#     })
+#     asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+# def on_recorded_chunk(chunk, loop):
+#     if send_recorded_chunk:
+#         bytes_b64 = base64.b64encode(chunk.tobytes()).decode('utf-8')
+#         message = json.dumps({
+#             'type': 'recorded_chunk',
+#             'bytes': bytes_b64
+#         })
+#         asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+# Define the server's arguments
+def parse_arguments():
+    global debug_logging, extended_logging, loglevel, writechunks, log_incoming_chunks, dynamic_silence_timing
+
+    import argparse
+    parser = argparse.ArgumentParser(description='Start the Speech-to-Text (STT) server with various configuration options.')
+
+    parser.add_argument('-m', '--model', type=str, default='large-v2',
+                        help='Path to the STT model or model size. Options include: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, or any huggingface CTranslate2 STT model such as deepdml/faster-whisper-large-v3-turbo-ct2. Default is large-v2.')
+
+    parser.add_argument('-r', '--rt-model', '--realtime_model_type', type=str, default='tiny.en',
+                        help='Model size for real-time transcription. Options same as --model.  This is used only if real-time transcription is enabled (enable_realtime_transcription). Default is tiny.en.')
+    
+    parser.add_argument('-l', '--lang', '--language', type=str, default='en',
+                help='Language code for the STT model to transcribe in a specific language. Leave this empty for auto-detection based on input audio. Default is en. List of supported language codes: https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L11-L110')
+
+    parser.add_argument('-i', '--input-device', '--input_device_index', type=int, default=1,
+                    help='Index of the audio input device to use. Use this option to specify a particular microphone or audio input device based on your system. Default is 1.')
+
+    parser.add_argument('-c', '--control', '--control_port', type=int, default=8011,
+                        help='The port number used for the control WebSocket connection. Control connections are used to send and receive commands to the server. Default is port 8011.')
+
+    parser.add_argument('-d', '--data', '--data_port', type=int, default=8012,
+                        help='The port number used for the data WebSocket connection. Data connections are used to send audio data and receive transcription updates in real time. Default is port 8012.')
+
+    parser.add_argument('-w', '--wake_words', type=str, default="",
+                        help='Specify the wake word(s) that will trigger the server to start listening. For example, setting this to "Jarvis" will make the system start transcribing when it detects the wake word "Jarvis". Default is "Jarvis".')
+
+    parser.add_argument('-D', '--debug', action='store_true', help='Enable debug logging for detailed server operations')
+
+    parser.add_argument("-W", "--write", metavar="FILE",
+                        help="Save received audio to a WAV file")
+
+    parser.add_argument('-s', '--silence_timing', action='store_true', default=True,
+                    help='Enable dynamic adjustment of silence duration for sentence detection. Adjusts post-speech silence duration based on detected sentence structure and punctuation. Default is False.')
+
+    parser.add_argument('--silero_sensitivity', type=float, default=0.05,
+                        help='Sensitivity level for Silero Voice Activity Detection (VAD), with a range from 0 to 1. Lower values make the model less sensitive, useful for noisy environments. Default is 0.05.')
+
+    parser.add_argument('--silero_use_onnx', action='store_true', default=False,
+                        help='Enable ONNX version of Silero model for faster performance with lower resource usage. Default is False.')
+
+    parser.add_argument('--webrtc_sensitivity', type=int, default=3,
+                        help='Sensitivity level for WebRTC Voice Activity Detection (VAD), with a range from 0 to 3. Higher values make the model less sensitive, useful for cleaner environments. Default is 3.')
+
+    parser.add_argument('--min_length_of_recording', type=float, default=1.1,
+                        help='Minimum duration of valid recordings in seconds. This prevents very short recordings from being processed, which could be caused by noise or accidental sounds. Default is 1.1 seconds.')
+
+    parser.add_argument('--min_gap_between_recordings', type=float, default=0,
+                        help='Minimum time (in seconds) between consecutive recordings. Setting this helps avoid overlapping recordings when there’s a brief silence between them. Default is 0 seconds.')
+
+    parser.add_argument('--enable_realtime_transcription', action='store_true', default=True,
+                        help='Enable continuous real-time transcription of audio as it is received. When enabled, transcriptions are sent in near real-time. Default is True.')
+
+    parser.add_argument('--realtime_processing_pause', type=float, default=0.02,
+                        help='Time interval (in seconds) between processing audio chunks for real-time transcription. Lower values increase responsiveness but may put more load on the CPU. Default is 0.02 seconds.')
+
+    parser.add_argument('--silero_deactivity_detection', action='store_true', default=True,
+                        help='Use the Silero model for end-of-speech detection. This option can provide more robust silence detection in noisy environments, though it consumes more GPU resources. Default is True.')
+
+    parser.add_argument('--early_transcription_on_silence', type=float, default=0.2,
+                        help='Start transcription after the specified seconds of silence. This is useful when you want to trigger transcription mid-speech when there is a brief pause. Should be lower than post_speech_silence_duration. Set to 0 to disable. Default is 0.2 seconds.')
+
+    parser.add_argument('--beam_size', type=int, default=5,
+                        help='Beam size for the main transcription model. Larger values may improve transcription accuracy but increase the processing time. Default is 5.')
+
+    parser.add_argument('--beam_size_realtime', type=int, default=3,
+                        help='Beam size for the real-time transcription model. A smaller beam size allows for faster real-time processing but may reduce accuracy. Default is 3.')
+
+    # parser.add_argument('--initial_prompt', type=str,
+    #                     default='End incomplete sentences with ellipses.\nExamples:\nComplete: The sky is blue.\nIncomplete: When the sky...\nComplete: She walked home.\nIncomplete: Because he...',
+    #                     help='Initial prompt that guides the transcription model to produce transcriptions in a particular style or format. The default provides instructions for handling sentence completions and ellipsis usage.')
+    
+    parser.add_argument('--initial_prompt', type=str,
+                        default="Incomplete thoughts should end with '...'. Examples of complete thoughts: 'The sky is blue.' 'She walked home.' Examples of incomplete thoughts: 'When the sky...' 'Because he...'",
+                        help='Initial prompt that guides the transcription model to produce transcriptions in a particular style or format. The default provides instructions for handling sentence completions and ellipsis usage.')
+    
+
+    parser.add_argument('--end_of_sentence_detection_pause', type=float, default=0.45,
+                        help='The duration of silence (in seconds) that the model should interpret as the end of a sentence. This helps the system detect when to finalize the transcription of a sentence. Default is 0.45 seconds.')
+
+    parser.add_argument('--unknown_sentence_detection_pause', type=float, default=0.7,
+                        help='The duration of pause (in seconds) that the model should interpret as an incomplete or unknown sentence. This is useful for identifying when a sentence is trailing off or unfinished. Default is 0.7 seconds.')
+
+    parser.add_argument('--mid_sentence_detection_pause', type=float, default=2.0,
+                        help='The duration of pause (in seconds) that the model should interpret as a mid-sentence break. Longer pauses can indicate a pause in speech but not necessarily the end of a sentence. Default is 2.0 seconds.')
+
+    parser.add_argument('--wake_words_sensitivity', type=float, default=0.5,
+                        help='Sensitivity level for wake word detection, with a range from 0 (most sensitive) to 1 (least sensitive). Adjust this value based on your environment to ensure reliable wake word detection. Default is 0.5.')
+
+    parser.add_argument('--wake_word_timeout', type=float, default=5.0,
+                        help='Maximum time in seconds that the system will wait for a wake word before timing out. After this timeout, the system stops listening for wake words until reactivated. Default is 5.0 seconds.')
+
+    parser.add_argument('--wake_word_activation_delay', type=float, default=20,
+                        help='The delay in seconds before the wake word detection is activated after the system starts listening. This prevents false positives during the start of a session. Default is 0.5 seconds.')
+
+    parser.add_argument('--wakeword_backend', type=str, default='none',
+                        help='The backend used for wake word detection. You can specify different backends such as "default" or any custom implementations depending on your setup. Default is "pvporcupine".')
+
+    parser.add_argument('--openwakeword_model_paths', type=str, nargs='*',
+                        help='A list of file paths to OpenWakeWord models. This is useful if you are using OpenWakeWord for wake word detection and need to specify custom models.')
+
+    parser.add_argument('--openwakeword_inference_framework', type=str, default='tensorflow',
+                        help='The inference framework to use for OpenWakeWord models. Supported frameworks could include "tensorflow", "pytorch", etc. Default is "tensorflow".')
+
+    parser.add_argument('--wake_word_buffer_duration', type=float, default=1.0,
+                        help='Duration of the buffer in seconds for wake word detection. This sets how long the system will store the audio before and after detecting the wake word. Default is 1.0 seconds.')
+
+    parser.add_argument('--use_main_model_for_realtime', action='store_true',
+                        help='Enable this option if you want to use the main model for real-time transcription, instead of the smaller, faster real-time model. Using the main model may provide better accuracy but at the cost of higher processing time.')
+
+    parser.add_argument('--use_extended_logging', action='store_true',
+                        help='Writes extensive log messages for the recording worker, that processes the audio chunks.')
+
+    parser.add_argument('--logchunks', action='store_true', help='Enable logging of incoming audio chunks (periods)')
+
+    # Parse arguments
+    args = parser.parse_args()
+
+    debug_logging = args.debug
+    extended_logging = args.use_extended_logging
+    writechunks = args.write
+    log_incoming_chunks = args.logchunks
+    dynamic_silence_timing = args.silence_timing
+
+    if debug_logging:
+        loglevel = logging.DEBUG
+        logging.basicConfig(level=loglevel, format='[%(asctime)s] %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
+    else:
+        loglevel = logging.WARNING
+
+
+    # Replace escaped newlines with actual newlines in initial_prompt
+    if args.initial_prompt:
+        args.initial_prompt = args.initial_prompt.replace("\\n", "\n")
+
+    return args
+
+def _recorder_thread(loop):
+    global recorder, stop_recorder
+    print(f"{bcolors.OKGREEN}Initializing RealtimeSTT server with parameters:{bcolors.ENDC}")
+    for key, value in recorder_config.items():
+        print(f"    {bcolors.OKBLUE}{key}{bcolors.ENDC}: {value}")
+    recorder = AudioToTextRecorder(**recorder_config)
+    print(f"{bcolors.OKGREEN}{bcolors.BOLD}RealtimeSTT initialized{bcolors.ENDC}")
+    recorder_ready.set()
+    
+    def process_text(full_sentence):
+        global prev_text
+        prev_text = ""
+        full_sentence = preprocess_text(full_sentence)
+        message = json.dumps({
+            'type': 'fullSentence',
+            'text': full_sentence
+        })
+        # Use the passed event loop here
+        asyncio.run_coroutine_threadsafe(audio_queue.put(message), loop)
+
+        timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+
+        if extended_logging:
+            print(f"  [{timestamp}] Full text: {bcolors.BOLD}Sentence:{bcolors.ENDC} {bcolors.OKGREEN}{full_sentence}{bcolors.ENDC}\n", flush=True, end="")
+        else:
+            print(f"\r[{timestamp}] {bcolors.BOLD}Sentence:{bcolors.ENDC} {bcolors.OKGREEN}{full_sentence}{bcolors.ENDC}\n")
+    try:
+        while not stop_recorder:
+            recorder.text(process_text)
+    except KeyboardInterrupt:
+        print(f"{bcolors.WARNING}Exiting application due to keyboard interrupt{bcolors.ENDC}")
+
+def decode_and_resample(
+        audio_data,
+        original_sample_rate,
+        target_sample_rate):
+
+    # Decode 16-bit PCM data to numpy array
+    if original_sample_rate == target_sample_rate:
+        return audio_data
+
+    audio_np = np.frombuffer(audio_data, dtype=np.int16)
+
+    # Calculate the number of samples after resampling
+    num_original_samples = len(audio_np)
+    num_target_samples = int(num_original_samples * target_sample_rate /
+                                original_sample_rate)
+
+    # Resample the audio
+    resampled_audio = resample(audio_np, num_target_samples)
+
+    return resampled_audio.astype(np.int16).tobytes()
+
+async def control_handler(websocket, path):
+    debug_print(f"New control connection from {websocket.remote_address}")
+    print(f"{bcolors.OKGREEN}Control client connected{bcolors.ENDC}")
+    global recorder
+    control_connections.add(websocket)
+    try:
+        async for message in websocket:
+            debug_print(f"Received control message: {message[:200]}...")
+            if not recorder_ready.is_set():
+                print(f"{bcolors.WARNING}Recorder not ready{bcolors.ENDC}")
+                continue
+            if isinstance(message, str):
+                # Handle text message (command)
+                try:
+                    command_data = json.loads(message)
+                    command = command_data.get("command")
+                    if command == "set_parameter":
+                        parameter = command_data.get("parameter")
+                        value = command_data.get("value")
+                        if parameter in allowed_parameters and hasattr(recorder, parameter):
+                            setattr(recorder, parameter, value)
+                            # Format the value for output
+                            if isinstance(value, float):
+                                value_formatted = f"{value:.2f}"
+                            else:
+                                value_formatted = value
+                            timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                            if extended_logging:
+                                print(f"  [{timestamp}] {bcolors.OKGREEN}Set recorder.{parameter} to: {bcolors.OKBLUE}{value_formatted}{bcolors.ENDC}")
+                            # Optionally send a response back to the client
+                            await websocket.send(json.dumps({"status": "success", "message": f"Parameter {parameter} set to {value}"}))
+                        else:
+                            if not parameter in allowed_parameters:
+                                print(f"{bcolors.WARNING}Parameter {parameter} is not allowed (set_parameter){bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "error", "message": f"Parameter {parameter} is not allowed (set_parameter)"}))
+                            else:
+                                print(f"{bcolors.WARNING}Parameter {parameter} does not exist (set_parameter){bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "error", "message": f"Parameter {parameter} does not exist (set_parameter)"}))
+
+                    elif command == "get_parameter":
+                        parameter = command_data.get("parameter")
+                        request_id = command_data.get("request_id")  # Get the request_id from the command data
+                        if parameter in allowed_parameters and hasattr(recorder, parameter):
+                            value = getattr(recorder, parameter)
+                            if isinstance(value, float):
+                                value_formatted = f"{value:.2f}"
+                            else:
+                                value_formatted = f"{value}"
+
+                            value_truncated = value_formatted[:39] + "…" if len(value_formatted) > 40 else value_formatted
+
+                            timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                            if extended_logging:
+                                print(f"  [{timestamp}] {bcolors.OKGREEN}Get recorder.{parameter}: {bcolors.OKBLUE}{value_truncated}{bcolors.ENDC}")
+                            response = {"status": "success", "parameter": parameter, "value": value}
+                            if request_id is not None:
+                                response["request_id"] = request_id
+                            await websocket.send(json.dumps(response))
+                        else:
+                            if not parameter in allowed_parameters:
+                                print(f"{bcolors.WARNING}Parameter {parameter} is not allowed (get_parameter){bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "error", "message": f"Parameter {parameter} is not allowed (get_parameter)"}))
+                            else:
+                                print(f"{bcolors.WARNING}Parameter {parameter} does not exist (get_parameter){bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "error", "message": f"Parameter {parameter} does not exist (get_parameter)"}))
+                    elif command == "call_method":
+                        method_name = command_data.get("method")
+                        if method_name in allowed_methods:
+                            method = getattr(recorder, method_name, None)
+                            if method and callable(method):
+                                args = command_data.get("args", [])
+                                kwargs = command_data.get("kwargs", {})
+                                method(*args, **kwargs)
+                                timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+                                print(f"  [{timestamp}] {bcolors.OKGREEN}Called method recorder.{bcolors.OKBLUE}{method_name}{bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "success", "message": f"Method {method_name} called"}))
+                            else:
+                                print(f"{bcolors.WARNING}Recorder does not have method {method_name}{bcolors.ENDC}")
+                                await websocket.send(json.dumps({"status": "error", "message": f"Recorder does not have method {method_name}"}))
+                        else:
+                            print(f"{bcolors.WARNING}Method {method_name} is not allowed{bcolors.ENDC}")
+                            await websocket.send(json.dumps({"status": "error", "message": f"Method {method_name} is not allowed"}))
+                    else:
+                        print(f"{bcolors.WARNING}Unknown command: {command}{bcolors.ENDC}")
+                        await websocket.send(json.dumps({"status": "error", "message": f"Unknown command {command}"}))
+                except json.JSONDecodeError:
+                    print(f"{bcolors.WARNING}Received invalid JSON command{bcolors.ENDC}")
+                    await websocket.send(json.dumps({"status": "error", "message": "Invalid JSON command"}))
+            else:
+                print(f"{bcolors.WARNING}Received unknown message type on control connection{bcolors.ENDC}")
+    except websockets.exceptions.ConnectionClosed as e:
+        print(f"{bcolors.WARNING}Control client disconnected: {e}{bcolors.ENDC}")
+    finally:
+        control_connections.remove(websocket)
+
+async def data_handler(websocket, path):
+    global writechunks, wav_file
+    print(f"{bcolors.OKGREEN}Data client connected{bcolors.ENDC}")
+    data_connections.add(websocket)
+    try:
+        while True:
+            message = await websocket.recv()
+            if isinstance(message, bytes):
+                if debug_logging:
+                    debug_print(f"Received audio chunk (size: {len(message)} bytes)")
+                elif log_incoming_chunks:
+                    print(".", end='', flush=True)
+                # Handle binary message (audio data)
+                metadata_length = int.from_bytes(message[:4], byteorder='little')
+                metadata_json = message[4:4+metadata_length].decode('utf-8')
+                metadata = json.loads(metadata_json)
+                sample_rate = metadata['sampleRate']
+
+                debug_print(f"Processing audio chunk with sample rate {sample_rate}")
+                chunk = message[4+metadata_length:]
+
+                if writechunks:
+                    if not wav_file:
+                        wav_file = wave.open(writechunks, 'wb')
+                        wav_file.setnchannels(CHANNELS)
+                        wav_file.setsampwidth(pyaudio.get_sample_size(FORMAT))
+                        wav_file.setframerate(sample_rate)
+
+                    wav_file.writeframes(chunk)
+
+                resampled_chunk = decode_and_resample(chunk, sample_rate, 16000)
+
+                debug_print(f"Resampled chunk size: {len(resampled_chunk)} bytes")
+                recorder.feed_audio(resampled_chunk)
+            else:
+                print(f"{bcolors.WARNING}Received non-binary message on data connection{bcolors.ENDC}")
+    except websockets.exceptions.ConnectionClosed as e:
+        print(f"{bcolors.WARNING}Data client disconnected: {e}{bcolors.ENDC}")
+    finally:
+        data_connections.remove(websocket)
+        recorder.clear_audio_queue()  # Ensure audio queue is cleared if client disconnects
+
+async def broadcast_audio_messages():
+    while True:
+        message = await audio_queue.get()
+        for conn in list(data_connections):
+            try:
+                timestamp = datetime.now().strftime('%H:%M:%S.%f')[:-3]
+
+                if extended_logging:
+                    print(f"  [{timestamp}] Sending message: {bcolors.OKBLUE}{message}{bcolors.ENDC}\n", flush=True, end="")
+                await conn.send(message)
+            except websockets.exceptions.ConnectionClosed:
+                data_connections.remove(conn)
+
+# Helper function to create event loop bound closures for callbacks
+def make_callback(loop, callback):
+    def inner_callback(*args, **kwargs):
+        callback(*args, **kwargs, loop=loop)
+    return inner_callback
+
+async def main_async():            
+    global stop_recorder, recorder_config, global_args
+    args = parse_arguments()
+    global_args = args
+
+    # Get the event loop here and pass it to the recorder thread
+    loop = asyncio.get_event_loop()
+
+    recorder_config = {
+        'model': args.model,
+        'realtime_model_type': args.rt_model,
+        'language': args.lang,
+        'input_device_index': args.input_device,
+        'silero_sensitivity': args.silero_sensitivity,
+        'silero_use_onnx': args.silero_use_onnx,
+        'webrtc_sensitivity': args.webrtc_sensitivity,
+        'post_speech_silence_duration': args.unknown_sentence_detection_pause,
+        'min_length_of_recording': args.min_length_of_recording,
+        'min_gap_between_recordings': args.min_gap_between_recordings,
+        'enable_realtime_transcription': args.enable_realtime_transcription,
+        'realtime_processing_pause': args.realtime_processing_pause,
+        'silero_deactivity_detection': args.silero_deactivity_detection,
+        'early_transcription_on_silence': args.early_transcription_on_silence,
+        'beam_size': args.beam_size,
+        'beam_size_realtime': args.beam_size_realtime,
+        'initial_prompt': args.initial_prompt,
+        'wake_words': args.wake_words,
+        'wake_words_sensitivity': args.wake_words_sensitivity,
+        'wake_word_timeout': args.wake_word_timeout,
+        'wake_word_activation_delay': args.wake_word_activation_delay,
+        'wakeword_backend': args.wakeword_backend,
+        'openwakeword_model_paths': args.openwakeword_model_paths,
+        'openwakeword_inference_framework': args.openwakeword_inference_framework,
+        'wake_word_buffer_duration': args.wake_word_buffer_duration,
+        'use_main_model_for_realtime': args.use_main_model_for_realtime,
+        'spinner': False,
+        'use_microphone': False,
+        'on_realtime_transcription_update': make_callback(loop, text_detected),
+        'on_recording_start': make_callback(loop, on_recording_start),
+        'on_recording_stop': make_callback(loop, on_recording_stop),
+        'on_vad_detect_start': make_callback(loop, on_vad_detect_start),
+        'on_vad_detect_stop': make_callback(loop, on_vad_detect_stop),
+        'on_wakeword_detected': make_callback(loop, on_wakeword_detected),
+        'on_wakeword_detection_start': make_callback(loop, on_wakeword_detection_start),
+        'on_wakeword_detection_end': make_callback(loop, on_wakeword_detection_end),
+        'on_transcription_start': make_callback(loop, on_transcription_start),
+        # 'on_recorded_chunk': make_callback(loop, on_recorded_chunk),
+        'no_log_file': True,  # Disable logging to file
+        'use_extended_logging': args.use_extended_logging,
+        'level': loglevel,
+    }
+
+    try:
+        # Attempt to start control and data servers
+        control_server = await websockets.serve(control_handler, "localhost", args.control)
+        data_server = await websockets.serve(data_handler, "localhost", args.data)
+        print(f"{bcolors.OKGREEN}Control server started on {bcolors.OKBLUE}ws://localhost:{args.control}{bcolors.ENDC}")
+        print(f"{bcolors.OKGREEN}Data server started on {bcolors.OKBLUE}ws://localhost:{args.data}{bcolors.ENDC}")
+
+        # Start the broadcast and recorder threads
+        broadcast_task = asyncio.create_task(broadcast_audio_messages())
+
+        recorder_thread = threading.Thread(target=_recorder_thread, args=(loop,))
+        recorder_thread.start()
+        recorder_ready.wait()
+
+        print(f"{bcolors.OKGREEN}Server started. Press Ctrl+C to stop the server.{bcolors.ENDC}")
+
+        # Run server tasks
+        await asyncio.gather(control_server.wait_closed(), data_server.wait_closed(), broadcast_task)
+    except OSError as e:
+        print(f"{bcolors.FAIL}Error: Could not start server on specified ports. It’s possible another instance of the server is already running, or the ports are being used by another application.{bcolors.ENDC}")
+    except KeyboardInterrupt:
+        print(f"{bcolors.WARNING}Server interrupted by user, shutting down...{bcolors.ENDC}")
+    finally:
+        # Shutdown procedures for recorder and server threads
+        await shutdown_procedure()
+        print(f"{bcolors.OKGREEN}Server shutdown complete.{bcolors.ENDC}")
+
+async def shutdown_procedure():
+    global stop_recorder, recorder_thread
+    if recorder:
+        stop_recorder = True
+        recorder.abort()
+        recorder.stop()
+        recorder.shutdown()
+        print(f"{bcolors.OKGREEN}Recorder shut down{bcolors.ENDC}")
+
+        if recorder_thread:
+            recorder_thread.join()
+            print(f"{bcolors.OKGREEN}Recorder thread finished{bcolors.ENDC}")
+
+    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
+    for task in tasks:
+        task.cancel()
+    await asyncio.gather(*tasks, return_exceptions=True)
+
+    print(f"{bcolors.OKGREEN}All tasks cancelled, closing event loop now.{bcolors.ENDC}")
+
+def main():
+    try:
+        asyncio.run(main_async())
+    except KeyboardInterrupt:
+        # Capture any final KeyboardInterrupt to prevent it from showing up in logs
+        print(f"{bcolors.WARNING}Server interrupted by user.{bcolors.ENDC}")
+        exit(0)
+
+if __name__ == '__main__':
+    main()

+ 0 - 0
__init__.py


+ 6 - 3
requirements-gpu.txt

@@ -1,7 +1,10 @@
 PyAudio==0.2.14
-faster-whisper==1.0.2
+faster-whisper==1.0.3
 pvporcupine==1.9.5
-webrtcvad==2.0.10
+webrtcvad-wheels==2.0.14
 halo==0.0.31
-scipy==1.13.0
+scipy==1.14.1
 websockets==v12.0
+websocket-client==1.8.0
+openwakeword>=0.4.0
+numpy<2.0.0

+ 7 - 5
requirements.txt

@@ -1,10 +1,12 @@
 PyAudio==0.2.14
 faster-whisper==1.0.3
 pvporcupine==1.9.5
-webrtcvad==2.0.10
+webrtcvad-wheels==2.0.14
 halo==0.0.31
-torch==2.3.1
-torchaudio==2.3.1
-scipy==1.12.0
+torch
+torchaudio
+scipy==1.14.1
 websockets==v12.0
-openwakeword==0.6.0
+websocket-client==1.8.0
+openwakeword>=0.4.0
+numpy<2.0.0

+ 10 - 4
setup.py

@@ -9,14 +9,14 @@ with open('requirements.txt') as f:
 
 setuptools.setup(
     name="RealtimeSTT",
-    version="0.2.4",
+    version="0.3.7",
     author="Kolja Beigel",
     author_email="kolja.beigel@web.de",
     description="A fast Voice Activity Detection and Transcription System",
     long_description=long_description,
     long_description_content_type="text/markdown",
     url="https://github.com/KoljaB/RealTimeSTT",
-    packages=setuptools.find_packages(),
+    packages=setuptools.find_packages(include=["RealtimeSTT", "RealtimeSTT_server"]),
     classifiers=[
         "Programming Language :: Python :: 3",
         "License :: OSI Approved :: MIT License",
@@ -24,5 +24,11 @@ setuptools.setup(
     ],
     python_requires='>=3.6',
     install_requires=requirements,
-    keywords="real-time, audio, transcription, speech-to-text, voice-activity-detection, VAD, real-time-transcription, ambient-noise-detection, microphone-input, faster_whisper, speech-recognition, voice-assistants, audio-processing, buffered-transcription, pyaudio, ambient-noise-level, voice-deactivity"
-)
+    keywords="real-time, audio, transcription, speech-to-text, voice-activity-detection, VAD, real-time-transcription, ambient-noise-detection, microphone-input, faster_whisper, speech-recognition, voice-assistants, audio-processing, buffered-transcription, pyaudio, ambient-noise-level, voice-deactivity",
+    entry_points={
+        'console_scripts': [
+            'stt-server=RealtimeSTT_server.stt_server:main',
+            'stt=RealtimeSTT_server.stt_cli_client:main',
+        ],
+    },
+)

+ 85 - 0
tests/feed_audio.py

@@ -0,0 +1,85 @@
+if __name__ == "__main__":
+    import threading
+    import pyaudio
+    from RealtimeSTT import AudioToTextRecorder
+
+    # Audio stream configuration constants
+    CHUNK = 1024                  # Number of audio samples per buffer
+    FORMAT = pyaudio.paInt16      # Sample format (16-bit integer)
+    CHANNELS = 1                  # Mono audio
+    RATE = 16000                  # Sampling rate in Hz (expected by the recorder)
+
+    # Initialize the audio-to-text recorder without using the microphone directly
+    # Since we are feeding audio data manually, set use_microphone to False
+    recorder = AudioToTextRecorder(
+        use_microphone=False,     # Disable built-in microphone usage
+        spinner=False             # Disable spinner animation in the console
+    )
+
+    # Event to signal when to stop the threads
+    stop_event = threading.Event()
+
+    def feed_audio_thread():
+        """Thread function to read audio data and feed it to the recorder."""
+        p = pyaudio.PyAudio()
+
+        # Open an input audio stream with the specified configuration
+        stream = p.open(
+            format=FORMAT,
+            channels=CHANNELS,
+            rate=RATE,
+            input=True,
+            frames_per_buffer=CHUNK
+        )
+
+        try:
+            print("Speak now")
+            while not stop_event.is_set():
+                # Read audio data from the stream (in the expected format)
+                data = stream.read(CHUNK)
+                # Feed the audio data to the recorder
+                recorder.feed_audio(data)
+        except Exception as e:
+            print(f"feed_audio_thread encountered an error: {e}")
+        finally:
+            # Clean up the audio stream
+            stream.stop_stream()
+            stream.close()
+            p.terminate()
+            print("Audio stream closed.")
+
+    def recorder_transcription_thread():
+        """Thread function to handle transcription and process the text."""
+        def process_text(full_sentence):
+            """Callback function to process the transcribed text."""
+            print("Transcribed text:", full_sentence)
+            # Check for the stop command in the transcribed text
+            if "stop recording" in full_sentence.lower():
+                print("Stop command detected. Stopping threads...")
+                stop_event.set()
+                recorder.abort()
+        try:
+            while not stop_event.is_set():
+                # Get transcribed text and process it using the callback
+                recorder.text(process_text)
+        except Exception as e:
+            print(f"transcription_thread encountered an error: {e}")
+        finally:
+            print("Transcription thread exiting.")
+
+    # Create and start the audio feeding thread
+    audio_thread = threading.Thread(target=feed_audio_thread)
+    audio_thread.daemon = False    # Ensure the thread doesn't exit prematurely
+    audio_thread.start()
+
+    # Create and start the transcription thread
+    transcription_thread = threading.Thread(target=recorder_transcription_thread)
+    transcription_thread.daemon = False    # Ensure the thread doesn't exit prematurely
+    transcription_thread.start()
+
+    # Wait for both threads to finish
+    audio_thread.join()
+    transcription_thread.join()
+
+    print("Recording and transcription have stopped.")
+    recorder.shutdown()

+ 45 - 0
tests/install_packages.py

@@ -0,0 +1,45 @@
+import subprocess
+import sys
+
+def check_and_install_packages(packages):
+    """
+    Checks if the specified packages are installed, and if not, prompts the user
+    to install them.
+
+    Parameters:
+    - packages: A list of dictionaries, each containing:
+        - 'import_name': The name used in the import statement.
+        - 'install_name': (Optional) The name used in the pip install command.
+                          Defaults to 'import_name' if not provided.
+        - 'version': (Optional) Version constraint for the package.
+    """
+    for package in packages:
+        import_name = package['import_name']
+        install_name = package.get('install_name', import_name)
+        version = package.get('version', '')
+
+        try:
+            __import__(import_name)
+        except ImportError:
+            user_input = input(
+                f"This program requires the '{import_name}' library, which is not installed.\n"
+                f"Do you want to install it now? (y/n): "
+            )
+            if user_input.strip().lower() == 'y':
+                try:
+                    # Build the pip install command
+                    install_command = [sys.executable, "-m", "pip", "install"]
+                    if version:
+                        install_command.append(f"{install_name}{version}")
+                    else:
+                        install_command.append(install_name)
+
+                    subprocess.check_call(install_command)
+                    __import__(import_name)
+                    print(f"Successfully installed '{install_name}'.")
+                except Exception as e:
+                    print(f"An error occurred while installing '{install_name}': {e}")
+                    sys.exit(1)
+            else:
+                print(f"The program requires the '{import_name}' library to run. Exiting...")
+                sys.exit(1)

+ 319 - 0
tests/realtimestt_speechendpoint.py

@@ -0,0 +1,319 @@
+IS_DEBUG = False
+
+import os
+import sys
+import threading
+import queue
+import time
+from collections import deque
+from difflib import SequenceMatcher
+from install_packages import check_and_install_packages
+
+# Check and install required packages
+check_and_install_packages([
+    {'import_name': 'rich'},
+    {'import_name': 'openai'},
+    {'import_name': 'colorama'},
+    {'import_name': 'RealtimeSTT'},
+    # Add any other required packages here
+])
+
+EXTENDED_LOGGING = False
+
+if __name__ == '__main__':
+
+    if EXTENDED_LOGGING:
+        import logging
+        logging.basicConfig(level=logging.DEBUG)
+
+    from rich.console import Console
+    from rich.live import Live
+    from rich.text import Text
+    from rich.panel import Panel
+    from rich.spinner import Spinner
+    from rich.progress import Progress, SpinnerColumn, TextColumn
+    console = Console()
+    console.print("System initializing, please wait")
+
+    from RealtimeSTT import AudioToTextRecorder
+    from colorama import Fore, Style
+    import colorama
+    from openai import OpenAI
+    # import ollama
+
+    # Initialize OpenAI client for Ollama    
+    client = OpenAI(
+        # base_url='http://127.0.0.1:11434/v1/', # ollama
+        base_url='http://127.0.0.1:1234/v1/', # lm_studio
+        api_key='ollama',  # required but ignored
+    )
+
+    if os.name == "nt" and (3, 8) <= sys.version_info < (3, 99):
+        from torchaudio._extension.utils import _init_dll_path
+        _init_dll_path()    
+
+    colorama.init()
+
+    # Initialize Rich Console and Live
+    live = Live(console=console, refresh_per_second=10, screen=False)
+    live.start()
+
+    # Initialize a thread-safe queue
+    text_queue = queue.Queue()
+
+    # Variables for managing displayed text
+    full_sentences = []
+    rich_text_stored = ""
+    recorder = None
+    displayed_text = ""
+    text_time_deque = deque()
+
+    rapid_sentence_end_detection = 0.4
+    end_of_sentence_detection_pause = 1.2
+    unknown_sentence_detection_pause = 1.8
+    mid_sentence_detection_pause = 2.4
+    hard_break_even_on_background_noise = 3.0
+    hard_break_even_on_background_noise_min_texts = 3
+    hard_break_even_on_background_noise_min_chars = 15
+    hard_break_even_on_background_noise_min_similarity = 0.99
+    relisten_on_abrupt_stop = True
+
+    abrupt_stop = False
+
+    def clear_console():
+        os.system('clear' if os.name == 'posix' else 'cls')
+
+    prev_text = ""
+
+    speech_finished_cache = {}
+
+    def is_speech_finished(text):
+        # Check if the result is already in the cache
+        if text in speech_finished_cache:
+            if IS_DEBUG:
+                print(f"Cache hit for: '{text}'")
+            return speech_finished_cache[text]
+        
+        user_prompt = (
+            "Please reply with only 'c' if the following text is a complete thought (a sentence that stands on its own), "
+            "or 'i' if it is not finished. Do not include any additional text in your reply. "
+            "Consider a full sentence to have a clear subject, verb, and predicate or express a complete idea. "
+            "Examples:\n"
+            "- 'The sky is blue.' is complete (reply 'c').\n"
+            "- 'When the sky' is incomplete (reply 'i').\n"
+            "- 'She walked home.' is complete (reply 'c').\n"
+            "- 'Because he' is incomplete (reply 'i').\n"
+            f"\nText: {text}"
+        )
+
+        response = client.chat.completions.create(
+            model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf",
+            messages=[{"role": "user", "content": user_prompt}],
+            max_tokens=1,
+            temperature=0.0,  # Set temperature to 0 for deterministic output
+        )
+
+        if IS_DEBUG:
+            print(f"t:'{response.choices[0].message.content.strip().lower()}'", end="", flush=True)
+
+        reply = response.choices[0].message.content.strip().lower()
+        result = reply == 'c'
+
+        # Cache the result
+        speech_finished_cache[text] = result
+
+        return result
+
+    def preprocess_text(text):
+        # Remove leading whitespaces
+        text = text.lstrip()
+
+        #  Remove starting ellipses if present
+        if text.startswith("..."):
+            text = text[3:]
+
+        # Remove any leading whitespaces again after ellipses removal
+        text = text.lstrip()
+
+        # Uppercase the first letter
+        if text:
+            text = text[0].upper() + text[1:]
+        
+        return text
+
+    def text_detected(text):
+        """
+        Enqueue the detected text for processing.
+        """
+        text_queue.put(text)
+
+
+    def process_queue():
+        global recorder, full_sentences, prev_text, displayed_text, rich_text_stored, text_time_deque, abrupt_stop
+
+        # Initialize a deque to store texts with their timestamps
+        while True:
+            try:
+                text = text_queue.get(timeout=1)  # Wait for text or timeout after 1 second
+            except queue.Empty:
+                continue  # No text to process, continue looping
+
+            if text is None:
+                # Sentinel value to indicate thread should exit
+                break
+
+            text = preprocess_text(text)
+            current_time = time.time()
+
+            sentence_end_marks = ['.', '!', '?', '。'] 
+            if text.endswith("..."):
+                if not recorder.post_speech_silence_duration == mid_sentence_detection_pause:
+                    recorder.post_speech_silence_duration = mid_sentence_detection_pause
+                    if IS_DEBUG: print(f"RT: post_speech_silence_duration: {recorder.post_speech_silence_duration}")
+            elif text and text[-1] in sentence_end_marks and prev_text and prev_text[-1] in sentence_end_marks:
+                if not recorder.post_speech_silence_duration == end_of_sentence_detection_pause:
+                    recorder.post_speech_silence_duration = end_of_sentence_detection_pause
+                    if IS_DEBUG: print(f"RT: post_speech_silence_duration: {recorder.post_speech_silence_duration}")
+            else:
+                if not recorder.post_speech_silence_duration == unknown_sentence_detection_pause:
+                    recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+                    if IS_DEBUG: print(f"RT: post_speech_silence_duration: {recorder.post_speech_silence_duration}")
+
+            prev_text = text
+            
+            import string
+            transtext = text.translate(str.maketrans('', '', string.punctuation))
+            
+            if is_speech_finished(transtext):
+                if not recorder.post_speech_silence_duration == rapid_sentence_end_detection:
+                    recorder.post_speech_silence_duration = rapid_sentence_end_detection
+                    if IS_DEBUG: print(f"RT: {transtext} post_speech_silence_duration: {recorder.post_speech_silence_duration}")
+
+            # Append the new text with its timestamp
+            text_time_deque.append((current_time, text))
+
+            # Remove texts older than 1 second
+            while text_time_deque and text_time_deque[0][0] < current_time - hard_break_even_on_background_noise:
+                text_time_deque.popleft()
+
+            # Check if at least 3 texts have arrived within the last full second
+            if len(text_time_deque) >= hard_break_even_on_background_noise_min_texts:
+                texts = [t[1] for t in text_time_deque]
+                first_text = texts[0]
+                last_text = texts[-1]
+
+
+            # Check if at least 3 texts have arrived within the last full second
+            if len(text_time_deque) >= 3:
+                texts = [t[1] for t in text_time_deque]
+                first_text = texts[0]
+                last_text = texts[-1]
+
+                # Compute the similarity ratio between the first and last texts
+                similarity = SequenceMatcher(None, first_text, last_text).ratio()
+                #print(f"Similarity: {similarity:.2f}")
+
+                if similarity > hard_break_even_on_background_noise_min_similarity and len(first_text) > hard_break_even_on_background_noise_min_chars:
+                    abrupt_stop = True
+                    recorder.stop()
+
+            rich_text = Text()
+            for i, sentence in enumerate(full_sentences):
+                if i % 2 == 0:
+                    rich_text += Text(sentence, style="yellow") + Text(" ")
+                else:
+                    rich_text += Text(sentence, style="cyan") + Text(" ")
+            
+            if text:
+                rich_text += Text(text, style="bold yellow")
+
+            new_displayed_text = rich_text.plain
+
+            if new_displayed_text != displayed_text:
+                displayed_text = new_displayed_text
+                panel = Panel(rich_text, title="[bold green]Live Transcription[/bold green]", border_style="bold green")
+                live.update(panel)
+                rich_text_stored = rich_text
+
+            # Mark the task as done
+            text_queue.task_done()
+
+    def process_text(text):
+        global recorder, full_sentences, prev_text, abrupt_stop
+        if IS_DEBUG: print(f"SENTENCE: post_speech_silence_duration: {recorder.post_speech_silence_duration}")
+        recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+        text = preprocess_text(text)
+        text = text.rstrip()
+        text_time_deque.clear()
+        if text.endswith("..."):
+            text = text[:-2]
+                
+        full_sentences.append(text)
+        prev_text = ""
+        text_detected("")
+
+        if abrupt_stop:
+            abrupt_stop = False
+            if relisten_on_abrupt_stop:
+                recorder.listen()
+                recorder.start()
+                if hasattr(recorder, "last_words_buffer"):
+                    recorder.frames.extend(list(recorder.last_words_buffer))
+
+    # Recorder configuration
+    recorder_config = {
+        'spinner': False,
+        'model': 'medium.en',
+        #'input_device_index': 1, # mic
+        #'input_device_index': 2, # stereomix
+        'realtime_model_type': 'tiny.en',
+        'language': 'en',
+        #'silero_sensitivity': 0.05,
+        'silero_sensitivity': 0.4,
+        'webrtc_sensitivity': 3,
+        'post_speech_silence_duration': unknown_sentence_detection_pause,
+        'min_length_of_recording': 1.1,        
+        'min_gap_between_recordings': 0,                
+        'enable_realtime_transcription': True,
+        'realtime_processing_pause': 0.05,
+        'on_realtime_transcription_update': text_detected,
+        'silero_deactivity_detection': False,
+        'early_transcription_on_silence': 0,
+        'beam_size': 5,
+        'beam_size_realtime': 1,
+        'no_log_file': True,
+        'initial_prompt': (
+            "End incomplete sentences with ellipses.\n"
+            "Examples:\n"
+            "Complete: The sky is blue.\n"
+            "Incomplete: When the sky...\n"
+            "Complete: She walked home.\n"
+            "Incomplete: Because he...\n"
+        )
+        #'initial_prompt': "Use ellipses for incomplete sentences like: I went to the..."        
+    }
+
+    if EXTENDED_LOGGING:
+        recorder_config['level'] = logging.DEBUG
+
+    recorder = AudioToTextRecorder(**recorder_config)
+    
+    initial_text = Panel(Text("Say something...", style="cyan bold"), title="[bold yellow]Waiting for Input[/bold yellow]", border_style="bold yellow")
+    live.update(initial_text)
+
+    # Start the worker thread
+    worker_thread = threading.Thread(target=process_queue, daemon=True)
+    worker_thread.start()
+
+    try:
+        while True:
+            recorder.text(process_text)
+    except KeyboardInterrupt:
+        # Send sentinel value to worker thread to exit
+        text_queue.put(None)
+        worker_thread.join()
+        live.stop()
+        console.print("[bold red]Transcription stopped by user. Exiting...[/bold red]")
+        exit(0)
+
+

+ 156 - 33
tests/realtimestt_test.py

@@ -1,58 +1,181 @@
-from RealtimeSTT import AudioToTextRecorder
-from colorama import Fore, Style
-import colorama
-import os
+EXTENDED_LOGGING = False
+
+# set to 0 to deactivate writing to keyboard
+# try lower values like 0.002 (fast) first, take higher values like 0.05 in case it fails
+WRITE_TO_KEYBOARD_INTERVAL = 0.002
 
 if __name__ == '__main__':
 
-    print("Initializing RealtimeSTT test...")
+    from install_packages import check_and_install_packages
+    check_and_install_packages([
+        {
+            'import_name': 'rich',
+        },
+        {
+            'import_name': 'pyautogui',
+        }        
+    ])
+
+    if EXTENDED_LOGGING:
+        import logging
+        logging.basicConfig(level=logging.DEBUG)
+
+    from rich.console import Console
+    from rich.live import Live
+    from rich.text import Text
+    from rich.panel import Panel
+    from rich.spinner import Spinner
+    from rich.progress import Progress, SpinnerColumn, TextColumn
+    console = Console()
+    console.print("System initializing, please wait")
+
+    import os
+    import sys
+    from RealtimeSTT import AudioToTextRecorder
+    from colorama import Fore, Style
+    import colorama
+    import pyautogui
+
+    if os.name == "nt" and (3, 8) <= sys.version_info < (3, 99):
+        from torchaudio._extension.utils import _init_dll_path
+        _init_dll_path()    
 
     colorama.init()
 
+    # Initialize Rich Console and Live
+    live = Live(console=console, refresh_per_second=10, screen=False)
+    live.start()
+
     full_sentences = []
-    displayed_text = ""
+    rich_text_stored = ""
+    recorder = None
+    displayed_text = ""  # Used for tracking text that was already displayed
+
+    end_of_sentence_detection_pause = 0.45
+    unknown_sentence_detection_pause = 0.7
+    mid_sentence_detection_pause = 2.0
 
     def clear_console():
         os.system('clear' if os.name == 'posix' else 'cls')
 
+    prev_text = ""
+
+    def preprocess_text(text):
+        # Remove leading whitespaces
+        text = text.lstrip()
+
+        #  Remove starting ellipses if present
+        if text.startswith("..."):
+            text = text[3:]
+
+        # Remove any leading whitespaces again after ellipses removal
+        text = text.lstrip()
+
+        # Uppercase the first letter
+        if text:
+            text = text[0].upper() + text[1:]
+        
+        return text
+
+
     def text_detected(text):
-        global displayed_text
-        sentences_with_style = [
-            f"{Fore.YELLOW + sentence + Style.RESET_ALL if i % 2 == 0 else Fore.CYAN + sentence + Style.RESET_ALL} "
-            for i, sentence in enumerate(full_sentences)
-        ]
-        new_text = "".join(sentences_with_style).strip() + " " + text if len(sentences_with_style) > 0 else text
-
-        if new_text != displayed_text:
-            displayed_text = new_text
-            clear_console()
-            print(f"Language: {recorder.detected_language} (realtime: {recorder.detected_realtime_language})")
-            print(displayed_text, end="", flush=True)
+        global prev_text, displayed_text, rich_text_stored
+
+        text = preprocess_text(text)
+
+        sentence_end_marks = ['.', '!', '?', '。'] 
+        if text.endswith("..."):
+            recorder.post_speech_silence_duration = mid_sentence_detection_pause
+        elif text and text[-1] in sentence_end_marks and prev_text and prev_text[-1] in sentence_end_marks:
+            recorder.post_speech_silence_duration = end_of_sentence_detection_pause
+        else:
+            recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+
+        prev_text = text
+
+        # Build Rich Text with alternating colors
+        rich_text = Text()
+        for i, sentence in enumerate(full_sentences):
+            if i % 2 == 0:
+                #rich_text += Text(sentence, style="bold yellow") + Text(" ")
+                rich_text += Text(sentence, style="yellow") + Text(" ")
+            else:
+                rich_text += Text(sentence, style="cyan") + Text(" ")
+        
+        # If the current text is not a sentence-ending, display it in real-time
+        if text:
+            rich_text += Text(text, style="bold yellow")
+
+        new_displayed_text = rich_text.plain
+
+        if new_displayed_text != displayed_text:
+            displayed_text = new_displayed_text
+            panel = Panel(rich_text, title="[bold green]Live Transcription[/bold green]", border_style="bold green")
+            live.update(panel)
+            rich_text_stored = rich_text
 
     def process_text(text):
+        global recorder, full_sentences, prev_text
+        recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+
+        text = preprocess_text(text)
+        text = text.rstrip()
+        if text.endswith("..."):
+            text = text[:-2]
+                
+        if not text:
+            return
+
         full_sentences.append(text)
+        prev_text = ""
         text_detected("")
 
+        if WRITE_TO_KEYBOARD_INTERVAL:
+            pyautogui.write(f"{text} ", interval=WRITE_TO_KEYBOARD_INTERVAL)  # Adjust interval as needed
+
+    # Recorder configuration
     recorder_config = {
         'spinner': False,
-        'model': 'large-v2',
-        'silero_sensitivity': 0.4,
-        'webrtc_sensitivity': 2,
-        'post_speech_silence_duration': 0.4,
-        'min_length_of_recording': 0,
-        'min_gap_between_recordings': 0,
+        'model': 'large-v2', # or large-v2 or deepdml/faster-whisper-large-v3-turbo-ct2 or ...
+        # 'input_device_index': 1,
+        'realtime_model_type': 'tiny.en', # or small.en or distil-small.en or ...
+        'language': 'en',
+        'silero_sensitivity': 0.05,
+        'webrtc_sensitivity': 3,
+        'post_speech_silence_duration': unknown_sentence_detection_pause,
+        'min_length_of_recording': 1.1,        
+        'min_gap_between_recordings': 0,                
         'enable_realtime_transcription': True,
-        'realtime_processing_pause': 0.2,
-        'realtime_model_type': 'tiny',
-        'on_realtime_transcription_update': text_detected, 
+        'realtime_processing_pause': 0.02,
+        'on_realtime_transcription_update': text_detected,
+        #'on_realtime_transcription_stabilized': text_detected,
         'silero_deactivity_detection': True,
+        'early_transcription_on_silence': 0,
+        'beam_size': 5,
+        'beam_size_realtime': 3,
+        'no_log_file': True,
+        'initial_prompt': (
+            "End incomplete sentences with ellipses.\n"
+            "Examples:\n"
+            "Complete: The sky is blue.\n"
+            "Incomplete: When the sky...\n"
+            "Complete: She walked home.\n"
+            "Incomplete: Because he...\n"
+        )
     }
 
-    recorder = AudioToTextRecorder(**recorder_config)
-
-    clear_console()
-    print("Say something...", end="", flush=True)
+    if EXTENDED_LOGGING:
+        recorder_config['level'] = logging.DEBUG
 
-    while True:
-        recorder.text(process_text)
+    recorder = AudioToTextRecorder(**recorder_config)
+    
+    initial_text = Panel(Text("Say something...", style="cyan bold"), title="[bold yellow]Waiting for Input[/bold yellow]", border_style="bold yellow")
+    live.update(initial_text)
 
+    try:
+        while True:
+            recorder.text(process_text)
+    except KeyboardInterrupt:
+        live.stop()
+        console.print("[bold red]Transcription stopped by user. Exiting...[/bold red]")
+        exit(0)

+ 451 - 0
tests/realtimestt_test_hotkeys_v2.py

@@ -0,0 +1,451 @@
+EXTENDED_LOGGING = False
+
+if __name__ == '__main__':
+
+    import subprocess
+    import sys
+    import threading
+    import time
+
+    def install_rich():
+        subprocess.check_call([sys.executable, "-m", "pip", "install", "rich"])
+
+    try:
+        import rich
+    except ImportError:
+        user_input = input("This demo needs the 'rich' library, which is not installed.\nDo you want to install it now? (y/n): ")
+        if user_input.lower() == 'y':
+            try:
+                install_rich()
+                import rich
+                print("Successfully installed 'rich'.")
+            except Exception as e:
+                print(f"An error occurred while installing 'rich': {e}")
+                sys.exit(1)
+        else:
+            print("The program requires the 'rich' library to run. Exiting...")
+            sys.exit(1)
+
+    import keyboard
+    import pyperclip
+
+    if EXTENDED_LOGGING:
+        import logging
+        logging.basicConfig(level=logging.DEBUG)
+
+    from rich.console import Console
+    from rich.live import Live
+    from rich.text import Text
+    from rich.panel import Panel
+    console = Console()
+    console.print("System initializing, please wait")
+
+    import os
+    from RealtimeSTT import AudioToTextRecorder  # Ensure this module has stop() or close() methods
+
+    import colorama
+    colorama.init()
+
+    # Import pyautogui
+    import pyautogui
+
+    import pyaudio
+    import numpy as np
+
+    # Initialize Rich Console and Live
+    live = Live(console=console, refresh_per_second=10, screen=False)
+    live.start()
+
+    # Global variables
+    full_sentences = []
+    rich_text_stored = ""
+    recorder = None
+    displayed_text = ""  # Used for tracking text that was already displayed
+
+    end_of_sentence_detection_pause = 0.45
+    unknown_sentence_detection_pause = 0.7
+    mid_sentence_detection_pause = 2.0
+
+    prev_text = ""
+
+    # Events to signal threads to exit or reset
+    exit_event = threading.Event()
+    reset_event = threading.Event()
+
+    def preprocess_text(text):
+        # Remove leading whitespaces
+        text = text.lstrip()
+
+        # Remove starting ellipses if present
+        if text.startswith("..."):
+            text = text[3:]
+
+        # Remove any leading whitespaces again after ellipses removal
+        text = text.lstrip()
+
+        # Uppercase the first letter
+        if text:
+            text = text[0].upper() + text[1:]
+
+        return text
+
+    def text_detected(text):
+        global prev_text, displayed_text, rich_text_stored
+
+        text = preprocess_text(text)
+
+        sentence_end_marks = ['.', '!', '?', '。']
+        if text.endswith("..."):
+            recorder.post_speech_silence_duration = mid_sentence_detection_pause
+        elif text and text[-1] in sentence_end_marks and prev_text and prev_text[-1] in sentence_end_marks:
+            recorder.post_speech_silence_duration = end_of_sentence_detection_pause
+        else:
+            recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+
+        prev_text = text
+
+        # Build Rich Text with alternating colors
+        rich_text = Text()
+        for i, sentence in enumerate(full_sentences):
+            if i % 2 == 0:
+                rich_text += Text(sentence, style="yellow") + Text(" ")
+            else:
+                rich_text += Text(sentence, style="cyan") + Text(" ")
+
+        # If the current text is not a sentence-ending, display it in real-time
+        if text:
+            rich_text += Text(text, style="bold yellow")
+
+        new_displayed_text = rich_text.plain
+
+        if new_displayed_text != displayed_text:
+            displayed_text = new_displayed_text
+            panel = Panel(rich_text, title="[bold green]Live Transcription[/bold green]", border_style="bold green")
+            live.update(panel)
+            rich_text_stored = rich_text
+
+    def process_text(text):
+        global recorder, full_sentences, prev_text, displayed_text
+        recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+        text = preprocess_text(text)
+        text = text.rstrip()
+        if text.endswith("..."):
+            text = text[:-2]
+
+        full_sentences.append(text)
+        prev_text = ""
+        text_detected("")
+
+        # Check if reset_event is set
+        if reset_event.is_set():
+            # Clear buffers
+            full_sentences.clear()
+            displayed_text = ""
+            reset_event.clear()
+            console.print("[bold magenta]Transcription buffer reset.[/bold magenta]")
+            return
+
+        # Type the finalized sentence to the active window quickly if typing is enabled
+        try:
+            # Release modifier keys to prevent stuck keys
+            for key in ['ctrl', 'shift', 'alt', 'win']:
+                keyboard.release(key)
+                pyautogui.keyUp(key)
+
+            # Use clipboard to paste text
+            pyperclip.copy(text + ' ')
+            pyautogui.hotkey('ctrl', 'v')
+
+        except Exception as e:
+            console.print(f"[bold red]Failed to type the text: {e}[/bold red]")
+
+    # Recorder configuration
+    recorder_config = {
+        'spinner': False,
+        'model': 'Systran/faster-distil-whisper-large-v3',  # distil-medium.en or large-v2 or deepdml/faster-whisper-large-v3-turbo-ct2 or ...
+        'input_device_index': 1,
+        'realtime_model_type': 'Systran/faster-distil-whisper-large-v3',  # Using the same model for realtime
+        'language': 'en',
+        'silero_sensitivity': 0.05,
+        'webrtc_sensitivity': 3,
+        'post_speech_silence_duration': unknown_sentence_detection_pause,
+        'min_length_of_recording': 1.1,
+        'min_gap_between_recordings': 0,
+        'enable_realtime_transcription': True,
+        'realtime_processing_pause': 0.02,
+        'on_realtime_transcription_update': text_detected,
+        # 'on_realtime_transcription_stabilized': text_detected,
+        'silero_deactivity_detection': True,
+        'early_transcription_on_silence': 0,
+        'beam_size': 5,
+        'beam_size_realtime': 5,  # Matching beam_size for consistency
+        'no_log_file': True,
+        'initial_prompt': "Use ellipses for incomplete sentences like: I went to the...",
+        'device': 'cuda',          # Added device configuration
+        'compute_type': 'float16'  # Added compute_type configuration
+    }
+
+    if EXTENDED_LOGGING:
+        recorder_config['level'] = logging.DEBUG
+
+    recorder = AudioToTextRecorder(**recorder_config)
+
+    initial_text = Panel(Text("Say something...", style="cyan bold"), title="[bold yellow]Waiting for Input[/bold yellow]", border_style="bold yellow")
+    live.update(initial_text)
+
+    # Print available hotkeys
+    console.print("[bold green]Available Hotkeys:[/bold green]")
+    console.print("[bold cyan]F1[/bold cyan]: Mute Microphone")
+    console.print("[bold cyan]F2[/bold cyan]: Unmute Microphone")
+    console.print("[bold cyan]F3[/bold cyan]: Start Static Recording")
+    console.print("[bold cyan]F4[/bold cyan]: Stop Static Recording")
+    console.print("[bold cyan]F5[/bold cyan]: Reset Transcription")
+
+    # Global variables for static recording
+    static_recording_active = False
+    static_recording_thread = None
+    static_audio_frames = []
+    live_recording_enabled = True  # Track whether live recording was enabled before static recording
+
+    # Audio settings for static recording
+    audio_settings = {
+        'FORMAT': pyaudio.paInt16,  # PyAudio format
+        'CHANNELS': 1,               # Mono audio
+        'RATE': 16000,               # Sample rate
+        'CHUNK': 1024                # Buffer size
+    }
+
+    # Note: The maximum recommended length of static recording is about 5 minutes.
+
+    def static_recording_worker():
+        """
+        Worker function to record audio statically.
+        """
+        global static_audio_frames, static_recording_active
+        # Set up pyaudio
+        p = pyaudio.PyAudio()
+        # Use the same audio format as defined in audio_settings
+        FORMAT = audio_settings['FORMAT']
+        CHANNELS = audio_settings['CHANNELS']
+        RATE = audio_settings['RATE']  # Sample rate
+        CHUNK = audio_settings['CHUNK']  # Buffer size
+
+        # Open the audio stream
+        try:
+            stream = p.open(format=FORMAT,
+                            channels=CHANNELS,
+                            rate=RATE,
+                            input=True,
+                            frames_per_buffer=CHUNK)
+        except Exception as e:
+            console.print(f"[bold red]Failed to open audio stream for static recording: {e}[/bold red]")
+            static_recording_active = False
+            p.terminate()
+            return
+
+        while static_recording_active and not exit_event.is_set():
+            try:
+                data = stream.read(CHUNK)
+                static_audio_frames.append(data)
+            except Exception as e:
+                console.print(f"[bold red]Error during static recording: {e}[/bold red]")
+                break
+
+        # Stop and close the stream
+        stream.stop_stream()
+        stream.close()
+        p.terminate()
+
+    def start_static_recording():
+        """
+        Starts the static audio recording.
+        """
+        global static_recording_active, static_recording_thread, static_audio_frames, live_recording_enabled
+        if static_recording_active:
+            console.print("[bold yellow]Static recording is already in progress.[/bold yellow]")
+            return
+
+        # Mute the live recording microphone
+        live_recording_enabled = recorder.use_microphone.value
+        if live_recording_enabled:
+            recorder.set_microphone(False)
+            console.print("[bold yellow]Live microphone muted during static recording.[/bold yellow]")
+
+        console.print("[bold green]Starting static recording... Press F4 or F5 to stop/reset.[/bold green]")
+        static_audio_frames = []
+        static_recording_active = True
+        static_recording_thread = threading.Thread(target=static_recording_worker, daemon=True)
+        static_recording_thread.start()
+
+    def stop_static_recording():
+        """
+        Stops the static audio recording and processes the transcription.
+        """
+        global static_recording_active, static_recording_thread
+        if not static_recording_active:
+            console.print("[bold yellow]No static recording is in progress.[/bold yellow]")
+            return
+
+        console.print("[bold green]Stopping static recording...[/bold green]")
+        static_recording_active = False
+        if static_recording_thread is not None:
+            static_recording_thread.join()
+            static_recording_thread = None
+
+        # Start a new thread to process the transcription
+        processing_thread = threading.Thread(target=process_static_transcription, daemon=True)
+        processing_thread.start()
+
+    def process_static_transcription():
+        global static_audio_frames, live_recording_enabled
+        if exit_event.is_set():
+            return
+        # Process the recorded audio
+        console.print("[bold green]Processing static recording...[/bold green]")
+
+        # Convert audio data to numpy array
+        audio_data = b''.join(static_audio_frames)
+        audio_array = np.frombuffer(audio_data, dtype=np.int16).astype(np.float32) / 32768.0
+
+        # Transcribe the audio data
+        try:
+            from faster_whisper import WhisperModel
+        except ImportError:
+            console.print("[bold red]faster_whisper is not installed. Please install it to use static transcription.[/bold red]")
+            return
+
+        # Load the model using recorder_config
+        model_size = recorder_config['model']
+        device = recorder_config['device']
+        compute_type = recorder_config['compute_type']
+
+        console.print("Loading transcription model... This may take a moment.")
+        try:
+            model = WhisperModel(model_size, device=device, compute_type=compute_type)
+        except Exception as e:
+            console.print(f"[bold red]Failed to load transcription model: {e}[/bold red]")
+            return
+
+        # Transcribe the audio
+        try:
+            segments, info = model.transcribe(audio_array, beam_size=recorder_config['beam_size'])
+            transcription = ' '.join([segment.text for segment in segments]).strip()
+        except Exception as e:
+            console.print(f"[bold red]Error during transcription: {e}[/bold red]")
+            return
+
+        # Display the transcription
+        console.print("Static Recording Transcription:")
+        console.print(f"[bold cyan]{transcription}[/bold cyan]")
+
+        # Type the transcription into the active window
+        try:
+            # Release modifier keys to prevent stuck keys
+            for key in ['ctrl', 'shift', 'alt', 'win']:
+                keyboard.release(key)
+                pyautogui.keyUp(key)
+
+            # Use clipboard to paste text
+            pyperclip.copy(transcription + ' ')
+            pyautogui.hotkey('ctrl', 'v')
+
+        except Exception as e:
+            console.print(f"[bold red]Failed to type the static transcription: {e}[/bold red]")
+
+        # Unmute the live recording microphone if it was enabled before
+        if live_recording_enabled and not exit_event.is_set():
+            recorder.set_microphone(True)
+            console.print("[bold yellow]Live microphone unmuted.[/bold yellow]")
+
+    def reset_transcription():
+        """
+        Resets the transcription by flushing ongoing recordings or buffers.
+        """
+        global static_recording_active, static_recording_thread, static_audio_frames
+        console.print("[bold magenta]Resetting transcription...[/bold magenta]")
+        if static_recording_active:
+            console.print("[bold magenta]Flushing static recording...[/bold magenta]")
+            # Stop static recording
+            static_recording_active = False
+            if static_recording_thread is not None:
+                static_recording_thread.join()
+                static_recording_thread = None
+            # Clear static audio frames
+            static_audio_frames = []
+            # Unmute microphone if it was muted during static recording
+            if live_recording_enabled:
+                recorder.set_microphone(True)
+                console.print("[bold yellow]Live microphone unmuted after reset.[/bold yellow]")
+        elif recorder.use_microphone.value:
+            # Live transcription is active and microphone is not muted
+            console.print("[bold magenta]Resetting live transcription buffer...[/bold magenta]")
+            reset_event.set()
+        else:
+            # Microphone is muted; nothing to reset
+            console.print("[bold yellow]Microphone is muted. Nothing to reset.[/bold yellow]")
+
+    # Hotkey Callback Functions
+
+    def mute_microphone():
+        recorder.set_microphone(False)
+        console.print("[bold red]Microphone muted.[/bold red]")
+
+    def unmute_microphone():
+        recorder.set_microphone(True)
+        console.print("[bold green]Microphone unmuted.[/bold green]")
+
+    # Start the transcription loop in a separate thread
+    def transcription_loop():
+        try:
+            while not exit_event.is_set():
+                recorder.text(process_text)
+        except Exception as e:
+            console.print(f"[bold red]Error in transcription loop: {e}[/bold red]")
+        finally:
+            # Do not call sys.exit() here
+            pass
+
+    # Start the transcription loop thread
+    transcription_thread = threading.Thread(target=transcription_loop, daemon=True)
+    transcription_thread.start()
+
+    # Define the hotkey combinations and their corresponding functions
+    keyboard.add_hotkey('F1', mute_microphone, suppress=True)
+    keyboard.add_hotkey('F2', unmute_microphone, suppress=True)
+    keyboard.add_hotkey('F3', start_static_recording, suppress=True)
+    keyboard.add_hotkey('F4', stop_static_recording, suppress=True)
+    keyboard.add_hotkey('F5', reset_transcription, suppress=True)
+
+    # Keep the main thread running and handle graceful exit
+    try:
+        keyboard.wait()  # Waits indefinitely, until a hotkey triggers an exit or Ctrl+C
+    except KeyboardInterrupt:
+        console.print("[bold yellow]KeyboardInterrupt received. Exiting...[/bold yellow]")
+    finally:
+        # Signal threads to exit
+        exit_event.set()
+
+        # Reset transcription if needed
+        reset_transcription()
+
+        # Stop the recorder
+        try:
+            if hasattr(recorder, 'stop'):
+                recorder.stop()
+            elif hasattr(recorder, 'close'):
+                recorder.close()
+        except Exception as e:
+            console.print(f"[bold red]Error stopping recorder: {e}[/bold red]")
+
+        # Allow some time for threads to finish
+        time.sleep(1)
+
+        # Wait for transcription_thread to finish
+        if transcription_thread.is_alive():
+            transcription_thread.join(timeout=5)
+
+        # Stop the Live console
+        live.stop()
+
+        console.print("[bold red]Exiting gracefully...[/bold red]")
+        sys.exit(0)

+ 241 - 0
tests/realtimestt_test_stereomix.py

@@ -0,0 +1,241 @@
+EXTENDED_LOGGING = False
+
+def main():
+
+    from install_packages import check_and_install_packages
+    check_and_install_packages([
+        {
+            'import_name': 'rich',
+        }
+    ])
+
+    if EXTENDED_LOGGING:
+        import logging
+        logging.basicConfig(level=logging.DEBUG)
+
+    import os
+    import sys
+    import threading
+    import time
+    import pyaudio
+    from rich.console import Console
+    from rich.live import Live
+    from rich.text import Text
+    from rich.panel import Panel
+    from rich.spinner import Spinner
+    from rich.progress import Progress, SpinnerColumn, TextColumn
+    from colorama import Fore, Style, init as colorama_init
+
+    from RealtimeSTT import AudioToTextRecorder 
+
+    # Configuration Constants
+    LOOPBACK_DEVICE_NAME = "stereomix"
+    LOOPBACK_DEVICE_HOST_API = 0
+    BUFFER_SIZE = 512 
+    AUDIO_FORMAT = pyaudio.paInt16
+    CHANNELS = 1
+    RATE = 16000
+
+    console = Console()
+    console.print("System initializing, please wait")
+
+    colorama_init()
+
+    # Initialize Rich Console and Live
+    live = Live(console=console, refresh_per_second=10, screen=False)
+    live.start()
+
+    full_sentences = []
+    rich_text_stored = ""
+    recorder = None
+    displayed_text = ""  # Used for tracking text that was already displayed
+
+    end_of_sentence_detection_pause = 0.2
+    unknown_sentence_detection_pause = 0.5
+    mid_sentence_detection_pause = 1
+
+    prev_text = ""
+
+    def clear_console():
+        os.system('clear' if os.name == 'posix' else 'cls')
+
+    def preprocess_text(text):
+        # Remove leading whitespaces
+        text = text.lstrip()
+
+        # Remove starting ellipses if present
+        if text.startswith("..."):
+            text = text[3:]
+
+        # Remove any leading whitespaces again after ellipses removal
+        text = text.lstrip()
+
+        # Uppercase the first letter
+        if text:
+            text = text[0].upper() + text[1:]
+
+        return text
+
+    def text_detected(text):
+        nonlocal prev_text, displayed_text, rich_text_stored
+
+        text = preprocess_text(text)
+
+        sentence_end_marks = ['.', '!', '?', '。']
+        midsentence_marks = ['…', '-', '(']
+        if text.endswith("...") or text and text[-1] in midsentence_marks:
+            recorder.post_speech_silence_duration = mid_sentence_detection_pause
+        elif text and text[-1] in sentence_end_marks and prev_text and prev_text[-1] in sentence_end_marks:
+            recorder.post_speech_silence_duration = end_of_sentence_detection_pause
+        else:
+            recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+
+        prev_text = text
+
+        # Build Rich Text with alternating colors
+        rich_text = Text()
+        for i, sentence in enumerate(full_sentences):
+            if i % 2 == 0:
+                rich_text += Text(sentence, style="yellow") + Text(" ")
+            else:
+                rich_text += Text(sentence, style="cyan") + Text(" ")
+
+        # If the current text is not a sentence-ending, display it in real-time
+        if text:
+            rich_text += Text(text, style="bold yellow")
+
+        new_displayed_text = rich_text.plain
+
+        if new_displayed_text != displayed_text:
+            displayed_text = new_displayed_text
+            panel = Panel(rich_text, title="[bold green]Live Transcription[/bold green]", border_style="bold green")
+            live.update(panel)
+            rich_text_stored = rich_text
+
+    def process_text(text):
+        nonlocal recorder, full_sentences, prev_text
+        recorder.post_speech_silence_duration = unknown_sentence_detection_pause
+        text = preprocess_text(text)
+        text = text.rstrip()
+        if text.endswith("..."):
+            text = text[:-2]  # Remove ellipsis
+
+        full_sentences.append(text)
+        prev_text = ""
+        text_detected("")
+
+    # Recorder configuration
+    recorder_config = {
+        'spinner': False,
+        'use_microphone': False,
+        'model': 'large-v2',
+        'input_device_index': None,  # To be set after finding the device
+        'realtime_model_type': 'tiny.en',
+        'language': 'en',
+        'silero_sensitivity': 0.05,
+        'webrtc_sensitivity': 3,
+        'post_speech_silence_duration': unknown_sentence_detection_pause,
+        'min_length_of_recording': 2.0,        
+        'min_gap_between_recordings': 0,
+        'enable_realtime_transcription': True,
+        'realtime_processing_pause': 0.01,
+        'on_realtime_transcription_update': text_detected,
+        'silero_deactivity_detection': False,
+        'early_transcription_on_silence': 0,
+        'beam_size': 5,
+        'beam_size_realtime': 1,
+        'no_log_file': True,
+        'initial_prompt': "Use ellipses for incomplete sentences like: I went to the..."
+    }
+
+    if EXTENDED_LOGGING:
+        recorder_config['level'] = logging.DEBUG
+
+    # Initialize PyAudio
+    audio = pyaudio.PyAudio()
+
+    def find_stereo_mix_index():
+        nonlocal audio
+        devices_info = ""
+        for i in range(audio.get_device_count()):
+            dev = audio.get_device_info_by_index(i)
+            devices_info += f"{dev['index']}: {dev['name']} (hostApi: {dev['hostApi']})\n"
+
+            if (LOOPBACK_DEVICE_NAME.lower() in dev['name'].lower()
+                    and dev['hostApi'] == LOOPBACK_DEVICE_HOST_API):
+                return dev['index'], devices_info
+
+        return None, devices_info
+
+    device_index, devices_info = find_stereo_mix_index()
+    if device_index is None:
+        live.stop()
+        console.print("[bold red]Stereo Mix device not found. Available audio devices are:\n[/bold red]")
+        console.print(devices_info, style="red")
+        audio.terminate()
+        sys.exit(1)
+    else:
+        recorder_config['input_device_index'] = device_index
+        console.print(f"Using audio device index {device_index} for Stereo Mix.", style="green")
+
+    # Initialize the recorder
+    recorder = AudioToTextRecorder(**recorder_config)
+
+    # Initialize Live Display with waiting message
+    initial_text = Panel(Text("Say something...", style="cyan bold"), title="[bold yellow]Waiting for Input[/bold yellow]", border_style="bold yellow")
+    live.update(initial_text)
+
+    # Define the recording thread
+    def recording_thread():
+        nonlocal recorder
+        stream = audio.open(format=AUDIO_FORMAT,
+                            channels=CHANNELS,
+                            rate=RATE,
+                            input=True,
+                            frames_per_buffer=BUFFER_SIZE,
+                            input_device_index=recorder_config['input_device_index'])
+
+        try:
+            while not stop_event.is_set():
+                data = stream.read(BUFFER_SIZE, exception_on_overflow=False)
+                recorder.feed_audio(data)
+        except Exception as e:
+            console.print(f"[bold red]Error in recording thread: {e}[/bold red]")
+        finally:
+            console.print(f"[bold red]Stopping stream[/bold red]")
+            stream.stop_stream()
+            stream.close()
+
+    # Define the stop event
+    stop_event = threading.Event()
+
+    # Start the recording thread
+    thread = threading.Thread(target=recording_thread, daemon=True)
+    thread.start()
+
+    try:
+        while True:
+            recorder.text(process_text)
+    except KeyboardInterrupt:
+        console.print("[bold red]\nTranscription stopped by user. Exiting...[/bold red]")
+    finally:
+        print("live stop")
+        live.stop()
+
+        print("setting stop event")
+        stop_event.set()
+
+        print("thread join")
+        thread.join()
+
+        print("recorder stop")
+        recorder.stop()
+
+        print("audio terminate")
+        audio.terminate()
+
+        print("sys exit ")
+        sys.exit(0)
+
+if __name__ == '__main__':
+    main()

+ 27 - 0
tests/recorder_client.py

@@ -0,0 +1,27 @@
+from RealtimeSTT import AudioToTextRecorderClient
+
+# ANSI escape codes for terminal control
+CLEAR_LINE = "\033[K"      # Clear from cursor to end of line
+RESET_CURSOR = "\r"        # Move cursor to the beginning of the line
+GREEN_TEXT = "\033[92m"    # Set text color to green
+RESET_COLOR = "\033[0m"    # Reset text color to default
+
+def print_realtime_text(text):
+    print(f"{RESET_CURSOR}{CLEAR_LINE}{GREEN_TEXT}👄 {text}{RESET_COLOR}", end="", flush=True)
+
+# Initialize the audio recorder with the real-time transcription callback
+recorder = AudioToTextRecorderClient(on_realtime_transcription_update=print_realtime_text)
+
+# Print the speaking prompt
+print("👄 ", end="", flush=True)
+
+try:
+    while True:
+        # Fetch finalized transcription text, if available
+        if text := recorder.text():
+            # Display the finalized transcription
+            print(f"{RESET_CURSOR}{CLEAR_LINE}✍️ {text}\n👄 ", end="", flush=True)
+except KeyboardInterrupt:
+    # Handle graceful shutdown on Ctrl+C
+    print(f"{RESET_CURSOR}{CLEAR_LINE}", end="", flush=True)
+    recorder.shutdown()

+ 21 - 3
tests/simple_test.py

@@ -1,6 +1,24 @@
-from RealtimeSTT import AudioToTextRecorder
 if __name__ == '__main__':
-    recorder = AudioToTextRecorder(spinner=False, model="tiny.en", language="en")
+
+    import os
+    import sys
+    if os.name == "nt" and (3, 8) <= sys.version_info < (3, 99):
+        from torchaudio._extension.utils import _init_dll_path
+        _init_dll_path()
+
+    from RealtimeSTT import AudioToTextRecorder
+
+    recorder = AudioToTextRecorder(
+        spinner=False,
+        silero_sensitivity=0.01,
+        model="tiny.en",
+        language="en",
+        )
 
     print("Say something...")
-    while (True): print(recorder.text(), end=" ", flush=True)
+    
+    try:
+        while (True):
+            print("Detected text: " + recorder.text())
+    except KeyboardInterrupt:
+        print("Exiting application due to keyboard interrupt")

Kaikkia tiedostoja ei voida näyttää, sillä liian monta tiedostoa muuttui tässä diffissä