Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving input audio as raw binary data #1268

Open
fabiancuza opened this issue Feb 21, 2025 · 3 comments
Open

Receiving input audio as raw binary data #1268

fabiancuza opened this issue Feb 21, 2025 · 3 comments

Comments

@fabiancuza
Copy link

Hello! I am trying to implement pipecat into an already existing application using normal websockets, which means I cannot use one of the preferred transports. Pipecat seems to connect find to the websocket and receive the audio, but it cannot decode it: WebsocketClientTransport#0::WebsocketClientSession exception receiving data: DecodeError (Error parsing message with type 'pipecat.Frame').

Is there a specific serializer I should use for the WebsocketClientParams or something else?

Thanks a lot for the help!

The input data looks like this:

b'C\xc6\x81\x03\xfb\x80\xfb\x03?\x8bj \x8a\xef\x0c\xed\x12.\x86!\x82\xd9k\xe7\xd7\xc0\xb7C\xa9;{\xe2\xd23e\x1f1\xee\x934\x87\xc7\x1c\x9dVM\xa4.\x03\x8a\xb6\xb7\xe9$\xc3\xc8cd\x12\xa8k4\x8c\xa9\x03\xb0U|5\xe0\xbc\xbbT\xefD&\xa4\x83WE\x89\x96\xe5)\xcebnvI\xdf\x86\xb3qt\xd1)?8\r\x86;\x1c\x08\x12\x0f\r^A[\xf7\xd3\x15\xd5\xfd\x9d\xe7\xd2\xb8\xb9f`w\xec\xdaqcz:mHT\x7f\xbe\xf8\xbd7\xb1\xba\xa7\x97W6w\xeaH\x86)\xfa\xd9\xbc\x0b\xe2\x11\x9f\\|\x9dUD\x03\x82\xb5\xa8\x04\x0cD\x83EvX\xa63g\x10\x924\x89\xf5\x8f\xa9ay\xe25\x1e\x893^\xf7\x08^\xe9S\xd6w\x95h\x1e\xba\x92D*\xe3\x94V\x8a\x01+\xa6z/\x16&k3\xea\x03)\xca\xf99C\x08\xecV\x8d%\x88\x8d\xa5\xebF\xbb\x0c\xb6\xf6\x01<\x0f\x11\xee\x93\x8cRI\x82 \x93\xa5\x1aFz\x0b.\xfc\x98\x1c\xa61\xbb_\xd0\xeb\x90\xb2<Bs=\x98\x14\xf3\xd7w\xa2X\xe0B-\'Y\xb7\xca\xc0\x17\xa0\x9eJ\xb7/< \x91%\xd8\xe3\xb8\xb8\xf8>\x04\xcd\xdb\x18\xf4\x83 V\x95\xca\xc6\x06=\x98\xb1s\x83\x96D\xfd\xec)N\x7f\x9b\xc8\xe1\xb3mN\x00\x97\xe3\xd9\x0e\x03\x02\xba\x0b\xd1\xd8\x9f4\x8b@\x8a"Q\x9e\xe6\x83ix\xfb,"\xe11\x92\xe9\x84ZL\xf7q\x88\xd0{SB\xe2l\xb4\xceN\x05\xffP\x9d\x9a\xa4\x03K\xf0K\xac\xcd\x99\x7f\xf5\xbb\xd4Z\xa3\x96s\xb1\xc2\xe7K]zl\xed\xaa\xa5>\x86\x8ef1\xb6\xd4\xa3\xed\x8aC\xa0\xb573z\x9d\x83\xa5kK\xb3\xf6=)\xf8\xd3l\xaf"{\xcf\x10\xdf-\xac\x06`g\n\xb5\x80\xd9\xdc\x89?B\x15\xbb\x92\xb4\x89k\x7f0\xc3\x1c\x08\xf2\xf5\xce\xa1\xb6\xe2z\'\x89\xb9\xa3%NC\xef\xf3,\xf2\xb9\x19\xeb\x9f)ON\xeb\xc6\x98V\x8e\xe4\x19\xf6\xbf\x91\xc8~K\xc5\xae\x84hu\x1f6Uu\xee\xae\xe0_\xe8|Z>U\xc3\xc3\x07\x85\xbaj\xf0RyE\x90\x03~n2\xd7\xcdq\x0e\xeb\x8d\xaa\xdc\x1c\xc5\x04Kh\x8c5\x0b\xce\xf5I\x13\xb3q\xcfq\x0f\r\x9b\x15\x8a\xc9P\x82\xdc2w\x05\xe7\xa6\xcc\n,)A\xaeX!~\xa0\xf7\xeb\xa7\x16\x002\xe0\xffw\x00\t\xf6\xb8\xbc %7\x9d\xe8=a.\x8c\xa6\x9c\x17\x0bY\xd8\x1b\x8b\xed\xa02\xce\xe9\x9b\xc9gt\xca-\xedT\x08\xb7\xaf#\xaf\xc6\xd7\xb6cd\x91\x04\xd1\xb5!=\xa6\x04\rO\xba@\x11\xde\x1c\xca\xf9\xd7\xb6\x16\xb4 \x84\xb9ut^\xb1ce\x15\x8d\x10 \xc3\xcfGO:sz=\xe1\x8b\xf2\xe2\x17\x1c76v8z\x82\xb2M2e\xa5\rE\xe2\x1c\xea\x14\xb6\x10H\x80\x81qfv\x02&SSV#?\np\xb8\xf2h\\\xc60g\xcaO\x07\xc2\x0c\xda{\xb6O\xa74\xbf\xb17\xb1Ga0\x1fP\xce5\x0f\xe0\xaf\xd7\xd8Ey\x06\xfe\x8d?\x8e\xc669+3\xe9\x03m\xc3\xe0\xc6\xbb\n:\n\xd2\x01\xfe$\xbf\x80i\x81\xec\xb4\x04?\xf5\xf6;\xa5\xedt`Gh>\xab\x18,F\x8c\xd6\x1c\xe6K?\xe9z\xd46T\xde"\xd6\xcb\xf1\x89\xd7r\x05\x08\xb61\xf1\xf0\x97|\xb9\xd8\xb8\xb2mx\xeb,\xc1b\x95\x9f\x8d\xba\x161\xbdm_\xb0\x92?:\xa57D\xe9~\x97i\x13\x84\xf1\x1a\xdf\t\xbf\xb2q\xf9\xde2\xe6\x066`|V\xc8\xa0\xa1=E\xcf\xb6O\x11\xf7\x11Gce\xb8z\x88~b4a\x98\xb6}.\xca\x81\x8d\x91\xee\x0bS\x08\xa0\xb0\x1a\xbfi\xf6,\x137\xcb\xc7g?\xd6\xa5\xff\xa6\xf2\xf9\xd7\xf8.\xedH3ZV\x05H\xc0K\xe5\x0f\xac@3\xf4/}~\x0b\x02\xff\x99\x16\xc1\xd1w\x9cM\xa1S\xb6m&\xee\x8a\tC'
@aconchillo
Copy link
Contributor

Hello! I am trying to implement pipecat into an already existing application using normal websockets, which means I cannot use one of the preferred transports. Pipecat seems to connect find to the websocket and receive the audio, but it cannot decode it: WebsocketClientTransport#0::WebsocketClientSession exception receiving data: DecodeError (Error parsing message with type 'pipecat.Frame').

Is there a specific serializer I should use for the WebsocketClientParams or something else?

Thanks a lot for the help!

The input data looks like this:

b'C\xc6\x81\x03\xfb\x80\xfb\x03?\x8bj \x8a\xef\x0c\xed\x12.\x86!\x82\xd9k\xe7\xd7\xc0\xb7C\xa9;{\xe2\xd23e\x1f1\xee\x934\x87\xc7\x1c\x9dVM\xa4.\x03\x8a\xb6\xb7\xe9$\xc3\xc8cd\x12\xa8k4\x8c\xa9\x03\xb0U|5\xe0\xbc\xbbT\xefD&\xa4\x83WE\x89\x96\xe5)\xcebnvI\xdf\x86\xb3qt\xd1)?8\r\x86;\x1c\x08\x12\x0f\r^A[\xf7\xd3\x15\xd5\xfd\x9d\xe7\xd2\xb8\xb9f`w\xec\xdaqcz:mHT\x7f\xbe\xf8\xbd7\xb1\xba\xa7\x97W6w\xeaH\x86)\xfa\xd9\xbc\x0b\xe2\x11\x9f\\|\x9dUD\x03\x82\xb5\xa8\x04\x0cD\x83EvX\xa63g\x10\x924\x89\xf5\x8f\xa9ay\xe25\x1e\x893^\xf7\x08^\xe9S\xd6w\x95h\x1e\xba\x92D*\xe3\x94V\x8a\x01+\xa6z/\x16&k3\xea\x03)\xca\xf99C\x08\xecV\x8d%\x88\x8d\xa5\xebF\xbb\x0c\xb6\xf6\x01<\x0f\x11\xee\x93\x8cRI\x82 \x93\xa5\x1aFz\x0b.\xfc\x98\x1c\xa61\xbb_\xd0\xeb\x90\xb2<Bs=\x98\x14\xf3\xd7w\xa2X\xe0B-\'Y\xb7\xca\xc0\x17\xa0\x9eJ\xb7/< \x91%\xd8\xe3\xb8\xb8\xf8>\x04\xcd\xdb\x18\xf4\x83 V\x95\xca\xc6\x06=\x98\xb1s\x83\x96D\xfd\xec)N\x7f\x9b\xc8\xe1\xb3mN\x00\x97\xe3\xd9\x0e\x03\x02\xba\x0b\xd1\xd8\x9f4\x8b@\x8a"Q\x9e\xe6\x83ix\xfb,"\xe11\x92\xe9\x84ZL\xf7q\x88\xd0{SB\xe2l\xb4\xceN\x05\xffP\x9d\x9a\xa4\x03K\xf0K\xac\xcd\x99\x7f\xf5\xbb\xd4Z\xa3\x96s\xb1\xc2\xe7K]zl\xed\xaa\xa5>\x86\x8ef1\xb6\xd4\xa3\xed\x8aC\xa0\xb573z\x9d\x83\xa5kK\xb3\xf6=)\xf8\xd3l\xaf"{\xcf\x10\xdf-\xac\x06`g\n\xb5\x80\xd9\xdc\x89?B\x15\xbb\x92\xb4\x89k\x7f0\xc3\x1c\x08\xf2\xf5\xce\xa1\xb6\xe2z\'\x89\xb9\xa3%NC\xef\xf3,\xf2\xb9\x19\xeb\x9f)ON\xeb\xc6\x98V\x8e\xe4\x19\xf6\xbf\x91\xc8~K\xc5\xae\x84hu\x1f6Uu\xee\xae\xe0_\xe8|Z>U\xc3\xc3\x07\x85\xbaj\xf0RyE\x90\x03~n2\xd7\xcdq\x0e\xeb\x8d\xaa\xdc\x1c\xc5\x04Kh\x8c5\x0b\xce\xf5I\x13\xb3q\xcfq\x0f\r\x9b\x15\x8a\xc9P\x82\xdc2w\x05\xe7\xa6\xcc\n,)A\xaeX!~\xa0\xf7\xeb\xa7\x16\x002\xe0\xffw\x00\t\xf6\xb8\xbc %7\x9d\xe8=a.\x8c\xa6\x9c\x17\x0bY\xd8\x1b\x8b\xed\xa02\xce\xe9\x9b\xc9gt\xca-\xedT\x08\xb7\xaf#\xaf\xc6\xd7\xb6cd\x91\x04\xd1\xb5!=\xa6\x04\rO\xba@\x11\xde\x1c\xca\xf9\xd7\xb6\x16\xb4 \x84\xb9ut^\xb1ce\x15\x8d\x10 \xc3\xcfGO:sz=\xe1\x8b\xf2\xe2\x17\x1c76v8z\x82\xb2M2e\xa5\rE\xe2\x1c\xea\x14\xb6\x10H\x80\x81qfv\x02&SSV#?\np\xb8\xf2h\\\xc60g\xcaO\x07\xc2\x0c\xda{\xb6O\xa74\xbf\xb17\xb1Ga0\x1fP\xce5\x0f\xe0\xaf\xd7\xd8Ey\x06\xfe\x8d?\x8e\xc669+3\xe9\x03m\xc3\xe0\xc6\xbb\n:\n\xd2\x01\xfe$\xbf\x80i\x81\xec\xb4\x04?\xf5\xf6;\xa5\xedt`Gh>\xab\x18,F\x8c\xd6\x1c\xe6K?\xe9z\xd46T\xde"\xd6\xcb\xf1\x89\xd7r\x05\x08\xb61\xf1\xf0\x97|\xb9\xd8\xb8\xb2mx\xeb,\xc1b\x95\x9f\x8d\xba\x161\xbdm_\xb0\x92?:\xa57D\xe9~\x97i\x13\x84\xf1\x1a\xdf\t\xbf\xb2q\xf9\xde2\xe6\x066`|V\xc8\xa0\xa1=E\xcf\xb6O\x11\xf7\x11Gce\xb8z\x88~b4a\x98\xb6}.\xca\x81\x8d\x91\xee\x0bS\x08\xa0\xb0\x1a\xbfi\xf6,\x137\xcb\xc7g?\xd6\xa5\xff\xa6\xf2\xf9\xd7\xf8.\xedH3ZV\x05H\xc0K\xe5\x0f\xac@3\xf4/}~\x0b\x02\xff\x99\x16\xc1\xd1w\x9cM\xa1S\xb6m&\xee\x8a\tC'

Is this raw audio or something else?

@aconchillo
Copy link
Contributor

If that's the case, then you should be able to use a serializer that does something like this:

class SimpleRawFrameSerializer(FrameSerializer):
    @property
    def type(self) -> FrameSerializerType:
        return FrameSerializerType.BINARY

    async def serialize(self, frame: Frame) -> str | bytes | None:
        if isinstance(frame, AudioRawFrame):
            return frame.audio

    async def deserialize(self, data: str | bytes) -> Frame | None:
        audio_frame = InputAudioRawFrame(audio=data, num_channels=1, sample_rate=16000)
        return audio_frame

NOTE: you pass the sample rate at run time. Look at other serializers.

@fabiancuza
Copy link
Author

@aconchillo I'm not entirely sure what type it is. I'm using a simple react component for the client side.

import React, { useState, useRef, useEffect } from 'react';
import { Mic, Square, Play, Pause } from 'lucide-react';

const WebSocketAudioTester = () => {
  const [isConnected, setIsConnected] = useState(false);
  const [isRecording, setIsRecording] = useState(false);
  const [isPlaying, setIsPlaying] = useState(false);
  const [status, setStatus] = useState('Disconnected');
  const [audioData, setAudioData] = useState([]);

  const wsRef = useRef(null);
  const mediaRecorderRef = useRef(null);
  const audioContextRef = useRef(null);

  // Connect to WebSocket server
  const connect = () => {
    try {
      // Replace with your WebSocket server URL
      wsRef.current = new WebSocket('ws://localhost:8000');

      wsRef.current.onopen = () => {
        setIsConnected(true);
        setStatus('Connected');
      };

      wsRef.current.onclose = () => {
        setIsConnected(false);
        setStatus('Disconnected');
      };

      wsRef.current.onmessage = async (event) => {
        // Handle incoming audio data
        if (event.data instanceof Blob) {
          const audioBlob = event.data;
          setAudioData(prev => [...prev, audioBlob]);
        }
      };

      wsRef.current.onerror = (error) => {
        setStatus(`Error: ${error.message}`);
      };
    } catch (error) {
      setStatus(`Connection error: ${error.message}`);
    }
  };

  // Start recording audio
  const startRecording = async () => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const audioContext = new AudioContext();
      console.log(`Sample rate: ${audioContext.sampleRate}`);
      mediaRecorderRef.current = new MediaRecorder(stream);

      mediaRecorderRef.current.ondataavailable = (event) => {
        if (event.data.size > 0 && wsRef.current?.readyState === WebSocket.OPEN) {
          wsRef.current.send(event.data);
        }
      };

      console.log(`Recording format: ${mediaRecorderRef.current.mimeType}`);

      mediaRecorderRef.current.start(100); // Collect data every 100ms
      setIsRecording(true);
      setStatus('Recording...');
    } catch (error) {
      setStatus(`Recording error: ${error.message}`);
    }
  };

  // Stop recording
  const stopRecording = () => {
    if (mediaRecorderRef.current && mediaRecorderRef.current.state !== 'inactive') {
      mediaRecorderRef.current.stop();
      mediaRecorderRef.current.stream.getTracks().forEach(track => track.stop());
      setIsRecording(false);
      setStatus('Recording stopped');
    }
  };

  // Play received audio
  const playAudio = async () => {
    if (!audioData.length) return;

    try {
      if (!audioContextRef.current) {
        audioContextRef.current = new (window.AudioContext || window.webkitAudioContext)();
      }

      setIsPlaying(true);
      setStatus('Playing received audio...');

      for (const blob of audioData) {
        const arrayBuffer = await blob.arrayBuffer();
        const audioBuffer = await audioContextRef.current.decodeAudioData(arrayBuffer);
        const source = audioContextRef.current.createBufferSource();
        source.buffer = audioBuffer;
        source.connect(audioContextRef.current.destination);
        source.start();

        // Wait for the audio to finish playing
        await new Promise(resolve => {
          source.onended = resolve;
        });
      }

      setIsPlaying(false);
      setStatus('Playback complete');
    } catch (error) {
      setIsPlaying(false);
      setStatus(`Playback error: ${error.message}`);
    }
  };

  // Cleanup on unmount
  useEffect(() => {
    return () => {
      if (wsRef.current) {
        wsRef.current.close();
      }
      if (mediaRecorderRef.current) {
        stopRecording();
      }
    };
  }, []);

  return (
    <div className="w-full max-w-md mx-auto">
      <div>
        <div>WebSocket Audio Tester</div>
      </div>
      <div className="space-y-4">
        <div className="flex justify-between items-center">
          <button
            onClick={isConnected ? () => wsRef.current?.close() : connect}
            variant={isConnected ? "destructive" : "default"}
          >
            {isConnected ? 'Disconnect' : 'Connect'}
          </button>

          <button
            onClick={isRecording ? stopRecording : startRecording}
            disabled={!isConnected}
            variant={isRecording ? "destructive" : "default"}
          >
            {isRecording ? (
              <Square className="w-4 h-4 mr-2" />
            ) : (
              <Mic className="w-4 h-4 mr-2" />
            )}
            {isRecording ? 'Stop Recording' : 'Start Recording'}
          </button>

          <button
            onClick={playAudio}
            disabled={!audioData.length || isPlaying}
            variant="outline"
          >
            {isPlaying ? (
              <Pause className="w-4 h-4 mr-2" />
            ) : (
              <Play className="w-4 h-4 mr-2" />
            )}
            Play Received
          </button>
        </div>

        <div className="text-sm text-center p-2 bg-slate-100 rounded">
          Status: {status}
        </div>

        <div className="text-sm text-center">
          Received audio chunks: {audioData.length}
        </div>
      </div>
    </div>
  );
};

export default WebSocketAudioTester;

Is there a simpler more recommended way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants