Receiving input audio as raw binary data #1268

fabiancuza · 2025-02-21T15:19:26Z

Hello! I am trying to implement pipecat into an already existing application using normal websockets, which means I cannot use one of the preferred transports. Pipecat seems to connect find to the websocket and receive the audio, but it cannot decode it: WebsocketClientTransport#0::WebsocketClientSession exception receiving data: DecodeError (Error parsing message with type 'pipecat.Frame').

Is there a specific serializer I should use for the WebsocketClientParams or something else?

Thanks a lot for the help!

The input data looks like this:

b'C\xc6\x81\x03\xfb\x80\xfb\x03?\x8bj \x8a\xef\x0c\xed\x12.\x86!\x82\xd9k\xe7\xd7\xc0\xb7C\xa9;{\xe2\xd23e\x1f1\xee\x934\x87\xc7\x1c\x9dVM\xa4.\x03\x8a\xb6\xb7\xe9$\xc3\xc8cd\x12\xa8k4\x8c\xa9\x03\xb0U|5\xe0\xbc\xbbT\xefD&\xa4\x83WE\x89\x96\xe5)\xcebnvI\xdf\x86\xb3qt\xd1)?8\r\x86;\x1c\x08\x12\x0f\r^A[\xf7\xd3\x15\xd5\xfd\x9d\xe7\xd2\xb8\xb9f`w\xec\xdaqcz:mHT\x7f\xbe\xf8\xbd7\xb1\xba\xa7\x97W6w\xeaH\x86)\xfa\xd9\xbc\x0b\xe2\x11\x9f\\|\x9dUD\x03\x82\xb5\xa8\x04\x0cD\x83EvX\xa63g\x10\x924\x89\xf5\x8f\xa9ay\xe25\x1e\x893^\xf7\x08^\xe9S\xd6w\x95h\x1e\xba\x92D*\xe3\x94V\x8a\x01+\xa6z/\x16&k3\xea\x03)\xca\xf99C\x08\xecV\x8d%\x88\x8d\xa5\xebF\xbb\x0c\xb6\xf6\x01<\x0f\x11\xee\x93\x8cRI\x82 \x93\xa5\x1aFz\x0b.\xfc\x98\x1c\xa61\xbb_\xd0\xeb\x90\xb2<Bs=\x98\x14\xf3\xd7w\xa2X\xe0B-\'Y\xb7\xca\xc0\x17\xa0\x9eJ\xb7/< \x91%\xd8\xe3\xb8\xb8\xf8>\x04\xcd\xdb\x18\xf4\x83 V\x95\xca\xc6\x06=\x98\xb1s\x83\x96D\xfd\xec)N\x7f\x9b\xc8\xe1\xb3mN\x00\x97\xe3\xd9\x0e\x03\x02\xba\x0b\xd1\xd8\x9f4\x8b@\x8a"Q\x9e\xe6\x83ix\xfb,"\xe11\x92\xe9\x84ZL\xf7q\x88\xd0{SB\xe2l\xb4\xceN\x05\xffP\x9d\x9a\xa4\x03K\xf0K\xac\xcd\x99\x7f\xf5\xbb\xd4Z\xa3\x96s\xb1\xc2\xe7K]zl\xed\xaa\xa5>\x86\x8ef1\xb6\xd4\xa3\xed\x8aC\xa0\xb573z\x9d\x83\xa5kK\xb3\xf6=)\xf8\xd3l\xaf"{\xcf\x10\xdf-\xac\x06`g\n\xb5\x80\xd9\xdc\x89?B\x15\xbb\x92\xb4\x89k\x7f0\xc3\x1c\x08\xf2\xf5\xce\xa1\xb6\xe2z\'\x89\xb9\xa3%NC\xef\xf3,\xf2\xb9\x19\xeb\x9f)ON\xeb\xc6\x98V\x8e\xe4\x19\xf6\xbf\x91\xc8~K\xc5\xae\x84hu\x1f6Uu\xee\xae\xe0_\xe8|Z>U\xc3\xc3\x07\x85\xbaj\xf0RyE\x90\x03~n2\xd7\xcdq\x0e\xeb\x8d\xaa\xdc\x1c\xc5\x04Kh\x8c5\x0b\xce\xf5I\x13\xb3q\xcfq\x0f\r\x9b\x15\x8a\xc9P\x82\xdc2w\x05\xe7\xa6\xcc\n,)A\xaeX!~\xa0\xf7\xeb\xa7\x16\x002\xe0\xffw\x00\t\xf6\xb8\xbc %7\x9d\xe8=a.\x8c\xa6\x9c\x17\x0bY\xd8\x1b\x8b\xed\xa02\xce\xe9\x9b\xc9gt\xca-\xedT\x08\xb7\xaf#\xaf\xc6\xd7\xb6cd\x91\x04\xd1\xb5!=\xa6\x04\rO\xba@\x11\xde\x1c\xca\xf9\xd7\xb6\x16\xb4 \x84\xb9ut^\xb1ce\x15\x8d\x10 \xc3\xcfGO:sz=\xe1\x8b\xf2\xe2\x17\x1c76v8z\x82\xb2M2e\xa5\rE\xe2\x1c\xea\x14\xb6\x10H\x80\x81qfv\x02&SSV#?\np\xb8\xf2h\\\xc60g\xcaO\x07\xc2\x0c\xda{\xb6O\xa74\xbf\xb17\xb1Ga0\x1fP\xce5\x0f\xe0\xaf\xd7\xd8Ey\x06\xfe\x8d?\x8e\xc669+3\xe9\x03m\xc3\xe0\xc6\xbb\n:\n\xd2\x01\xfe$\xbf\x80i\x81\xec\xb4\x04?\xf5\xf6;\xa5\xedt`Gh>\xab\x18,F\x8c\xd6\x1c\xe6K?\xe9z\xd46T\xde"\xd6\xcb\xf1\x89\xd7r\x05\x08\xb61\xf1\xf0\x97|\xb9\xd8\xb8\xb2mx\xeb,\xc1b\x95\x9f\x8d\xba\x161\xbdm_\xb0\x92?:\xa57D\xe9~\x97i\x13\x84\xf1\x1a\xdf\t\xbf\xb2q\xf9\xde2\xe6\x066`|V\xc8\xa0\xa1=E\xcf\xb6O\x11\xf7\x11Gce\xb8z\x88~b4a\x98\xb6}.\xca\x81\x8d\x91\xee\x0bS\x08\xa0\xb0\x1a\xbfi\xf6,\x137\xcb\xc7g?\xd6\xa5\xff\xa6\xf2\xf9\xd7\xf8.\xedH3ZV\x05H\xc0K\xe5\x0f\xac@3\xf4/}~\x0b\x02\xff\x99\x16\xc1\xd1w\x9cM\xa1S\xb6m&\xee\x8a\tC'

The text was updated successfully, but these errors were encountered:

aconchillo · 2025-02-21T17:28:29Z

Hello! I am trying to implement pipecat into an already existing application using normal websockets, which means I cannot use one of the preferred transports. Pipecat seems to connect find to the websocket and receive the audio, but it cannot decode it: WebsocketClientTransport#0::WebsocketClientSession exception receiving data: DecodeError (Error parsing message with type 'pipecat.Frame').

Is there a specific serializer I should use for the WebsocketClientParams or something else?

Thanks a lot for the help!

The input data looks like this:

b'C\xc6\x81\x03\xfb\x80\xfb\x03?\x8bj \x8a\xef\x0c\xed\x12.\x86!\x82\xd9k\xe7\xd7\xc0\xb7C\xa9;{\xe2\xd23e\x1f1\xee\x934\x87\xc7\x1c\x9dVM\xa4.\x03\x8a\xb6\xb7\xe9$\xc3\xc8cd\x12\xa8k4\x8c\xa9\x03\xb0U|5\xe0\xbc\xbbT\xefD&\xa4\x83WE\x89\x96\xe5)\xcebnvI\xdf\x86\xb3qt\xd1)?8\r\x86;\x1c\x08\x12\x0f\r^A[\xf7\xd3\x15\xd5\xfd\x9d\xe7\xd2\xb8\xb9f`w\xec\xdaqcz:mHT\x7f\xbe\xf8\xbd7\xb1\xba\xa7\x97W6w\xeaH\x86)\xfa\xd9\xbc\x0b\xe2\x11\x9f\\|\x9dUD\x03\x82\xb5\xa8\x04\x0cD\x83EvX\xa63g\x10\x924\x89\xf5\x8f\xa9ay\xe25\x1e\x893^\xf7\x08^\xe9S\xd6w\x95h\x1e\xba\x92D*\xe3\x94V\x8a\x01+\xa6z/\x16&k3\xea\x03)\xca\xf99C\x08\xecV\x8d%\x88\x8d\xa5\xebF\xbb\x0c\xb6\xf6\x01<\x0f\x11\xee\x93\x8cRI\x82 \x93\xa5\x1aFz\x0b.\xfc\x98\x1c\xa61\xbb_\xd0\xeb\x90\xb2<Bs=\x98\x14\xf3\xd7w\xa2X\xe0B-\'Y\xb7\xca\xc0\x17\xa0\x9eJ\xb7/< \x91%\xd8\xe3\xb8\xb8\xf8>\x04\xcd\xdb\x18\xf4\x83 V\x95\xca\xc6\x06=\x98\xb1s\x83\x96D\xfd\xec)N\x7f\x9b\xc8\xe1\xb3mN\x00\x97\xe3\xd9\x0e\x03\x02\xba\x0b\xd1\xd8\x9f4\x8b@\x8a"Q\x9e\xe6\x83ix\xfb,"\xe11\x92\xe9\x84ZL\xf7q\x88\xd0{SB\xe2l\xb4\xceN\x05\xffP\x9d\x9a\xa4\x03K\xf0K\xac\xcd\x99\x7f\xf5\xbb\xd4Z\xa3\x96s\xb1\xc2\xe7K]zl\xed\xaa\xa5>\x86\x8ef1\xb6\xd4\xa3\xed\x8aC\xa0\xb573z\x9d\x83\xa5kK\xb3\xf6=)\xf8\xd3l\xaf"{\xcf\x10\xdf-\xac\x06`g\n\xb5\x80\xd9\xdc\x89?B\x15\xbb\x92\xb4\x89k\x7f0\xc3\x1c\x08\xf2\xf5\xce\xa1\xb6\xe2z\'\x89\xb9\xa3%NC\xef\xf3,\xf2\xb9\x19\xeb\x9f)ON\xeb\xc6\x98V\x8e\xe4\x19\xf6\xbf\x91\xc8~K\xc5\xae\x84hu\x1f6Uu\xee\xae\xe0_\xe8|Z>U\xc3\xc3\x07\x85\xbaj\xf0RyE\x90\x03~n2\xd7\xcdq\x0e\xeb\x8d\xaa\xdc\x1c\xc5\x04Kh\x8c5\x0b\xce\xf5I\x13\xb3q\xcfq\x0f\r\x9b\x15\x8a\xc9P\x82\xdc2w\x05\xe7\xa6\xcc\n,)A\xaeX!~\xa0\xf7\xeb\xa7\x16\x002\xe0\xffw\x00\t\xf6\xb8\xbc %7\x9d\xe8=a.\x8c\xa6\x9c\x17\x0bY\xd8\x1b\x8b\xed\xa02\xce\xe9\x9b\xc9gt\xca-\xedT\x08\xb7\xaf#\xaf\xc6\xd7\xb6cd\x91\x04\xd1\xb5!=\xa6\x04\rO\xba@\x11\xde\x1c\xca\xf9\xd7\xb6\x16\xb4 \x84\xb9ut^\xb1ce\x15\x8d\x10 \xc3\xcfGO:sz=\xe1\x8b\xf2\xe2\x17\x1c76v8z\x82\xb2M2e\xa5\rE\xe2\x1c\xea\x14\xb6\x10H\x80\x81qfv\x02&SSV#?\np\xb8\xf2h\\\xc60g\xcaO\x07\xc2\x0c\xda{\xb6O\xa74\xbf\xb17\xb1Ga0\x1fP\xce5\x0f\xe0\xaf\xd7\xd8Ey\x06\xfe\x8d?\x8e\xc669+3\xe9\x03m\xc3\xe0\xc6\xbb\n:\n\xd2\x01\xfe$\xbf\x80i\x81\xec\xb4\x04?\xf5\xf6;\xa5\xedt`Gh>\xab\x18,F\x8c\xd6\x1c\xe6K?\xe9z\xd46T\xde"\xd6\xcb\xf1\x89\xd7r\x05\x08\xb61\xf1\xf0\x97|\xb9\xd8\xb8\xb2mx\xeb,\xc1b\x95\x9f\x8d\xba\x161\xbdm_\xb0\x92?:\xa57D\xe9~\x97i\x13\x84\xf1\x1a\xdf\t\xbf\xb2q\xf9\xde2\xe6\x066`|V\xc8\xa0\xa1=E\xcf\xb6O\x11\xf7\x11Gce\xb8z\x88~b4a\x98\xb6}.\xca\x81\x8d\x91\xee\x0bS\x08\xa0\xb0\x1a\xbfi\xf6,\x137\xcb\xc7g?\xd6\xa5\xff\xa6\xf2\xf9\xd7\xf8.\xedH3ZV\x05H\xc0K\xe5\x0f\xac@3\xf4/}~\x0b\x02\xff\x99\x16\xc1\xd1w\x9cM\xa1S\xb6m&\xee\x8a\tC'

Is this raw audio or something else?

aconchillo · 2025-02-21T18:22:14Z

If that's the case, then you should be able to use a serializer that does something like this:

class SimpleRawFrameSerializer(FrameSerializer):
    @property
    def type(self) -> FrameSerializerType:
        return FrameSerializerType.BINARY

    async def serialize(self, frame: Frame) -> str | bytes | None:
        if isinstance(frame, AudioRawFrame):
            return frame.audio

    async def deserialize(self, data: str | bytes) -> Frame | None:
        audio_frame = InputAudioRawFrame(audio=data, num_channels=1, sample_rate=16000)
        return audio_frame

NOTE: you pass the sample rate at run time. Look at other serializers.

fabiancuza · 2025-02-21T21:23:47Z

@aconchillo I'm not entirely sure what type it is. I'm using a simple react component for the client side.

import React, { useState, useRef, useEffect } from 'react';
import { Mic, Square, Play, Pause } from 'lucide-react';

const WebSocketAudioTester = () => {
  const [isConnected, setIsConnected] = useState(false);
  const [isRecording, setIsRecording] = useState(false);
  const [isPlaying, setIsPlaying] = useState(false);
  const [status, setStatus] = useState('Disconnected');
  const [audioData, setAudioData] = useState([]);

  const wsRef = useRef(null);
  const mediaRecorderRef = useRef(null);
  const audioContextRef = useRef(null);

  // Connect to WebSocket server
  const connect = () => {
    try {
      // Replace with your WebSocket server URL
      wsRef.current = new WebSocket('ws://localhost:8000');

      wsRef.current.onopen = () => {
        setIsConnected(true);
        setStatus('Connected');
      };

      wsRef.current.onclose = () => {
        setIsConnected(false);
        setStatus('Disconnected');
      };

      wsRef.current.onmessage = async (event) => {
        // Handle incoming audio data
        if (event.data instanceof Blob) {
          const audioBlob = event.data;
          setAudioData(prev => [...prev, audioBlob]);
        }
      };

      wsRef.current.onerror = (error) => {
        setStatus(`Error: ${error.message}`);
      };
    } catch (error) {
      setStatus(`Connection error: ${error.message}`);
    }
  };

  // Start recording audio
  const startRecording = async () => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const audioContext = new AudioContext();
      console.log(`Sample rate: ${audioContext.sampleRate}`);
      mediaRecorderRef.current = new MediaRecorder(stream);

      mediaRecorderRef.current.ondataavailable = (event) => {
        if (event.data.size > 0 && wsRef.current?.readyState === WebSocket.OPEN) {
          wsRef.current.send(event.data);
        }
      };

      console.log(`Recording format: ${mediaRecorderRef.current.mimeType}`);

      mediaRecorderRef.current.start(100); // Collect data every 100ms
      setIsRecording(true);
      setStatus('Recording...');
    } catch (error) {
      setStatus(`Recording error: ${error.message}`);
    }
  };

  // Stop recording
  const stopRecording = () => {
    if (mediaRecorderRef.current && mediaRecorderRef.current.state !== 'inactive') {
      mediaRecorderRef.current.stop();
      mediaRecorderRef.current.stream.getTracks().forEach(track => track.stop());
      setIsRecording(false);
      setStatus('Recording stopped');
    }
  };

  // Play received audio
  const playAudio = async () => {
    if (!audioData.length) return;

    try {
      if (!audioContextRef.current) {
        audioContextRef.current = new (window.AudioContext || window.webkitAudioContext)();
      }

      setIsPlaying(true);
      setStatus('Playing received audio...');

      for (const blob of audioData) {
        const arrayBuffer = await blob.arrayBuffer();
        const audioBuffer = await audioContextRef.current.decodeAudioData(arrayBuffer);
        const source = audioContextRef.current.createBufferSource();
        source.buffer = audioBuffer;
        source.connect(audioContextRef.current.destination);
        source.start();

        // Wait for the audio to finish playing
        await new Promise(resolve => {
          source.onended = resolve;
        });
      }

      setIsPlaying(false);
      setStatus('Playback complete');
    } catch (error) {
      setIsPlaying(false);
      setStatus(`Playback error: ${error.message}`);
    }
  };

  // Cleanup on unmount
  useEffect(() => {
    return () => {
      if (wsRef.current) {
        wsRef.current.close();
      }
      if (mediaRecorderRef.current) {
        stopRecording();
      }
    };
  }, []);

  return (
    <div className="w-full max-w-md mx-auto">
      <div>
        <div>WebSocket Audio Tester</div>
      </div>
      <div className="space-y-4">
        <div className="flex justify-between items-center">
          <button
            onClick={isConnected ? () => wsRef.current?.close() : connect}
            variant={isConnected ? "destructive" : "default"}
          >
            {isConnected ? 'Disconnect' : 'Connect'}
          </button>

          <button
            onClick={isRecording ? stopRecording : startRecording}
            disabled={!isConnected}
            variant={isRecording ? "destructive" : "default"}
          >
            {isRecording ? (
              <Square className="w-4 h-4 mr-2" />
            ) : (
              <Mic className="w-4 h-4 mr-2" />
            )}
            {isRecording ? 'Stop Recording' : 'Start Recording'}
          </button>

          <button
            onClick={playAudio}
            disabled={!audioData.length || isPlaying}
            variant="outline"
          >
            {isPlaying ? (
              <Pause className="w-4 h-4 mr-2" />
            ) : (
              <Play className="w-4 h-4 mr-2" />
            )}
            Play Received
          </button>
        </div>

        <div className="text-sm text-center p-2 bg-slate-100 rounded">
          Status: {status}
        </div>

        <div className="text-sm text-center">
          Received audio chunks: {audioData.length}
        </div>
      </div>
    </div>
  );
};

export default WebSocketAudioTester;

Is there a simpler more recommended way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receiving input audio as raw binary data #1268

Receiving input audio as raw binary data #1268

fabiancuza commented Feb 21, 2025

aconchillo commented Feb 21, 2025

aconchillo commented Feb 21, 2025

fabiancuza commented Feb 21, 2025

Receiving input audio as raw binary data #1268

Receiving input audio as raw binary data #1268

Comments

fabiancuza commented Feb 21, 2025

aconchillo commented Feb 21, 2025

aconchillo commented Feb 21, 2025

fabiancuza commented Feb 21, 2025