Google Speech-to-Text API returns "RecognitionAudio not set" (Error Code 400)

jdavis · October 1, 2024, 3:27pm

I’m working on a project with an ESP32 using Google Cloud Speech-to-Text API. The connection to the server works, but I’m getting an error message when sending audio data for transcription.

Serial Monitor Output:

---- Opened the serial port /dev/tty.usbserial-0001 ----

Initializing… Connecting to WiFi… WiFi Status: 3

Connected to WiFi Attempting to connect to Google API server… Connected to server successfully Sent request header. Waiting for response… … Server response: HTTP/1.1 400 Bad Request Vary: X-Origin Vary: Referer Content-Type: application/json; charset=UTF-8 Date: Sun, 29 Sep 2024 23:46:32 GMT Server: ESF Cache-Control: private X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff Alt-Svc: h3=“:443”; ma=2592000,h3-29=“:443”; ma=2592000 Accept-Ranges: none Vary: Origin,Accept-Encoding Transfer-Encoding: chunked

73 { “error”: { “code”: 400, “message”: “RecognitionAudio not set.”, “status”: “INVALID_ARGUMENT” } }

0

Closed connection. Full Encoded Audio Data: UklGRrRfAQBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YZBfAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA …

Transcribe Function:

‘void CloudSpeechClient::Transcribe(customAudio* audio) { audio->EncodeToBase64();

String HttpBody1 = “{"config":{"encoding":"LINEAR16","sampleRateHertz":16000,"languageCode":"en-US"},"audio":{"content":"”;
String HttpBody3 = “"}}\r\n\r\n”;

int httpBody2Length = (audio->wavDataSize + sizeof(audio->paddedHeader)) * 4 / 3;
String ContentLength = String(HttpBody1.length() + httpBody2Length + HttpBody3.length());

String fullHttpHeader = String(“POST /v1/speech:recognize?key=”) + apiKey +
" HTTP/1.1\r\nHost: speech.googleapis.com\r\nContent-Type: application/json\r\nContent-Length: " + ContentLength + “\r\n\r\n”;

client.print(fullHttpHeader);
client.print(HttpBody1);
PrintHttpBody2(audio); // This sends encoded audio data
client.print(HttpBody3);

String response = “”;
while (client.available()) {
response += client.readString();
}

int position = response.indexOf(‘{’);
if (position >= 0) {
ans = response.substring(position);
Serial.print("JSON Data: ");
Serial.println(ans);
}
} ‘

What I’ve tried:

Verifying the API key and server connection.

Ensuring the audio data is properly base64 encoded and added to the request.

Double-checked the JSON structure for sending the audio content.

Is there something wrong with how I’m handling the audio encoding or building the HTTP request? Any guidance would be appreciated!

GitHub: