Connection to MQTT-Broker get lost every 200 minutes

Setting:
mqtt-Broker: Mosquitto 2.0.4 on Raspberry Pi4 (Raspberry Pi OS Lite)
platform: espressif32@3.0.0
board: esp32doit-devkit-v1
mqtt-Client: knolleary/PubSubClient@^2.8

Hi Community,

this setting produces mqtt-reconnects in periodic intervals of exactly 200 minutes. I noticed that several users have had the same experience.

Is there any tip for me to fix this bug?

Can you link to those statements?

I don’t know specifically much regarding MQTT, but that sounds like either

  • the mqtt broker mosquitto is disconnecting your client – which makes sense because there’s a “Keep Alive” time setting, see here regarding max_keepalive
  • the client reconnects somehow

Did you call into the library’s

function?

I think to get a better answer though you should ask on both the library’s page (with your exact code that reconnects every 200 minutes) and on the Mosquitto side (maybe they have a community forum).

for example: https://www.arduinoforum.de/arduino-Thread-ESP32-MQTT-Ausfall-nach-201-Minuten

(sorry, in german, but maxgerhardt sounds german too :slight_smile: )

keepalive and socket-timeout are set by default to 15 Seconds:

// MQTT_KEEPALIVE : keepAlive interval in Seconds. Override with setKeepAlive()
#ifndef MQTT_KEEPALIVE
#define MQTT_KEEPALIVE 15
#endif

// MQTT_SOCKET_TIMEOUT: socket timeout interval in Seconds. Override with setSocketTimeout()
#ifndef MQTT_SOCKET_TIMEOUT
#define MQTT_SOCKET_TIMEOUT 15
#endif

I see now. The thread also talks about reverting back to Arduino-ESP32 v1.0.0, which reportedely ran over 15 hours… Indicating that it might be an issue in the Arduino core, and not in the library, or Mosquitto.

But you should be able to test this.

I’d suggest the following two tests:

First of all, for all, turn on Mosquitto logging to the absolute maximum (here, log_type all, make sure you have enough space on the SD card of the Pi). With that info one can hopefully see if Mosquitto is the one doing the disconnecting.

Then, run one ESP32 with the MQTT firmware that you say is reconnecting every 200 minutes and add

build_flags = -DCORE_DEBUG_LEVEL=5
monitor_filters = log2file, time, esp32_exception_decoder
build_type = debug

to the platformio.ini of the project (docs). This makes it so that the “Monitor” tasks now creates a log file with all serial output, decodes exceptioins with debug-symbol information if a crash occurrs, and activates all debug output of the core (docs) with maximum verbosity.

Re-upload and monitor that firmware until you observe the disconnect again (or just let it run for more than say 205 minutes). The two log files should then very clearly show what is happening .

Another test would be to run the 1.0.0 core version and see if the error is automatically gone with that. Since the version is very old (2018) it might be a bit difficult do that in PlatformIO, you need both an old platform version and probably an old MQTT library version for it.

There can you start off of the platformio.ini

[env:esp32]
platform = espressif32@1.2.0
board = esp32dev
framework = arduino
; first version that supports ESP32
lib_deps = 
   https://github.com/knolleary/pubsubclient/archive/refs/tags/v2.7.zip

and try fixing errors or increasing versions from there, see here and here.

Thank you very much for your detailed suggestions.

Since I’m not very familiar with PlatformIO (and IoT programming in general), it will take me some time to try everything.

If I can find relevant results, I will of course report here.

Intermediate result:

The following 2 sections each show a connection break. As always, the time interval is 200 minutes. The error message comes out of the WiFiClient module. The WiFiClient object is transferred to PubSubClient when it is instantiated.

What does fd mean? And which host is meant: the fritz.box, the Raspberry or the Mosquitto broker?

P.S. the mqttClient.publish entries ‘test02: 101: xxxx: xxxx’ are generated by my program. They are status messages that are sent every 60 seconds. The messages ‘log: #mqttConnect: xxxx’ create a logbook entry in my web visualization when a reconnect has taken place.

The issue [IDFv4.2] Wifi connection lost shortly after connect (ESP32-S2 and ESP32) · Issue #4536 · espressif/arduino-esp32 · GitHub talks about “ERR: 113” and it being specific to a WiFi disconnect experienced on a Fritz.box router, and it not appearing when a smartphone WiFi hotspot is used.

Maybe there’s more info in Search · err 113 · GitHub. but above would be interesting if you were able to test that… somehow… but for the the MQTT broker has probably to be globally internet reachable and not in the local network… but there are surely test mqtt servers out there.

The debug print is done here at

errno = 118 means

Which may have been triggered internally after the device was kicked from the WiFi.

As a test, I connected the ESP32 to the home network via a WLAN hotspot on my desktop PC.

Here, too, the connection was periodically lost. The time between two interruptions was now exactly 249 minutes (and 36 seconds; observation period: 24 hours).

The log file entries are initially identical to those from the previous report. However, there are now 2 additional entries that show that the IP address has been renewed.

02:04:39.311 > [D][WiFiClient.cpp:509] connected(): Disconnected: RES: -1, ERR: 113
02:04:39.317 > [E][WiFiClient.cpp:232] connect(): connect on fd 57, errno: 118, “Host is unreachable”
02:04:39.330 > [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 7 - STA_GOT_IP
02:04:39.330 > [D][WiFiGeneric.cpp:381] _eventCallback(): STA IP: 192.168.137.26, MASK: 255.255.255.0, GW: 192.168.137.1
02:04:40.368 > mqttClient.publish->published: log:#mqttConnect:1621123481

The Fritzbox does not show any special events during these times. The cause is probably in the ESP itself. It is annoying that the program execution in the ESP stops completely during an (MQTT) connection break, since PubSubClient is executed by default together with the actual main program in CPU 1.