Debugging WiFiClient with an ESP8266 on PlatformIO

This is sort of a combo question and this community is quite a bit stronger than others, so I was hoping I could get some help here.

I have an app on a WeMos D1 Mini using WiFiClient posting to an endpoint. I am unable to connect when using the initial client.connect(“foo.com”, 80)); statement - it fails immediately which tells me it’s not a network issue since I have a 10-second timeout. When I create an MWE it works fine so somewhere in my app I’m doing something wrong. This is the basic gist of it where it fails:

// Connect to the HTTP server
WiFiClient client;
client.setTimeout(10000);
Log.verbose(F("Connecting to: %s, %l" CR), url.getHost().c_str(), url.getPort());
if (!client.connect(url.getHost(), url.getPort())) {
    Log.error(F("Connection failed: %s, %l" CR), url.getHost().c_str(), url.getPort());
    return false;
}

… Like I said; simple, so I’m definitely missing something equally simple. :slight_smile:

So the two-part question is:

  1. (Core related) Is there a way to debug the WiFiClient’s inner workings (without dropping a buncha Serial.prints in the core)?

  2. (PlatformIO related) How does one enable/disable Serial debugging as is done via the Arduino ISE’s Tools>Debug>Serial menu selection?

This one is really annoying and I am quite sure I’ll kick myself when I find it.

As can be seen in the source code the class mostly forwards calls to its ClientContext* _client object (and on the side calling some other WiFi functions for e.g. DNS resolving)

It also does some debug prints here and there

DEBUGV maps to a printf() if a certain macro is enabled:

The WiFi core uses the DEBUG_WIFI macro which is enabled via the macro DEBUG_ESP_WIFI

See Espressif 8266 — PlatformIO latest documentation and macros above. For your specific case, I would start with

build_flags = -DDEBUG_ESP_CORE -DDEBUG_ESP_WIFI -DDEBUG_ESP_PORT=Serial

and then go from further from that output. Maybe DNS resolving fails, maybe the target TCP port is closed, maybe an out-of-memory, or something else…

Also, the mapping from “Arduino IDE debug level text” to actual macros can be found here: Arduino/boards.txt at c55f49bd6103dab81c6a389470f6d5bbbee399d0 · esp8266/Arduino · GitHub

2 Likes

Goodness! Thank you so much for such a thorough answer!

Now I guess I need to go actually FIX it. :slight_smile:

Thanks again @maxgerhardt! TschĂĽĂź!

It’s like a new language:

[hostByName] request IP for: b97945a0-dffc-427f-b81c-cf811e96faf1.mock.pstmn.io
[hostByName] Host: b97945a0-dffc-427f-b81c-cf811e96faf1.mock.pstmn.io IP: 107.21.29.187
:ref 1
:ctmo
:abort
:ur 1
:del
    125601 E: Connection failed: b97945a0-dffc-427f-b81c-cf811e96faf1.mock.pstmn.io, 80

How does one decode the debug?

DNS resolve OK but then connection timeout or connection refused maybe, then abort. Do you have a firewall in place that could block stuff to port 80, from either the node side or server side?

Sure don’t … that’s so weird. So you’re telling me Wireshark is in my future? :stuck_out_tongue:

1 Like

If you have tcpdump on the router of the network that’s great, a pcapng would help. On the node-side, you can still further debug it, since the error is now certain to be thrown in the LWIP network stack. It also has debug outputs:

Now the ESP8266 framework offers LWIP2 and LWIP as possible network stacks (default is “LWIP v2 Lower Memory”), but by default it doesn’t compile it from the source but just includes the precompiled binary blob. In order to be able to debug this on the node side, you would need to enable debugging here (e.g. for LWIP2)

(you must just define the macro LWIP_DEBUG via the build_flags)

and then recompile the LWIP source from instructions found here: How to (re)compile LWIP open source? · Issue #3143 · esp8266/Arduino · GitHub, which is only possible from Linux or Mac due to the make commands

And then exchange the precompiled binary for your own compiled one

Which binary exactly is selectable via Espressif 8266 — PlatformIO latest documentation.

However, another interesting test would be whether the behavior changes depending on the used stack. Could you tediously test whether one of the LWIP variants exhibits a different behavior?

Thanks so much for the additional direction Max. This has me flustered so I appreciate it very much. I’ll try the different LWIP variants first and report back.

As an aside: I seem to not be able to use my custom python script as a build flag as well as these others. If I put it first, it runs but the flags do not apply. If I put it last the flags work but the script does not run. Am I doing something wrong there or is that a shortcoming?

Behavior is the same on all stacks (except IPv6 which of course bombs out on compile.)

I’m still stepping through getting the LWIP variants compiled with debug. Not quite nailed that yet (although I just got one to compile without.

@maxgerhardt I failed miserably at recompiling the LWIP libs with debugging enabled. I can’t even explain where I went wrong but let’s leave that aside for a moment.

I created a simple DNS lookup sketch which I think may demonstrate the core of my issue:

#include <WiFiClient.h>
#include <ESP8266WiFi.h>
#include <Ticker.h>
#include <Arduino.h>

void setup();
void loop();
void printDNSServers();
void printIPAddressOfHost(const char*);

void setup() {
	Serial.begin(74880);
	Serial.setDebugOutput(true);
    Serial.flush();
    WiFi.begin(F("Xxxxxx"), F("xxxxxxxx"));
    Serial.println();
    Serial.print(F("Waiting for connection."));
    while (WiFi.status() != WL_CONNECTED) {
        Serial.print(F("."));
        delay(500);
    }
    Serial.println();
    Serial.println(F("Connected."));
    printDNSServers();
}

void loop() {
    printIPAddressOfHost("raspberrypi.local");

    Ticker lookup;
    lookup.attach(10, [lookup]() { printIPAddressOfHost("raspberrypi.local"); } );

    while (true) { yield(); }
}

void printDNSServers() {
    Serial.print(F("DNS #1: "));
	Serial.print(WiFi.dnsIP().toString().c_str());
	Serial.print(F(", DNS #2: "));
	Serial.println(WiFi.dnsIP(1).toString().c_str());
}

void printIPAddressOfHost(const char* host) {
    IPAddress resolvedIP;
    if (!WiFi.hostByName(host, resolvedIP)) {
        Serial.print(F("Host lookup failed for "));
		Serial.println(host);
    }
    Serial.print(F("Host: "));
	Serial.print(host);
	Serial.print(", IP: ");
	Serial.println(resolvedIP.toString().c_str());
}

The DNS (mDNS in this case) resolves the first time, but subsequent lookups fail:

connected with Xxxxxx, channel 8
dhcp client start...
wifi evt: 0
ip:192.168.xxx.155,mask:255.255.255.0,gw:192.168.xxx.1
wifi evt: 3

Connected.
DNS #1: 1.1.1.1, DNS #2: 1.0.0.1
[hostByName] request IP for: raspberrypi.local
[hostByName] Host: raspberrypi.local IP: 192.168.xxx.131
Host: raspberrypi.local, IP: 192.168.168.131
pm open,type:2 0
[hostByName] request IP for: raspberrypi.local
[hostByName] Host: raspberrypi.local lookup error: -5!
Host lookup failed for raspberrypi.local
Host: raspberrypi.local, IP: (IP unset)

Interestingly, going to an Internet address (using DNS of course) works correctly:

Connected.
DNS #1: 1.1.1.1, DNS #2: 1.0.0.1
[hostByName] request IP for: google.com
[hostByName] Host: google.com IP: 172.217.9.78
Host: google.com, IP: 172.217.9.78
[hostByName] request IP for: google.com
[hostByName] Host: google.com IP: 172.217.9.78
Host: google.com, IP: 172.217.9.78

Interesting, right? Not really sure what’s up here with this one.

I’m not even sure to which project I might post this to get some help.

That is indeed very interesting. But in the previous log you showed where no connection could be made, it did succeed in the DNS resolve but not in the TCP connection. So this might be a different error?

Since that error is quiete nicely reproducable I’d go to GitHub - esp8266/Arduino: ESP8266 core for Arduino for this one.

I can try to recompile it on Linux, since all you need is just the compiled .a file which has full LWIP debugging enabled. Should be done by tomorrow.

1 Like

No sir - this looks the same to me. In my notes I see the connection failing immediately but when I added a DNS lookup before it as a debug (which the connect does for itself) the DNS failed. So, I think it’s the same thing.

Thanks, did that with #6639.

Incidentally, it looks as if when called from the loop() it works fine, but fails when called via Ticker.

That would be awesome, thank you so much!

Seems like the answer in the issue is that in this case, the mDNS resolver should be used, which solves that problem? However in your original sketch and problem, it’s a normal DNS name it seems, so a different error after all.

I think I was too eager to create a minimum reproducible example, I need to dig back in once I get home. Traveling this week, and my project requires Internet access for the device of course. Hard to do that from a hotel.

I see. Anyway I compiled the LWIP2 library with debug enabled.

Procedure btw:

git clone --recursive https://github.com/esp8266/Arduino.git
cd Arduino

Then modify this file Arduino/tools/sdk/lwip2/builder/Makefile.arduino to point the TOOLS directory to PIO’s toolchain-xtensa folder as a relative path as viewed from Arduino/tools/sdk/lwip2/builder.

For me for example (since I cloned it in my home folder)

TOOLS=../../../../../.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-

To enable debug we can also just turn on a different flag which will activate subsequent flags

To change that value we just have to change it in that file

For the build I set UDUMP, ULWIPDEBUG and ULWIPASSERT to 1 and UNDEBUG to 0.

Then a make install in the Arduino/tools/sdk/lwip2 folder will compile two library versions (one IPv4 and the other IPv6 enabled). Filenames will follow the style BUILD=build-$$mss$$featname-v$$((4+2*v6)) where MSS is a TCP buffer size (Maximum Segment Size) and the end will be either v4 or v6. If it has “feature” in the name then it has the LWIP_FEATURE macro activated which activates a bunch of features (esp82xx-nonos-linklayer/lwipopts.h at 400428d380c9dff2a4d060e3ee78d5def4a279c2 · d-a-v/esp82xx-nonos-linklayer · GitHub)

Binaries at https://drive.google.com/open?id=1DkLeRE_FwV-EI9QmdCS1CF6grjYd7yBW

You would then have to pick one binary (for example liblwip6-1460-feat.a), then find out which LWIP2 library you’re currently linking against (pio run -v and look at the final linker command which has the list of libs which are linked in), then make a backup of that file and replace it with the downloaded one, and observe whether there is more debug in the original error case.

2 Likes

Thank you so much @maxgerhardt, such a complete answer! I’ll give that a go when I get back home tomorrow (I think).

Can’t get my device connected to the Internet at the hotel, so I spent my entire free time last night trying to get my RPi plugged into my laptop using OTG and then masquerading to an access point on the RPi so I could attach the esp8266 …

Yeah.

Lost lots of time. :slight_smile:

So I wanted to expand upon what I said previously. I spent a couple hours trying to come up with an example to post as an Issue on that repository because I knew it would be faced with critical review. All that and I failed. Therefore instead of getting towards where I could figure out what’s wrong with my code, I spent time studying for a test which I ended up failing anyway.

Curious though, even though I forgot to include mDNS in that example, the initial mDNS lookup in the loop() worked and the subsequent ones (called by Ticker()) failed. So I demonstrated what I was trying to demonstrate, but failed in basic programming. :slight_smile: When I get this debug library in place it will be interesting to figure out why it worked like that in my example.

Trying the Git Issue route again with #6650. Have a look at it now - I’ve created a good repro and it’s VERY strange.

I’m home now so I am going to use those debug libs you did for me.

2 Likes

@maxgerhardt no love yet on my issue, but in my code I’ve had a breakthrough:

ESP8266HTTPClient::connect() does not work when called in a function called via a Ticker() callback. That seems pretty specific but not sure why it would not work.

Is this called in an interrupt-context or in a thread-context?