Clarification for Embedding Binary Data

mvadu · February 24, 2020, 6:45pm

Currently I have a project where I have bunch of html files what to be compressed into a gzip byte array and to be used by the code (to server as a web response).

As of now I have a pre build script which runs another powershell script which does the actual gzip compression, generates the byte array in .CPP complaint format and updates a header file.

Now I came across Embedding Binary Data in the documentation. I have few clarifications:

The example in the documentation talks about two array variables for each of the file. _start and _end. How do I use them? In the actual example code linked there it uses only _start. What is the use of the second array, and what will it contain?

extern const uint8_t aws_root_ca_pem_start[] asm("_binary_src_aws_root_ca_pem_start");
extern const uint8_t aws_root_ca_pem_end[] asm("_binary_src_aws_root_ca_pem_end");

Reading the build script (/builder/frameworks/_embed_files.py) I can’t make out how the actual embedding happen. Does it provide any optimization compared to my current way of embedding the binary data in a byte array variable in a header file?

mvadu · February 24, 2020, 7:27pm

After thinking some more I think the start and end are better represented as pointers (as any array name is just a pointer) and to find the length of the binary blob we would need to know _start and _end.

maxgerhardt · February 24, 2020, 7:32pm

Correct. The symbols define the address of the start and end of the symbol, as a pointer. As such the second _end array will contain nothing per-se, it’s a marker to the end.

github.com

platformio/platform-espressif32/blob/9c5455280fafd715d2c1f149bad3a2d66e577dd7/builder/frameworks/_embed_files.py#L100-L112


      
          env.Append(
              BUILDERS=dict(
                  TxtToBin=Builder(
                      action=env.VerboseAction(" ".join([
                          "xtensa-esp32-elf-objcopy",
                          "--input-target", "binary",
                          "--output-target", "elf32-xtensa-le",
                          "--binary-architecture", "xtensa",
                          "--rename-section", ".data=.rodata.embedded",
                          "$SOURCE", "$TARGET"
                      ]), "Converting $TARGET"),
                      suffix=".txt.o"))
          )

xtensa-esp32-elf-objcopy is called on the to-be-embedded file which produces the given symbols and a .o file which will be linked in the final executable.

No, it’s a binary copy of the input file, put in section of read-only flash. If you do the same in your header file with like a const uint8_t[] the effect will be exactly the same: a copy of the binary content and a pointer to the start of it.

mvadu · February 27, 2020, 3:46am

Thank you @maxgerhardt. Your answer makes it easier to understand the mechanism. I think my current logic of the bytearry generation in prebuild and adding to a header file makes it more easier to manage the static content.