You are overflowing RAM by a mere 1248 bytes. There are 2 paths:
Minimize memory usage in your part
Minimize memory usage in USB CDC part
For your part, you’re allocating a 16kB buffer
Which is your maximum size for a read request. You read in the cartridge data byte-by-byte in this huge buffer
then calculate the checksum over the whole data and then print the whole buffer.
This is highly wasteful. You can tweak the protocol so that you can do:
print header with its checksum first
have a 1-byte buffer (or block size for more efficient reads), read from the cartridge, print it immediately on the UART. For every output chunk written, update the internal checksum (which you have implemented as a simple 16-bit summation). Since the whole thing is additive, you can just add to the previous checksum checksum += CalculateChecksum(new_data, new_data_len);
at the end of the packet, write your final data checksum
adapt the readout logic on the other side.
This enables you to not need a 16kB buffer and also your maximum read size restriction is gone and is basically infinity / cartridge length. Thus you have freed 16kB and can do USB-CDC again.
This “streaming” approach requires just the minimal amount of memory compared to your “whole block read” approach.
The other path would be to optimize buffer sizes within the USB CDC HAL. Best I could find is these queue buffers
So the receive queue will have length 64 * 3 = 192 and the TX queue 128 bytes. Not really significant and some more memory must be used elsewhere (HAL implementations, Arduino core layer etc). You can use Amap | Sergey Sikorskiy to find all memory usage using a map file.
But I’d really suggest a more efficient implementation of the readout logic.