Custom USB+TDM Audio on a Teensy 4.1

There’s this question, about how to get the Arduino Audio Library to work in PIO:

That’s great, but I have some projects that can’t use the Arduino Audio Library:

The Arduino Library is hard-coded for 16-bit 44.1kHz throughout, with a 128-sample buffer (2.9ms latency), and its USB connection is further hard-coded for stereo. That’s fine for just playing around with audio processing, or making synthesizers, or any number of things that don’t need the dynamic range or the I/O count of a serious pro-audio rig, but my specs are closer to the pro world.

For one project, I need 8 channels of 32-bit audio, on both USB and TDM (total of 16-in, 16-out, including both connections), and the physical/acoustic geometry requires about 0.2ms of latency, analog-to-analog. The datasheets for several TDM CODEC chips say that that latency and channel count are possible, only if I run them at 96kHz. (“Double Speed”, as Cirrus calls it; “Quad Speed” gives me 196kHz and even fewer samples of group delay, but not enough channels)

That comes up about 3 samples too fast at 96kHz, which is easy to fix with a 3-sample buffer. But because that latency needs to be exact and is not yet known precisely, I think the practical requirement is no buffer at all (or a buffer of 1, depending on how you think of it), except for an explicit, tweakable delay somewhere in the DSP chain.

Since that completely nukes the Arduino framework’s notion of audio, and imposes a (very!) strict timing requirement (MUST service EVERY sample IMMEDIATELY and completely, at 96kHz), how can I create a new project to do this?

I don’t mind breaking away from the Arduino framework altogether, but I would like a portable project folder that I can put on a fresh PC with a fresh installation of VSCode+PIO, and have it work.

My goal here is to have a template project that includes:

  • Globally-defined, compile-time-adjustable sample rate, likely 48kHz or a multiple of it.
  • Standards-compliant (works everywhere with no explicit drivers) USB Audio+Serial, with its own compile-time choice of audio channel count and bit-depth, each direction.
  • I2S/TDM audio, with its own compile-time choice of channel count and bit-depth.
  • Boilerplate DSP section, just to demonstrate how to read and write the USB and I2S/TDM I/O.
  • No audio buffer. (or a buffer of 1, depending on how you think of it) This is to support absolute minimal latency, at the expense of some throughput. Practically, it ends up with a 3-frame pipeline for I2S/TDM: clock in, process, clock out. (USB is exempt from this requirement, for obvious reasons)

Copy that template, change the audio settings as needed, and start writing the application code.

I suspect I’ll end up with a top-priority interrupt for each I2S/TDM frame, and the DSP application code also runs in that ISR, but I’m open to any architecture that essentially ends up with, in order of preemptive task priority:

  1. An “add your DSP code here” section that meets the timing requirements.
  2. A USB Audio+Serial driver that “just works” in the background.
  3. An “add your front-panel code here” section that does everything else too, including calculating coefficients, communicating with a PC app via USB Serial, managing the CODEC chip, etc.

It would be nice to still be able to use the other Arduino functions and libraries for #3, but that’s not a hard requirement. The sample timing, and easy flexibility on USB and I2S/TDM, are.

To give some more context:

I’ve done a lot of projects with 8-bit PIC’s and AVR’s, and I’m quite comfortable there. The (single!) datasheet for each is only 1k pages at most, well organized, and describes the entire chip. And there are tons of tutorials to get started and make something useful, both with and without the official toolchains at different points throughout their history.

It’s easy enough to flip between two sections of the datasheet and bit-bang the config registers to connect two peripherals in possibly an unconventional way (Is there such a thing in the 8-bit world? I’ve always seen it as a toolbox to fully understand and be creative with.), and make it do something useful with no CPU intervention. The CPU then, is just a data shuffler in lieu of a DMA, or maybe a crude PLL to keep things roughly in sync, and occasionally an actual data processor, like for a scene-based DMX lighting controller with scene masters and a grand master.

Most of my 8-bit projects are like that, with a strict requirement to keep the main loop “tight”, meaning that absolutely everything is a polled state machine. Very few interrupts, if any, because the interrupt controller isn’t recursive. Poll the flags in the main loop instead, and don’t ever stall for anything. Set a flag instead, or move to a “waiting” state of the state machine, and move on. Then the one interrupt really does get serviced immediately, and is not preempted by another one (“priority inversion”), and everything else happens “soon enough”.

I’m also familiar with one periodic, uninterruptible thing taking maybe 80% of the CPU time - bit-banging a lot of data through a weird protocol with strict timing, for example, or 25 independent triac dimmers with 8- or 10-bit resolution; or for these audio projects, the DSP code - so that everything else has to work around that. Effectively, the longer-running things just have a slower clock, by being interrupted so often, and the faster things are limited to the scan interval of the “time hog”, simply because they don’t run during that time. Evaluate the timing requirements accordingly, and provide for some of the buffered things to “catch up”, or flush their incoming buffers by looping in-place instead of one item per scan.

For the one with the weird protocol, I also slowed the UART way down from what I normally do, just to be sure that I can’t overflow its hardware receive buffer while the CPU is tied up in that 80%. (and I had it “catch up” as well, when it got the opportunity) Comparing that to a USB stack that MUST stay responsive or the host drops it, I’m really not too concerned about it. As the next highest priority, it’s guaranteed to get focus at least at the same rate that the DSP code does: 48kHz or higher.

If the USB driver always has enough spare time between audio samples to get all the way through its state machine, then it’s not a problem. And considering that I’ve made a standards-compliant USB HID driver from scratch on a 12MIPS 8-bit chip (the official testing app passed it) and had plenty of CPU time left for the application code, I can’t imagine that an Audio+Serial driver would have a problem. Even if it did take several times more cycles than my HID-only one, the 12MIPS that it ran on is only 2% of my Teensy’s 600. I’d get nervous about the DSP load long before that.

If the same ease of use and familiar reasoning were to carry over from the 8-bit world, to a near-GHz 32-bit processor with hardware floating-point, I’d jump on it in a heartbeat!

The Raspberry Pi Pico is almost there in terms of the ease of use part, with the way that they’ve made their SDK and how easy it is to start a raw project with it - no framework at all except for that SDK - but I soon discovered that it’s not fast enough for a complex Audio DSP, in both the clock speed and the floating-point hardware. And the TinyUSB library that it uses, has a bug in the Audio class that makes it unusable.

TinyUSB is modular enough that I could adjust the descriptors for what I need, and I’m pretty sure I got that part right - grab 4 bytes from the buffer, cast to int32_t, and get something sensible, likewise going the other way - but it “gets stuck” when the Linux host stops sending samples, as it does when the player app pauses, and it never “wakes up” until it re-enumerates. Trying to play again throws an error on the host screen.

There are several tickets for that on TinyUSB’s github, that have been there for years, and the only response I’ve seen from the author is, “USB Audio is hard.” So I think I’m looking for a different USB library too.

The Arduino library is not nearly as modular in the sense that TinyUSB is. Yes, I can choose from a list of major configurations, but that selection drives an equal number of #if’s to choose which giant monolithic block to drop in, which is entirely optimized for that one thing and nothing else. I thought USB itself was more flexible than that.

Are there 3 easy getting started guides for the Teensy 4.1 (or the NXP i.MX RT1062 chip that’s on it) that end up with:

  1. A flexible USB sound card + Serial that “just works”? (*)
  2. I2S/TDM to an external CODEC chip? (**)
  3. Arbitrary interrupts?

I think I can put those together and go from there, but I’m running into brick walls trying to even get that far.

(*) Audio+Serial would be more scriptable on the host side, in addition to the full-control app, so I’d rather do that if possible, but Audio+HID would work too. I haven’t written the app yet that talks to it, so I can still go either way.

(**) I2S and TDM are of course different protocols, but they’re similar enough that I could make a single driver on the Pi Pico’s Programmable I/O module that does both. Invert the frame sync and adjust the bit depth and channel count, and it does either one just as easily. If the Teensy / RT1062 works the same way in the same example project, that’s great! But if they’re too different, I might need a separate example for each.