I’ve had good experiences with whisper.cpp (should be in the AUR). I used the large model on my GPU (3060), and it filled 11.5 out of the 12GB of vram, so you might have to settle for a lower tier model. The speed was pretty much real time on my GPU, so it might be quite a bit slower on your CPU, unless the lower tier models are also a lot faster (never tested them due to lack of necessity).
The large model had pretty much perfect accuracy (only 5 or so mistakes in ~40 pages of transcriptions), and that was with Dutch audio recorded on a smartphone. If it can handle my pretty horrible conditions, your audio should (hopefully) be no problem to transcribe.
Not to be an unfunny nitpicker (I don’t know why I’m denying this, that kinda the whole point), but all iphones do have lossless audio streaming via AirPlay. I’m assuming that you specifically meant Bluetooth streaming, but then you should’ve said so. Furthermore, normal aptx isn’t high resolution, only aptx HD and aptx adaptive are. The phone does support aptx HD as well, but once again, you could’ve said so from the start (though 3 characters more or less might make a significant difference to most memes, this one certainly wouldn’t have had that problem)