The interval is a time duration like 10 seconds. Each client keeps track of time relative to its own audio stream so every 10 seconds it moves on to the next interval (1, 2 , 3, …).
The client receives audio that other users are currently recording, but it only starts playing that buffered audio at the beginning of the next interval.
How can we play in sync if we're not following jammr's metronome click? As long as we're playing a chord progression that is 10 seconds long it doesn't matter if the first chord starts at the very beginning of jammr's interval or any other time.
Here is an attempt at illustrating this when we play the chords A and G for two intervals starting at the beginning of jammr's interval:
me: AAAAAGGGGG|AAAAAGGGGG
you: xxxxxxxxxx|AAAAAGGGGG
You don't hear anything from me yet in the first interval and then you play along. We're in sync because we play the same chord at the same time. Now let's start the progression 3 seconds into the interval instead:
me: xxxAAAAAGG|GGGAAAAAGG|GGGAAAAAGG
you: xxxxxxxxxx|xxxAAAAAGG|GGGAAAAAGG
We are still in sync - we play the same chord at the same time even though the progression didn't start at the beginning of the interval. The same is true if we divide the time into eigth notes, sixteenth notes, or even start at an offset that isn't on a regular tempo division (like 1.12 seconds from the start).
It works because we're playing at the same tempo as jammr and the chord progression matches the interval length (10 seconds in this case).
If you're wondering how it looks when we don't follow these rules, here is a different example with the chord progression A F G and an interval of 4 (longer than the chord progression!):
me: AFGA|FGAF|GAFG
you: xxxx|AFGA|FGAF
We end up playing different chords at the same time because the interval doesn't match the length of the chord progression.
These examples are simplified. It's actually a little more complicated because I only hear what you played next interval, so we would need another line showing what I hear from you. I left that out because it's not necessary to show the gist of what is happening.