I'm playing the sound out into the room, so shouldn't the mic be picking up the sound developments over time too?
Yes, but you are not seeing them in those highly simplified waveform graphs! They really aren't much use at all, except as a first approach to finding out if you have uneven response.
The problem isn't reverb (although that's an issue too, but isn't really valid for small rooms anyway, since small rooms cannot actually have a reverberant field). The problem is room modes. Modes are standing waves, a form of resonance, which build up rapidly and "store energy" (not strictly true, but its a good way to think of it). After the tone STOPS, the speaker cone stops moving, but the energy carries on resonating for a while, dying out slowly over a period of time. You CANNOT see that in your simple waveform graphs, and even if you could, you would not be able to analyze it from a waveform. It needs to be transformed into the time domain, and the frequency domain, since tones can excite modes that are not at their own frequency, but are nearby. You'd never see that on a waveform. Rather, you need graphs such as ETC (energy time curves), IR (impulse response), waterfall plots, and things like that, which you find in proper acoustic testing software, such as REW. For example, you'd never be able to see a reflection in a waveform plot, but they stick out like sore thumbs in IR and ETC graphs. You'd never be able to see the overall RT-60 across the entire spectrum (or rather, the modal equivalent in a small room) from a waveform, but it's glaringly obvious in waterfall plots, and somewhat obvious from an IR graph (or an ETC). And so on.
Sorry, but that's a very simplistic, basically meaningless test that you are trying to run: It tells you that you have a problem with the room, but it cannot tell you what that problem is: Is it the speakers? Flutter echo? A reflection? Comb filtering? A mode? Reverberation? Some other form of resonance? What is the amplitude of the problem? What is the Q of the problem? Is it related to other problems? Etc. As you just discovered, you cannot even determine if you are seeing a speaker problem or a room problem from such a test! It tells you very little that is useful at all, other than "You seem to have a problem".
I mean if there's reverbs, etc they should all be picked up.
Small rooms cannot possibly have a reverberant field, or at least not a statistically valid one. In order to have reverb, you need a free path of at least six or seven times the longest wavelength of interest. That's impossible in small rooms, for low frequencies: their dimensions are just nowhere near big enough to support a reverberant field: All you have in small rooms is direct field, reflections, and modal behavior (plus some other stuff that isn't really worth bothering about).
The tricky part is presumably teasing out what each point on the graph actually consists of. As you said above, the interpretation is the ticklish part.
Exactly! but you have to be looking at the right type of graph!
My thinking was that my main focus is accurate mixing and I'm not so concerned about the room characteristics for recording instruments with a mic ( I mainly play directly in).
Well, it's hard not to be blunt about this, so I guess I have to be!

Your thinking is not correct. Your goal in mixing is to hear exactly what was recorded, accurately, without any coloration from the room, the speakers, or the sound system. Therefore, the room must be neutral, acoustically. It must not do ANYTHING to the sound, except reproduce it with extreme accuracy. So the room must have flat frequency response, and flat RT-60 response, and evenly spread modal response. The RT-60 must be LESS than that of the associated live room, or you'd never hear reverb tails from the natural sound of the live room itself. It must be symmetrical and perfectly balanced, in order to have an accurate sound stage, clear stereo image, and perfect phantom center. There must be no first order reflections getting back to the mix position within 15 ms of the direct signal, and 15 dB down from it. Etc. There are a whole slew of characteristics that a good control has to meet, and they are all related to the room not doing anything except tell you the truth. I've seen a few people go into a great room for the first time, listen to the session for a bit, and declare that the room sounds terrible because there's no warmth to it. That's the whole point! The truth sounds ugly, sometimes, but that's what an engineer needs to hear: the truth: He has to hear exactly what is in the music, and nothing of the room. If he can make the mix sound good in such a room, then it will automatically "translate" well, and sound great elsewhere. But if the room is coloring the sound, or not telling the truth, then the engineer will subconsciously compensate for that, and even though the mix sounds good in the room, it won't sound good elsewhere: it wont translate, because the room "lied" to the engineer.
My hope at this stage is to get the room as accurate as possible with good layout and some appropriate treatment and then compensate for any remaining bad areas by knowing the points which I'll need to adjust for.
Well, some people can do that, but I never have been able to. If I try to mix in a room that doesn't tell the truth, then my mixes don't translate well, since I subconsciously "corrected" the room defects in the mix, so it sounds bad in places that don't have those defects, or that have different defects. I find it really hard to keep in mind all of the deficiencies of a room, and apply those to the mix while also trying to mix! Most people's auditive "memory" only lasts a few seconds anyway. That's why it is so hard to compare mixes without doing direct A/B switching on the fly: your brain just can't remember what the "old" version sounded like for more than a few seconds. So mixing in such a poor room is not easy (for me at least), and requires constant reference to a known track, in order to refresh my memory of what is wrong with the room, and apply it to the mix. Tiring, boring, non-creative. Not the way I want to mix. But if you can handle that, then fine.
I'm keen to get on with it, but I'm going to spend a few days learning more about testing first, especially as I can guarantee some obviously bad results in the existing position.
To be really honest, I suggest that you don't waste your time learning about testing, but rather learning about small room acoustics and studio design. The testing won't tell you how to design and build your room, and you won't really need to test much until you get the basic room into shape, with the basic treatment in place. THEN you test, to see what still needs to be done.
Sorry to be so blunt and to-the-point, but that's kind of the way we are around here: tell it like it is, no sugar coating. It tends to get your attention better!
I'm interested in trying out John's 'open air' version too. If I set it up in the garden there will be no walls for the sounds to bounce off, so I'll be interested to see what testing there looks like.
An "open air" studio in the middle of a an empty field is great! Until it rains...

And isolating such a room is pretty hard to handle as well...
This is the same LFSineTone file recorded with only a single speaker turned on.
That's the way you should ALWAYS run acoustic tests! If you test with more than one speaker on, then you have no idea if you are testing the room, the speakers, or the interference patterns created between them. Test one speaker at a time, calibrate one speaker at at time, then do the final stereo test with your ears.
The mic was placed about 5" from the cone
Which cone?

5" from the woofer means it is way off-axis for the tweeter, and therefore it is not a valid test. Correct testing distance for speakers is one meter (roughly 39 1/2"). Closer than that, and the fields from the various cones and ports have not yet fully merged.
The idea was to try and remove the room as much as possible.
The only way to do that is, well, to remove the room! One "old" way of testing speakers was to bury them in sand, out in the open, facing upwards, with the ground being exactly level with the front face of the speaker: Hang a proper omnidirectional mic with perfectly flat response exactly one meter above the speaker, exactly on the acoustic axis, and measure. That is an real infinite baffle measurement. More modern measurements are usually done in anechoic chambers.
but of course I can't remove the effect of the speakers themselves or the mic.
That's why you need a proper measurement mic. Like Brian said, The ECM 8000 is inexpensive, and not too bad.
What mic and speaker are you using for these tests right now? To me, it looks like a lot of what you are seeing on your waveform is speaker-related, especially the bass roll-off.
- Stuart -