Trigger Crate Overrun Corruption in run 20113036

 I've analyzed the trigger corruption from run 20113036.   The cause of the corruption in a technical sense is well known.  The trigger crates have a circular buffer of 65k bunch crossings, and if the readout time of the crate (measured from the trigger time) exceeds this value there is potentially corruption.   The question is what causes this.

Fast forwarding to the answer, the FCS trigger has short bursts of events every 6.5 seconds during this run at about 5khz rates.   However, I see that the EQ1 and EQ2 event size (prior to the corruption) is also increased during these periods, so it appears that there is some beam related event occurring that the FCS trigger is sensitive to, rather than a simple malfunction of the FCS trigger.

To start out, here is the spectrum of event sizes vs time with the instances of corrupted events highlighted in red.   There is no obvious correlation with the overall event size seen in the TPC/iTPC with this corruption.  

The actual corrupted events mask the true situation, however.   Here is the readout time of EQ1 and EQ2 as a function of time, again with the corrupted events highlighted:

Here we see clearly that the readout times are increased every 6.5 seconds, though only occasionally, does the time get large enough to cause corruption.  Note that this plot is zoomed in on one of the double peaks.

We can zoom in, in time to see the structure of the readout time.   The bursts last for about a .025 seconds.

If we look at the data taking rate over a six second range we see three characteristic  rates:    The baseline trigger rate at this time is about 520hz.   Every 6.5 seconds there are two bursts at about 5kHz.   With a region in between at about 750hz.


Below we show the same plot zoomed into a single set of bursts:

This overall behavior is actually visible in the scaler rates seen in the monitoring pages for the FCS trigger, although the 10 second integration time makes the signal rather ambiguous:

A better proof that the FCS trigger is the source of the higher rate is to plot the trigger id as a function of time.  Remember that the trigger is a bit mask.   The FCS trigger is bit 38, so the band at 2.7e11 is due to the FCS trigger.   You can see that the density of these triggers increases in the range where the corruption is seen.

The evidence that the FCS trigger is simply malfunctioning comes from the size of the EQ data (which is zero suppressed).   Here we see the EQ crate sizes over this same time period with the corruption indicated:

Importantly, notice that the corruption is occurring AFTER the event size gets large.   This implies that the large occupancy in these crates is not an artifact of the corruption itself.

*****

4/24/2019 update

I checked the bunch crossing of events and they clearly come at the same time as the injections:

I also looked at the following run (which was not an injection run, but did show some corrupt events).   The injection was still progressing during the first 30 seconds of this run, which is why we saw the corruption!