wiki:WritingAnalyses

An Introduction to MadAnalysis5 and Recasting LHC Results Part 3: Writing an analysis

This is a step-by-step introduction for using MadAnalysis 5 on reconstructed level event files. It consists of three parts:

Here we continue with Part 3.

This part assumes that you have properly installed MadAnalysis5 at Reco level.

You may want to start with using an implemented analysis in MA5 first before writing your own.

We will now write our own CAT-UED-16-013 analysis. More detailed documentation about how to do this and accessible variables and functions can also be found in arXiv:1405.3982.

Many observables and functions are already implemented in the MA5 implementation of CMS-SUS-13-011, which can be used as a guide for writing a more complicated analysis than what is done here.

Note that an important change from MA5 1.3 to 1.4 is the TLorentzvector:

% grep TLorentz  madanalysis5/madanalysis/*/*
madanalysis5/madanalysis/install/install_padma5tune.py:                if 'TLorentzVector' in line:
madanalysis5/madanalysis/install/install_padma5tune.py:                    newfile.write(line.replace('TLorentzVector','MA5::MALorentzVector'))
madanalysis5/madanalysis/install/install_pad.py:                if 'TLorentzVector' in line:
madanalysis5/madanalysis/install/install_pad.py:                    newfile.write(line.replace('TLorentzVector','MALorentzVector'))

Creating the directory

This proceeds as for using an implemented analysis in MA5. Again, we will create a directory from within the interactive MadAnalysis mode:

./bin/ma5 -R -E

We will call the analysis directory MINE. The name of the analysis is asked. We will call it cat_ued_16_013. A directory my_analysis is created with these files:

MINE/Build/SampleAnalyzer/User/Analyzer/cat_ued_16_013.cpp
MINE/Build/SampleAnalyzer/User/Analyzer/cat_ued_16_013.h

The cat_ued_13_016.h file can stay as is. We will now edit the .cpp file to write our own analysis.

Defining signal regions and cuts

We will now edit cat_ued_13_016.cpp to write our own analysis CAT-UED-16-013. In this file, you can find an Initialize, Finalize, and Execute part.

In the Initialize part, we will define our signal regions and cuts.

In the Execute part, we will write our code.

Signal regions

Note: If you do not have signal regions, define at least one, like Baseline or so, in order to obtain a resulting cutflow and histograms.

Suppose that our CAT-UED-16-013 contains signal regions in MET and MHT with 200 < MET < 500, MET > 500 and 100 < HT < 500, HT > 500. This results in 4 signal regions. These are defined with the syntax

Manager()->AddRegionSelection("signal_region_name")

in the cat_ued_16_013.cpp file in the initialize part, that is, it will look like:

// -----------------------------------------------------------------------------
// Initialize
// function called one time at the beginning of the analysis
// -----------------------------------------------------------------------------
bool cat_ued_16_013::Initialize(const MA5::Configuration& cfg, const std::map<std::string,std::string>& parameters)
{
  cout << "BEGIN Initialization" << endl;
  // initialize variables, histos
  // Initialize signal regions
  Manager()->AddRegionSelection("200<MET<500,100<HT<500"); // SR0
  Manager()->AddRegionSelection("MET>500,100<HT<500"); // SR1
  Manager()->AddRegionSelection("200<MET<500,HT>500"); // SR2
  Manager()->AddRegionSelection("MET>500,HT>500"); // SR3

Cuts

In this same initialize block, so immediately after, we define the cuts. Note that the order in which the cuts are defined, is the order in which the cuts will appear in the cutflow. If this does not match the order in which the cuts are applied, the names of cuts in the cutflow results will be wrong!

Here we differentiate between baseline cuts and signal region specific cuts. The baseline cuts in this case are:

  • HT > 100 GeV;
  • MET > 200 GeV;
  • a lepton veto for leptons with pT > 10 GeV;

Jets in this analysis must have a pT > 20 GeV.

The baseline cuts are defined as follows:

  // Baseline cuts
  Manager()->AddCut("nolepton");
  Manager()->AddCut("MET>200");
  Manager()->AddCut("HT>100");

In addition to the above cuts, we have the region specific cuts, in this case just those that define the signal regions. The signal-region specific cuts are defined as:

  // Signal-region specific cuts
  // MET < 500 for signal regions 0 and 2:
  string SR_0_2[] = {"200<MET<500,100<HT<500", "200<MET<500,HT>500"};
  Manager()->AddCut("MET<500", SR_0_2);

  // HT < 500 applies to signal regions 0 and 1:
  string SR_0_1[] = {"200<MET<500,100<HT<500", "MET>500,100<HT<500"};
  Manager()->AddCut("HT<500", SR_0_1);

  // HT > 500 applies to signal regions 2 and 3:
  string SR_2_3[] = {"200<MET<500,HT>500", "MET>500,HT>500"};
  Manager()->AddCut("HT>500", SR_2_3);

  // MET > 500 for signal regions 1 and 3:
  string SR_1_3[] = {"MET>500,100<HT<500", "MET>500,HT>500"};
  Manager()->AddCut("MET>500", SR_1_3);

The code

Now that we have defined everything we need in the Initialize part of cat_ued_16_013.cpp, we will write the code in the Execute part.

Define objects

First, we will select objects, in this case leptons (electrons and muons) and jets. This is done in the Execute block. Since we now work with a ROOT file, not an STDHEP file (as is shown in the example in the file), after the start of the Execute block:

// -----------------------------------------------------------------------------
// Execute
// function called each time one event is read
// -----------------------------------------------------------------------------
bool cat_ued_16_013::Execute(SampleFormat& sample, const EventFormat& event)
{

we will remove all following code and write the following for events at the reconstructed level:

  // ***************************************************************************
  // Example of analysis with reconstructed particles
  // Concerned samples : ROOT
  // ***************************************************************************

  double myEventWeight;
  if(Configuration().IsNoEventWeight()) myEventWeight=1.;
  else if(event.mc()->weight()!=0.) myEventWeight=event.mc()->weight();
  else
    {
      INFO << "Found one event with a zero weight. Skipping...\n";
      return false;
    }
  Manager()->InitializeForNewEvent(myEventWeight);

  if (event.rec()!=0)
    {
    cout << "---------------NEW EVENT-------------------" << endl;

Most notably, here we changed the event.mc() to event.rec(). Immediately after the above lines, we declare the jet and lepton objects needed for our analysis. These are reconstructed objects that can be read from the Delphes ROOT file:

    // =====first declare the empty containers:=====¬
    vector<const RecLeptonFormat*> electron, muon;
    vector<const RecJetFormat*> jet;

Now collect the electrons with a pT greater than 10 GeV:

     // =====fill the electrons container:=====
    for(unsigned int e=0; e<event.rec()->electrons().size(); e++)
    {
      const RecLeptonFormat *CurrentElectron = &(event.rec()->electrons()[e]);
      double pt = CurrentElectron->momentum().Pt();
      if(!(pt>10)) continue;
      electron.push_back(CurrentElectron);
    }

In arXiv:1405.3982 one can find all accessible objects and their functions.

Collect also the muons:

    // =====fill the muons container:=====
    for(unsigned int m=0; m<event.rec()->muons().size(); m++)
    {
      const RecLeptonFormat *CurrentMuon = &(event.rec()->muons()[m]);
      double pt = CurrentMuon->momentum().Pt();
      if(!(pt>10)) continue;
      muon.push_back(CurrentMuon);
    }

and collect jets, calculating HT on the way:

    // =====define HT======
    double HT = 0;
    // =====fill the jet containers and order=====
    for(unsigned int j=0; j<event.rec()->jets().size(); j++)
    {
      const RecJetFormat *CurrentJet = &(event.rec()->jets()[j]);
      double pt = CurrentJet->momentum().Pt();
      if(!(pt > 20)) continue;
      // Add pt to HT:
      HT += pt;
      jet.push_back(CurrentJet);
    }

Define the missing energy MET:

   // =====Get the missing ET=====
    TLorentzVector pTmiss = event.rec()->MET().momentum();
    double MET = pTmiss.Pt();

In this same block, we now apply the cuts.

Apply cuts

Very important: In order to have the names in the cutflow results (that are given per region) match the corresponding cut, the cuts MUST be applied in the same order as they are defined!

First, we apply the lepton veto:

    // =====Apply no-lepton cut========
    if(!Manager()->ApplyCut((electron.size()+muon.size() == 0),"nolepton")) return true;

the same for the other baseline cuts:

    // =====Apply HT100 cut============
    if(!Manager()->ApplyCut((HT > 100),"HT>100")) return true;

    // =====Apply MET200 cut===========
    if(!Manager()->ApplyCut((MET > 200),"MET>200")) return true;

Finally, apply the signal-region specific cuts:

    // =====Apply signal region cuts=======
    Manager()->ApplyCut(MET < 500, "MET<500"); // SR0, SR2
    Manager()->ApplyCut(HT < 500, "HT<500"); // SR0, SR1
    Manager()->ApplyCut(HT >= 500, "HT>500"); // SR2, SR3
    Manager()->ApplyCut(MET >= 500, "MET>500"); // SR1, SR3

and close the block and end the file with:

    // =====finished=====
    return true;
  }
}

You can also find the CAT-UED-16-013 C++ analysis file here. For testing, you can use this rootfile.

The signal region information

The cat_ued_16_013.info file contains the information about observed and numbers of background events in our signal regions. This is needed when one wants to compute confidence levels or upper limits using exclusion_CLs.py from the RecastingTools package.

For our analysis we have a luminosity of 20fb-1, and four each of the four signal regions we have observed numbers of events (nobs), estimated backgrounds (nb), and background errors (deltanb) as follows:

  • Signal region 0, 200<MET<500,100<HT<500: nobs=6159, nb=6090.0, deltanb=670.0
  • Signal region 1, MET>500,100<HT<500: nobs=2305, nb=2280.0, deltanb=270.0
  • Signal region 2, 200<MET<500,HT>500: nobs=451, nb=418.0, deltanb=66.0
  • Signal region 3, MET>500,100<HT<500: nobs=62, nb=57.4, deltanb=11.4

From this we create the following cat_ued_16_013.info

<!--
this XML file serves for the statistical interpretation of the MA5 simulation. 
it lists the number of observed events <nobs>, number of expected backgrounds <nb> 
and number of background uncertainty <deltanb> in each of the regions  
-->

<!--
to be put in the same directory as the analysis code,
i.e. Build/SampleAnalyzer/Analyzer/
-->

<analysis id="cat_ued_16_013">
  <lumi>20.0</lumi> <!-- in fb^-1 -->

  <!-- first, the signal regions targeting stop -> t neutralino -->

  <!--
     region definition: the attribute "id" has to match the name of the region
     as defined in the analysis code;
     the attribute "type" can be "signal" or "control" and is optional (default=signal)
  -->

  <region type="signal" id="200<MET<500,100<HT<500">
    <nobs>6159</nobs>
    <nb>6090.0</nb>
    <deltanb>670.0</deltanb>
  </region>

  <region type="signal" id="MET>500,100<HT<500">
    <nobs>2305</nobs>
    <nb>2280.0</nb>
    <deltanb>270.0</deltanb>
  </region>

  <region type="signal" id="200<MET<500,HT>500">
    <nobs>454</nobs>
    <nb>418.0</nb>
    <deltanb>66.0</deltanb>
  </region>

  <region type="signal" id="MET>500,100<HT<500">
    <nobs>62</nobs>
    <nb>57.4</nb>
    <deltanb>11.2</deltanb>
  </region>
</analysis>

Back to the Public Analysis Database main page.

Last modified 4 years ago Last modified on 11/14/16 00:14:35