wiki:SRCombinations

HistFactory Input to calculate Combined SR exclusion limits

HistFactory JSON files can be attached to the analysis to be able to estimate exclusion limits of combined likelihoods and global likelihoods. JSON files need to be under the same folder with info file and required information has to be added into the info file as shown below. Note that this is an additional subelement to the <analysis> main element which is described in here.

  <pyhf id="RegionA">
    <name>atlas_susy_2018_031_SRA.json</name>
    <regions>
      <channel name="SR_meff">
      SRA_L
      SRA_M
      SRA_H
      </channel>
      <channel name="VRtt_meff"></channel>
      <channel name="CRtt_meff"></channel>
    </regions>
  </pyhf>

where <pyhf id="RegionA"> is the identifier of the profile and will be printed in the output file to show the exclusion estimates calculated using this specific likelihood profile. It can be named as anything without using spaces. <name>atlas_susy_2018_031_SRA.json</name> is the name of the HistFactory JSON file. <channel name="SR_meff"> is the name of the channel as specified in the JSON file, please note that in case of the wrong declaration the profile will be ignored. In the example above, the channel SR_meff has 3 signal regions declared as SRA_L, SRA_M and SRA_H. These names correspond to the names of the signal regions as declared in the analysis recast. The ordering MUST be the same as in the JSON file, otherwise, the exclusion limit will be calculated wrong. To make sure, please refer to the analysis description. VRtt_meff and CRtt_meff does not contain any signal region since the Validation and Control regions are not included in the analysis recast due to lack of information. If further help needed, one can use HistFactory helper.

python write_histfactory_info.py -i FILE1.json FILE2.json FILE3.json

where -i refers to the interactive mode, which writes the file by giving you directions. The JSON files have to be named as in info file and placed in the same folder with the info file (~/madanalysis5/tools/<PADofChoice>/Build/SampleAnalyzer/User/Analysis/atlas_xyz_00_00.info)

In order to use HistFactory, one needs to install pyhf package which is automatically installed via the following command

install pyhf

After these steps, all given signal region combinations are automatically calculated. Additionally, MadAnalysis constructs a global likelihood profile to combine all given HistFactory files with the same parameter of interest.

How to write <pyhf> information with HistFactory helper as declared above:

$$ python write_histfactory_info.py -i atlas_susy_2018_031_SRA.json
Writing SR_meff...
	Please write the name of 3
	signal region from the analysis corresponding to the
	following observed values: 12.0, 3.0, 2.0
>>SRA_L SRA_M SRA_H
Writing VRtt_meff...
	Please write the name of 3
	signal region from the analysis corresponding to the
	following observed values: 210.0, 62.0, 22.0
>>
Please note that number of SR does not match...
Writing CRtt_meff...
	Please write the name of 3
	signal region from the analysis corresponding to the
	following observed values: 153.0, 52.0, 19.0
>>
Please note that number of SR does not match...

HistFactory/pyhf FAQ

JSON files are generally given in HEPData, under the resources of the analysis in question.

  • Where should I add the JSON files?

JSON files should be included with you cpp file, we encourage you to upload them in inspire alongside with the analysis code. Please change their names as indicated in info file before uploading.

  • pyhf installation is failing, how can I fix this?

pyhf has different dependencies besides MadAnalysis. Requirements can be installed via

pip install click tqdm six jsonschema jsonpatch pyyaml

After installing those packages, please try to install pyhf again.

Combining SR using covariance matrices with the simplified likelihood method

Covariance matrices provided for some CMS SUSY searches, can be used to build an approximate simplified likelihood. info files from the Public Analysis Database (PAD) can be extended with the covariance information from which !MadAnalysis5 builds a simplified likelihood. This allows to compute combined CLs and combined cross-section upper limits. The standard syntax of the info file

<analysis id="analysis name">
    <region type="signal" id="region name">
        <nobs> ... </nobs>
        <nb> ... </nb>
        <deltanb> ... </deltanb>
    </region>
    ...
</analysis>

specifying, for each SR, the number of observed events <nobs>, expected background events <nb> and their uncertainty <deltanb>, is therefore extended by adding in each <region> subelement, the successive covariance values with respect to all other regions:

<analysis id="analysis name" cov_subset="combined SRs">
    <region type="signal" id="region name">
        <nobs> ... </nobs>
        <nb> ... </nb>
        <deltanb> ... </deltanb>
        <covariance region="first SR name">...</covariance>
        <covariance region="second SR name">...</covariance>
        ...
        <covariance region="last SR name">...</covariance>
    </region>
    ...
</analysis>

where, for each <covariance> element, the associated region is specified with the region attribute. Every missing covariance value will be interpreted as a zero element in the covariance matrix. If a <region> subelement does not contain any covariance values, then it won't be included in the set of combined regions. This allows to combine only a subset of signal regions. For instance, CMS-SUS-16-039 only provides covariances for signal regions of type A. In addition, an attribute cov_subset must be added to the <analysis> main element to store information about which SRs subset is combined. In the case of CMS-SUS-16-039, this is:

<analysis id="cms_sus_16_039" cov_subset="SRs_A">

CMS-SUS-17-001 provided a set of 3 "super signal regions" (SSRs) as well as their covariances, aimed to be used for approximate reinterpretation. These SSRs were implemented in the PAD and thus have recently been updated with their covariance information. The info file of this implementation is attached to this page as a concrete example with a reduced number of signal regions.

The subset description will be printed to the output file with the results from simplified likelihood combination, after the usual exclusion information:

<set> <tag> <cov_subset> <exp> <obs> <CLs> ||

The successive elements consist of the dataset name, the analysis name, the description of the subset of combined SRs, the expected and observed cross section upper limits at 95% confidence level (CL), and finally the exclusion level, 1-CLs. A concrete example reads

defaultset  cms_sus_16_039  [SL]-SRs_A  10.4851515  11.1534040  0.9997  ||

where [SL] stands for simplified likelihood.

Last modified 4 weeks ago Last modified on 08/22/20 23:49:29

Attachments (2)

Download all attachments as: .zip