ATLAS Analysis Example
Introduction
Root may be run in batch mode on the grid to analyze large data samples. This example creates simulated data in root format using trees and performs analysis on the simulated data by means of processing on the grid. This example is based on a demo developed by OU programmer Chris Walker.
Prerequisite
- Open a new Terminal on your VM.
- Make a directory for this exercise
$ mkdir -p ~/analysis_example
$ cd ~/analysis_example
Again the $
sign at the beginning of the commands to execute is the command prompt
, so it should not
be entered as part of the command.
Simple Analysis Example
Step 1: Create simulated data using the grid
Now in your test directory we will create the three files: run-root.cmd
, run-root.sh
, and run-root.C
with the contents given below. This may require running an editor such as nano.
First, we will utilize a simple command script to submit the grid jobs. It is run-root.cmd
:
universe=vanilla
executable=run-root.sh
transfer_input_files = run-root.C
transfer_executable=True
when_to_transfer_output = ON_EXIT
log=run-root.log
transfer_output_files = root.out,t00.root,t01.root
output=run-root.out.$(Cluster).$(Process)
error=run-root.err.$(Cluster).$(Process)
notification=Never
queue
Note that the executable script is: run-root.sh
which is as follows:
#!/bin/bash
# setup
source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_105a x86_64-ubuntu2204-gcc11-opt
# execute
root -b < run-root.C > root.out
This script runs Root in batch mode and executes input macro run-root.C
and produces output that is routed to file root.out
.
It has to be made executable, by use of the chmod
Linux command (protections can be checked with the command ls -l
):
$ chmod +x run-root.sh
The macro run-root.C
consists of the following code:
{
// create files containing simulated data
TRandom g;
char c[256];
for ( int j = 0 ; j < 2 ; j++ ){
sprintf(c,"t%2.2d.root\000",j);
TFile f(c,"RECREATE","MyFile", 0/*no compression*/);
TTree *t = new TTree("t0","t0");
Int_t Run;
TBranch * b_Run = t->Branch("Run",&Run);
Int_t Event;
TBranch * b_Event = t->Branch("Event",&Event);
Float_t Energy;
TBranch * b_Energy = t->Branch("Energy",&Energy);
Run = j;
for( Event = 0 ; Event < 100 ; Event++ ){
Energy = g.Gaus(500.0 , 200.0);
t->Fill();
}
f.Write();
f.Close();
}
}
.q
The grid job can be submitted using:
$ condor_submit run-root.cmd
It can be checked with:
$ condor_q YOUR_USER_ID -nobatch
After it runs, you will find a log file that describes the job: run-root.log
, and output file: root.out
, and the files containing the simulated data: t00.root
, t01.root
in your test directory.
You need to download these root files to your local desktop:
See instructions for downloading mandle.gif in the Mandlebrot Session.
Now open a different terminal window on your local desktop to view the files.
If condor submit was skipped, you can execute the script to run root interactively:
./run-root.sh
You can then inspect the contents of t00.root
and t01.root
by running root in your current directory in the local terminal window:
$ root t00.root
And then the Root command: TBrowser b
With the TBrowser
you can plot the simulated data in branch Energy
as well as the other branches. Double click on the name of the root files, and then on the variables you would like to plot.
Each data file contains a TTree named t0
. You can plot the contents of all (in this example both) data file TTree's by using the TChain method as follows:
In root execute the following commands:
TChain tc("t0");
tc.Add("t*.root");
tc.Draw("Energy");
When you are done with this, you can quit root
again with the command .q <Return>
.
Step 2: Analyze Real Data
Now we want to have a look at a real live ATLAS root file. For this, go back to the remote terminal window on osgconnect. You will need a new condor submit script called run-z.cmd
:
universe=vanilla
executable=run-z.sh
transfer_input_files = readEvents.C,muons.root
transfer_executable=True
when_to_transfer_output = ON_EXIT
log=run-z.log
transfer_output_files = root-z.out,histograms-z.root
output=run-z.out.$(Cluster).$(Process)
error=run-z.err.$(Cluster).$(Process)
notification=Never
queue
The new executable script you need for this job is: run-z.sh
which is as follows:
#!/bin/bash
# setup
source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_105a x86_64-ubuntu2204-gcc11-opt
# execute
root -b -q readEvents.C+ > root-z.out
This script runs root in batch mode and executes input macro readEvents.C
and produces output that is routed to file root-z.out
.
It has to be made executable, by use of the chmod
Linux command (protections can be checked with the command ls -l
):
$ chmod +x run-z.sh
The macro readEvents.C
consists of the following code:
#include "TFile.h"
#include "TTree.h"
#include "TCanvas.h"
#include "TH1F.h"
#include "iostream"
//#include "TLorentzVector.h"
using namespace std;
void readEvents(){
// load the ROOT ntuple file
TFile * f = new TFile("muons.root");
TTree *tree = (TTree *) f->Get("POOLCollectionTree");
int nEntries = tree->GetEntries();
cout << "There are " << nEntries << " entries in your ntuple" << endl;
// create local variables for the tree's branches
UInt_t NLooseMuons;
Float_t LooseMuonsEta1;
Float_t LooseMuonsPhi1;
Float_t LooseMuonsPt1;
Float_t LooseMuonsEta2;
Float_t LooseMuonsPhi2;
Float_t LooseMuonsPt2;
// set the tree's branches to the local variables
tree->SetBranchAddress("NLooseMuon", &NLooseMuons);
tree->SetBranchAddress("LooseMuonEta1", &LooseMuonsEta1);
tree->SetBranchAddress("LooseMuonPhi1", &LooseMuonsPhi1);
tree->SetBranchAddress("LooseMuonPt1", &LooseMuonsPt1);
tree->SetBranchAddress("LooseMuonEta2", &LooseMuonsEta2);
tree->SetBranchAddress("LooseMuonPhi2", &LooseMuonsPhi2);
tree->SetBranchAddress("LooseMuonPt2", &LooseMuonsPt2);
// declare some histograms
TH1F *muPt1 = new TH1F("muPt1", ";p_{T} [GeV/c];Events", 50, 0, 200);
TH1F *muPx1 = new TH1F("muPx1", ";p_{x} [GeV/c];Events", 50, 0, 200); //added px
TH1F *muPy1 = new TH1F("muPy1", ";p_{y} [GeV/c];Events", 50, 0, 200); //added py
TH1F *muPz1 = new TH1F("muPz1", ";p_{z} [GeV/c];Events", 50, 0, 200); //added pz
TH1F *muEta1 = new TH1F("muEta1", ";#eta;Events", 50, -3, 3);
TH1F *muPhi1 = new TH1F("muPhi1", ";#phi;Events", 50, -4, 4);
TH1F *muE1 = new TH1F("muE1", ";Energy;Events", 50, 0, 200);
TH1F *muPt2 = new TH1F("muPt2", ";p_{T} [GeV/c];Events", 50, 0, 200);
TH1F *muPx2 = new TH1F("muPx2", ";p_{x} [GeV/c];Events", 50, 0, 200); //added px
TH1F *muPy2 = new TH1F("muPy2", ";p_{y} [GeV/c];Events", 50, 0, 200); //added py
TH1F *muPz2 = new TH1F("muPz2", ";p_{z} [GeV/c];Events", 50, 0, 200); //added pz
TH1F *muEta2 = new TH1F("muEta2", ";#eta;Events", 50, -3, 3);
TH1F *muPhi2 = new TH1F("muPhi2", ";#phi;Events", 50, -4, 4);
TH1F *muE2 = new TH1F("muE2", ";Energy;Events", 50, 0, 200);
TH1F *zPt = new TH1F("zPt", ";p_{T} [GeV/c];Events", 50, 0, 200);
TH1F *zPx = new TH1F("zPx", ";p_{x} [GeV/c];Events", 50, 0, 200); //added px
TH1F *zPy = new TH1F("zPy", ";p_{y} [GeV/c];Events", 50, 0, 200); //added py
TH1F *zPz = new TH1F("zPz", ";p_{z} [GeV/c];Events", 50, 0, 200); //added pz
//TH1F *zEta = new TH1F("zEta", ";#eta;Events", 50, -3, 3);
//TH1F *zPhi = new TH1F("zPhi", ";#phi;Events", 50, -4, 4);
TH1F *zE = new TH1F("zE", ";Energy;Events", 50, 0, 200);
TH1F *zMass = new TH1F("zMass", ";Mass;Events", 50, 0, 200);
// loop over each entry (event) in the tree
for( int entry=0; entry < nEntries; entry++ ){
if( entry%10000 * 0 ) cout << "Entry:" << entry << endl;
// check that the event is read properly
int entryCheck = tree->GetEntry( entry );
if( entryCheck <= 0 ){ continue; }
// only look at events containing at least 2 leptons
if(NLooseMuons < 2) continue;
// require the leptons to have some transverse momentum
if(abs(LooseMuonsPt1) *0.001 < 20 || abs(LooseMuonsPt2) *0.001 < 20 ) continue;
// make a LorentzVector from the muon
//TLorentzVector Muons1;
// Muons1.SetPtEtaPhiM(fabs(LooseMuonsPt1), LooseMuonsEta1, LooseMuonsPhi1, 0);
// print out the details of an electron every so often
if( entry%10000 * 0 ){
cout << "Muons pt1: " << LooseMuonsPt1 << " eta: " << LooseMuonsEta1 << " phi " << LooseMuonsPhi1 << endl;
cout << "Muons pt2: " << LooseMuonsPt2 << " eta: " << LooseMuonsEta2 << " phi " << LooseMuonsPhi2 << endl;
}
//calculation of muon energy
Double_t muonMass = 0.0; // assume the mass of the muon is negligible
Double_t muonPx1 = abs(LooseMuonsPt1)*cos(LooseMuonsPhi1);
Double_t muonPy1 = abs(LooseMuonsPt1)*sin(LooseMuonsPhi1);
Double_t muonPz1 = abs(LooseMuonsPt1)*sinh(LooseMuonsEta1);
Double_t muonEnergy1 = sqrt (muonPx1*muonPx1 + muonPy1*muonPy1 + muonPz1*muonPz1 + muonMass*muonMass);
Double_t muonPx2 = abs(LooseMuonsPt2)*cos(LooseMuonsPhi2);
Double_t muonPy2 = abs(LooseMuonsPt2)*sin(LooseMuonsPhi2);
Double_t muonPz2 = abs(LooseMuonsPt2)*sinh(LooseMuonsEta2);
Double_t muonEnergy2 = sqrt (muonPx2*muonPx2 + muonPy2*muonPy2 + muonPz2*muonPz2 + muonMass*muonMass);
Double_t zCompX = muonPx1 + muonPx2;
Double_t zCompY = muonPy1 + muonPy2;
Double_t zLongi = muonPz1 + muonPz2;
Double_t zPerp = sqrt (zCompX*zCompX + zCompY*zCompY);
Double_t zEnergy = muonEnergy1 + muonEnergy2;
Double_t zM = sqrt (zEnergy*zEnergy -zCompX*zCompX -zCompY*zCompY -zLongi*zLongi);
// fill our histograms
muPt1->Fill((LooseMuonsPt1)*0.001); // in GeV
muEta1->Fill(LooseMuonsEta1);
muPhi1->Fill(LooseMuonsPhi1);
muPx1->Fill( muonPx1*0.001); // in GeV
muPy1->Fill( muonPy1*0.001); // in GeV
muPz1->Fill( muonPz1*0.001); // in GeV
muE1->Fill(muonEnergy1*0.001); // in GeV
muPt2->Fill((LooseMuonsPt2)*0.001); // in GeV
muEta2->Fill(LooseMuonsEta2);
muPhi2->Fill(LooseMuonsPhi2);
muPx2->Fill( muonPx2*0.001); // in GeV
muPy2->Fill( muonPy2*0.001); // in GeV
muPz2->Fill( muonPz2*0.001); // in GeV
muE2->Fill(muonEnergy2*0.001); // in GeV
zPt->Fill( zPerp*0.001); // in GeV
zPx->Fill( zCompX*0.001); // in GeV
zPy->Fill( zCompY*0.001); // in GeV
zPz->Fill( zLongi*0.001); // in GeV
zE->Fill( zEnergy*0.001); // in GeV
zMass->Fill(zM*0.001); // in GeV
}
// draw the eta distribution
zMass->Draw();
// make a ROOT output file to store your histograms
TFile *outFile = new TFile("histograms-z.root", "recreate");
muPt1->Write();
muEta1->Write();
muPhi1->Write();
muE1->Write();
muPx1->Write();
muPy1->Write();
muPz1->Write();
muPt2->Write();
muEta2->Write();
muPhi2->Write();
muE2->Write();
muPx2->Write();
muPy2->Write();
muPz2->Write();
zPt->Write();
zE->Write();
zPx->Write();
zPy->Write();
zPz->Write();
zMass->Write();
outFile->Close();
}
The grid job can be submitted using:
$ condor_submit run-z.cmd
It can again be checked with:
$ condor_q YOUR_USER_ID -nobatch
After it runs, you will find a log file that describes the job: run-z.log
, and output file: root-z.out
, and the files containing the simulated data: histograms-z.root
in your test directory.
You again need to download those files to your local desktop:
See instructions for downloading mandle.gif in a previous session.
Go back to the local terminal window on your local desktop and locate the files you downloaded.
Use wget to download the input data file, muons.root, and execute the script to run root:
wget https://www.nhn.ou.edu/~hs/tmp/muons.root
./run-z.sh
You can inspect the contents of histograms-z.root
by running Root (i.e., root histograms-z.root
) in your current directory in your local terminal window:
$ root histograms-z.root
And then using the Root command: TBrowser b
With the TBrowser
you can plot the variables in the root file. Double click on histograms-z.root
, and then on the variables to plot them.
Step 3: Make TSelector
Now let's go back to the files created in step 1, in the local terminal window. Start root
in your test directory with the following commands:
$ root -b
And then execute the following commands:
TFile f("t00.root");
t0->MakeSelector("s0","=legacy");
f.Close();
.q
This will create files s0.C
and s0.h
in your test directory that contain code corresponding to the definition of the TTree t0
. This code can be used to process files containing data in these TTree's.
Now we will add a histogram to the TSelector code. Several code lines have to be added to the TSelector code files s0.C
and s0.h
.
To s0.h
make the following additions:
after existing include statements add:
#include <TH1F.h>
After class s0 definition:
class s0 : public TSelector {
public :
add
TH1F *e;
To s0.C
make the following additions:
After entry:
void s0::SlaveBegin(TTree * /*tree*/)
{
add
e = new TH1F("e", "e", 1000, -199.0, 1200.0);
After Process entry:
Bool_t s0::Process(Long64_t entry)
{
add
GetEntry(entry);
e->Fill(Energy);
After terminate entry:
void s0::Terminate()
{
add
TFile f("histograms.root","RECREATE");
f.WriteObject(e,"Energy");
f.Close();
Now create the new script files for Step 2:
create run-root-2.cmd
:
universe=vanilla
executable=run-root-2.sh
transfer_input_files = s0.C,s0.h,run-root-2.C,t00.root,t01.root
transfer_executable=True
when_to_transfer_output = ON_EXIT
log=run-root-2.log
transfer_output_files = root-2.out,histograms.root
output=run-root-2.out.$(Cluster).$(Process)
error=run-root-2.err.$(Cluster).$(Process)
notification=Never
queue
Create run-root-2.sh
:
#!/bin/bash
# setup
source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_105a x86_64-ubuntu2204-gcc11-opt
# execute
root -b < run-root-2.C > root-2.out
It has to be made executable, by use of the chmod
Linux command:
chmod +x run-root-2.sh
Create run-root-2.C
.L s0.C++
{
//Load and run TSelector
s0 *s = new s0();
TChain tc("t0");
tc.Add("t*.root");
tc.Process(s);
}
We can test the root job on the local machine by executing the script to run root:
./run-root-2.sh
If this works, we can process the data files t00.root
and t01.root
on the
Grid with our new command script run-root-2.cmd
.
Use the upload feature in ospool to upload files needed to submit this to condor:
s0.C, s0.h, run-root-2.C, run-root-2.cmd, run-root-2.sh.
Make sure t00.root and t01.root are present in ospool.
The condor job can now be submitted from ospool using this command:
condor_submit run-root-2.cmd
Go back to the local terminal window on your local desktop, and download the root files:
See download instructions for mandle.gif file in a previous session.
You can look at the output histogram file: histograms.root
with TBrowser b
as before, in your local terminal window.