EGEE Home | Technical Pages Home | Search | EDMS Documents | People | Calendar | Agenda maker | Glossary

JRA1: Workload Management

Tools Testing Integration Information
Security Management

JRA1 Home | Workload Management Home | Mandate | People | Meetings | Presentations | Savannah Portal| Useful links | Actions list | EDMS

LB Architecture overview

The Logging and Bookkeeping service (LB) tracks jobs in terms of event(important points of job life, e.g. submission, finding a matching CE,
starting execution etc.)gathered from various WMS components as well as CEs (all those have to be instrumented with LB calls).
The events are passed to a physically close component of the LB infrastructure (locallogger) in order to avoid network problems.
This component stores them in a local disk file and takes over the responsibility to deliver them further.

The destination of an event is one of bookkeeping servers (assigned statically to a job upon its submission).
The server processes the incoming events to give a higher level view on the job states (e.g. Submitted, Running, Done)
which also contain various recorded attributes (e.g. JDL, destination CE name, job exit code, etc.).
Retrieval of both job states and raw events is available via legacy (EDG) and WS querying interfaces.

Besides querying for the job state actively, the user may also register for receiving notifications on particular job state changes (e.g. when a job terminates).
The notifications are delivered using an appropriate infrastructure.

Detailed LB description

Within the EDG WMS, upon creation each job is assigned a unique, virtually non-recyclable job identifier (JobId) in an URL form.
The server part of the URL designates the bookkeeping server which gathers and provides information on the job for its whole life.

LB tracks jobs in terms of events (eg Transfer from a WMS component to another one, Run and Done when the jobs starts and stops execution).
Each event type carries its specific attributes.
The entire architecture is specialized for this purpose and is job-centric: any event is assigned to a unique Grid job.

The events are gathered from various WMS components by the LB producer library, and passed on to the locallogger daemon,
running physically close to avoid any sort of network problems.
The locallogger's task is storing the accepted event in a local disk file. Once it's done, confirmation is sent back and the logging library call returns, reporting success.
Consequently, logging calls have local, virtually non-blocking semantics.

Further on, event delivery is managed by the interlogger daemon.
It takes the events from the locallogger (or the disk files on crash recovery), and repeatedly tries to deliver them to the destination
bookkeeping server (known from the JobId) until it succeeds finally.
Therefore the entire event delivery is highly reliable.
However, in the standard mode described so far it is asynchronous, there is no direct way for the caller to see whether an event has been already delivered.
Our experience shows that the semantics is suitable in the prevailing number of cases while being the most efficient in the erratic Grid environment.

The bookkeeping server processes the incoming events to give a higher level view on the job states (eg Submitted, Running, Done),
each having an appropriate set of attributes again. LB provides a public interface consumer API to retrieve them.
This interface is completely passive; it allows querying but no data are pushed beyond the LB server actively.

       Disclaimer Contact   Last Modified: Friday, 06-Apr-2007 10:07:49 CEST