EGEE Home | Technical Pages Home | Search | EDMS Documents | People | Calendar | Agenda maker | Glossary

JRA1: Workload Management

Tools Testing Integration Information
Services
Workload
Management
Data
Management
Security Management

JRA1 Home | Workload Management Home | Mandate | People | Meetings | Presentations | Savannah Portal| Useful links | Actions list | EDMS



CE Blahp


Link to porting notes (DRAFT)

1.0   COMPONENT DESCRIPTION

	BLAHPD is a light component accepting commands according to the
	BLAH (Batch Local Ascii Helper) protocol to manage
	jobs on different Local Resources Management Systems (LRMS).
	In section 3 a detailed descripion of the protocol is given.
	BLAHPD has to be installed on a Computing Element (the head node
	of a farm) and exposes a BLAH interface for job controllers
	like CREAM and Condor gridmanager. BLAHPD is completely stateless,
	so it is not a job controller itself. It can be regarded as
	a modular interface between job state machines and LRMSs.
	Currently, modules for PBS and LSF have been implemented.


2.0   BLAHP COMMANDS

	The following list of commands represents the set
        of commands required for interaction with the BLAHP
        server, interfacing to a given Local Resource Management
        system. This is based on the minimum set of
        commands used in the original GAHP (v1.0.0) specification
        removing commands that are specific to the operation of the
        GRAM protocol (INITIALIZE_FROM_FILE, GASS_SERVER_INIT,
        GRAM_CALLBACK_ALLOW, GRAM_JOB_CALLBACK_REGISTER, GRAM_PING).
        The JOB_SIGNAL command may be initially left unimplemented
        for some of the batch systems (and in that case will return
        an error -E- state and will not be returned by COMMANDS).

		BLAH_JOB_CANCEL
		BLAH_JOB_HOLD
		BLAH_JOB_REFRESH_PROXY
		BLAH_JOB_RESUME
		BLAH_JOB_STATUS
		BLAH_JOB_SUBMIT
		COMMANDS
		QUIT
		RESULTS
		VERSION

       Optionally, the following two commands may also be implemented:

		ASYNC_MODE_ON
		ASYNC_MODE_OFF

	Currently unimplemented:
	       BLAH_JOB_SIGNAL

2.1   CONVENTIONS AND TERMS USED IN SECTION 3.2

	Below are definitions for the terms used in the sections to follow:

	<CRLF>

		The characters carriage return and line feed (in that
		order), _or_ solely the line feed character.

	<SP>

		The space character.

	line

		A sequence of ASCII characters ending with a <CRLF>.

	Request Line

		A request for action on the part of the BLAHP server.

	Return Line

		A line immediately returned by the BLAHP server upon
		receiving a Request Line.

	Result Line

		A line sent by the BLAHP server in response to a RESULTS
		request, which communicates the results of a previous
		asynchronous command Request.

	S: and R:

		In the Example sections for the commands below, the prefix
		"S: " is used to signify what the client sends to the BLAHP
		server.   The prefix "R: " is used to signify what the
		client receives from the BLAHP server.  Note that the "S: "
		or "R: " should not actually be sent or received.

2.2   BLAHP COMMAND STRUCTURE

         BLAHP commands consist of three parts:

	    * Request Line
	    * Return Line
	    * Result Line

	 Each of these "Lines" consists of a variable length character
         string ending with the character sequence <CRLF>.

	 A Request Line is a request from the client for action on the part of
	 the BLAHP server.  Each Request Line consists of a command code
	 followed by argument field(s).  Command codes are a string of
	 alphabetic characters.  Upper and lower case alphabetic characters
	 are to be treated identically with respect to command codes.  Thus,
	 any of the following may represent the blah_job_submit command:
            blah_job_submit
            Blah_Job_Submit
            blAh_joB_suBMit
            BLAH_JOB_SUBMIT
	In contrast, the argument fields of a Request Line are _case
	sensitive_.

	The Return Line is always generated by the server as an immediate
	response to a Request Line.  The first character of a Return Line will
	contain one the following characters:
		S - for Success
		F - for Failure
		E - for a syntax or parse Error
	Any Request Line which contains an unrecognized or unsupported command,
	or a command with an insufficient number of arguments, will generate an
	"E" response.

	The Result Line is used to support commands that would otherwise
	block.  Any BLAHP command which may require the implementation to block
	on network communication require a "request id" as part of the Request
	Line.  For such commands, the Result Line just communicates if the
	request has been successfully parsed and queued for service by the
	BLAHP server.  At this point, the BLAHP server would typically dispatch
	a new thread to actually service the request.  Once the request has
	completed, the dispatched thread should create a Result Line and
	enqueue it until the client issues a RESULT command.

2.3   TRANSPARENCY

	Arguments on a particular Line (be it Request, Return, or Result) are
	typically separated by a <SP>.  In the event that a string argument
	needs to contain a <SP> within the string itself, it may be escaped by
	placing a backslash ("\") in front of the <SP> character.  Thus, the
	character sequence "\ " (no quotes) must not be treated as a
	separator between arguments, but instead as a space character within a
	string argument.

2.4   SEQUENCE OF EVENTS

	Upon startup, the BLAHP server should output to stdout a banner string
	which is identical to the output from the VERSION command without the
	beginning "S " sequence (see example below).  Next, the BLAHP server
	should wait for a complete Request Line from the client (e.g. stdin).
	The server is to take no action until a Request Line sequence is
	received.

	Example:

		R: $GahpVersion: x.y.z Feb 31 2004 INFN\ Blahpd $
		S: COMMANDS
		R: S COMMANDS BLAH_JOB_CANCEL BLAH_JOB_SIGNAL BLAH_JOB_STATUS BLAH_JOB_SUBMIT COMMANDS QUIT RESULTS VERSION
		S: VERSION
		R: S $GahpVersion: x.y.z Feb 31 2004 INFN\ Blahpd $
                (other commands)
		S: QUIT
		R: S

2.5   COMMAND SYNTAX

	This section contains the syntax for the Request, Return, and Result
	line for each command.

	-----------------------------------------------

	COMMANDS

	List all the commands from this protocol specification which are
	implemented by this BLAHP server.

	+ Request Line:

		COMMANDS <CRLF>

	+ Return Line:

		S <SP> <command 1> <SP> <command 2> <SP> ... <command X> <CRLF>


	+ Result Line:

	  	None.

	-----------------------------------------------

	VERSION

	Return the version string for this BLAHP.  The version string follows
	a specified format (see below).  Ideally, the version entire version
	string, including the starting and ending dollar sign ($)
	delimiters, should be a literal string in the text of the BLAHP
	server executable.  This way, the Unix/RCS "ident" command can
	produce the version string.

	The version returned should correspond to the version of the
	protocol supported.

	+ Request Line:

		VERSION <CRLF>

	+ Return Line:

		S <SP> $GahpVesion: <SP> <major>.<minor>.<subminor> <SP>
		    <build-month> <SP> <build-day-of-month> <SP>
		    <build-year> <SP> <general-descrip> <SP>$ <CRLF>

		* major.minor.subminor = for this version of the
		    protocol, use version 1.0.0.

		* build-month = string with the month abbreviation when
		    this BLAHP server was built or released.  Permitted
		    values are: "Jan", "Feb", "Mar", "Apr", "May", "Jun",
		    "Jul", "Aug", "Sep", "Oct", "Nov", and "Dec".

		* build-day-of-month = day of the month when BLAHP server
		    was built or released; an integer between 1 and 31
		    inclusive.

		* build-year = four digit integer specifying the year in
		    which the BLAHP server was built or released.

		* general-descrip = a string identifying a particular
		    BLAHP server implementation.

	+ Result Line:

		None.

	+ Example:

		S: VERSION
		R: S $GahpVersion: x.y.z Feb 31 2004 INFN\ Blahpd $

	-----------------------------------------------

	QUIT

	Free any/all system resources (close all sockets, etc) and terminate
	as quickly as possible.

	+ Request Line:

		QUIT <CRLF>

	+ Return Line:

		S <CRLF>

		Immediately afterwards, the command pipe should be closed
		and the BLAHP server should terminate.

	+ Result Line:

		None.

	-----------------------------------------------

	RESULTS

	Display all of the Result Lines which have been queued since the
	last RESULTS command was issued.  Upon success, the first return
	line specifies the number of subsequent Result Lines which will be
	displayed.  Then each result line appears (one per line) -- each
	starts with the request ID which corresponds to the request ID
	supplied when the corresponding command was submitted.  The exact
	format of the Result Line varies based upon which corresponding
	Request command was issued.

	IMPORTANT: Result Lines must be displayed in the _exact order_ in
	which they were queued!!!  In other words, the Result Lines
	displayed must be sorted in the order by which they were placed into
	the BLAHP's result line queue, from earliest to most recent.

	+ Request Line:

		RESULTS <CRLF>

	+ Return Line(s):

		S <SP> <num-of-subsequent-result-lines> <CRLF>
		<reqid> <SP> ... <CRLF>
		<reqid> <SP> ... <CRLF>
		...

		* reqid = integer Request ID, set to the value specified in
		    the corresponding Request Line.

	+ Result Line:

		None.

	+ Example:

		S: RESULTS
		R: S 1
		R: 100 0

	-----------------------------------------------

	ASYNC_MODE_ON

	Enable Asynchronous notification when the BLAHP server has results
	pending for a client. This is most useful for clients that do not
	want to periodically poll the BLAHP server with a RESULTS command.
	When asynchronous notification mode is active, the GAHP server will
	print out an 'R' (without the quotes) on column one when the
	'RESULTS' command would return one or more lines. The 'R' is printed
	only once between successive 'RESULTS' commands. The 'R' is
	also guaranteed to only appear in between atomic return lines; the
	'R' will not interrupt another command's output.

	If there are already pending results when the asynchronous results
	available mode is activated, no indication of the presence of those
	results will be given. A GAHP server is permitted to only consider
	changes to it's result queue for additions after the ASYNC_MODE_ON
	command has successfully completed. GAHP clients should issue a
	'RESULTS' command immediately after enabling asynchronous
	notification, to ensure that any results that may have been added to
	the queue during the processing of the ASYNC_MODE_ON command are
	accounted for.

	+ Request Line:

		ASYNC_MODE_ON <CRLF>

	+ Return Line:

		S <CRLF>

		Immediately afterwards, the client should be prepared to
		handle an R <CRLF> appearing in the output of the GAHP
		server.

	+ Result Line:

		None.
	+ Example:

		S: ASYNC_MODE_ON
		R: S
		S: BLAH_JOB_CANCEL 00001 123.bbq.mi.infn.it
		R: S
		S: BLAH_JOB_CANCEL 00002 124.bbq.mi.infn.it
		R: S
		R: R
		S: RESULTS
		R: S 2
		R: 00001 0
		R: 00002 0

	Note that you are NOT guaranteed that the 'R' will not appear
	between the dispatching of a command and the return line(s) of that
	command; the GAHP server only guarantees that the 'R' will not
	interrupt an in-progress return. The following is also a legal
	example:
		S: ASYNC_MODE_ON
		R: S
		S: BLAH_JOB_CANCEL 00001 123.bbq.mi.infn.it
		R: S
		S: BLAH_JOB_CANCEL 00002 124.bbq.mi.infn.it
		R: R
		R: S
		S: RESULTS
		R: S 2
		R: 00001 0
		R: 00002 0

		(Note the reversal of the R and the S after BLAH_JOB_CANCEL 00002)

	-----------------------------------------------

	ASYNC_MODE_OFF

	Disable asynchronous results-available notification. In this mode,
	the only way to discover available results is to poll with the
	RESULTS command.  This mode is the default. Asynchronous mode can be
	enable with the ASYNC_MODE_ON command.

	+ Request Line:

		ASYNC_MODE_OFF <CRLF>

	+ Return Line:

		S <CRLF>

	+ Results Line:

		None

	+ Example:

		S: ASYNC_MODE_OFF
		R: S


	-----------------------------------------------

	BLAH_JOB_SUBMIT

	Submit a job request to a specified queue (specified in the
        submit classad). This will cause the job to be submitted to the
        batch system.

	+ Request Line:

		BLAH_JOB_SUBMIT <SP> <reqid> <SP> <submit classad> <CRLF>

		* reqid = non-zero integer Request ID

		* submit classad = valid submit description for the job,
                                   in string representation. See paragraph 3.0
				   for a description of the format.
			Here's a list of supported attributes with
			a brief description.
			"Cmd":	Full path of the executable in the local
				filesystem
			"Args":	List of individual arguments (no '/bin/sh'
				convention on argument separation, but
				separate arguments) for the executable
			"In": 	Full path in the local filesystem where
				the standard input for the executable is found
			"Out": 	Full path in the local filesystem where
				the standard output of the executable
				will be stored (at job completion).
			"Err": 	Full path in the local filesystem where
				the standard error of the executable
				will be stored (at job completion).
			"X509UserProxy":
				Full path wherethe proxy certificate
				is stored.
			"Env":	Semicolon-separated list of environment
				variables of the form:
        			<parameter> = <value>
			"Stagecmd":
				Sets if the executable of the job must
				be copied on the WorkerNode: can be 
				"TRUE" or "FALSE".
			"Queue":Queue in the local batch system where
				the job must be enqueued.
			"Gridtype":
				String indicating the underlying local batch
				system (currently "pbs" and "lsf" supported). 
				

	+ Return Line:

		<result> <CRLF>

		* result = the character "S" (no quotes) for successful
		    submission of the request (meaning that the request is
		    now pending), or an "E" for error on the
		    parse of the request or its arguments (e.g. an
		    unrecognized or unsupported command, or for missing or
		    malformed arguments).

	+ Result Lines:

		<reqid> <SP> <result-code> <SP> <error-string> <SP>
			<job_local_id> <CRLF>

		* reqid = integer Request ID, set to the value specified in
		    the corresponding Request Line.

		* result-code = integer equal to 0 on success, or an error code

		* error-string = description of error

		* job_local_id = on success, a string representing a unique
		    identifier for the job.  This identifier must not be bound
		    to this BLAHP server, but instead must be allowed to be
		    used in subsequent BLAHP server instantiations.  For
		    instance, the job_local_id must be implemented in such a
		    fashion that the following sequence of events by the caller
		    must be permissible:
		    	a) issue a BLAH_JOB_SUBMIT command
			b) read the job_local_id in the result line
			c) store the job_local_id persistently
			d) subsequently kill and restart the BLAHP server
		  	   process
			e) issue a BLAH_JOB_CANCEL command, passing it the
			   stored job_local_id value obtained in step (b).

	+ Example:
        	S: BLAH_JOB_SUBMIT 2 [\ Cmd\ =\ "/usr/bin/test.sh";\ Args\ =\ "'X=3:Y=2'";
	   	\ Env\ =\ "VAR1=56568";\ In\ =\ "/dev/null";\ Out\ =\ "/home/StdOutput";
	   	\ Err\ =\ "/home/error";\ x509userproxy\ =\ "/home/123.proxy";\ Stagecmd
          	\ =\ TRUE;\ Queue\ =\ "short";\ GridType\ =\ "pbs";\ ]'
		R: S
		S: RESULTS
		R: 2 0 No\ error pbs/20051012/2957

	-----------------------------------------------

	BLAH_JOB_CANCEL

	This function removes an IDLE job request, or kill all processes
	associated with a RUNNING job, releasing any associated resources.

	+ Request Line:

		BLAH_JOB_CANCEL <SP> <reqid> <SP> <job_local_id> <CRLF>

		* reqid = non-zero integer Request ID

		* job_local_id = job_local_id (as returned from
		    BLAH_JOB_SUBMIT) of the job to be canceled.

	+ Return Line:

		<result> <CRLF>

		* result = the character "S" (no quotes) for successful
		    submission of the request (meaning that the request is
		    now pending), or an "E" for error on the
		    parse of the request or its arguments (e.g. an
		    unrecognized or unsupported command, or for missing or
		    malformed arguments).

	+ Result Line:

		<reqid> <SP> <result-code> <SP> <error-string> <CRLF>

		* reqid = integer Request ID, set to the value specified in
		    the corresponding Request Line.

		* result-code = integer equal to 0 on success, or an error code

		* error-string = description of error

	+ Example:
		S: BLAH_JOB_CANCEL 1 pbs/20051012/2957.grid001.mi.infn.it
		R: S
		R: R
		S: RESULTS
		R: S 1
		R: 1 0 No\ error

	-----------------------------------------------

	BLAH_JOB_STATUS

	Query and report the current status of a submitted job.

	+ Request Line:

		BLAH_JOB_STATUS <SP> <reqid> <SP> <job_local_id> <CRLF>

		* reqid = non-zero integer Request ID

		* job_local_id = job_local_id (as returned from
		    BLAH_JOB_SUBMIT) of the job whose status is desired.

	+ Return Line:

		<result> <CRLF>

		* result = the character "S" (no quotes) for successful
		    submission of the request (meaning that the request is
		    now pending), or an "E" for error on the
		    parse of the request or its arguments (e.g. an
		    unrecognized or unsupported command, or for missing or
		    malformed arguments).

	+ Result Line:

		<reqid> <SP> <result-code> <SP> <error-string> <SP>
			<job_status> <SP> <result-classad> <CRLF>

		* reqid = integer Request ID, set to the value specified in
		    the corresponding Request Line.

		* result-code = integer equal to 0 on success, or an error code

		* error-string = description of error

		* job_status = if the result_code is 0 (success), then
		    job_status is set to an integer based upon the status of
		    the job as follows:
                      1 IDLE (job is waiting on the batch system queue)
                      2 RUNNING (job is executing on a worker node)
                      3 REMOVED (job was successfully cancelled)
                      4 COMPLETED (job completed its execution on the batch
                                   system)
		      5 HELD (job execution is suspended; job is still in
                              the batch system queue)
		* result-classad = Aggregate information about the job status.
		    At least the following attributes are defined:

		    *********************************************
			TO BE DEFINED
		    *********************************************

	Example:
		S: BLAH_JOB_STATUS 1 pbs/20051012/2958.grid001.mi.infn.it
		R: S
		R: R
		S: RESULTS
		R: S 1
		R: 1 0 No\ Error 2 [\ BatchjobId\ =\ "2958.grid001.mi.infn.it";
		   \ JobStatus\ =\ 2;\ WorkerNode\ =\ "\ grid001.mi.infn.it"\ ]


	-----------------------------------------------


        BLAH_JOB_REFRESH_PROXY

        Renew the proxy of an already submitted job. The job has to be in IDLE,
	RUNNING or HELD status.

        + Request Line:

                BLAH_JOB_REFRESH_PROXY <SP> <reqid> <SP> <job_local_id> <SP>
                        <proxy_file> <CRLF>

                * reqid = non-zero integer Request ID

                * job_local_id = job_local_id (as returned from
                    BLAH_JOB_SUBMIT) of the job whose proxy has to be
                    renewed.

                * proxy_file = path to the fresh proxy file.

        + Return Line:

                <result> <CRLF>

                * result = the character "S" (no quotes) for successful
                    submission of the request (meaning that the request is
                    now pending), or an "E" for error on the
                    parse of the request or its arguments (e.g. an
                    unrecognized or unsupported command, or for missing or
                    malformed arguments).

        + Result Line:

                <reqid> <SP> <result-code> <SP> <error-string> <CRLF>

                * reqid = integer Request ID, set to the value specified in
                    the corresponding Request Line.

                * result-code = integer equal to 0 on success, or an error code

                * error-string = description of error

        Example:
                S: BLAH_JOB_REFRESH 1 123.proxy
                R: S
                R: R
                S: RESULTS
                R: S 1
                R: 1 0 No\ Error


        -----------------------------------------------

        BLAH_JOB_HOLD

        This function always puts an IDLE job request in a HELD  status. If
	the job is already running RUNNING it can be HELD too, depending whether
        the underlying batch system supports this feature.

        + Request Line:

                BLAH_JOB_HOLD <SP> <reqid> <SP> <job_local_id> <CRLF>

                * reqid = non-zero integer Request ID

                * job_local_id = job_local_id (as returned from
                    BLAH_JOB_SUBMIT) of the job to be canceled.

        + Return Line:

                <result> <CRLF>

                * result = the character "S" (no quotes) for successful
                    submission of the request (meaning that the request is
                    now pending), or an "E" for error on the
                    parse of the request or its arguments (e.g. an
                    unrecognized or unsupported command, or for missing or
                    malformed arguments).

        + Result Line:

                <reqid> <SP> <result-code> <SP> <error-string> <CRLF>

                * reqid = integer Request ID, set to the value specified in
                    the corresponding Request Line.

                * result-code = integer equal to 0 on success, or an error code

                * error-string = description of error

        + Example:
                S: BLAH_JOB_HOLD 1 pbs/20051012/2957.grid001.mi.infn.it
                R: S
                R: R
                S: RESULTS
                R: S 1
                R: 1 0 No\ error


        -----------------------------------------------

        BLAH_JOB_RESUME

        This function puts an HELD job request in the status it was before the
        holding action.

        + Request Line:

                BLAH_JOB_RESUME <SP> <reqid> <SP> <job_local_id> <CRLF>

                * reqid = non-zero integer Request ID

                * job_local_id = job_local_id (as returned from
                    BLAH_JOB_SUBMIT) of the job to be canceled.

        + Return Line:

                <result> <CRLF>

                * result = the character "S" (no quotes) for successful
                    submission of the request (meaning that the request is
                    now pending), or an "E" for error on the
                    parse of the request or its arguments (e.g. an
                    unrecognized or unsupported command, or for missing or
                    malformed arguments).

        + Result Line:

                <reqid> <SP> <result-code> <SP> <error-string> <CRLF>

                * reqid = integer Request ID, set to the value specified in
                    the corresponding Request Line.

                * result-code = integer equal to 0 on success, or an error code

                * error-string = description of error

        + Example:
                S: BLAH_JOB_RESUME 1 pbs/20051012/2957.grid001.mi.infn.it
                R: S
                R: R
                S: RESULTS
                R: S 1
                R: 1 0 No\ error

        -----------------------------------------------

/* UNIMPLMENTED * UNIMPLMENTED * UNIMPLMENTED */
	BLAH_JOB_SIGNAL

	Send a signal (if possible) to a specified job. This
        has to be in the RUNNING status.

	+ Request Line:

		BLAH_JOB_SIGNAL <SP> <reqid> <SP> <job_local_id> <SP>
                        <signal> <CRLF>

		* reqid = non-zero integer Request ID

		* job_local_id = job_local_id (as returned from
		    BLAH_JOB_SUBMIT) of the job whose status is desired.

		* signal = an integer with the signal to send

	+ Return Line:

		<result> <CRLF>

		* result = the character "S" (no quotes) for successful
		    submission of the request (meaning that the request is
		    now pending), or an "E" for error on the
		    parse of the request or its arguments (e.g. an
		    unrecognized or unsupported command, or for missing or
		    malformed arguments).

	+ Result Line:

		<reqid> <SP> <result-code> <SP> <error-string> <SP>
			<job_status> <CRLF>

		* reqid = integer Request ID, set to the value specified in
		    the corresponding Request Line.

		* result-code = integer equal to 0 on success, or an error code

		* error-string = description of error

		* job_status = if the result_code is 0 (success), then
		    job_status is set to an integer based upon the status of
		    the job as follows (compare above):

			1 IDLE
  			2 RUNNING
  			3 REMOVED
   			4 COMPLETED
   			5 HELD

/* UNIMPLMENTED * UNIMPLMENTED * UNIMPLMENTED */

3.0	SUBMIT CLASSAD DESCRIPTION

        As described in BLAH_JOB_SUBMIT  the submit classad is a valid submit
	description for the job, supporting various optional and mandatory attributes.
	A valid submit classad has the following format;
	
	[\ Attr1\ =\ "value1";\ Attr2\ =\ "value2";....;\ AttrN\ =\ "valueN";\ ]

	where Attr1 ..AttrN are a variable number of the  attributes explained in
	paragraph 2 and  value1...valueN their values.
	Three attributes are mandatory: 

        - "Cmd", "X509UserProxy", "GridType".

  	The other attributes can be missing :

	- "Args", "In", "Out", "Err", "Env", "Stagecmd", "Queue .
 
	Previous BLAHPD versions didn't support the X509UserProxy attribute,
	and the path to the user proxy was set from an environment variable.
	Now this attribute is mandatory and setting a X509_USER_PROXY env variable
	has no effect on BLAHPD submissions.

	+ Example of a minimal submit classad:
          
	[\ Cmd\ =\ "/usr/bin/test.sh";\ x509userproxy\ =\ "/home/123.proxy";\ GridType\ =\ "pbs";\ ]'

** THIRD DRAFT ** THIRD DRAFT ** THIRD DRAFT ** THIRD DRAFT ** THIRD DRAFT **


       Disclaimer Contact   Last Modified: Monday, 28-Jan-2008 17:35:21 CET