1: \documentclass{cpeauth}
2:
3: \begin{document}
4: \def\cop{Copyright \copyright\ 2000 John Wiley \&\ Sons, Ltd.}
5:
6: \CPE{1}{7}{00}{00}{2000}
7: \runningheads{K. Keahey et al.} {Fine-Grained Authorizatin in the Grid}
8:
9: \title{ Fine-Grained Authorization for Job Execution in the Grid: Design and Implementation}
10:
11: \author{K.~Keahey\footnotemark[2], V.~Welch\footnotemark[3], S.~Lang\footnotemark[2], B.~Liu\footnotemark[4] and
12: S.~Meder\footnotemark[3]}
13:
14:
15: \longaddress{Katarzyna Keahey, Argonne National Laboratory, Mathematics
16: and Computer Science Division, 9700 S. Cass Ave., Argonne, IL 60439}
17:
18:
19: \corraddr{Katarzyna Keahey, Argonne National Laboratory, Mathematics
20: and Computer Science Division, 9700 S. Cass Ave., Argonne, IL 60439}
21:
22: \footnotetext[2]{Argonne National Laboratory, Argonne, IL, USA}
23: \footnotetext[3]{University of Chicago, Chicago, IL, USA}
24: \footnotetext[4]{University of Houston, Houston, TX, USA}
25:
26: \cgsn{This work was supported by the Mathematical,
27: Information, and Computational Sciences Division subprogram of the
28: Office of Advanced Scientific Computing Research, Office of Science,
29: SciDAC Program, U.S. Department of Energy, under Contract}{W-31-109-ENG-38.}
30:
31: \received{1 August 2003}
32: \revised{1 August 2003}
33: \noaccepted{}
34:
35: \begin{abstract}
36: In this paper we describe our work on enabling fine-grained
37: authorization for resource usage and management. We address the need
38: of virtual organizations to enforce their own polices in addition to
39: those of the resource owners, in regard to both resource consumption
40: and job management. To implement this design, we propose changes and
41: extensions to the Globus Toolkit's version 2 resource management
42: mechanism. We describe the prototype and the policy language that we
43: designed to express fine-grained policies, and we present an analysis
44: of our solution.~\cop
45: \end{abstract}
46:
47: \keywords{Grids, Authorization, Policy Enforcement, Resource Management}
48:
49: \section{INTRODUCTION}
50:
51: As computational Grids [1] become more widespread, both the resource
52: pool and the pool of users wishing to use those resources become large
53: and tend to change dynamically. In such an environment, the
54: traditional mode of resource sharing, requiring Grid users to
55: establish direct relationships with resources they wish to use
56: (i.e. in the form of user accounts), becomes unmanageably complex. We
57: therefore observe a trend toward defining virtual organizations (VOs)
58: [1] allowing users to collaborate across different administrative
59: domains. Credentials issued by such organizations, used in conjunction
60: with resource provider policies, become the basis of sharing in
61: Grids. In this model, resource providers typically outsource some
62: subset of their policy administration to the VO. This strategy allows
63: the VO to coordinate policy across resources in different domains
64: forming a consistent policy environment in which its participants can
65: operate. Such an environment requires mechanisms for enabling the VO
66: to specify and enforce VO-specific policies on tasks and resources owned by VO
67: participants.
68:
69: Another developing trend is the need to express and enforce fine-grain
70: policies on the usage of resources and services. These can no longer
71: be expressed by simple access control; resource owners and VO
72: administrators may want to specify exactly what fractions or
73: configurations of resource may be used by a given entity. In addition,
74: while some VOs are focused on sharing of hardware resources (e.g.,
75: CPUs and storage), for others the primary motivation is to coordinate
76: sharing of application services [2] requiring access to both software
77: and hardware. In these cases the VO members should not be running
78: arbitrary code but only applications sanctioned by VO policy. Such
79: policies may be dynamic, adapting over time or even changing during
80: application execution, depending on factors such as past and current
81: resource utilization record, a member's role in the VO, deadline-based
82: priorities.
83:
84: In this paper, we address the requirements posed by these two
85: trends. We present a design for service and resource management that
86: enables a VO and resource managers to specify fine-grained service and
87: resource usage policies using VO credentials and allows resources to
88: enforce those policies. We implement our design as extensions to the
89: Globus Toolkit version 2 (GT2) resource management mechanism [3]. We
90: then consider policy enforcement in the context of two types of policy
91: target: application services and traditional computing resources. A
92: prototype of this implementation, combined with the Akenti
93: authorization system [4], was demonstrated at the SC02 conference and
94: is currently being adopted by the National Fusion Collaboratory [2].
95:
96: This paper is organized as follows. In Section 2, we present a use
97: case scenario and concrete requirements guiding our design. In Section
98: 3 we define our problem. We follow this by a discussion of the
99: capabilities of the Globus Toolkit's resource management (GRAM) [3]
100: mechanism (Section 4) and describe extensions needed to GRAM to support our
101: architecture (Section 5). In the last three sections, we analyze our solution, present future
102: directions, and conclude the paper.
103:
104:
105: \section{USE CASE SCENARIO AND REQUIREMENTS}
106:
107:
108: In a typical VO scenario, a resource provider has reached an agreement
109: with a VO to allow the VO to use some resource allocation. The
110: resource provider thinks of the allocation in a coarse-grained manner:
111: the provider is concerned about how many resources the VO can use as a
112: whole, not about how allocation is used inside the VO.
113:
114: The finer-grained specification of resource usage among the VO
115: participants is the responsibility of the VO. For example, the VO has
116: two primary classifications of its members:
117:
118: \begin{enumerate}
119:
120: \item [$\bullet$] One group is developing, installing, and debugging the
121: application services used by the VO to perform a scientific
122: computation. This group may need to run many types of processes
123: (e.g., compilers, debuggers, applications services) in order to debug
124: and deploy the VO application services, but should be consuming small
125: amounts of traditional computing resources (e.g., CPU, disk and
126: bandwidth) in doing so.
127:
128: \item [$\bullet$] The second group performs analysis using the
129: application services. This group may need to consume large
130: amounts of resources in order to run simulations related to their research.
131: \end{enumerate}
132:
133: Thus, the VO may wish to specify finer-grained policies that allow certain
134: users to use more or fewer resources than other users. These policies may
135: be dynamic and change at any point (for example, during runtime of an
136: application).
137:
138: In addition to policy on resource utilization, the VO wishes to be
139: able to manage jobs running on VO resources. For example, users often
140: have long-running computational jobs using VO resources, which the VO
141: often has short-notice high-priority jobs that require immediate
142: access to resources. This mode of operation requires suspending
143: existing jobs to free up resources, something that normally only the
144: user that submitted the job has the right to do. Since going through
145: the user who submitted the original job may not always be an option,
146: the VO wants to give a group of its members the ability to manage any
147: jobs using VO resources so they can instantiate high-priority jobs on
148: short notice.
149:
150: Supporting this scenario places several requirements on the authorization policy system:
151:
152: \begin{enumerate}
153:
154: \item {\em Combining policies from different sources}. In outsourcing
155: a portion of the policy administration to the VO, the policy enforcement
156: mechanism on the resource needs to be able to combine policies from
157: two different sources: the resource owner and the VO.
158:
159: \item {\em Fine-grained control of how resources are used}. For the VO
160: to express the differences between how its user groups are allowed to
161: use resources, the VO needs to be able to express policies regarding a
162: variety of aspects of resource usage, not just grant access.
163:
164: \item {\em VO-wide management of jobs and resource allocations}. The
165: VO wants to be able to treat jobs as resources themselves that can be
166: managed. This requirement poses a particular challenge because jobs are dynamic, so
167: static methods of policy management are not effective. Users may also
168: start jobs that shouldn't be under the domain of the VO; for example, a user
169: may have allocations on a resource other than those obtained through
170: the VO, and jobs invoked under this alternate allocation should not be
171: subject to VO policy.
172:
173: \item {\em Fine-grained, dynamic enforcement mechanisms}. In order to
174: support any policies, there must be enforcement mechanisms capable of
175: implementing these policies. Most resources today are capable of
176: policy enforcement at the user level: that is, all jobs run by a given
177: user will have the same policy applied to them. These mechanisms are
178: typically statically configured through file permissions, quotas and
179: similar mechanisms. Our scenario brings out the requirement that
180: enforcement mechanisms need to handle dynamic, fine-grained policies.
181: \end{enumerate}
182:
183: \section{INTERACTION MODEL}
184:
185: To support the scenario described in the preceding section, we need to
186: provide resource management mechanisms that allow the specification
187: and consistent enforcement of authorization and usage policies that
188: come from both the VO and the resource owner. In addition to allowing
189: the VO to specify policies on standard computational resources, such as
190: processor time and storage, we need to allow the VO to specify
191: policies on application services that it deploys, as well as
192: long-running computational jobs instantiated by VO members.
193:
194: In our work we assume the following interaction model:
195: \begin{enumerate}
196: \item A user submits a request, composed of the job's description to initiate a job. The request is accompanied by the user's Grid credentials, which may include the user's personal credentials as well as VO-issued credentials.
197: \item This request is evaluated against both local and VO policies by different policy evaluation points (PEPs), capable of interpreting the VO and the resource management policy respectively, located in the resource management facilities.
198: \item If the request is authorized by both PEPs, it is mapped to a set of local resource credentials (e.g., a Unix user account). Policy enforcement is carried out by local enforcement mechanisms operating based on local credentials.
199: \item During the job execution, a VO user may make management requests to the job (e.g., request information, suspend or resume a job, cancel a job).
200: \end{enumerate}
201:
202: \section{GRID RESOURCE MANAGEMENT IN GT2}
203:
204: The Globus Toolkit provides mechanisms for security, data management
205: and movement, resource monitoring and discovery (MDS) and resource
206: acquisition and management. In this paper we are focusing on the
207: functionality of resource acquisition and management, which is
208: implemented by the GRAM (Grid Resource Acquisition and Management)
209: system [3].
210:
211: The GRAM system has two major software components: the Gatekeeper and
212: the Job Manager. The Gatekeeper is responsible for translating Grid
213: credentials to local credentials (e.g., mapping the user to a local
214: account based on their Grid credentials) and creating a Job Manager
215: Instance to handle the specific job invocation request. The Job
216: Manager Instance (JMI) is a Grid service that instantiates and then
217: provides for the ability to manage a job. Figure 1 shows the
218: interaction of these elements; in this section we explain their roles
219: and limitations.
220:
221: \subsection{Gatekeeper}
222:
223: The Gatekeeper is responsible for authenticating the requesting Grid
224: user, authorizing their job invocation request and determining the
225: account in which their job should be run. Authentication, performed by
226: the Grid Security Infrastructure [5], verifies the validity of the
227: presented Grid credentials, the user's possession of those
228: credentials, and the user's Grid identity as indicated by those
229: credentials. Authorization is based on the user's Grid identity and a
230: policy contained in a configuration file, the gridmapfile, which
231: serves as an access control list. Mapping from the Grid identity to a
232: local account is also done with the policy in the gridmapfile,
233: effectively translating the user's Grid credential into a local user
234: credential. Finally, the Gatekeeper starts up a Job Manage Instance,
235: executing with the user's local credential. This mode of operation
236: requires the user to have an account on the resource and implements
237: enforcement by privileges of the account.
238:
239:
240:
241: \begin{figure}
242: \centering\includegraphics{GRAM1.ps}
243: \caption{Interaction of the main components of GRAM}
244: \end{figure}
245:
246:
247: \subsection{Job Manager Instance}
248:
249: The JMI parses the user's request, including the job description, and
250: interfaces with the resource's job control system (e.g., LSF, PBS) to
251: initiate the user's job. During the job's execution the JMI monitors
252: its progress and handles job management requests (e.g., suspend,
253: stop, query) from the user. Since the JMI is run under the user's
254: local credential, as defined by the user's account, the operating
255: system and local job control system are able to enforce local policy
256: on the JMI and user job by the policy tied to that account.
257:
258: The JMI has no authorization on job startup since the Gatekeeper has
259: already authorized it. Once the job has been started however, the JMI
260: accepts, authenticates, and authorizes management requests on the
261: job. In GT2, the authorization policy on these management requests is
262: static and simple: the Grid identity of the user making the request
263: must match the Grid identity of the user who initiated the job.
264:
265: \subsection{GRAM Shortcomings}
266:
267: The current GRAM architecture has a number of shortcomings when matched against the requirements we laid out in Section 2:
268:
269: \begin{enumerate}
270: \item Authorization of user job startup is coarse-grained. It is based solely on whether a user has an account on a resource.
271: \item Authorization on job management is coarse-grained and static. Only the user who initiated a job is allowed to manage it.
272: \item Enforcement is implemented chiefly through the medium of privileges tied to a statically configured local account (JMI runs under local user credential) and is therefore useless for enforcing fine-grained policy or dynamic policy coming from sources external to the resource (such as a VO).
273: \item Local enforcement depends on the rights attached to the user's account, not the rights presented by the user with a specific request; in other words, the enforcement vehicle is largely accidental.
274: \item A local account must exist for a user; as described in the introduction, this creates an undue burden on system administrators and users alike. This burden prevents wide adoption of the network services model in large and dynamically changing communities.
275: \end{enumerate}
276:
277: These problems can, and have been, in some measure alleviated by
278: clever setup. For example, the impact of (4) can be alleviated by
279: mapping a grid identity to several different local accounts with
280: different capabilities. Often, (5) is handled by working with
281: ``shared accounts'' (which, however, introduce many security, audit,
282: accounting and other problems) or by providing a limited
283: implementation of dynamic accounts [6,13,14].
284:
285:
286: \section{AUTHORIZATION AND ENFORCEMENT EXTENSIONS TO GRAM}
287:
288: In this section we describe extensions to the GT2 Grid Resource
289: Acquisition and Management (GRAM) that address the shortcomings
290: described above.
291:
292: \begin{figure}
293: \centering\includegraphics{newGRAM1.ps}
294: \caption{Changes to GRAM: the changed component (the Job Manager) has been highlighted in gray}
295: \end{figure}
296:
297: We extended the GRAM design to allow authorization callouts,
298: evaluating the user's job invocation and management requests in the
299: context of policies defined by the resource owner and VO. Our changes
300: to GRAM, prototyped using GT2, are illustrated in Figure 2. In our
301: prototype we experimented with policies written in plain text files
302: on the resource. These files included both local resource and VO
303: policies (in a real system the VO policies would be carried in the VO
304: credentials). This work has recently been tested with the Akenti [4]
305: system, representing the same policies as described here, and is being
306: adopted by the National Fusion Collaboratory [2]. In order to show
307: the generality of our approach, we also experimented with the
308: Community Authorization Service (CAS) [7]. Both of these systems
309: allow for multiple policies sources but have significant
310: differences, in terms of both architecture and programming APIs.
311:
312: \subsection{Policy Language}
313:
314: GRAM allows users to start and manage jobs by submitting requests
315: composed of an action, (e.g., initiate, cancel, provide status, change
316: priority) and, in the case of job initiation, a job
317: description. The job description is formulated in terms of attributes
318: using the Resource Specification Language (RSL) [3]. RSL consists of
319: attribute value pairs specifying job parameters referring to
320: executable description (executable name, directory where it is
321: located, etc.) and resource requirements (number of CPUs to be used,
322: maximum/minimum allowable memory, maximum time a job is allowed to
323: run, etc.).
324:
325: We have designed a simple policy language that allows for policy
326: specification in terms of RSL. The policy assumes that unless a
327: specific stipulation has been made, an action will not be
328: allowed. Otherwise, a user, or a group of users, is related to a set
329: of assertions. The rules have the form of user (or group) identity
330: separated by a colon from a set of action based assertions that follow
331: the RSL syntax.
332:
333: To express the rules, we extended the RSL set of attributes with the addition of the following:
334:
335: \begin{enumerate}
336: \item [$\bullet$] {\em Action.} This attribute represents what the
337: user wants to do with the job. Currently, it can take values of
338: ``start'', ``cancel'', ``information'', or ``signal'', where
339: ``signal'' describes a variety of job management actions such as
340: changing priority.
341:
342: \item [$\bullet$] {\em Jobowner.} The jobowner attribute denotes the
343: job initiator and can take values of the distinguished name of a
344: job initiator's Grid credential. It is used mainly to express VO-wide
345: management policy.
346:
347: \item [$\bullet$] {\em Jobtag.} The jobtag attribute has been
348: introduced in order to enable the specification of VO-wide job
349: management policies. A jobtag indicates the job membership in a group
350: of jobs for which policy can be defined. For example, a set of users
351: with an administrative role in the VO can be granted the right to
352: manage all jobs in a particular group. A policy may require a VO user
353: to submit a job with a specific jobtag, hence placing it into a group
354: that is manageable by another user (or group of users). At present,
355: jobtags are statically defined by a policy administrator.
356: \end{enumerate}
357:
358: We also added the following values to RSL:
359: \begin{enumerate}
360: \item [$\bullet$] ''NULL'' to denote a nonempty value
361:
362: \item [$\bullet$] ''SELF'' to allow expression of the job initiator's identity in a policy.
363: \end{enumerate}
364:
365: These extensions allow the following types of assertions to be expressed in policy:
366: \begin{enumerate}
367:
368: \item [$\bullet$] The job request is permitted to contain a particular
369: attribute, value, or set of values. This extension allows one, for
370: example, to limit the maximum number of processors used or to restrict
371: the name of the executable to a specified set. Multiple assertions can
372: be made about the same attribute.
373:
374: \item [$\bullet$] The job request is required to contain a particular attribute, possibly with a particular value or set of values. For example, the job request must specify a jobtag attribute to allow its management by a VO-defined group of administrators.
375:
376: \item [$\bullet$] The job request is required not to contain a
377: particular attribute. For example, the job request must not specify a
378: particular queue, which is reserved for high-priority users.
379: \end{enumerate}
380:
381: Our extensions allow a policy not only to limit the usage of
382: traditional computational resources but also to dictate the
383: executables they are allowed to invoke, allowing a VO to limit
384: resource consumption. Further, by introducing the notion of a jobtag,
385: we are able to express policies allowing users to manage jobs. The
386: example below illustrates how policy may be expressed.
387:
388: \begin{verbatim}
389: &/O=Grid/O=Globus/OU=mcs.anl.gov:
390: (action = start)(jobtag != NULL)
391:
392: /O=Grid/O=Globus/OU=mcs.anl.gov/CN= Bo Liu:
393: &(action = start)(executable = test1)(directory = /sandbox/test)(jobtag = ADS)(count<4)
394: &(action = start)(executable = test2)(directory = /sandbox/test)(jobtag = NFC)(count<4)
395:
396: /O=Grid/O=GlobusOU=mcs.anl.gov/CN= KateKeahey:
397: &(action = start)(executable = TRANSP)(directory = /sandbox/test)(jobtag = NFC)
398: &(action=cancel)(jobtag=NFC)
399:
400: \end{verbatim}
401:
402: The first statement in the policy specifies a requirement for a group
403: of users whose Grid identities start with the string {\tt ``
404: /O=Grid/O=Globus/OU=mcs.anl.gov''}. The requirement is that for job
405: invocations (where the action is ``start''), the job description must
406: contain a jobtag attribute with some value. This allows us to later
407: write management policies referring to that jobtag.
408:
409: The second statement in the policy refers to a specific user, Bo Liu,
410: and states that she can start jobs only using the ``test1'' and
411: ``test2'' executables. The rules also place constraints on the
412: directory from which the executable can be taken and the jobtag they
413: can be started with. In addition, a constraint is placed on the number
414: of processors Bo Liu can use ($count < 4$).
415:
416: The third statement in the policy gives
417: user Kate Keahey the right to start jobs using the ``TRANSP''
418: executable from a specific directory and with a specific jobtag. It
419: also gives her the right to cancel all the jobs with jobtag ``NFC'',
420: for example, jobs based on the executable ``test1'' started by Bo Liu.
421:
422: \subsection{Enforcing Policies in GRAM}
423:
424: We enforce our policies in GRAM by creating a policy evaluation point
425: (PEP) controlling all external access to a resource via GRAM; an
426: action is authorized depending on decision yielded by the PEP. Policy
427: can be enforced in GRAM at multiple PEPs corresponding to different
428: decision domains; for example, a PEP placed in the Gatekeeper can allow
429: or disallow access based on the user's Grid identity. Since our work
430: focuses on job and resource management, we established a PEP in the Job
431: Manager (JM). The JM parses user job descriptions and can therefore
432: evaluate policy that depends on the nature of the job request in
433: addition to the user's identity.
434:
435: Specifically, our additions consist of the following:
436:
437: \begin{enumerate}
438:
439: \item [$\bullet$] {\em An authorization callout API to integrate the PEP
440: with the JM}. The callout passes to the PEP authorization module the
441: relevant information, such as the credential of the user requesting a
442: remote job, the credential of the user who originally started the job,
443: the action to be performed (such as start or cancel a job), a unique
444: job identifier, and the job description expressed in RSL. The PEP
445: responds through the callout API with either success or an appropriate
446: authorization error. This call is made whenever an action needs to be
447: authorized, that is, before creating a job manager request and before
448: calls to cancel, query, and signal a running job.
449:
450: \item [$\bullet$] {\em Policy-based authorization for job
451: management}. As discussed in Section 4, each job management request
452: other than job startup is currently authorized by GRAM so that only
453: the user that started a job is allowed to manage it. We modified the
454: authorization in GRAM to enable Grid users other than the job
455: initiator to manage the job based on policy with decisions rendered
456: through the authorization callout API. In addition to changes to the
457: authorization model, this modification also required extensions to the
458: GRAM client allowing the client to process other identities than that
459: of the client (specifically, allowing it to recognize the identity of
460: the job originator).
461:
462: \item [$\bullet$] {\em RSL parameters}. We extended RSL to add the ``jobtag'' parameter allowing the user to submit a job to a specific job management group.
463:
464: \item [$\bullet$] {\em Errors}. We further extended the GRAM protocol to return authorization errors describing reasons for authorization denial as well as authorization system failures.
465: \end{enumerate}
466:
467: For easy integration of third-party authorization
468: solutions, the callout API provides facilities for runtime
469: configurable callouts. Callouts can be configured either through a
470: configuration file or an API call. Configuration consists of
471: specifying an abstract callout name, the path to the dynamic library
472: that implements the callout, and the symbol for the callout in the
473: library. Callouts are invoked through runtime loading of dynamic
474: libraries using GNU Libtool's dlopen-like portability library.
475: Arguments to the callout are passed by using the C variable argument list
476: facility.
477:
478: The insertion of callout points into JM required defining a GRAM
479: authorization callout type, (i.e., an abstract callout type), the exact
480: arguments passed to the callout, and a set of errors the callout may
481: return. These callout points are configured by parsing a global
482: configuration file.
483:
484:
485: \section{ANALYSIS}
486:
487:
488: Our solution overcame some of the shortcomings outlined in Section
489: 4.3. However our approach has a number of outstanding issues that we
490: discuss in this section.
491:
492: \subsection{Gateway Enforcement Model}
493:
494: A weakness of the gateway approach is that once a gateway authorizes
495: an action (for example, a job execution) it is no longer involved in
496: the continuous enforcement of the policy. GRAM maps incoming
497: requests to static local accounts to perform this continuous policy
498: enforcement.
499:
500: This has two consequences: (1) the local policy enforcement depends
501: on the privileges tied to the account that the user maps to on the local
502: system, rather than to the credential with which the request was made,
503: and (2) GRAM's abilities for continuous policy enforcement are limited
504: by local capabilities for policy enforcement.
505:
506: The first limitation could to some extent, be dealt with by using
507: dynamic accounts [6,13,14]. Dynamic accounts are accounts created and
508: configured on the fly by a resource management facility. This enables
509: the resource management system to run jobs on a system for users that
510: do not have an account on that system, and it also enables account
511: configuration relevant to policies for a particular resource
512: management request as opposed to a static user's configuration. To
513: some extent a dynamic account can be also used as a sandbox on the
514: user's rights (by modifying user's group membership to control file
515: system access, for example). Although work has been
516: done to support fine-grained policy for file access [8], Unix
517: accounts allow the user to modify only very few configuration
518: parameters, and hence the enforcement implemented in an account is
519: coarse-grained.
520:
521: A sandbox is an environment that imposes restrictions on resource
522: usage [9,15,16]. Sandboxing represents a strong enforcement solution, having
523: the resource operating system act as the policy evaluation and
524: enforcement modules, and is complementary to the gateway
525: approach. However, while the sandboxes provide a solution with relatively high
526: degree of security, they are hard to implement portably and may
527: introduce a performance penalty.
528:
529: \subsection{Job Manager Trust Model}
530:
531: In the GRAM architecture, the job manager runs with the user's local
532: credentials; this approach makes the job manager less than ideal for
533: policy enforcement. The reasons are twofold. First, from the security
534: perspective it is vulnerable to user tampering that could result in
535: changes in policy enforcement. Second, it effectively limits
536: enforcement potential for VO-wide job management. For example, a user
537: managing a job may cancel a job started by somebody else (by virtue of
538: the fact that the job manager is running with the job initiator's
539: local credential), but the user may not apply higher resource rights
540: to, for example, raise the job's priority.
541:
542: One possible solution to this problem in the context of the GRAM
543: architecture would be to locate the policy enforcement point in the
544: gatekeeper. However, this would increase the vulnerability of the
545: system by placing more complex code into the trusted component of the
546: system, increasing chances for logic errors, buffer overflows, and so
547: forth.
548:
549: Another possibility would be for policy enforcement to be done by
550: trusted services such as the local operating system. As discussed
551: earlier, this is difficult today because most operating systems do not
552: have the support for fine-grained policies that we
553: require. Investigation into sandboxing techniques remains an open
554: research issue.
555:
556: \subsection{Policy Language}
557:
558: Our implementation currently expresses policy in terms of the same
559: resource specification language (RSL) that GRAM uses to describe
560: jobs. While this allows for easy comparison of a job description with
561: a policy, it is not a standard policy language. Policy administrators
562: are not familiar with RSL, and our initial experiences show that
563: expressing policies in these terms is not natural to this
564: community. This difficulty is compounded by the fact that the syntax
565: is not be supported by standard policy tools. We are therefore
566: investigating existing policy languages as a replacement to our
567: RSL-based scheme. With the merging of Grid technologies and Web
568: service-based technologies in OGSA[10], languages based on XML, such
569: as XACML [11] and XrML [12], are being scrutinized by the Grid
570: security community in general and are viable candidates.
571:
572: \subsection{Relevance to Other Systems}
573:
574: Our work could be applied to systems similar to the Globus Toolkit
575: based on its relevance. For example, Legion authorization is
576: implemented by the use of a MayI [20] method on all Legion objects. In
577: the default implementation, this method offers similar functionality
578: as the Globus Toolkit, with access control lists and static mapping to
579: local accounts. Our work could be integrated with Legion in a similar
580: manner as we described here, through the reimplementation of object
581: creation routines (Legion's equivalent of GRAM) and the MayI method.
582: Condor's [21] interface on the other hand, is based more on compute
583: resources than instantiated jobs. Although it also uses access
584: control lists to manage its policy, it does not provide the per-job
585: interface of GRAM and Legion. This makes our work less relevant to that system.
586:
587:
588: \section{TOWARD GT3: FUTURE DIRECTIONS}
589:
590: To address the open issues summarized above, we are developing an
591: architecture building on abstractions and mechanisms defined as part
592: of the Open Grid Services Infrastructure (OGSI) [17]. The key to the
593: policy enforcement questions is the implementation of an abstraction
594: that would allow for dynamic creation and management of a local
595: protection environment (such as a Unix account, a sandbox [9,15,16], or a
596: virtual machine [18, 19]). Such abstraction would not only provide
597: protection but also facilitate resource management (by enforcing
598: limits on resource usage for a particular user) and maintain state
599: associated with its owner. We will call such an abstraction a {\em
600: dynamic session}.
601:
602: The OGSI abstractions of Grid Service and Grid Service factory are
603: suitable for this task of implementing such abstraction. Representing
604: a session as a Gird service will provide uniform management
605: capabilities across different technologies that could be used to
606: implement sessions. Standardizing session creation alleviates the
607: administrative burden involved in adding users to a VO, and it also
608: allows session creation based on rights granted to a particular user
609: at a specific time. To manage dynamic sessions, we can
610: leverage the OGSI Service Data Element (SDE) mechanism in order to make the
611: properties of a session (such as its termination time) accessible to
612: the session owner and modifiable by him or her.
613:
614: In a typical interaction, a user requests a session with certain
615: properties (i.e., resource requirements) from a session factory. The
616: factory authorizes the request and, on success, creates dynamic
617: session service and a local protection environment corresponding to
618: it. As part of the creation process, policy defining sharing rights
619: for the session is written. This policy can be modified by authorized
620: entities during the service's lifetime, as can other session
621: properties. The user can submit against that session, pending
622: conformance with the rights just created. Further, to facilitate
623: management of sessions that do not have to be reused multiple times
624: (i.e., do not preserve state between the times when they get used), the
625: resource management service can obtain sessions based on credentials
626: presented by the user requesting job submission or credentials of the
627: resource manager itself.
628:
629:
630: \section{CONCLUSIONS}
631:
632: We have described the design and implementation of an authorization
633: system allowing for enforcement of fine-grained policies and VO-wide
634: management of remote jobs. To implement this design, we have proposed
635: changes to the Globus Toolkit GRAM design and have designed a policy
636: language suitable for our needs. We are planning to use the same
637: mechanism to provide pluggable authorization in other components of
638: the Globus Toolkit.
639:
640:
641: \acks
642:
643: We are pleased to acknowledge contributions to this work by Mary
644: Thompson of LBNL.
645:
646: \section{\bf REFERENCES}
647:
648: \begin{enumerate}
649:
650: \item Foster, I., C. Kesselman, and S. Tuecke, The Anatomy of the
651: Grid: Enabling Scalable Virtual Organizations. International Journal
652: of High Performance Computing Applications, 2001. 15(3): p. 200-222.
653:
654: \item Keahey, K., T. Fredian, Q. Peng, D.P. Schissel, M. Thompson,
655: I. Foster, M. Greenwald, and D. McCune, Computational Grids in
656: Action: the National Fusion Collaboratory. Future Generation
657: Computing Systems (to ap-pear), October 2002. 18(8): p. 1005-1015.
658:
659: \item Czajkowski, K., I. Foster, N. Karonis, C. Kesselman, S. Martin,
660: W. Smith, and S. Tuecke, A Resource Management Architecture for
661: Meta-computing Systems, in 4th Workshop on Job Scheduling Strategies
662: for Parallel Processing. 1998, Springer-Verlag. p. 62-82.
663:
664: \item Thompson, M., W. Johnston, S. Mudumbai, G. Hoo, K. Jackson, and
665: A. Essiari, Certificate-based Access Control for Widely Distributed
666: Resources, in Proc. 8th Usenix Security Symposium. 1999.
667:
668: \item Butler, R., D. Engert, I. Foster, C. Kesselman, S. Tuecke,
669: J. Volmer, and V. Welch, Design and Deployment of a National-Scale
670: Authentication Infrastructure. IEEE Computer, 2000. 33(12): p. 60-66.
671:
672: \item Dynamic Accounts. http://www.gridpp.ac.uk/gridmapdir/.
673:
674: \item Pearlman, L., V. Welch, I. Foster, C. Kesselman, and
675: S. Tuecke, A Community Authorization Service for Group
676: Collaboration. in IEEE Workshop on Policies for Distributed Systems
677: and Networks. 2002.
678:
679: \item Lorch M. and D. Kafura, Supporting Secure Ad-hoc User
680: Collaboration in Grid Environments. in Proceedings of the 3rd
681: Int. Workshop on Grid Computing - Grid 2002, Baltimore, MD, USA. 2002.
682:
683: \item Chang, F., A. Itzkovitz, and V. Karamacheti, User-level
684: Resource-constrained Sandboxing. Proceedings of the USENIX Windows
685: Systems Symposium (previously USENIX-NT), 2000.
686:
687: \item Foster, I., C. Kesselman, J. Nick, and S. Tuecke, The Physiology
688: of the Grid: An Open Grid Services Architecture for Distributed
689: Sys-tems Integration. Open Grid Service Infrastructure WG,
690: Global Grid Forum, 2002.
691:
692: \item OASIS eXtensible Access Control Markup Language (XACML)
693: Committee Specification 1.0 (Revision
694: 1). http://www.oasis-open.org/committees/xacml/docs/s-xacml-specification-1.0-1.doc,
695: 2002.
696:
697: \item XRML. http://www.xrml.org/get\_XrML.asp.
698: \item Hacker, T. and B. Athey, A Methodology for Account Management in Grid Computing Environments. Proceedings of the 2nd International Workshop on Grid Computing, 2001.
699: \item Kapadia, N. H., R. J. Figueiredo, and J. Fortes. Enhancing the Scalability and Usability of Computational Grids via Logical User Accounts and Virtual File Systems. in 10th Heterogeneous Computing Workshop. 2001. San Francisco, California.
700: \item Bosilca, G., F. Capello, A. Djilali, G. Fedak, T. Hernault and F. Magniette, Performance Evaluation of Sandboxing Techniqes for Peer-to-Peer Computing.
701: \item Goldberg, I., D. Wagner, R. Thomas, and E. Brewer, A Secure Environment for Untrusted Helper Applications --- Confining the Wily Hacker, in Proc. 1996 USENIX Security Symposium. 1996.
702: \item Tuecke, S., K. Czajkowski, I. Foster, J. Frey, S. Graham, and C. Kesselman, Grid Service Specification. 2003: Open Grid Service Infrastructure WG, Global Grid Forum.
703: \item VMware: http://www.vmware.com/.
704: \item User Mode Linux (UML). http://user-mode-linux.sourceforge.net/.
705:
706: \item Humphrey, M., F. Knabe, A. Ferrari and A. Grimshaw,
707: Accountability and Control of Process Creation in Metasystems. 2000
708: Network and Distributed System Security Symposium, 2000.
709:
710: \item ``Condor Version 6.4.7 Manual: Security In Condor'', \\
711: http://www.cs.wisc.edu/condor/manual/v6.4/3\_7Security\_In.html, 2003.
712:
713:
714: \end{enumerate}
715:
716:
717:
718: \end{document}
719: