0311:cs0311025/auth1.tex

1: \documentclass{cpeauth}

2:

3: \begin{document}

4: \def\cop{Copyright \copyright\ 2000 John Wiley \&\ Sons, Ltd.}

5:

6: \CPE{1}{7}{00}{00}{2000}

7: \runningheads{K. Keahey et al.} {Fine-Grained Authorizatin in the Grid}

8:

9: \title{ Fine-Grained Authorization for Job Execution in the Grid: Design and Implementation}

10:

11: \author{K.~Keahey\footnotemark[2],  V.~Welch\footnotemark[3],  S.~Lang\footnotemark[2],  B.~Liu\footnotemark[4] and

12: S.~Meder\footnotemark[3]}

13:

14:

15: \longaddress{Katarzyna Keahey, Argonne National Laboratory, Mathematics

16: and Computer Science Division, 9700 S. Cass Ave., Argonne, IL 60439}

17:

18:

19: \corraddr{Katarzyna Keahey, Argonne National Laboratory, Mathematics

20: and Computer Science Division, 9700 S. Cass Ave., Argonne, IL 60439}

21:

22: \footnotetext[2]{Argonne National Laboratory,  Argonne, IL, USA}

23: \footnotetext[3]{University of Chicago, Chicago, IL, USA}

24: \footnotetext[4]{University of Houston, Houston, TX, USA}

25:

26: \cgsn{This work was supported by the Mathematical,

27: Information, and Computational Sciences Division subprogram of the

28: Office of Advanced Scientific Computing Research, Office of Science,

29: SciDAC Program, U.S. Department of Energy, under Contract}{W-31-109-ENG-38.}

30:

31: \received{1 August 2003}

32: \revised{1 August 2003}

33: \noaccepted{}

34:

35: \begin{abstract}

36: In this paper we describe our work on enabling fine-grained

37: authorization for resource usage and management. We address the need

38: of virtual organizations to enforce their own polices in addition to

39: those of the resource owners, in regard to both resource consumption

40: and job management. To implement this design, we propose changes and

41: extensions to the Globus Toolkit's version 2 resource management

42: mechanism.  We describe the prototype and the policy language that we

43: designed to express fine-grained policies, and we present an analysis

44: of our solution.~\cop

45: \end{abstract}

46:

47: \keywords{Grids, Authorization, Policy Enforcement, Resource Management}

48:

49: \section{INTRODUCTION}

50:

51: As computational Grids [1] become more widespread, both the resource

52: pool and the pool of users wishing to use those resources become large

53: and tend to change dynamically. In such an environment, the

54: traditional mode of resource sharing, requiring Grid users to

55: establish direct relationships with resources they wish to use

56: (i.e. in the form of user accounts), becomes unmanageably complex. We

57: therefore observe a trend toward defining virtual organizations (VOs)

58: [1] allowing users to collaborate across different administrative

59: domains. Credentials issued by such organizations, used in conjunction

60: with resource provider policies, become the basis of sharing in

61: Grids. In this model, resource providers typically outsource some

62: subset of their policy administration to the VO. This strategy allows

63: the VO to coordinate policy across resources in different domains

64: forming a consistent policy environment in which its participants can

65: operate. Such an environment requires mechanisms for enabling the VO

66: to specify and enforce VO-specific policies on tasks and resources owned by VO

67: participants.

68:

69: Another developing trend is the need to express and enforce fine-grain

70: policies on the usage of resources and services. These can no longer

71: be expressed by simple access control; resource owners and VO

72: administrators may want to specify exactly what fractions or

73: configurations of resource may be used by a given entity. In addition,

74: while some VOs are focused on sharing of hardware resources (e.g.,

75: CPUs and storage), for others the primary motivation is to coordinate

76: sharing of application services [2] requiring access to both software

77: and hardware. In these cases the VO members should not be running

78: arbitrary code but only applications sanctioned by VO policy. Such

79: policies may be dynamic, adapting over time or even changing during

80: application execution, depending on factors such as past and current

81: resource utilization record, a member's role in the VO, deadline-based

82: priorities.

83:

84: In this paper, we address the requirements posed by these two

85: trends. We present a design for service and resource management that

86: enables a VO and resource managers to specify fine-grained service and

87: resource usage policies using VO credentials and allows resources to

88: enforce those policies. We implement our design as extensions to the

89: Globus Toolkit version 2 (GT2) resource management mechanism [3]. We

90: then consider policy enforcement in the context of two types of policy

91: target: application services and traditional computing resources. A

92: prototype of this implementation, combined with the Akenti

93: authorization system [4], was demonstrated at the SC02 conference and

94: is currently being adopted by the National Fusion Collaboratory [2].

95:

96: This paper is organized as follows. In Section 2, we present a use

97: case scenario and concrete requirements guiding our design. In Section

98: 3 we define our problem. We follow this by a discussion of the

99: capabilities of the Globus Toolkit's resource management (GRAM) [3]

100: mechanism (Section 4) and describe extensions needed to GRAM to support our

101: architecture (Section 5).  In the last three sections, we analyze our solution, present future

102: directions, and conclude the paper.

103:

104:

105: \section{USE CASE SCENARIO AND REQUIREMENTS}

106:

107:

108: In a typical VO scenario, a resource provider has reached an agreement

109: with a VO to allow the VO to use some resource allocation. The

110: resource provider thinks of the allocation in a coarse-grained manner:

111: the provider is concerned about how many resources the VO can use as a

112: whole, not about how allocation is used inside the VO.

113:

114: The finer-grained specification of resource usage among the VO

115: participants is the responsibility of the VO. For example, the VO has

116: two primary classifications of its members:

117:

118: \begin{enumerate}

119:

120: \item [$\bullet$] One group is developing, installing, and debugging the

121: application services used by the VO to perform a scientific

122: computation. This group may need to run many types of processes

123: (e.g., compilers, debuggers, applications services) in order to debug

124: and deploy the VO application services, but should be consuming small

125: amounts of traditional computing resources (e.g., CPU, disk and

126: bandwidth) in doing so.

127:

128: \item [$\bullet$] The second group performs analysis using the

129: application services. This group may need to consume large

130: amounts of resources in order to run simulations related to their research.

131: \end{enumerate}

132:

133: Thus, the VO may wish to specify finer-grained policies that allow certain

134: users to use more or fewer resources than other users. These policies may

135: be dynamic and change at any point (for example, during runtime of an

136: application).

137:

138: In addition to policy on resource utilization, the VO wishes to be

139: able to manage jobs running on VO resources. For example, users often

140: have long-running computational jobs using VO resources, which the VO

141: often has short-notice high-priority jobs that require immediate

142: access to resources. This mode of operation requires suspending

143: existing jobs to free up resources, something that normally only the

144: user that submitted the job has the right to do. Since going through

145: the user who submitted the original job may not always be an option,

146: the VO wants to give a group of its members the ability to manage any

147: jobs using VO resources so they can instantiate high-priority jobs on

148: short notice.

149:

150: Supporting this scenario places several requirements on the authorization policy system:

151:

152: \begin{enumerate}

153:

154: \item {\em Combining policies from different sources}. In outsourcing

155: a portion of the policy administration to the VO, the policy enforcement

156: mechanism on the resource needs to be able to combine policies from

157: two different sources: the resource owner and the VO.

158:

159: \item {\em Fine-grained control of how resources are used}. For the VO

160: to express the differences between how its user groups are allowed to

161: use resources, the VO needs to be able to express policies regarding a

162: variety of aspects of resource usage, not just grant access.

163:

164: \item {\em VO-wide management of jobs and resource allocations}. The

165: VO wants to be able to treat jobs as resources themselves that can be

166: managed. This requirement poses a particular challenge because jobs are dynamic, so

167: static methods of policy management are not effective. Users may also

168: start jobs that shouldn't be under the domain of the VO; for example, a user

169: may have allocations on a resource other than those obtained through

170: the VO, and jobs invoked under this alternate allocation should not be

171: subject to VO policy.

172:

173: \item {\em Fine-grained, dynamic enforcement mechanisms}. In order to

174: support any policies, there must be enforcement mechanisms capable of

175: implementing these policies. Most resources today are capable of

176: policy enforcement at the user level: that is, all jobs run by a given

177: user will have the same policy applied to them. These mechanisms are

178: typically statically configured through file permissions, quotas and

179: similar mechanisms. Our scenario brings out the requirement that

180: enforcement mechanisms need to handle dynamic, fine-grained policies.

181: \end{enumerate}

182:

183: \section{INTERACTION MODEL}

184:

185: To support the scenario described in the preceding section, we need to

186: provide resource management mechanisms that allow the specification

187: and consistent enforcement of authorization and usage policies that

188: come from both the VO and the resource owner. In addition to allowing

189: the VO to specify policies on standard computational resources, such as

190: processor time and storage, we need to allow the VO to specify

191: policies on application services that it deploys, as well as

192: long-running computational jobs instantiated by VO members.

193:

194: In our work we assume the following interaction model:

195: \begin{enumerate}

196: \item A user submits a request, composed of the job's description to initiate a job. The request is accompanied by the user's Grid credentials, which may include the user's personal credentials as well as VO-issued credentials.

197: \item This request is evaluated against both local and VO policies by different policy evaluation points (PEPs), capable of interpreting the VO and the resource management policy respectively, located in the resource management facilities.

198: \item If the request is authorized by both PEPs, it is mapped to a set of local resource credentials (e.g., a Unix user account). Policy enforcement is carried out by local enforcement mechanisms operating based on local credentials.

199: \item During the job execution, a VO user may make management requests to the job (e.g., request information, suspend or resume a job, cancel a job).

200: \end{enumerate}

201:

202: \section{GRID RESOURCE MANAGEMENT IN GT2}

203:

204: The Globus Toolkit provides mechanisms for security, data management

205: and movement, resource monitoring and discovery (MDS) and resource

206: acquisition and management. In this paper we are focusing on the

207: functionality of resource acquisition and management, which is

208: implemented by the GRAM (Grid Resource Acquisition and Management)

209: system [3].

210:

211: The GRAM system has two major software components: the Gatekeeper and

212: the Job Manager. The Gatekeeper is responsible for translating Grid

213: credentials to local credentials (e.g., mapping the user to a local

214: account based on their Grid credentials) and creating a Job Manager

215: Instance to handle the specific job invocation request. The Job

216: Manager Instance (JMI) is a Grid service that instantiates and then

217: provides for the ability to manage a job.  Figure 1 shows the

218: interaction of these elements; in this section we explain their roles

219: and limitations.

220:

221: \subsection{Gatekeeper}

222:

223: The Gatekeeper is responsible for authenticating the requesting Grid

224: user, authorizing their job invocation request and determining the

225: account in which their job should be run. Authentication, performed by

226: the Grid Security Infrastructure [5], verifies the validity of the

227: presented Grid credentials, the user's possession of those

228: credentials, and the user's Grid identity as indicated by those

229: credentials. Authorization is based on the user's Grid identity and a

230: policy contained in a configuration file, the gridmapfile, which

231: serves as an access control list. Mapping from the Grid identity to a

232: local account is also done with the policy in the gridmapfile,

233: effectively translating the user's Grid credential into a local user

234: credential. Finally, the Gatekeeper starts up a Job Manage Instance,

235: executing with the user's local credential. This mode of operation

236: requires the user to have an account on the resource and implements

237: enforcement by privileges of the account.

238:

239:

240:

241: \begin{figure}

242: \centering\includegraphics{GRAM1.ps}

243: \caption{Interaction of the main components of GRAM}

244: \end{figure}

245:

246:

247: \subsection{Job Manager Instance}

248:

249: The JMI parses the user's request, including the job description, and

250: interfaces with the resource's job control system (e.g., LSF, PBS) to

251: initiate the user's job. During the job's execution the JMI monitors

252: its progress and handles job management requests (e.g., suspend,

253: stop, query) from the user. Since the JMI is run under the user's

254: local credential, as defined by the user's account, the operating

255: system and local job control system are able to enforce local policy

256: on the JMI and user job by the policy tied to that account.

257:

258: The JMI has no authorization on job startup since the Gatekeeper has

259: already authorized it. Once the job has been started however, the JMI

260: accepts, authenticates, and authorizes management requests on the

261: job. In GT2, the authorization policy on these management requests is

262: static and simple: the Grid identity of the user making the request

263: must match the Grid identity of the user who initiated the job.

264:

265: \subsection{GRAM Shortcomings}

266:

267: The current GRAM architecture has a number of shortcomings when matched against the requirements we laid out in Section 2:

268:

269: \begin{enumerate}

270: \item Authorization of user job startup is coarse-grained. It is based solely on whether a user has an account on a resource.

271: \item Authorization on job management is coarse-grained and static. Only the user who initiated a job is allowed to manage it.

272: \item Enforcement is implemented chiefly through the medium of privileges tied to a statically configured local account (JMI runs under local user credential) and is therefore useless for enforcing fine-grained policy or dynamic policy coming from sources external to the resource (such as a VO).

273: \item Local enforcement depends on the rights attached to the user's account, not the rights presented by the user with a specific request; in other words, the enforcement vehicle is largely accidental.

274: \item A local account must exist for a user; as described in the introduction, this creates an undue burden on system administrators and users alike. This burden prevents wide adoption of the network services model in large and dynamically changing communities.

275: \end{enumerate}

276:

277: These problems can, and have been, in some measure alleviated by

278: clever setup. For example, the impact of (4) can be alleviated by

279: mapping a grid identity to several different local accounts with

280: different capabilities. Often, (5) is handled by working with

281: ``shared accounts'' (which, however, introduce many security, audit,

282: accounting and other problems) or by providing a limited

283: implementation of dynamic accounts [6,13,14].

284:

285:

286: \section{AUTHORIZATION AND ENFORCEMENT EXTENSIONS TO GRAM}

287:

288: In this section we describe extensions to the GT2 Grid Resource

289: Acquisition and Management (GRAM) that address the shortcomings

290: described above.

291:

292: \begin{figure}

293: \centering\includegraphics{newGRAM1.ps}

294: \caption{Changes to GRAM: the changed component (the Job Manager) has been highlighted in gray}

295: \end{figure}

296:

297: We extended the GRAM design to allow authorization callouts,

298:  evaluating the user's job invocation and management requests in the

299:  context of policies defined by the resource owner and VO. Our changes

300:  to GRAM, prototyped using GT2, are illustrated in Figure 2. In our

301:  prototype we experimented with policies written in plain text files

302:  on the resource. These files included both local resource and VO

303:  policies (in a real system the VO policies would be carried in the VO

304:  credentials).  This work has recently been tested with the Akenti [4]

305:  system, representing the same policies as described here, and is being

306:  adopted by the National Fusion Collaboratory [2]. In order to show

307: the generality of our approach, we also experimented with the

308:  Community Authorization Service (CAS) [7]. Both of these systems

309:  allow for multiple policies sources but have significant

310:  differences, in terms of both architecture and programming APIs.

311:

312: \subsection{Policy Language}

313:

314: GRAM allows users to start and manage jobs by submitting requests

315: composed of an action, (e.g., initiate, cancel, provide status, change

316: priority) and, in the case of job initiation, a job

317: description. The job description is formulated in terms of attributes

318: using the Resource Specification Language (RSL) [3]. RSL consists of

319: attribute value pairs specifying job parameters referring to

320: executable description (executable name, directory where it is

321: located, etc.) and resource requirements (number of CPUs to be used,

322: maximum/minimum allowable memory, maximum time a job is allowed to

323: run, etc.).

324:

325: We have designed a simple policy language that allows for policy

326: specification in terms of RSL. The policy assumes that unless a

327: specific stipulation has been made, an action will not be

328: allowed. Otherwise, a user, or a group of users, is related to a set

329: of assertions. The rules have the form of user (or group) identity

330: separated by a colon from a set of action based assertions that follow

331: the RSL syntax.

332:

333: To express the rules, we extended the RSL set of attributes with the addition of the following:

334:

335: \begin{enumerate}

336: \item [$\bullet$] {\em Action.} This attribute represents what the

337: user wants to do with the job. Currently, it can take values of

338: ``start'', ``cancel'', ``information'', or ``signal'', where

339: ``signal'' describes a variety of job management actions such as

340: changing priority.

341:

342: \item [$\bullet$] {\em Jobowner.} The jobowner attribute denotes the

343: job initiator and can take values of the distinguished name of a

344: job initiator's Grid credential. It is used mainly to express VO-wide

345: management policy.

346:

347: \item [$\bullet$] {\em Jobtag.} The jobtag attribute has been

348: introduced in order to enable the specification of VO-wide job

349: management policies.  A jobtag indicates the job membership in a group

350: of jobs for which policy can be defined. For example, a set of users

351: with an administrative role in the VO can be granted the right to

352: manage all jobs in a particular group. A policy may require a VO user

353: to submit a job with a specific jobtag, hence placing it into a group

354: that is manageable by another user (or group of users). At present,

355: jobtags are statically defined by a policy administrator.

356: \end{enumerate}

357:

358: We also added the following values to RSL:

359: \begin{enumerate}

360: \item [$\bullet$] ''NULL'' to denote a nonempty value

361:

362: \item [$\bullet$] ''SELF'' to allow expression of the job initiator's identity in a policy.

363: \end{enumerate}

364:

365: These extensions allow the following types of assertions to be expressed in policy:

366: \begin{enumerate}

367:

368: \item [$\bullet$] The job request is permitted to contain a particular

369: attribute, value, or set of values. This extension allows one, for

370: example, to limit the maximum number of processors used or to restrict

371: the name of the executable to a specified set. Multiple assertions can

372: be made about the same attribute.

373:

374: \item [$\bullet$] The job request is required to contain a particular attribute, possibly with a particular value or set of values. For example, the job request must specify a jobtag attribute to allow its management by a VO-defined group of administrators.

375:

376: \item [$\bullet$] The job request is required not to contain a

377: particular attribute. For example, the job request must not specify a

378: particular queue, which is reserved for high-priority users.

379: \end{enumerate}

380:

381: Our extensions allow a policy not only to limit the usage of

382: traditional computational resources but also to dictate the

383: executables they are allowed to invoke, allowing a VO to limit

384: resource consumption. Further, by introducing the notion of a jobtag,

385: we are able to express policies allowing users to manage jobs. The

386: example below illustrates how policy may be expressed.

387:

388: \begin{verbatim}

389: &/O=Grid/O=Globus/OU=mcs.anl.gov:

390: (action = start)(jobtag != NULL)

391:

392: /O=Grid/O=Globus/OU=mcs.anl.gov/CN= Bo Liu:

393: &(action = start)(executable = test1)(directory = /sandbox/test)(jobtag = ADS)(count<4)

394: &(action = start)(executable = test2)(directory = /sandbox/test)(jobtag = NFC)(count<4)

395:

396: /O=Grid/O=GlobusOU=mcs.anl.gov/CN= KateKeahey:

397: &(action = start)(executable = TRANSP)(directory = /sandbox/test)(jobtag = NFC)

398: &(action=cancel)(jobtag=NFC)

399:

400: \end{verbatim}

401:

402: The first statement in the policy specifies a requirement for a group

403: of users whose Grid identities start with the string {\tt ``

404: /O=Grid/O=Globus/OU=mcs.anl.gov''}. The requirement is that for job

405: invocations (where the action is ``start''), the job description must

406: contain a jobtag attribute with some value. This allows us to later

407: write management policies referring to that jobtag.

408:

409: The second statement in the policy refers to a specific user, Bo Liu,

410: and states that she can start jobs only using the ``test1'' and

411: ``test2'' executables. The rules also place constraints on the

412: directory from which the executable can be taken and the jobtag they

413: can be started with. In addition, a constraint is placed on the number

414: of processors Bo Liu can use ($count < 4$).

415:

416: The third statement in the policy gives

417: user Kate Keahey the right to start jobs using the ``TRANSP''

418: executable from a specific directory and with a specific jobtag. It

419: also gives her the right to cancel all the jobs with jobtag ``NFC'',

420: for example, jobs based on the executable ``test1'' started by Bo Liu.

421:

422: \subsection{Enforcing Policies in GRAM}

423:

424: We enforce our policies in GRAM by creating a policy evaluation point

425: (PEP) controlling all external access to a resource via GRAM; an

426: action is authorized depending on decision yielded by the PEP. Policy

427: can be enforced in GRAM at multiple PEPs corresponding to different

428: decision domains; for example, a PEP placed in the Gatekeeper can allow

429: or disallow access based on the user's Grid identity. Since our work

430: focuses on job and resource management, we established a PEP in the Job

431: Manager (JM). The JM parses user job descriptions and can therefore

432: evaluate policy that depends on the nature of the job request in

433: addition to the user's identity.

434:

435: Specifically, our additions consist of the following:

436:

437: \begin{enumerate}

438:

439: \item [$\bullet$] {\em An authorization callout API to integrate the PEP

440: with the JM}. The callout passes to the PEP authorization module the

441: relevant information, such as the credential of the user requesting a

442: remote job, the credential of the user who originally started the job,

443: the action to be performed (such as start or cancel a job), a unique

444: job identifier, and the job description expressed in RSL. The PEP

445: responds through the callout API with either success or an appropriate

446: authorization error. This call is made whenever an action needs to be

447: authorized, that is, before creating a job manager request and before

448: calls to cancel, query, and signal a running job.

449:

450: \item [$\bullet$] {\em Policy-based authorization for job

451: management}. As discussed in Section 4, each job management request

452: other than job startup is currently authorized by GRAM so that only

453: the user that started a job is allowed to manage it. We modified the

454: authorization in GRAM to enable Grid users other than the job

455: initiator to manage the job based on policy with decisions rendered

456: through the authorization callout API. In addition to changes to the

457: authorization model, this modification also required extensions to the

458: GRAM client allowing the client to process other identities than that

459: of the client (specifically, allowing it to recognize the identity of

460: the job originator).

461:

462: \item [$\bullet$] {\em RSL parameters}. We extended RSL to add the ``jobtag'' parameter allowing the user to submit a job to a specific job management group.

463:

464: \item [$\bullet$] {\em Errors}. We further extended the GRAM protocol to return authorization errors describing reasons for authorization denial as well as authorization system failures.

465: \end{enumerate}

466:

467: For easy integration of third-party authorization

468: solutions, the callout API provides facilities for runtime

469: configurable callouts.  Callouts can be configured either through a

470: configuration file or an API call. Configuration consists of

471: specifying an abstract callout name, the path to the dynamic library

472: that implements the callout, and the symbol for the callout in the

473: library. Callouts are invoked through runtime loading of dynamic

474: libraries using GNU Libtool's dlopen-like portability library.

475: Arguments to the callout are passed by using the C variable argument list

476: facility.

477:

478: The insertion of callout points into JM required defining a GRAM

479: authorization callout type, (i.e., an abstract callout type), the exact

480: arguments passed to the callout, and a set of errors the callout may

481: return.  These callout points are configured by parsing a global

482: configuration file.

483:

484:

485: \section{ANALYSIS}

486:

487:

488: Our solution overcame some of the shortcomings outlined in Section

489: 4.3. However our approach has a number of outstanding issues that we

490: discuss in this section.

491:

492: \subsection{Gateway Enforcement Model}

493:

494: A weakness of the gateway approach is that once a gateway authorizes

495: an action (for example, a job execution) it is no longer involved in

496: the continuous enforcement of the policy. GRAM maps incoming

497: requests to static local accounts to perform this continuous policy

498: enforcement.

499:

500: This has two consequences: (1) the local policy enforcement depends

501: on the privileges tied to the account that the user maps to on the local

502: system, rather than to the credential with which the request was made,

503: and (2) GRAM's abilities for continuous policy enforcement are limited

504: by local capabilities for policy enforcement.

505:

506: The first limitation could to some extent, be dealt with by using

507: dynamic accounts [6,13,14]. Dynamic accounts are accounts created and

508: configured on the fly by a resource management facility. This enables

509: the resource management system to run jobs on a system for users that

510: do not have an account on that system, and it also enables account

511: configuration relevant to policies for a particular resource

512: management request as opposed to a static user's configuration. To

513: some extent a dynamic account can be also used as a sandbox on the

514: user's rights (by modifying user's group membership to control file

515: system access, for example). Although work has been

516: done to support fine-grained policy for file access [8], Unix

517: accounts allow the user to modify only very few configuration

518: parameters, and hence the enforcement implemented in an account is

519: coarse-grained.

520:

521: A sandbox is an environment that imposes restrictions on resource

522: usage [9,15,16]. Sandboxing represents a strong enforcement solution, having

523: the resource operating system act as the policy evaluation and

524: enforcement modules, and is complementary to the gateway

525: approach. However, while the sandboxes provide a solution with relatively high

526: degree of security, they are hard to implement portably and may

527: introduce a performance penalty.

528:

529: \subsection{Job Manager Trust Model}

530:

531: In the GRAM architecture, the job manager runs with the user's local

532: credentials; this approach makes the job manager less than ideal for

533: policy enforcement. The reasons are twofold. First, from the security

534: perspective it is vulnerable to user tampering that could result in

535: changes in policy enforcement. Second, it effectively limits

536: enforcement potential for VO-wide job management. For example, a user

537: managing a job may cancel a job started by somebody else (by virtue of

538: the fact that the job manager is running with the job initiator's

539: local credential), but the user may not apply higher resource rights

540: to, for example, raise the job's priority.

541:

542: One possible solution to this problem in the context of the GRAM

543: architecture would be to locate the policy enforcement point in the

544: gatekeeper. However, this would increase the vulnerability of the

545: system by placing more complex code into the trusted component of the

546: system, increasing chances for logic errors, buffer overflows, and so

547: forth.

548:

549: Another possibility would be for policy enforcement to be done by

550: trusted services such as the local operating system. As discussed

551: earlier, this is difficult today because most operating systems do not

552: have the support for fine-grained policies that we

553: require. Investigation into sandboxing techniques remains an open

554: research issue.

555:

556: \subsection{Policy Language}

557:

558: Our implementation currently expresses policy in terms of the same

559: resource specification language (RSL) that GRAM uses to describe

560: jobs. While this allows for easy comparison of a job description with

561: a policy, it is not a standard policy language. Policy administrators

562: are not familiar with RSL, and our initial experiences show that

563: expressing policies in these terms is not natural to this

564: community. This difficulty is compounded by the fact that the syntax

565: is not be supported by standard policy tools. We are therefore

566: investigating existing policy languages as a replacement to our

567: RSL-based scheme. With the merging of Grid technologies and Web

568: service-based technologies in OGSA[10], languages based on XML, such

569: as XACML [11] and XrML [12], are being scrutinized by the Grid

570: security community in general and are viable candidates.

571:

572: \subsection{Relevance to Other Systems}

573:

574: Our work could be applied to systems similar to the Globus Toolkit

575: based on its relevance. For example, Legion authorization is

576: implemented by the use of a MayI [20] method on all Legion objects. In

577: the default implementation, this method offers similar functionality

578: as the Globus Toolkit, with access control lists and static mapping to

579: local accounts. Our work could be integrated with Legion in a similar

580: manner as we described here, through the reimplementation of object

581: creation routines (Legion's equivalent of GRAM) and the MayI method.

582: Condor's [21] interface on the other hand, is based more on compute

583: resources than instantiated jobs. Although it also uses access

584: control lists to manage its policy, it does not provide the per-job

585: interface of GRAM and Legion. This makes our work less relevant to that system.

586:

587:

588: \section{TOWARD GT3: FUTURE DIRECTIONS}

589:

590: To address the open issues summarized above, we are developing an

591: architecture building on abstractions and mechanisms defined as part

592: of the Open Grid Services Infrastructure (OGSI) [17]. The key to the

593: policy enforcement questions is the implementation of an abstraction

594: that would allow for dynamic creation and management of a local

595: protection environment (such as a Unix account, a sandbox [9,15,16], or a

596: virtual machine [18, 19]). Such abstraction would not only provide

597: protection but also facilitate resource management (by enforcing

598: limits on resource usage for a particular user) and maintain state

599: associated with its owner. We will call such an abstraction a {\em

600: dynamic session}.

601:

602: The OGSI abstractions of Grid Service and Grid Service factory are

603: suitable for this task of implementing such abstraction. Representing

604: a session as a Gird service will provide uniform management

605: capabilities across different technologies that could be used to

606: implement sessions. Standardizing session creation alleviates the

607: administrative burden involved in adding users to a VO, and it also

608: allows session creation based on rights granted to a particular user

609: at a specific time. To manage dynamic sessions, we can

610: leverage the OGSI Service Data Element (SDE) mechanism in order to make the

611: properties of a session (such as its termination time) accessible to

612: the session owner and modifiable by him or her.

613:

614: In a typical interaction, a user requests a session with certain

615: properties (i.e., resource requirements) from a session factory. The

616: factory authorizes the request and, on success, creates dynamic

617: session service and a local protection environment corresponding to

618: it. As part of the creation process, policy defining sharing rights

619: for the session is written. This policy can be modified by authorized

620: entities during the service's lifetime, as can other session

621: properties. The user can submit against that session, pending

622: conformance with the rights just created. Further, to facilitate

623: management of sessions that do not have to be reused multiple times

624: (i.e., do not preserve state between the times when they get used), the

625: resource management service can obtain sessions based on credentials

626: presented by the user requesting job submission or credentials of the

627: resource manager itself.

628:

629:

630: \section{CONCLUSIONS}

631:

632: We have described the design and implementation of an authorization

633: system allowing for enforcement of fine-grained policies and VO-wide

634: management of remote jobs. To implement this design, we have proposed

635: changes to the Globus Toolkit GRAM design and have designed a policy

636: language suitable for our needs. We are planning to use the same

637: mechanism to provide pluggable authorization in other components of

638: the Globus Toolkit.

639:

640:

641: \acks

642:

643: We are pleased to acknowledge contributions to this work by Mary

644: Thompson of LBNL.

645:

646: \section{\bf REFERENCES}

647:

648: \begin{enumerate}

649:

650: \item Foster, I., C. Kesselman, and S. Tuecke, The Anatomy of the

651: Grid: Enabling Scalable Virtual Organizations. International Journal

652: of High Performance Computing Applications, 2001. 15(3): p. 200-222.

653:

654: \item Keahey, K., T. Fredian, Q. Peng, D.P. Schissel, M. Thompson,

655: I. Foster, M. Greenwald, and D. McCune, Computational Grids in

656: Action: the National Fusion Collaboratory. Future Generation

657: Computing Systems (to ap-pear), October 2002. 18(8): p. 1005-1015.

658:

659: \item Czajkowski, K., I. Foster, N. Karonis, C. Kesselman, S. Martin,

660: W. Smith, and S. Tuecke, A Resource Management Architecture for

661: Meta-computing Systems, in 4th Workshop on Job Scheduling Strategies

662: for Parallel Processing. 1998, Springer-Verlag. p. 62-82.

663:

664: \item Thompson, M., W. Johnston, S. Mudumbai, G. Hoo, K. Jackson, and

665: A. Essiari, Certificate-based Access Control for Widely Distributed

666: Resources, in Proc. 8th Usenix Security Symposium. 1999.

667:

668: \item Butler, R., D. Engert, I. Foster, C. Kesselman, S. Tuecke,

669: J. Volmer, and V. Welch, Design and Deployment of a National-Scale

670: Authentication Infrastructure. IEEE Computer, 2000. 33(12): p. 60-66.

671:

672: \item Dynamic Accounts. http://www.gridpp.ac.uk/gridmapdir/.

673:

674: \item Pearlman, L., V. Welch, I. Foster, C. Kesselman, and

675: S. Tuecke, A Community Authorization Service for Group

676: Collaboration. in IEEE Workshop on Policies for Distributed Systems

677: and Networks. 2002.

678:

679: \item Lorch M. and D. Kafura, Supporting Secure Ad-hoc User

680: Collaboration in Grid Environments. in Proceedings of the 3rd

681: Int. Workshop on Grid Computing - Grid 2002, Baltimore, MD, USA. 2002.

682:

683: \item Chang, F., A. Itzkovitz, and V. Karamacheti, User-level

684: Resource-constrained Sandboxing. Proceedings of the USENIX Windows

685: Systems Symposium (previously USENIX-NT), 2000.

686:

687: \item Foster, I., C. Kesselman, J. Nick, and S. Tuecke, The Physiology

688: of the Grid: An Open Grid Services Architecture for Distributed

689: Sys-tems Integration. Open Grid Service Infrastructure WG,

690: Global Grid Forum, 2002.

691:

692: \item OASIS eXtensible Access Control Markup Language (XACML)

693: Committee Specification 1.0 (Revision

694: 1). http://www.oasis-open.org/committees/xacml/docs/s-xacml-specification-1.0-1.doc,

695: 2002.

696:

697: \item XRML. http://www.xrml.org/get\_XrML.asp.

698: \item Hacker, T. and B. Athey, A Methodology for Account Management in Grid Computing Environments. Proceedings of the 2nd International Workshop on Grid Computing, 2001.

699: \item Kapadia, N. H., R. J. Figueiredo, and J. Fortes. Enhancing the Scalability and Usability of Computational Grids via Logical User Accounts and Virtual File Systems. in 10th Heterogeneous Computing Workshop. 2001. San Francisco, California.

700: \item Bosilca, G., F.  Capello, A. Djilali, G. Fedak, T. Hernault and F. Magniette, Performance Evaluation of Sandboxing Techniqes for Peer-to-Peer Computing.

701: \item Goldberg, I., D. Wagner, R. Thomas, and E. Brewer, A Secure Environment for Untrusted Helper Applications --- Confining the   Wily Hacker, in Proc. 1996 USENIX Security Symposium. 1996.

702: \item Tuecke, S., K. Czajkowski, I. Foster, J. Frey, S. Graham, and C. Kesselman, Grid Service Specification. 2003: Open Grid Service Infrastructure WG, Global Grid Forum.

703: \item VMware: http://www.vmware.com/.

704: \item User Mode Linux (UML). http://user-mode-linux.sourceforge.net/.

705:

706: \item Humphrey, M., F. Knabe, A. Ferrari and A. Grimshaw,

707: Accountability and Control of Process Creation in Metasystems. 2000

708: Network and Distributed System Security Symposium, 2000.

709:

710: \item ``Condor Version 6.4.7 Manual: Security In Condor'', \\

711: http://www.cs.wisc.edu/condor/manual/v6.4/3\_7Security\_In.html, 2003.

712:

713:

714: \end{enumerate}

715:

716:

717:

718: \end{document}

719: