Share |

Glossary

         A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z

application - a set of software that a researcher runs on a computer to answer a research question. Grid applications typically fall into a couple of categories: (1) simulations that model a natural process and are used by researchers to prepare for or compare to data, and (2) analysis programs for experimental data. An application executes computational jobs and/or manages data.

authentication - the process of verifying that an entity (a person, computer program, data packet, etc.) is who or what the entity claims to be. This is an important step performed by themiddleware when a job or process starts for ensuring security across the grid.

authorization -  the act of giving permission to a user or a process to access a resource.  An important post-authentication step performed by the middleware for ensuring security across the grid.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

bandwidth - the capacity of a network or other communication channel for transferring data, measured in bps.

bit (binary digit) - the basic unit of data quantity or digital storage. It takes a value of 0 or 1. Note that network people speak of bits and storage people speak of bytes.

bps - bits per second (It takes multipliers, for instance Mbps is "million bits per second.")

Bps - (with uppercase B) bytes per second

byte - a unit of data quantity or computer storage; typically taken as an ordered collection of eight bits. A thousand bytes is called a kilobyte, a million bytes is a megabyte, and so on (in some worlds, kilo=1024 and mega=1048576). Note that network people speak of bits and storage people speak ofbytes.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

certificate (or Digital certificate) - used to verify an identity on a computer or over a network before allowing access to a resource, a digital certificate is a digitally signed statement from a trusted third party (see certificate authority) that associates a public encryption key with a name. The named entity (a person, computer or service) should hold the corresponding private key and use it toauthenticate as the named  identity.  

certificate authority (CA) - a trusted third-party entity, often a government agency or commercial enterprise, that issues digital certificates for use by other parties.  

cloud - a collection of computers, usually owned by a single party, connected together such that users can lease access to a share of their combined power.  They are dynamically scalable (i.e. you get as much as you need when you need it), often distributed and sometimes virtualized.

cluster - a networked group of compute and/or storage nodes at a site.

compute element (CE) - a term used in grids to denote any kind of computing interface, e.g., a job entry or batch system. A compute element consists of one or more similar machines, managed by a single scheduler/job queue, which is set up to accept and run grid jobs. The machines do not need to be identical, but must have the same operating system and the same processor architecture.

core (also dual-core, quad-core, multi-core) - a core is a CPU; a multi-core processor combines two (dual-core), four (quad-core) or more independent cores onto a single integrated circuit. 

CPU (central processing unit) - a microprocessor (a processor on an integrated circuit) inside a computer that can execute computer programs.  

CPU-hour (also CPU-day) - analogous to a "man-hour" (or "man-day"), it is the work (or utilization) of one CPU for one hour (day). A common metric for computer processing.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

D

distributed computing - a model of computing in which hardware and software systems contain multiple processing and/or storage elements that are connected over a network and integrated in some fashion. The purpose is to connect users, applications and resources in a transparent, open and scalable way, and provide more computing and storage capacity to users.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

E

embarassingly parallel - straightforward to separate into independent, parallel tracks. An "embarassingly parallel" application is one that can be divided into a number of tasks that can run concurrently and have neither dependencies nor communication between them during execution. (Sometimes called "pleasantly" or "pleasingly" parallel.)

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

F

FLOPS - floating point operation per second (similar to instructions per second), a measure of a computer's performance.  GigaFLOPS, teraFLOPS and petaFLOPS are some commonly used multiples (10 9, 1012, and 1015, respectively).

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

G

gatekeeper -  a computer that coordinates authentication and authorization at a grid site. It often functions as a doorway for other services as they reach the site, such as job submission, monitoring or data transfer. A gatekeeper provides a standard interface to interact with a set of networked resources at a site. Also known as a head node.

gLite - a grid middleware toolkit that provides a framework for building grid applications, tapping into the power of distributed computing and storage resources across the Internet. It is used by EGEE.

Globus Alliance - a community of organizations and individuals developing fundamental technologies behind the grid.



Globus Toolkit - an open source software toolkit used for building grid systems and applications. It is being developed by the Globus Alliance and many others all over the world.

GPU - Graphics Processing Unit; a device that renders graphics for a computer. GPUs have a highly parallel structure that makes them more effective than general-purpose CPUs for some complex processing tasks.

grid computing - in grid computing, users harness computing power from resources owned by many different institutions and organizations that have elected to make them available to others. These resources may be scattered over a wide geographic area — even globally — and comprise processing power, data storage capacity, sensors, visualization tools and more. Grid computing brings these resources, thousands of them in some cases, together into a common, shared infrastructure, linked over networks via a common set of middleware, to create a massively powerful computing resource that is accessible from the comfort of a personal computer and useful for compute-intensive or data-intensive applications in science, the humanities, business and beyond. (Adapted from theGridTalk definition)  

Grid Security Infrastructure (GSI) - a Globus authorization and authentication system that includes PKI and in which users get a time-limited proxy certificate to run their jobs.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

H

heterogeneous resources - disparate computing resources, both conventional and specialized, and/or using different types of processors. In the context of grid, these are deployed in a way to work cooperatively.

HPC (High Performance Computing) - sometimes used as a synonym for supercomputing, HPC provides high speed tera- and peta-scale computing via multiple processors, sometimes thousands, harnessed together via fast communications pipelines and cluster software that work together as if they were one big machine.

HTC (High Throughput Computing) - computing environments that deliver large amounts of processing capacity over long periods of time (floating point operations per month rather than per second).  In HTC, sustained capacity to process large amounts of data is paramount, in contrast to speed in HPC.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

I

information provider (IP) - software that interfaces to any data collection service, collects virtually any type of data it's asked to, and communicates the information for publishing to the grid. This service helps with efficient job allocation and processing.

infrastructure as a service (IaaS) - the delivery of infrastructure, including data and processing power, as an outsourced service. First came into use in late 2006. Read more about IaaS.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

J

job - an executable set of code; an executable to be submitted to run on grid resources. It is an invocation of an application, or of part of an application.

job manager - a Globus term that refers to a program used to manage jobs at a grid site (e.g., LSF, PBS, Condor).

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

M

many core - multi-core with a high number of cores (in the thousands).

message passing - an approach to parallel computing where data and work are divided across processors (striving for optimal load balancing) and communication between them is managed by explicitly calling specific library functions (striving for minimal and non-blocking communication). The de facto standard for message passing is Message Passing Interface (MPI).

middleware - software components of general utility, provided by the grid or VO. Middleware is both the glue for the grid and the brains of the grid; it makes the independent and potentially hetergeneous resources of the grid interoperate.

multi-core - see core

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

N

node - a commonly used word for a single computer or single CPU.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z  

O

on-the-fly - dynamic; a process that happens on-the-fly, or dynamically, is invoked at the point in the workflow when its function will be used, in contrast to being set up ahead of time.

Open Grid Forum (OGF) - an open community committed to driving the rapid evolution and adoption of applied distributed computing.  www.ogf.org/

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

P

parallel computing - a type of computing where a job is split into many smaller ones, and they can execute concurrently (i.e., in parallel).

performance - refers to how fast a system does work; used for networks, CPUs, clusters, and more.  Examples of metrics to measure performance include bps and FLOPS (and their multiples, e.g., petaFLOPS).

persistency - a persistent computing system (ideally) functions continuously even when it needs to be maintained, upgraded, or reconfigured, or it is attacked.  (adapted from the Department of Information and Computer Sciences, Saitama University)

petabyte (PB) - a unit of data quantity or computer storage equal to 1015 bytes 

pleasantly parallel - see Embarassingly parallel

public key infrastructure (PKI) - enables users of a public network such as the Internet to securely and privately exchange data through the use of a public and a private cryptographic key pair that is obtained and shared through a certificate authority. The public key infrastructure provides for a digital certificate that can identify an individual or an organization and directory services that can store and, when necessary, revoke the certificates. (adapted from searchsecurity.com)

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

R

resource - an entity that is available through the grid for use by researchers, typically a machine providing CPU cycles or storage capacity. 

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

S

scalability - the ability of an application or a resource to handle increasing amounts of data, calculations, or other work that increases its load, without losing efficiency.

service - an application running on a grid and providing an essential unit of work to a job.

software as a service (SaaS) - a model whereby software is provided as a service. There are two primary methods of SaaS:

  1. The software is hosted on the provider's web server, and accessed through a web portal, such as a science gateway.
  2. The user can download the application, but it is remotely disabled when the license expires.

Read more about SaaS.

 

storage element - the interface through which grid components communicate with a storage unit.

supercomputing- see high performance computing

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

T

terabyte (TB) - a unit of data quantity or computer storage equal to 10 12 bytes

throughput - relates to data flow rates; it is used to describe the average rate of successful data or message delivery over a communication channel. Measured in bps (or multiples)

tier-n (where n=0,1, 2, ...) - a hierarchical level for a site within a grid that implements such a scheme among its sites. Tiers are typically distinguished by the services they provide. Often the lower number tiers (e.g., 0 and 1) provide generic resources for user analysis, simulation or stand-alone applications requiring the least services. Higher number tiers (e.g., 2 and 3) would then provide higher level services or large amounts of resources (database services, permanent storage, archiving capability, and/or a large fraction of the experimental processing power). The Tier-n sites are interconnected via an agreement defined by the VOs it serves.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

U

UNICORE (Uniform Interface to Computing Resources) - a middleware toolkit that offers a ready-to-run grid system including client and server software. UNICORE makes distributed computing and data resources available in a seamless and secure way in intranets and the internet.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

V

Virtual Data Toolkit (VDT) -  an ensemble of grid software that can be easily installed and configured. The VDT is a product of the Open Science Grid (OSG), which uses the VDT as its gridmiddleware distribution.

Virtual Organization (VO) - a dynamic collection of users, resources and services that enables sharing of resources.   Also, a participating organization in a grid to which grid end users must be registered and authenticated in order to gain access to the grid's resources. A VO must establish resource-usage agreements with grid resource providers. Members of a VO may come from many different home institutions, may have in common only a general interest or goal (e.g., work on the same experiment), and may communicate and coordinate their work solely through information technology (hence the term virtual).  

virtualization - a technique that uses software on a host computer to simulate the existence of another, independent computer environment (also called a virtual machine) on that same host.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

W

worker nodes-  the hardware (compute resources) on which jobs run. Typically, they are controlled by a gatekeeper.

A-B-C- D-E-F-G-H-I- J-K-L-M-N-O- P-Q-R-S-T-U- V-W-X-Y-Z 

Additional glossaries: