applicationGrid applications typically fall into a couple of categories: (1) simulations that model a natural process and are used by researchers to prepare for or compare to data, and (2) analysis programs for experimental data. An application executes computational jobs and/or manages data. - a set of software that a researcher runs on a computer to answer a research question.
authenticationmiddleware when a job or process starts for ensuring security across the grid. - the process of verifying that an entity (a person, computer program, data packet, etc.) is who or what the entity claims to be. This is an important step performed by the
bandwidth - the capacity of a network or other communication channel for transferring data, measured in bps.
bitbytes. (binary digit) - the basic unit of data quantity or digital storage. It takes a value of 0 or 1. Note that network people speak of bits and storage people speak of
bps - bits per second (It takes multipliers, for instance Mbps is "million bits per second.")
Bps - (with uppercase B) bytes per second
bytebits and storage people speak ofbytes. - a unit of data quantity or computer storage; typically taken as an ordered collection of eight bits. A thousand bytes is called a kilobyte, a million bytes is a megabyte, and so on (in some worlds, kilo=1024 and mega=1048576). Note that network people speak of
certificate (or Digital certificate) - used to verify an identity on a computer or over a network before allowing access to a resource, a digital certificate is a digitally signed statement from a trusted third party (see certificate authority) that associates a public encryption key with a name. The named entity (a person, computer or service) should hold the corresponding private key and use it toauthenticate as the named identity.
certificate authority (CA) - a trusted third-party entity, often a government agency or commercial enterprise, that issues digital certificates for use by other parties.
cloud - a collection of computers, usually owned by a single party, connected together such that users can lease access to a share of their combined power. They are dynamically scalable (i.e. you get as much as you need when you need it), often distributed and sometimes virtualized.
cluster - a networked group of compute and/or storage nodes at a site.
compute element (CE) - a term used in grids to denote any kind of computing interface, e.g., a job entry or batch system. A compute element consists of one or more similar machines, managed by a single scheduler/job queue, which is set up to accept and run grid jobs. The machines do not need to be identical, but must have the same operating system and the same processor architecture.
coreCPU; a multi-core processor combines two (dual-core), four (quad-core) or more independent cores onto a single integrated circuit. (also dual-core, quad-core, multi-core) - a core is a
CPU (central processing unit) - a microprocessor (a processor on an integrated circuit) inside a computer that can execute computer programs.
CPU-hour (also CPU-day) - analogous to a "man-hour" (or "man-day"), it is the work (or utilization) of one CPU for one hour (day). A common metric for computer processing.
distributed computing - a model of computing in which hardware and software systems contain multiple processing and/or storage elements that are connected over a network and integrated in some fashion. The purpose is to connect users, applications and resources in a transparent, open and scalable way, and provide more computing and storage capacity to users.
embarassinglyapplication is one that can be divided into a number of tasks that can run concurrently and have neither dependencies nor communication between them during execution. (Sometimes called "pleasantly" or "pleasingly" parallel.) parallel - straightforward to separate into independent, parallel tracks. An "embarassingly parallel"
FLOPS - floating point operation per second (similar to instructions per second), a measure of a computer's performance. GigaFLOPS, teraFLOPS and petaFLOPS are some commonly used multiples (10 9, 1012, and 1015, respectively).
gatekeeperauthentication and authorization at a grid site. It often functions as a doorway for other services as they reach the site, such as job submission, monitoring or data transfer. A gatekeeper provides a standard interface to interact with a set of networked resources at a site. Also known as a head node. - a computer that coordinates
gLite - a grid middleware toolkit that provides a framework for building grid applications, tapping into the power of distributed computing and storage resources across the Internet. It is used by EGEE.
Globus Alliance - a community of organizations and individuals developing fundamental technologies behind the grid.
Globus Toolkit - an open source software toolkit used for building grid systems and applications. It is being developed by the Globus Alliance and many others all over the world.
GPU - Graphics Processing Unit; a device that renders graphics for a computer. GPUs have a highly parallel structure that makes them more effective than general-purpose CPUs for some complex processing tasks.
gridmiddleware, to create a massively powerful computing resource that is accessible from the comfort of a personal computer and useful for compute-intensive or data-intensive applications in science, the humanities, business and beyond. (Adapted from theGridTalk definition) computing - in grid computing, users harness computing power from resources owned by many different institutions and organizations that have elected to make them available to others. These resources may be scattered over a wide geographic area — even globally — and comprise processing power, data storage capacity, sensors, visualization tools and more. Grid computing brings these resources, thousands of them in some cases, together into a common, shared infrastructure, linked over networks via a common set of
heterogeneous resources - disparate computing resources, both conventional and specialized, and/or using different types of processors. In the context of grid, these are deployed in a way to work cooperatively.
HPC (High Performance Computing) - sometimes used as a synonym for supercomputing, HPC provides high speed tera- and peta-scale computing via multiple processors, sometimes thousands, harnessed together via fast communications pipelines and cluster software that work together as if they were one big machine.
HTC (High Throughput Computing) - computing environments that deliver large amounts of processing capacity over long periods of time (floating point operations per month rather than per second). In HTC, sustained capacity to process large amounts of data is paramount, in contrast to speed in HPC.
information provider (IP) - software that interfaces to any data collection service, collects virtually any type of data it's asked to, and communicates the information for publishing to the grid. This service helps with efficient job allocation and processing.
infrastructure as a service (IaaS) - the delivery of infrastructure, including data and processing power, as an outsourced service. First came into use in late 2006. Read more about IaaS.
many core - multi-core with a high number of cores (in the thousands).
message passing - an approach to parallel computing where data and work are divided across processors (striving for optimal load balancing) and communication between them is managed by explicitly calling specific library functions (striving for minimal and non-blocking communication). The de facto standard for message passing is Message Passing Interface (MPI).
middlewaregrid or VO. Middleware is both the glue for the grid and the brains of the grid; it makes the independent and potentially hetergeneous resources of the grid interoperate. - software components of general utility, provided by the
multi-core - see core
node - a commonly used word for a single computer or single CPU.
on-the-fly - dynamic; a process that happens on-the-fly, or dynamically, is invoked at the point in the workflow when its function will be used, in contrast to being set up ahead of time.
Open Grid Forum (OGF) - an open community committed to driving the rapid evolution and adoption of applied distributed computing. www.ogf.org/
parallel computing - a type of computing where a job is split into many smaller ones, and they can execute concurrently (i.e., in parallel).
performance - refers to how fast a system does work; used for networks, CPUs, clusters, and more. Examples of metrics to measure performance include bps and FLOPS (and their multiples, e.g., petaFLOPS).
persistency - a persistent computing system (ideally) functions continuously even when it needs to be maintained, upgraded, or reconfigured, or it is attacked. (adapted from the Department of Information and Computer Sciences, Saitama University)
petabyte (PB) - a unit of data quantity or computer storage equal to 1015 bytes
pleasantly parallel - see Embarassingly parallel
public key infrastructurecertificate authority. The public key infrastructure provides for a digital certificate that can identify an individual or an organization and directory services that can store and, when necessary, revoke the certificates. (adapted from searchsecurity.com) (PKI) - enables users of a public network such as the Internet to securely and privately exchange data through the use of a public and a private cryptographic key pair that is obtained and shared through a
scalability - the ability of an application or a resource to handle increasing amounts of data, calculations, or other work that increases its load, without losing efficiency.
software as a service (SaaS) - a model whereby software is provided as a service. There are two primary methods of SaaS:
storage element - the interface through which grid components communicate with a storage unit.
supercomputing- see high performance computing
terabyte (TB) - a unit of data quantity or computer storage equal to 10 12 bytes
throughput - relates to data flow rates; it is used to describe the average rate of successful data or message delivery over a communication channel. Measured in bps (or multiples)
tier-n (where n=0,1, 2, ...) - a hierarchical level for a site within a grid that implements such a scheme among its sites. Tiers are typically distinguished by the services they provide. Often the lower number tiers (e.g., 0 and 1) provide generic resources for user analysis, simulation or stand-alone applications requiring the least services. Higher number tiers (e.g., 2 and 3) would then provide higher level services or large amounts of resources (database services, permanent storage, archiving capability, and/or a large fraction of the experimental processing power). The Tier-n sites are interconnected via an agreement defined by the VOs it serves.
UNICORE (Uniform Interface to Computing Resources) - a middleware toolkit that offers a ready-to-run grid system including client and server software. UNICORE makes distributed computing and data resources available in a seamless and secure way in intranets and the internet.
Virtual Data Toolkit (VDT) - an ensemble of grid software that can be easily installed and configured. The VDT is a product of the Open Science Grid (OSG), which uses the VDT as its gridmiddleware distribution.
Virtual Organization (VO)grid to which grid end users must be registered and authenticated in order to gain access to the grid's resources. A VO must establish resource-usage agreements with grid resource providers. Members of a VO may come from many different home institutions, may have in common only a general interest or goal (e.g., work on the same experiment), and may communicate and coordinate their work solely through information technology (hence the term virtual). - a dynamic collection of users, resources and services that enables sharing of resources. Also, a participating organization in a
virtualization - a technique that uses software on a host computer to simulate the existence of another, independent computer environment (also called a virtual machine) on that same host.