Events Training Consulting Newsletters Webcasts Blogs
Subscriptions
Current Issue
Past Issues
Join Our Mailing List
Contact Us
Home
 
 
 

 


TechEncyclopedia

Storage Virtualization

It's been around for decades in the mainframe world. But recent developments in the storage world have breathed new life into the subject of virtualization.

By Elizabeth Clark

print this article print this article
email this article e-mail this article
.

.

06/04/2003, 3:00 PM ET

While marketspeak and vendor hype have muddied the waters when it comes to defining virtualization, the term basically refers to the process of representing physical storage in logical form. Specifically, virtualization involves uniting multiple storage devices into a single logical pool. A primary driver behind virtualization is the desire to shield users and administrators from the complexity of the underlying storage environment. Other goals include simplified management, more efficient capacity usage and allocation, and reduced management costs (see "Keeping Storage Costs Under Control," October 2002). While policy-based automation is also on the wish list, it'll likely be some time before true automation of higher-level management functions is widely available.

BLOCKS, FILES, AND BEYOND

Virtualization can be executed at the block and file levels. In storage networking, block-based storage is associated with Fibre Channel SANs, and file-based storage with Network-Attached Storage (NAS).

In a typical block-level virtualization implementation, data blocks are mapped to one or more disks or disk systems. The block addresses may be distributed throughout multiple storage arrays, but appear to the user as residing on a single drive or volume.

In file virtualization, multiple files or objects can be made to appear as a single file. File virtualization provides a layer of abstraction between the files and their physical location. In this approach, a common namespace is created, which enables users to access different files within that namespace without having to alter the pathname. The namespace appears to users as a single large file system.

File systems can also be virtualized. A file system applies a workable structure to storage blocks, essentially converting them to objects, or files. File systems can reside on file servers or NAS systems. In file system virtualization, metadata from individual file systems can be combined to form an extended virtual file system. In the case of NAS, users can access files, primarily using the Network File System (NFS) and Common Internet File System (CIFS) protocols, without having to familiarize themselves with the physical or logical aspects of the underlying infrastructure. Ideally, the distributed file system should be able to span across multiple NAS devices for scalability. The drawback of file system virtualization is that software agents must be installed on all host systems.

HOSTS WITH THE MOST

Virtualization can be performed at three primary levels: the host level, the storage device level, and the network level.

Host-based virtualization has long been available in the form of logical volume managers. Logical volumes, also referred to as virtual disks, are essentially pointers to physical storage, such as drives or Logical Unit Numbers (LUNs). A LUN is a SCSI-based identifier for a logical unit on a device such as a disk array.

In host-based virtualization, software presents a view to the host server in which disks from multiple storage arrays appear as a single virtual pool. Logical volume managers can eliminate the need to display multiple devices to the user. When storage requirements expand, logical volume managers can perform mapping to free disk space (block aggregation) in a manner that's transparent to users. A primary benefit of this approach is that applications can remain online while file system and volume sizes are adjusted. Also, implementation of host-based virtualization doesn't require the purchase of additional hardware.

On the downside, host-based virtualization can result in performance bottlenecks at the server, where CPU cycles are consumed by the processing efforts involved. In addition, virtualization software must be installed on each server. There are also limits on the scalability of this approach.

DEFYING LOGIC

Virtualization can also be implemented within devices, such as storage arrays, using virtualization software residing inside the array. This software enables the construction of storage pools across multiple arrays.

With storage-based virtualization, the logical storage units are mapped to the physical devices via algorithms or using a table-based approach. Essentially, volumes become independent of the devices they reside on. Depending on the solution used, storage-based virtualization capabilities can include RAID, mirroring, disk-to-disk replication, and the creation of point-in-time snapshots.

While storage-based virtualization yields favorable results for individual vendors' arrays and is relatively easy to manage, systems based on this approach are typically proprietary, and are thus limited when it comes to interoperability with other vendors' hardware and software.

Devices such as tape libraries can also be virtualized. In tape virtualization, disk storage is made to appear as tape drives. The disks typically front-end tape libraries, performing a cache function that allows data to be accessed faster than would be possible with tape. When the data is no longer accessed as frequently by users, it can then be transferred to tape. In addition to enhancing performance, this approach also optimizes tape utilization. However, the administrator must ensure that a sufficient amount of disk cache and a sufficient number of tape drives are in place to avoid performance bottlenecks. Some tape virtualization solutions are also restricted to proprietary libraries.

NETWORKING WITH A TWIST

Network-based virtualization is a relatively recent development in the storage industry. In network-based virtualization, the virtualization functions are executed within the network itself, as opposed to within the host servers or storage devices. Today, that network is typically a Fibre Channel SAN, although virtualization products are available for IP SANs as well.

In network-based virtualization, the primary virtualization functions can be executed in switches or routers, appliances, or servers. Network-based virtualization can be either in-band or out-of-band.

With in-band, or symmetric, virtualization, the virtualization function is executed in the data path between the servers and the storage. In a typical configuration, virtualization software in an appliance enables both the control data (metadata) and the storage data to traverse the same data path. With in-band virtualization, the appliance receives requests for data from the host. It then looks for the requested data, which may be physically located on multiple devices, and then sends that data back to the host. To the user, the in-band appliance appears as a storage system on the host.


| 1 | 2 | Next Page > >

.

Free CallCenter Insider Newsletter

Your Email Address


Optional Areas of Interest
International News
Advice/Tips
Technology
Agent Development
IVR