Computer Files and Data Storage
At a Glance
Instructor’s Notes
Chapter Overview
In this chapter you will learn about the different types of files and how a computer manages and stores data. You will learn how to use DOS or Windows operating systems to organize the files on disks so that they are easy to locate. You will learn how to use file manager utility software to organize files on disks. You will also learn what happens when you save, retrieve, revise, delete, and copy files.
Chapter Outline
| Lecture Topics | Page # | Material Covered |
| Data, Information, and Files | 160 | Distinction between data and information; types of computer files (executable, data, source); filenaming conventions; wildcards |
| File Manager Utility Software | 168 | Device letters; directories and folders; storage models; using file manager utility software |
| How Computers Store File Data | 174 | Storage terminology; bits and bytes; magnetic and optical technologies; tracks, sectors, and clusters; file allocation tables |
| Disks, Tapes, CDs, and DVDs | 180 | Floppy disk, hard disk, and tape storage; CD-ROM technologies; DVD technologies |
| User Focus: Using Files in Applications | 191 | Running applications; creating files; saving, retrieving, and running a data file |
| Issue: Is Data Getting Lost? | 193 | Storage technology development |
Lecture Notes
Key Terms
Application-specific filename extension (166): Extension associated with a particular application, that indicates which specific application software was used to create a file.
Archiving (189): The process of moving data off a primary storage device when that data is not frequently accessed.
Bit (174): Smallest unit for digitizing data; either a 1 or 0 that represents whether or not current flows through a circuit.
Byte (174): Term used to describe eight bits.
CD-R (189): CDs on which you can record data one time only.
CD-ROM (188): Disk that contains data that has been stamped on the disk surface when it was manufactured. CD-ROMS store up to 680 MB of data.
CD-RW (189): CDs on which you can record and change data by modifying the crystal structure of the disk.
Cluster (177): Group of sectors.
Cylinder (183): The basic storage bin for a hard disk drive.
Data(160): The words, numbers, and graphics that describe people, events, things, and ideas.
Data file (164): File containing words, numbers, and pictures that view, edit, save, send, and print.
Data transfer rate (180): The amount of data that a storage device can move from the storage medium to the computer per second.
Defragmentation utility (179): Feature that rearranges files on a disk so that they are stored in contiguous clusters.
Device letter (168): Letter that provides a shorthand way of referring to a particular storage device when saving or opening files.
Directory (169): List, maintained by an operating system, of all files on a disk.
Disk cache (185): Special area of computer memory into which the computer transfers the data you are likely to need form disk storage.
Disk density (181): The closeness and size of the magnetic particles on the disk surface.
Double-sided (DS) disk (181): Disk that stores twice the data as a single-sided disk.
DVD (190): A variation of CD technology that was designed to provide storage capacity for a full-length movie.
DVD+RW (190): DVD disks that use phase change technology similar to that used for CDs.
DVD-RAM (190): Writable technology that uses a blend of technologies to record data.
DVD-ROM (190): Disk that contains data that has been stamped on the disk surface when it was manufactured. DVD-ROMS store up to 4.7 GB of data.
Executable file (163): Program instructions that tell a computer how to perform a certain task.
File (160): A named collection of data that exists on a storage medium such as hard disk, floppy, or a CD-ROM.
File allocation table (FAT) (177): Operating system file that maintains a list of files and their physical location on the disk.
File manager utility software (168): Software that helps you locate, rename, move, copy, and delete files.
File naming conventions (161): Specific rules used in creating file names.
File specification (170): The drive letter, folder, filename, and extension that identifies a file. Also called a path.
Filename (161): A unique set of letters and numbers that identifies a file and usually describes the file contents.
Filename extension (161): Short series of letters that identifies a file type, usually three characters in length.
Floppy disk (181): Round piece of flexible mylar covered with a thin layer of magnetic oxide, sealed inside a protective casing. Also called floppies or diskettes.
Folders (169): Subdirectories into which most operating systems allow you to divide your directory.
Fragmented (179): Having files on a magnetic storage medium stored in many noncontiguous clusters.
Generic filename extension (165): Extension indicating the general type of data a file contains.
Gigabyte (GB) (180): Approximately one billion bytes.
Hard disk (183): One or more platters and their associated read-write heads.
Hard disk platter (183): Flat, rigid disk made of aluminum or glass and coated with a magnetic oxide.
Head crash (185): Event that occurs when a read-write head runs into a dust particle or some other contaminant on a disk, causing damage to some of the data on the disk.
High-density (HD) disk (181): Disk that can store more data than a double-density disk. Most are formatted with 18 sectors and 80 tracks per side.
Information (160): The words, numbers, and graphics used as the basis for humans actions and decisions.
Kilobyte (KB) (180): Approximately one thousand bytes.
Magnetic storage (175): Hard disk, floppy disk, and tape storage technologies that store data by magnetizing microscopic particles on the disk or tape surface.
Megabyte (MB) (180): Approximately one million bytes.
Millisecond (ms) (180): A thousandth of a second.
Multisession support (189): The ability to record portions of a CD during one recording session, and other portions of the CD at other times.
Open reel tapes (187): Tapes that resemble spools of 16 mm film that are still used as a distribution medium for some mainframe and minicomputer systems.
Optical storage (176): CD-ROM and DVD storage technologies that store data as microscopic light and dark spots on the disk surface.
Phase change technology (189): Technology used to alter the crystal structure on a disk’s surface.
RAID (185): Storage device containing many disk platters that provides redundancy and achieves faster data access than conventional hard disks.
Random access (180): The ability of a device to jump directly to the track or sector that holds the requested data. Also called direct access.
Read-only (188): Term meaning that a computer can retrieve data from the storage medium, but cannot save any new data onto the storage medium.
Read-write head (175): Mechanism in a disk drive that reads and writes the magnetized particles that represent data.
Removable hard disks (185): Disks that contain platters and read-write heads that can be inserted and removed form the drive, much like a floppy disk.
Root directory (169): Main directory on a disk.
Sectors (177): Wedge shaped sections of the concentric circles found on magnetic disks.
Sequential access (180): The process of accessing requested data by reading through data form the beginning of a tape.
Storage capacity (180): The maximum amount of data that can be stored on a storage medium.
Storage device (174): The mechanical apparatus that records and retrieves data form a storage medium.
Storage medium (174): The disk, tape, CD, DVD, paper, or other substance that contains data.
Storage technology (174): A storage device and the media that it uses.
Tape backup (186): Copy of the data on a hard disk that is stored on magnetic tape, and used to restore lost data.
Tape cartridge (187): Removable magnetic tape module similar to a cassette tape.
Terabyte (180): Approximately one trillion bytes.
Tracks (177): Divisions of a magnetic storage medium that serve as the electronic equivalent of storage bins.
Undelete utility (179): Operating system utility that retrieves files that have been inadvertently deleted.
Wildcard character (162): Character used to represent a group of characters in the filename extension.
Zip disk (181): Special high capacity floppy disk manufactured by Iomega Corporation.
Data, Information, and Files
I have attempted to present the basic information that a computer user needs to know about files:
File Manager Utility Software
In this section of the chapter, you will learn about common ways in which computers keep track of files, using file manager utility software. Although file managers vary from system to system, they rely on the same basic principles. As you become familiar with the concepts of device letters and folders, you will be able to navigate through most computer systems.
Windows Explorer is a dramatic aid to help you visualize the structure of files on a disk and better understand the features of a file manager utility. Take an in-depth look at the way in which Windows Explorer organizes its directory, first study Figure 4-10 in order to understand directory components. Then, open Windows Explorer on your computer, and identify the file types found on either the hard drive or a floppy disk. Identify the device letters that represent the different storage devices on your computer.
Page 172 of the text lists a number of common tasks for which file manager
utility software is used. There are some common errors made by computer
users, such as renaming documents with the wrong file extension.
How Computers Store File Data
The introduction to Section C mentions that the conceptual model of folders and files does not reflect the physical reality of the way in which files are stored on a disk. You may confuse this statement with the explanation of fragmentation discussed later in the section. Remember that random-access storage allows files to become physically divided up all over a disk’s surface. Although this does not mean that the file itself becomes unreadable, it does make it harder for a drive to efficiently read the file. The process of defragmentation helps to organize the disk by rearranging the files on the disk so that they are stored in contiguous clusters. Even when a disk is defragmented, however, the physical organization is still not equivalent to the conceptual organization. Although some people like to use a directory analogy in which each directory is like a slice of a pie or disk, all the files in a directory are not necessarily stored in the same physical area of the disk. The directory structure is a logical model that helps us think about the organization of files on a disk. It is not meant to accurately portray the physical organization of the disk itself.
The concepts of byte, kilobyte, megabyte, gigabyte, and terabyte are introduced in the context that a byte is used to store one character of data. This is useful to know before comparing storage devices—storage capacity is one of the characteristics that distinguishes one storage medium from another.
Although a disk is physically laid out in tracks and sectors, a cluster is the smallest unit accessible by most microcomputer operating systems. Under DOS and Windows, a cluster is composed of two sectors.
Don't skip over Figure 4-19 without studying it thoroughly. The main point is that the Directory and the FAT work together to track the location of files on disks.
Disks, Tapes, CDs, and DVDs
Review the four criteria described on page 180.
Disk cache is becoming increasing prevalent on microcomputers and the term appears in many computer ads.
Issue: Is Data Getting Lost?
Think of a document you would want to access twenty years from now.
Examples could include diaries, theses, or databases containing the addresses
of friends and relatives. As the text points out, there are currently no
standards to safeguard against losing data due to changes in technology.
As a result, the responsibility for maintaining data lies with the computer
user him or herself. Frequently make backup copies of your work, and to
update their documents as new software becomes available (e.g. Save your
Microsoft Word 97 document as a Word 2000 document as soon as you obtain
the newer program.) Also, in the case of important documents, it is wise
for you to keep a hard copy on hand in case file damage or technological
changes make the documents impossible to access electronically. Although
there are no guarantees that electronically stored information will always
be accessible, keeping on top of changes in technology can help prevent
loss of important information.
Overview
In a microcomputer, flexible, nonvolatile storage is provided through
secondary storage, such as magnetic disk, magnetic tape, and optical disk.
Magnetic floppy disks are small and portable but cannot store as much information
and aren't as fast as magnetic hard disks. A number of technical issues
such as seek time and type of interface determine the performance characteristics
of hard disks, such as speed and capacity. Newer backup technologies
such as removable hard disks are increasingly popular, but magnetic tape
is still frequently used for backing up information from disks.
Optical disks can store vast amounts of information; CD-ROM is generally read-only in most computers, but the prices of the read/write units (CD-E) have continued to drop and may become standard in the near future since they are cheap to mass-produce and provide an alternative for backing up data. DVD (Digital Video Devices), which hold seven times the amount of data of CD-ROMs, are the newest technology, and can provide full active video with theater quality surround sound. They are available as players similar to VCRs and as components to desktop computers replacing the CD-ROM drive.
To read and write data "permanently," disk drives encode on the surface of disks individual bits that represent characters (letters and symbols). Related characters are organized in ways that make up files. There are various types of files, such as program files (which contain software), data files, and graphics files (which contain digital encoding of images). File names are used to specify which file to use; different systems limit the length of file names, and some use extensions to indicate the file type.
Files are organized in directories. Directories can also contain other directories (subdirectories), which enable related files and directories to be stored together. The structure is roughly analogous to papers in file folders in file drawers.
Commercial data processing systems have an enormous dependence on storage. Files consist of records, which consist of fields, which contain characters, which are made up of bits. Business activities are recorded as transactions, which are then processed to update master files or generate reports for humans or output files for other programs. Files in data processing (and other applications) can have the following types of organization: sequential (access one record after the other); direct, or random, access (each record directly accessible); and indexed sequential (indexes used to locate records).
Lecture Notes
Storage (secondary memory) is needed in addition to primary memory
because primary memory is volatile and therefore not suitable for retaining
information permanently.
Storage Concepts: The Basics
Information kept in long-term storage must be moved into RAM memory
before it can be used because the access speed of disk storage is too slow
for the CPU to work with. In this sense, storage acts like an input
device.
But the faster memory is, the more expensive it is, so computers generally use a variety of different kinds of memory, limiting the fastest and most expensive types to the situations where they are absolutely required, and using slower and cheaper types elsewhere. A chart in the text illustrates the hierarchy of memory and storage.
Not all storage devices can both read data - retrieve it for transfer to memory - and write data - transfer it from memory back into long term storage. Some storage devices, such as CD-ROM, are read-only. Those that can do both are called read/write devices.
Storage devices may permit sequential storage or random access. Tape, for example, must be read from the beginning to find a given piece of data while disk drives have a read/write head that permits direct access to any part of the disk. Random access devices are faster but more expensive.
Disks and Disk Drives: Putting a Spin on It
Disks coated with a magnetized material are the most common storage
devices today. They are random-access and read/write.
Magnetic disks store data in concentric tracks like the grooves of a record, although not in a continuous spiral. Before use, they must be formatted.
Formatting adds sector markers (lines that divide the disk into wedges like pieces of pie) and, in the case of PCs, a file allocation table (FAT), which acts like an index of where files are stored on the disk. Because formatting re-writes sectors and the FAT, formatting a disk a second time makes it impossible to find the information that was originally stored on it. However, the information is not actually destroyed, and can sometimes be re-located with the use of special software.
Floppy disks are made of plastic with a metallic coating. Most floppy disks today are 3.5-inches. They are an improvement over older 5.25 inch disks because they store more data (usually 1.4 MB), have re-usable write protection tabs, are stored in hard plastic cases to prevent some damage, and are not exactly square, so they can be inserted into the drive only in the correct way. Floppies are considered slow and small by today's standards, and are being replaced with newer technologies.
Hard disks hold the computer's operating system, programs, and often user data. Magnetic hard drives and floppy disk drives operate basically the same way, except that hard drives have two or more disks made of metal instead of soft plastic which are stacked vertically. The read-write head floats above a tiny cushion of air. If the head touches a disk, a "head crash" occurs that can destroy data and ruin the area of the disk that was touched permanently.
Hard drives are also formatted before use, but their formatting can also include the use of partitions, which are divisions of a disk that can be treated as if they were physically separate disks. Partitions allow a computer to use a different operating system on each partition, if they user chooses.
Today relatively inexpensive hard disks have large amounts of storage space. Many personal computers are equipped with hard disks that provide storage of 3 gigabytes. But today's larger, media-intensive applications require vast amounts of storage space.
The performance of a disk is determined by factors such as its rotational speed, which is the speed with which the disk spins, and its seek time, which is the speed with which its read/write head can be moved to the correct track.
Disk performance can also be improved by using a disk cache-an area of super-fast memory that pre-fetches and stores data from the drive that is likely to be used next by the CPU.
A hard drive's performance also depends on its interface-the electronic connection between the drive and the motherboard-which determines its data transfer rate. Interfaces are controlled by a hard disk controller. The text mentions several of these, from the older and still very common IDE to the newer Ultra Wide SCSI and Ultra DMA, and provides a table showing their relative performance.
Hard disk maintenance procedures include using programs to check and if necessary re-organize the files stored on the disk, and making regular backups in case the disk should fail.
As the hard disks meant for desktop computers have improved in quality
an dropped in price, minicomputers and even mainframes have begun to use
them in a RAID (Redundant Array of Inexpensive Disks) which combines dozens
or hundreds of disks with a controller in a single box.
Some magnetic drives combine the best qualities of both floppies and
hard drives; these are removable hard drives. The disk is in a sealed cartridge
that can be removed; these drives have capacities ranging up to a few hundred
megabytes; they are useful as a lower-capacity alternative to tape back-up
but have faster access times.
The multimedia platforms utilized on the Internet continue to fuel the demand for even larger and faster storage devices. The growing need for more disk space and the need for transportability have given way to the popularity of compression disks drives such as the Iomega Zip and Jaz drives, and the Syquest EZ Flyer series which offer inexpensive incremental storage.
Magnetic Tape: Still Useful
Magnetic tape is still used for backup medium because it can be reused,
it is inexpensive, can store large amounts of data quickly, and is easily
transportable. The negative characteristics of magnetic tape include the
fact that it deteriorates with time. In addition, data can only be
accessed sequentially, so data retrieval is extremely slow. QIC (quarter
inch cartridges) store more than 10 MB of data, and mass storage systems
store hundreds or thousand of these cartridges for backup purposes.
Sidebar: Fragile Data Storage
Data stored in computers is not as permanent as you might think.
It can be lost in two principle ways:
· It can decay because of damage to or deterioration of the
physical medium on which it is stored.
· It can become unrecoverable if the hardware or software needed
to retrieve it becomes obsolete and is no longer available.
The problems of data retrieval will become even more serious as technology progresses. Specialization in the field of data retrieval may become a new career option.
Optical Storage Media: Seeing the Light
Optical disks store data using a laser to read or write. The main types
of optical disk are CD-ROM, CD-R (recordable), CD-RW (rewritable), DVD-ROM,
and MO (Magneto-optical). Most new computer systems contain a Compact Disc
Read-Only Memory (CD-ROM). Today most software and printed user manuals
are often distributed on CD-ROM rather than on twenty or thirty high-density
floppy disks. Responding to consumer demand, manufacturers have developed
CD-ROM drives with much higher speeds that can be close to the speed of
a slow hard drive.
Newer CD types can record as well as read information. CD-R can record only once for permanent storage, while CD-RW can record over and over. The disks they record can be played on any computer that has a CD-ROM drive. CD-R and CD-RW are useful for creating transportable presentations and other large data files that won't fit on a standard floppy disk, or for backups. CD "jukeboxes" hold multiple CDs for access by many computers on a network simultaneously.
DVD stands for Digital Video Disk or Digital Versatile Disk. DVD-ROMs are similar to CD-ROMs, but can store more data. They require a special drive, but because that drive can read both DVD-ROMs and CD-ROMS, they will probably replace CD-ROM drives as standard equipment on PCs. DVD-RAM is the read/write version of DVD-ROM.
Write Once Read Many Compact Disks (WORM CDs) are used for giant data sets that aren't going to change, such as file archives, complete image processing, replications of photos, graphics, text, signatures, surveying, marketing and population data. They are encoded and can't be altered after they are encoded.
Magneto-optical disks (MO) are erasable, measure storage in gigabytes, removable, portable, and durable. They are used for applications that require large storage capacity.
Storing Data in Files
Although data could be stored without distinct files, locating and
using data would be much more difficult. Data is generally organized into
a separate file, which is stored with a distinct name, and related files
are stored in a single directory on a disk. Directories are also
named.
Program files are binary files - files that are written for and usable on only a single type of computer. They include operating systems and applications.
Data files, on the other hand, may be useful on different computer systems, as long as a program that can recognize the proprietary file format, either directly or through a conversion utility, is available. Many programs have a built-in capability to read different file formats. Opening a file without the capability of reading the file format results in "garbage"-characters from the unrecognized formatting information mingled with any text information.
Configuration files store settings that you typically use with the program, such as the typeface, the margin settings, the model of printer, and so on.
Most database, word processing, and spreadsheet programs allow export to plain text (ASCII) by using tab characters to separate fields of records or columns of spreadsheets and carriage returns between records or rows.
JPEG and GIF graphics file formats have the advantage of using data compression to reduce the amount of storage space an image requires. Sound files contain digitized sounds.
Backup files are not really a distinct type of file, because they can be of any file type. The purpose of a backup file, though, is to be an additional copy of another file.
DOS limits file names to 8 characters with a 3-character extension. Windows 3.1 uses MS-DOS for file storage and organization so it requires the same restrictions. Windows NT and Windows 95 finally removed these restrictions, which had long irritated users and which had long been eliminated in Macintoshes.
Certain special characters cannot be included in file names. Macintosh file names cannot contain colons (:), for example, and MS-DOS file names are limited to alphanumeric characters and a few punctuation symbols.
Some operating systems and programs require the use of extensions to operate correctly. For example, a C compiler may require a .C extension, and MS-DOS requires a .BAT on batch command files. Knowing the meaning of common extensions can tell you quickly what type of program created (and can open) a given file, or what kind of file it is. A table of common extensions is given in the text.
Some MS-DOS users never use directories; when they type DIR to get a list of their files, it is too long to fit on the screen. Although these users can still review their directory listing a screen at a time, they can easily lose track of a file. (The first version of MS-DOS did not allow the use of subdirectories; this capability was quickly added.)
System files are almost always kept in a separate directory to help prevent users from accidentally changing or removing files critical for system operation. Application programs, which may have 10-100 separate files in the complete package, are usually kept in separate directories as well.
Most people organize files by content or by use. If subdirectories are regarded as branches of an upside-down tree, the root directory is the top-most directory. Directories that are located on a branch under a given directory in this structure are its children, and it is their parent. The term "path" refers to the route that must be followed from the root through parent directories to child directories to reach a given file.
Most operating systems represent subdirectories as "special" files that contain lists of file names and information about the location of those files on disk.
The Macintosh represents subdirectories as folders within the hard disk (analogous to a filing cabinet). The user can rename and change the position of a folder (and all its enclosed folders and files) quite easily. Windows has now adopted the term "folders" also.
File Systems in Business: Minding the Store
Transaction processing allows hundreds or thousands of transactions
to be received and processed by a large central computer. Airline reservation
systems are an example of transaction processing.
The data storage hierarchy breaks up information into useful pieces at different levels. In the example, first name and last name are stored in separate fields so that users can search for them independently. Related fields are combined to form a record, and records to form a file. Batch processing may be used for applications where immediate feedback is not required. Real-time processing is practiced by travel agents, because they must be able to tell a customer whether a seat on a plane is available.
Report files may contain data selected according to some criteria as well as data presented in a different useful way (such as paychecks, mailing labels, or accounting summaries). Backups are critical in large data processing installations and frequently are duplicated and kept at secure alternate locations.
Three popular methods of file organization are discussed: sequential, direct (random), and indexed sequential.
Sequential files were the only kind used when data was stored on magnetic
tape.
Random-access file often is synonymous with relative file. Relative
files can be accessed sequentially by simply accessing record 1 first,
record 2 second, and so on. Therefore, relative files are more general
than sequential files but can still be accessed sequentially.
Indexed sequential files require more computing time (as well as more
storage) to maintain, because a change to the file may require that several
indexes be updated.
Have you ever saved a document in a Windows 95/98 software application then tried to open the document in a DOS or Windows 3.11 application only to find that you couldn't locate the document because the document name you're looking for can't be found? There's a good explanation for this.
As your text states, Windows 95/98 allows for up to 255 characters when naming a file including spaces and several special characters. DOS and Windows 3.11 only allow 8 characters with a 3 character extension. When you want to open the file in the DOS or Windows 3.11 application, the file name has been changed so that the file name is recognizable by DOS and 3.11. Here's what happens. Say I created a file in Word Perfect 8.0 and saved it as Barb Stevens.wpd. I then started up Word Perfect 6.1 (which is a DOS-based program) so that I could edit my Barb Stevens.wpd document. The name of the file would no longer be Barb Stevens.wpd but has changed to:
barbst~1.wpd
This is what DOS does when it sees a file the exceeds its own 8.3 naming convention:
1. It keeps the first 6 letters and discards the rest.
2. It removes any special characters.
3. It adds the tilde punctuation mark (~) plus a number.
My original file name, Barb Stevens.wpd, which exceeded the DOS/Windows 3.11 8.3 naming convention has now been altered to a name that it recognizes. The new name given by the DOS/Windows 3.11 is the permanent name of the file and would now be recognized as such by Windows 95/98.
So, what if I had 3 documents created in Word Perfect 8.0 that all began with the words Barb Stevens, such as:
Barb Stevens recipe.wpd
Barb Stevens address book.wpd
Barb Stevens calendar.wpd
What would DOS/Windows 3.11 do with these if it followed the three steps above? All three file names would change as to the one above created by DOS/Windows 3.11 but the only difference would be the number. So my recipe, address book, and calendar files would now be called:
barbst~1.wpd
barbst~2.wpd
barbst~3.wpd
This process would continue for the first 9 files. Then if I had a 10th file named Barb Stevens . . .wpd, DOS/Windows 3.11 would take only the first 5 characters of the file name, add the tilde, and use the number 10. This process would continue up to 99 files, then on the 100th, it takes the first 4 characters, etc.
As we move farther and farther away from DOS/Windows 3.11 applications, you will see less and less of this in your file names. But if you do see a file in your hard drive that contains the tilde (~), you at least know that at one time the file was opened by either DOS or Windows 3.11 if not created by either.
Windows Directories, Folders, and Files
DOS Directory and File Management
Defragmentation and Disk Operations
Using Files