About  |  Contact Us  |  Register for Benefits  |  Login  |  View/Edit Your Profile  |  Consulting  |  Principal & Founder  |  Sponsorships  |  Legal & Privacy

  Home      Blog      Job Board      Community      Contribute      Vendor Listings      Search Site
  Employment  |  More With Less  |  Potpourri  |  Records  |  Reporting  |  Research  |  Revenue  |  Samples  | Systems  |  Web Sightings
 
  The Basics of Document Sizing Records

Overview

Document Imaging SizingAs organizations move towards managing more and more digital documents, they must develop a document management system that will best suit their needs. However, it is not enough to have a document imaging system; it must also be maintained.

One of the tasks in the implementation and ongoing management of a document imaging system is to provide estimates of the amount of storage the organization will need for the short to medium term.

For any systems, especially large ones, it seems like a daunting task with many factors to consider. Where to begin?

Estimating

In the case of a document management system, planners must determine how much storage will be required for the immediate future.

This can be accomplished by first determining how much storage space will be required for the individual documents in the system.

The best way to do this is to use estimates of the actual figures. It is much easier for managers to make decisions based on easily useable, round figures that can be readily multiplied.

For example, 50 thousand bytes (50 Kilobytes) can be used as the industry standard for a bi-tonal (black and white) scanned document because it is close to the average size of scanned pages and also yields an estimate of exactly one million bytes (a Megabyte) for 20 pages, exactly one billion bytes (a Gigabyte) for 20 thousand pages, and exactly one trillion bytes (a Terabyte) for 20 million pages. In business, most documents will be bi-tonal.

These estimates lean towards an over-estimation rather than an underestimation of storage requirements. When dealing with these types of estimation, it is always safer to be conservative, especially when all assumptions have been factored in.

It is especially important to use conservative estimates since the addition of hardware in the future can be complicated. You may need to purchase new platters, new servers and/or reorganize existing images. All of this can be very costly.

As well, if everyone uses the same estimates, it is easier to discuss and compare document imaging systems. Because the estimates are industry standard, less time can be spent evaluating estimating methods, and more time can be spent understanding how the system will be used and whether the system design will accommodate the planned use.

Born Digital Documents (i.e. Microsoft Word documents etc.)

When documents are imported directly in the form they were created, they require much less storage space.

For example, a Microsoft Word document will only require 25 Kilobytes of storage space; as opposed to 50 Kilobytes for the same document when scanned. The reason for this is that scanned images pick up a lot of extra information through what’s called “digital noise”.

Compression

All of the figures given for scanned images are for compressed file sizes, unless otherwise noted. All imaging systems compress their image files for storage and transmission. Compression removes the redundancy and digital noise from the files, making the files smaller. These compressed page files have an average size of approximately 50 thousand bytes per page for bi-tonal pages.

Different Resolutions

Image files created by scanning at 200, 300, and 400 dpi (dots per inch) all have the same information content as the original image.

The only difference is that the higher resolutions merely increase the redundancy in the image file.

Compression removes this redundancy. In general, higher resolution scans of an image are slightly larger than lower resolution scans of the same image because higher resolution scans pick up more digital noise.

This variation between the compressed image sizes of different resolutions is within the variation range of document image sizes in general. In almost all cases, measuring the actual sizes of the first one percent of scanned images will easily adjust for this variation without requiring significant system changes.

Industry Standard vs. Actual Size of Documents

Making the assumption that your documents are similar to the industry average documents usually produces very small variances.

Because the cost of storage is very low as a percentage of overall system cost, and is constantly decreasing, an error of a few percent in an estimate has very little effect on the overall system cost. If round estimates speed up the understanding and discussion process, the benefit of rounding far out-weighs the cost of the slight variances.

After one percent of the documents have been scanned into a system, an actual average page image size can be calculated.

This actual average page image size will almost always provide the small correction necessary to adjust previous estimates. This is the system sizing method used in almost all system implementations.

The following table illustrates the differences between Industry standards and the average sizes that were calculated from the representative sample of documents:
 

  Advancement
Average
Industry Average Advancement Average Industry Average
PDF DOC XLS JPG PDF DOC XLS JPG
1 Page
Standard Typed
34 KB
(at 200 dpi)
50 KB
(at 300 dpi)
62 KB 25 KB n/a n/a 50 KB 25 KB 25 KB n/a
2 Page
Standard Typed
69 KB
(at 200 dpi)
100 KB
(at 300 dpi)
n/a 31 KB n/a n/a n/a 100 KB 30 KB n/a
Batch Standard
(14 pages, bi-tonal)

2.8 MB
(Avg. of 200 KB
/page at 200 dpi)

700 KB
(Avg. of 50 KB
/page at 300 dpi)

n/a n/a n/a n/a n/a n/a n/a n/a

1 page
colour standard

320 KB
(at 200 dpi)

500 KB
(at 150 dpi)

87 KB n/a n/a n/a 1 MB 25 KB 25 KB 1 MB

2 page
colour standard

640 KB
(at 200 dpi)

1 MB
(at 150 dpi)

n/a n/a n/a n/a

2 MB

30 KB 30 KB 2 MB

1 page
greyscale standard

262 KB
(at 200 dpi)

500 KB
(at 150 dpi)

n/a 55 KB n/a n/a 50 KB 25 KB 25 KB 50 KB

2 page
greyscale standard

324 KB
(at 200 dpi)

1 MB
(at 150 dpi)

n/a 94 KB n/a n/a 100 KB 30 KB 30 KB 100 KB

Receipts standard
(PDF)

n/a

100 KB
(at 300 dpi)

62 KB n/a n/a n/a 100 KB n/a n/a n/a


Using these estimates and projecting a future growth rate by estimated the number and size of future documents facilitates planning for future storage requirements.

You can also use the same methods of storage estimation for the future for your analysis of staffing requirements and both sets of statistics dovetail nicely for use in the planning process.

Note: This paper was based on information from Steve Gilheany on ArchiveBuilders.com.

To see the full version of the text, click on the link Sizing a Document Management System.

 

Electronic Document Imaging ResourcesAdditional Document Imaging Resources

Archive Builders
  Specializing in Manual and Digital Corporate Archives and Records Management.
Document Imaging Report
  Numerous resources on document imaging.
Laserfiche
  Resources for records managers including document imaging.
Lexis Nexis
  Applied Discover. Resources, email newsletter and white papers.

Other Miscellaneous Resources
  http://ed-thelen.org/comp-hist/con-Starkweather-Pedistal-Paper3.htm
  http://developer.novell.com/research/appnotes/1996/august/03/apv.htm
  http://gdz.sub.uni-goettingen.de/dieper/D15F.pdf
  http://pcnetworks.com/DOCSTOR.HTM

On this site ...

  Document Imaging Sizing. Thomas Tran.
  Document Imaging Systems Analyst. Job Description for an Imaging Analyst from the job descriptions page.
 Supportingadvancement.com Electronic Imaging - The Series. Part 1, Part 2, Part 3, Part 4, Part 5. Ursula Shail.
 Supportingadvancement.com Document Management. Implementing document management systems.
 
Overview on Document Imaging.
 
What to Do After "Yes". Preparing for your new imaging system. Jennifer Warwick.

Scanned document reports from the favorite reports page.
 Supportingadvancement.com By Fiscal Year
 Supportingadvancement.com By Fiscal Month
 Supportingadvancement.com By Fiscal Year Across Month
 Supportingadvancement.com By Fiscal Year by Document Group.

 
Contributed by ...
 
Thomas Tran, Document Systems Analyst, Division of University Advancement at the University of Toronto
Email AddressThomas Tran has worked at the University of Toronto for 4 years. During this time, he has assisted in the implementation and  maintenance of the document imaging system in the Division of University Advancement.

Supportingadvancement.com Document Imaging Sizing
 
  ↑  Top of Page  |  Samples Page  |  Sample Forms  |  Favorite Reports  |  Frequently Asked Questions  |  Glossary of Terms