Larimer County Genealogical Society

(+) How To Store Data in the Cloud

The following is a Plus Edition article written by and copyright by Dick Eastman. 

I received a message from a newsletter reader, asking how to store a genealogy society’s huge collection of digital images on a safe and secure online service. The following is an excerpt from a longer message:

I noted your recent writing about cloud computing…  Our genealogical society is struggling to determine the best back-up/storage solution for our growing files of electronic data.  We are seriously into digitizing local county records. The archival images, as you know, are relatively large files.  We already have close to 3 terabytes of data with a projected growth to circa 10 terabytes in the next 3 to 5 years if our digitizing and other electronic projects take off.  

Having stable, secure storage is increasingly important to our society.  We simply cannot leave this digital data at risk. And shipping 1 terabyte or 1.5 terabyte hard drives around among society digitizers and in-house e-publishers doesn’t seem like a very good solution. We are concerned about possible data loss from lost or damaged shipments and similar hazards. We also know that having a single copy is not sufficient; we need multiple copies for backup purposes. We have invested a lot of time and money in creating these images of records and cannot afford the risk of having single copies.

What is the best way to store such a significant and growing amount of data where we can add to it, have it securely backed-up, etc. Engage the services of a server farm? Do you have one to recommend? Use Carbonite? Or Amazon Web Services? Other? 

Excellent questions! In fact, there are several possible answers. First, let me re-state the goals in my own words:

You wish to safely and securely store files that will eventually grow to about ten terabytes within a very few years. This collection of data must be safely stored in a modern data center, managed by professionals and with multiple backups stored in different locations for safety reasons. While you didn’t mention it, I will suggest that you also need excellent security to ensure that the society’s designated personnel and computers can access the files at any time, but nobody else is ever able to gain access.

I must say this sounds like a job for the cloud!

In fact, several cloud vendors can easily handle terabytes of information. 

NOTE: A terabyte is 1024 gigabytes. A gigabyte is 1024 megabytes. A megabyte is 1024 kilobytes. A kilobyte is 1024 bytes. If my math is correct, a terabyte is 1,099,511,627,776 bytes (a bit more than one trillion bytes), or close to the storage space provided by one million floppy disks.

Dozens of companies now provide storage space “in the cloud” and prices are generally reasonable. For consumers who wish to save a few megabytes of information, the more popular services include Backblaze, Carbonite, Dropbox, Google Drive, Apple’s iCloud, Microsoft OneDrive, pCloud, and many others. However, if you want to store gigabytes or even terabytes of information, you probably want to look at the companies that provide storage space and other services to commercial companies.

Indeed, the leading cloud services provide more than simple file storage. It is possible to run databases, provide virtual private networks (VPNs), search engines, financial applications (including payroll), email servers, and more. 

Many organizations even have their web servers in the cloud. For instance, Netflix has thousands of web servers and terabytes of storage in the cloud. When you go to www.netflix.com, you are connecting to cloud-based servers. All the movies you watch are streamed from cloud servers on Amazon Web Services (AWS), not from a data center at Netflix headquarters. (See http://aws.amazon.com/solutions/case-studies/netflix/ for details.) SmugMug’s web servers also run in the cloud (see http://aws.amazon.com/solutions/case-studies/smugmug/ for details) as does the government’s US Food and Drug Administration (details may be found at http://aws.amazon.com/solutions/case-studies/us-food-and-drug-administration/). 

Genealogy vendors also use cloud services. The last I knew, the New England Historic Genealogical Society’s web site at www.AmericanAncestors.org is actually running in the cloud. The folks at FamilySearch have used the cloud to provide some of the services on www.FamilySearch.org although I don’t think the entire web site is cloud-based. The Michigan State Archives also uses the cloud for its Preservica data storage and retrieval system (see https://preservica.com/resources/blogs-and-news/forever-accessible-archives-michigan-moves-its-records-to-the-cloud for details).

Part of the www.EOGN.com web site that you are reading at this moment is also hosted in Amazon Web Services in the cloud, and many other files are stored in pCloud in Switzerland.

The remainder of this article is reserved for Plus Edition subscribers only. If you have a Plus Edition subscription, you may read the full article at: https://eogn.com/(*)-Plus-Edition-News-Articles/13269866.

If you are not yet a Plus Edition subscriber, you can learn more about such subscriptions and even upgrade to a Plus Edition subscription immediately at https://eogn.com/page-18077