University of California Academic Senate, Berkeley Division

Committee on Computing and Communications

 

Computer and Data Security on Campus:

A Tutorial for Users
(Link to Campus site)

 

Version 2.1, February 2, 2005

 

Table of Contents

 

Introduction ...................................................................................................................................................
Why you should care .....................................................................................................................................
Security is a system .......................................................................................................................................
Security impacts usability ...............................................................................................................................
Major elements of security..............................................................................................................................
Computer security .........................................................................................................................................
   Threats ........................................................................................................................................................
   Responses ...................................................................................................................................................
   Platform-specific issues .................................................................................................................................
Data security .................................................................................................................................................
   Threats ........................................................................................................................................................
   Responses ...................................................................................................................................................
Where to direct comments or questions ..........................................................................................................
Revision history .............................................................................................................................................
Copyright notice and disclaimer .....................................................................................................................
Appendix A: Other resources ........................................................................................................................
Appendix B: Applicable University policies ....................................................................................................
Appendix C: Free security software available to campus users ........................................................................
   Windows ....................................................................................................................................................
   Macintosh ...................................................................................................................................................
   Linux ...........................................................................................................................................................
Appendix D: De-identification of data sets ......................................................................................................
2
2
2
3
3
4
4
5
6
6
6
7
10
10
10
11
11
12
12
14
14
15

Introduction

Computer and data security has been a growing concern on campus. This Committee believes that education of all computer users is essential to strengthening computer and data security. To that end, this memorandum [1] should help users to understand their responsibilities and the threats they face, and offer them the means to meet those responsibilities and counter those threats.

Why you should care

As a user, why should you be interested in the security of the computer you use, or of the sensitive data you may be harboring?

Impact on you. The Internet is global, allowing anybody anywhere to attempt to penetrate your computer. Many adversaries out there are doing it for the challenge; they simply have nothing better to do. But a significant number of adversaries have criminal intent. They would like to exploit confidential information or commit identity theft or vandalize your data. The most direct and serious impact on you is campus, state and federal policies that may require an expensive and time-consuming notification of security breaches to third parties who may be affected, even if those breaches are not exploited.

Impact on others. Some adversaries want to seize control of your machine and wield it as a weapon against others. For example, they may co-opt your machine to originate spam, or to launch a denial of service attack (swamping some web site with fake and useless traffic). After they penetrate your machine, they become an insider bypassing the security perimeter of the campus, making it easier for them to attack your colleagues on campus.

Impact on staff organizations. Professional system administrators on campus spend increasing time dealing with security issues, and this drains resources from more productive activities. If users are vigilant and responsive, the needless burden on these organizations can be greatly reduced.

Security is a system

It is important to realize that security is a system of individual measures, each of which is not fully effective in isolation but which work effectively in tandem. As a system, it is only as strong as its weakest link. To appreciate this concept, consider your local bank branch [2] . It has a vault, a teller cage, a lock on the front door, a surveillance camera, an alarm system to summon the police, and an armored vehicle to transport cash to and fro. Think about it: These measures are complementary, and each makes up for obvious shortcomings in the others. Further, the security system can never be 100% effective, even though it can prevent most thefts.

Security impacts usability

Security always adversely impacts the ease of legitimate uses. Returning to the bank branch example, if the bank was willing to deny customer's access to their money, or even willing to make it harder for customers to access their money, security could be made more effective. Letting customers in the front door also lets in the bad guys. Thus, any security system, to avoid unnecessarily getting in the way of legitimate uses, should counter the most credible threats and take into account the seriousness of any consequences. This is why the security is more noticeable and invasive in the bank than it is in the grocery store. Analogously, dealing with sensitive information (like student's grades or identity information for human research subjects) deserves more stringent (and hence more invasive) security measures than, say, the drafting of this memorandum.

Major elements of security

Security requires a combination of people and technology. In the bank example, the bank's alarm is technological but it requires a vigilant teller to push the button, and the policeman's gun is technological but it requires the policeman's judgment to aim and fire.

People. In computer security, the security people include the users, who are analogous to the bank teller who decides who is a criminal (not legitimate customer) and pushes the alarm button and who hands money only to customers with appropriate balances in their accounts. The security people also include professional staff administering the computers and networks and computers on campus, analogous to the bank security staff installing the locks and alarms and surveillance cameras and monitoring and maintaining them. The people include law enforcement authorities, who are prepared to investigate and prosecute when a crime is committed.

Using technology appropriately. There exist effective security technologies, but they have to be used, and used properly, to be effective. An important source of security lapses is the failure to use available technologies, or using them improperly. Human error (either users or system administrators) is also a frequent cause of lapses. The most effective means to minimize human error is to employ technologies that are automatic and transparent, installed, configured, and maintained by professional system administrators. But even with this professional administration, users still have an important role, including vigilance and avoiding common errors.

With the foregoing in mind, the following are essential elements of a security system:

Education. Users need to be aware of their risks and their responsibilities, and understand how to use the technologies available to them and the consequences of innocent errors or omissions they may make. This memorandum is a step in that direction, but the campus should make available additional resources and training sessions.

Software. As the Internet provides global connectivity to any computer, security software preventing and detecting nefarious access is essential.

Services. It doesnât make sense for every user to gain the expertise necessary to choose, install, configure, and maintain all the security technologies. Thus, professional services should be made available to professionally administer and manage any computer, (especially those harboring sensitive data), install, configure, and maintain specialized security tools, and monitor for intrusions.Ê Users, especially those harboring sensitive data, should take advantage of these services.

Policies. The University has a right to expect all members of the campus community to adhere to minimum security practices through the expression of mandatory policies. Otherwise, it may be subject to litigation, or (as we discuss below) lax practices on the part of one employee or student may harm other employees or students. However, policies should always be accompanied by (and explicitly state) the means to adhere to them (by making available education, technologies, and services), and should be designed to be enforceable. Lacking any explicit and stated means for adherence, policies serve only to prohibit computer use, which is increasingly a requisite to university employment and scholarship. Focusing policies around credible means to follow them will also encourage wider compliance, although enforcement is generally necessary to ensure universal compliance.

Laws. It will always be possible for an insider or outsider to penetrate computer security through malfeasance. Laws provide for punishment as deterrence to this activity, and may also isolate the perpetrator from society so that they are unable to repeat this act.

Computer security

The first step to becoming an informed user is to be aware of threats to your computer system and some of the ways in which you can protect it.

Threats

Malicious programs. These programs somehow get installed on your computer, usually without your knowledge or under false pretenses. They may consume or vandalize resources, or use your computer as a stepping-off point to infecting other computers, or reveal information about you. These programs have different names, depending on their context and purpose, but all are to be avoided. A virus is a software program that finds its way into your computer without your knowledge, often by attaching itself to a legitimate file or email message. A worm is a software program that uses the network to replicate itself on different computers. A Trojan horse is a program that masquerades as a benign application but is actually malicious. Spyware is a program (it may be a trojan horse or it may be invisible to the user) that collects information from your computer and transmits it remotely. Spyware may be relatively benign, tracking your web browsing habits for marketing purposes. Some spyware is malicious, revealing data stored in your computer or capturing all your keystrokes and transmitting them back to an adversary. The latter is especially insidious, as this can allow an adversary to capture passwords and encryption keys (see later).

Intrusion. An outsider may gain access to your computer over the network (for example by guessing passwords or by tricking you into installing a trojan horse or spyware program) so that this person can do anything you could do, or worse anything the system administrator could do. This is analogous to giving a bank robber the manager's keys and safe combinations.

Physical access. Any adversary who gains physical access to a computer has a great head start at breaking into it. For example, you may leave your self logged in so the intruder can masquerade as you without knowledge of your password. Or the intruder may boot the computer with a removable disk and access your files. Or the intruder may steal the computer or its hard drive and work on it at their leisure. By inference a laptop that is carried in public carries the greatest risk.

Responses

Up-to-date operating system. Most ways to break into a computer are known flaws in the operating system software (and auxiliary software like media players or email clients). Because adversaries tend to focus on the most widely used operating system, Windows is a greater risk than other systems. But even Windows is not a great risk as long as you use a recent version of the operating system that is supported by vendor (Windows 98, 2000 or XP, and MacOS 9 or 10), and install latest patches regularly. In the case of Windows XP, you can configure it to upgrade itself regularly and automatically, and you should do so.

Password choice. Knowledge of a password is the only factor that distinguishes you, the legitimate owner, from somebody trying to break in. Adversaries sometimes write programs that attempt to log into your computer over the network by trying many passwords, or they may try to guess your password based on knowledge about your personal life. A very insecure password is thus something personally associated with you (like a date or social security number or address) or any word from the dictionary. The best password is long (at least eight characters) and has a mixture of letters (both lower and upper case), numbers, and punctuation. Also be wary of typing in a password through a network connection, as it may be eavesdropped (see below).

Firewall. If your computer is permanently connected to the Internet, it is important to use a firewall. Your computer has ports, which are analogous to the different doors in a bank building (except that there are a lot more of them, 65536 to be exact). Each application is assigned to a port; for example, a Web browser uses port 80. Among other things, a firewall makes sure that ports not in use are inaccessible from the outside [3] . More sophisticated firewalls monitor and regulate traffic in both directions, so for example they will inform you if a Trojan horse program is trying to access the internet (by giving you the name of the program and asking for your permission to access the network).

Virus protection software. Viruses arrive attached to email programs all the time. They won't affect you as long as you donât open email attachment files. However, it is easy to make a mistake, because virus senders donât use their real email addresses, they spoof the addresses of real users. So how do you tell the difference between an email attachment that someone is legitimately sending you, vs. a virus? There are two answers: First, be very sparing about opening email attachments÷if there is the least question, check with the sender first to verify that it is legitimate and benign. Second, run virus protection software. This software maintains a database of known viruses, and flags or deletes messages with known virus attachments. You have to make sure the database is frequently updated with the latest threats.

Spyware protection software. This type of software looks for spyware software running on your computer or stored on your disk. It works just like virus protection, using a database of known spyware threats that is updated frequently.

Platform-specific issues

Desktop computers. Desktop computers are the least secure computing platforms imaginable, for several reasons. They are optimized for usability, and as we indicate above, ease of use is generally associated with poor security. Their history is as a low-cost and unsophisticated platform, so security is mostly an afterthought. They harbor applications that are inherently 'leaky', like web browsers and email. Finally, and probably most important, they are often not professionally administrated. Thus, you need to take special effort to keep your desktop environment secure, and wherever possible you should not harbor sensitive data sets on your desktop.

Laptop computers. These computers have all the problems of desktop computer, and the additional factor that they are often lugged around public areas making them more susceptible to physical access or theft. Thus every precaution for desktops should be magnified for laptops.

Servers. These are computers that you access over the Internet, for example to conduct your online banking or to purchase goods and services. These platforms are far more secure than the desktop, basically because all the factors that make the desktop insecure are missing. In fact, servers are rarely compromised, and when they are it is almost always due to human error. Thus, when you are dealing with sensitive data, wherever practical you should capture and store it within servers.

Data security

Many personal and professional activities involve sensitive data that you would not want to lose or to be divulged. Examples are financial data or identify information about yourself or human research subjects. Again, it is good to be aware of threats and protective measures.

Due to the aforementioned campus policies and state and federal laws, you must report to responsible campus authorities immediately any known security breaches where even the potential for divulging sensitive data is suspected [4] . Those authorities will help you to ascertain the seriousness of the security breach and insure that all legal requirements are fulfilled.

Threats

Vandalism. An adversary may be bent on vandalizing your data, deleting it or corrupting it. This is easier than stealing data. For example, a trojan horse program that corrupts data without revealing it on the outside will not cause a firewall to alert you. Besides computer security, it is important to maintain frequent backups of your data sets. This is all the more important because they can be destroyed by benign failures (like a disk crash) even more easily than by vandalism.

Access. A more serious threat to sensitive data is revealing it to an adversary, who may use it for identity theft (yours or that of human subjects) or other nefarious purposes. This requires an adversary to gain access to your computer through the security holes mentioned earlier or, through physical access to your computer. Keep in mind that your backups of data sets are as valuable to an adversary as the latest versions stored on your computers. Thus, you need to worry about the security of backups too.

Discarding data securely. Even getting rid of a sensitive data set can be difficult. Simply deleting a file is usually not enough. It may live on in the 'recycle bin' or as part of past backups. Even if you delete it from the recycle bin, the data itself will probably still reside on the disk at least until it is later overwritten, because the operating system simply stops keeping track of it without actually erasing it from the disk. Thus, to get rid of a data set, you must do three things: delete the file, empty the recycle bin, and take measures to 'wipe' the disk of deleted data (using utility programs available for this purpose). You must also take account of any copies that may exist in backups.

Eavesdropping. Whenever you send a data set over the network, an adversary may be eavesdropping. The Internet is very insecure, so you should always assume that whatever you send over it can be eavesdropped. This includes email, and email attachments.

Loss of removable media. These storage media (like flash memory, floppy disks, and CD-ROM are very convenient for transporting files or creating backups. However, they also lack physical security and can be lost or stolen, revealing sensitive data.

Phishing. This is a strategy used by adversaries with the goal of identity theft [5] by luring you to a phony (but legitimate looking) web site, supposedly to update "account information". Never provide personal information to someone who initiates the contact with you, either by email or phone. Make sure that you initiate the contact and have a clear and legitimate purpose in revealing personal information.

Responses

As mentioned above, the seriousness with which you take security and the strength of your responses should depend on the nature of the threat and consequences. The first set of measures focuses on preventing access to your computer.

Computer security. You can make access to the data in your computer less likely by paying attention to computer security as described earlier. This includes using strong passwords and changing them often, logging out when not using the computer, and preventing physical access to your computer. This also includes using a firewall, and preventing and detecting malevolent programs from infecting your computer.

The second set of measures focus on minimizing or eliminating the damage when your computer is broken into.

Disk wiping. After you delete a file containing sensitive data and empty the recycle bin, you should wipe your disk. Otherwise, a determined adversary may be able to recover the deleted file if they are able to break into your computer or gain physical access to it.

Encryption. This is a sophisticated algorithm that transforms a secret encryption key (which is simply a set of data, perhaps created randomly or transformed from a password or passphrase), together with the data set itself, into a new unrecognizable form. The original data cannot be recovered without knowledge of a corresponding secret decryption key. Thus, if you store and transmit a data set in encrypted form, as long as you donât reveal the decryption key, then an adversary who steals or monitors that data set will be foiled. This is analogous to the bank transporting money in a lockbox that cannot be opened, even if it is stolen. Encryption is most effective in situations where data is not physically secured (stored on a removable medium, a laptop computer, or transmission across the network). Indeed, in these circumstances encryption is the only effective means to prevent data from being stolen. However, you must protect the decryption key from being stolen, and must convey it securely to whomever needs to decrypt the data [6] . Another threat you should be aware of is that should you lose the decryption key, you will also lose the data set [7] .

Encryption illustrates the 'system' nature of security, because one weakness is that an adversary who has broken into your computer may be able to steal encryption keys using spyware keystroke monitoring tools when encryption is used for storage on an active computer. Thus, the use of encryption does not reduce the need for strong computer security.

De-identification of data sets. Often times you can remove the sensitive part of a data set (like data elements that identify human subjects); what is left can be publicly revealed without problem. If there are legitimate reasons to retain the sensitive part of the data, it can be separated into a different file. Then you have two files: a de-identified data set and an identity data set. The former can be treated more casually, and the latter under more stringent security, for example by keeping it out of your computer storage and on a removable disk under strong lock and key, or by making sure it is always encrypted. The two files can later be joined back together as needed. (See Appendix D for an illustration of how to do this.)

Another category of responses focuses on keeping an eavesdropper from observing your communications over the network from gaining any useful information.

Secure file transfer. Transferring a data set over the network is inviting theft. If a data set is sensitive, you should take one of two precautions. You can encrypt the data set file, and once you have done this you can safely transfer it over the network, send it as an email attachment, etc. However, note that you cannot send the encryption key (which is uniquely chosen for this file only) over the network÷you need to get it securely to the recipient. For example, you might send it by registered mail, or call the recipient and give it to them verbally. Another approach is to use a secure file transfer program. Either approach is analogous to the bank's armored car for transferring money from bank branch to central vault.

Web browser security. Web browsers have a built-in security capability called "Secure Socket Layer (SSL)". You can tell if SSL is in use (your are viewing a secure Web server) by the 'lock' icon at the bottom of the window. SSL provides three capabilities: (a) It authenticates the secure Web site, so you can be reasonably certain it is not a phony. This is a good antidote for phishing [8] (their site should not be able to authenticate by SSL). (b) It encrypts the data passing in both directions, so eavesdropping should not be possible. (c) It ensures that data cannot be changed in transit without that being detected. You should not enter any personal information to a Web browser unless it talking to a secure site using SSL. Conversely, if you gather personal information (like in human subject research), you should insure that your server both supports and is using SSL.

Virtual private network (VPN). If you are accessing a remote server computer, an adversary can eavesdrop on all the back-and-forth commands and responses. A VPN [9] sets up a semi-permanent link between your computer and the server that (a) authenticates both parties and (b) encrypts all data before it is transferred. (This is basically what SSL does for Web browsers, except the VPN provides similar security for all applications.) This is analogous to the bank digging a secure tunnel from its branch and the main vault so that it does not need to use armored cars or lock boxes.

Finally, a longer-term measure is for the campus to provide a secure repository for sensitive data together with secure networked data transfer.

Secure data server. The campus should provide a secure server for storage of sensitive data. Such a server, together with a secure data link from server to your computer (like VPN or SSL) would solve the problem of finding a secure location for sensitive data sets. This is analogous to the bank's central vault.


Where to direct comments or questions

You may send comments (technical errors, needed clarifications or additions, or suggestions for improvement) to Professor David G. Messerschmitt, Co-Chair of the Committee, at messer@eecs (at domain berkeley.edu). Please mention the version number and line number. No detailed or specific technical questions please (this Committee is not a technical support organization).

Revision history

v1.0: January 7, 2005. Incomplete first draft posted for comment only. This was first generated at the suggestion of the Committee on the Protection of Human Subjects as a part of their efforts to improve the security of human subjects' identity data.

v2.0: First version we are comfortable distributing without a "draft only" disclaimer.

Copyright notice and disclaimer

Copyright © 2005, The Regents of the University of California. All rights reserved.

Permission to use or copy and distribute this document without modification for educational, research, personal, and non-profit purposes, without fee, with attribution, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following paragraph appears in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE RELIANCE ON INFORMATION IN THIS DOCUMENT.


Appendix A: Other resources

The campus has a security web site with useful information, including instructions on how to report suspected security lapses for computers connected to the campus network.

UC Berkeley System and Network Security

The following is a tutorial on security for personal computers. It is more technical than this memorandum, more detailed, and has links to additional resources:

ÊHome Network Security, CERT Coordination Center (www.cert.org/tech_tips/home_networks.html)

For Windows users, see Microsoft's site:

Protect your PC (www.microsoft.com/athome/security/protect/default.aspx).

Appendix B: Applicable University policies

There is an expanding list of policies issued by campus and systemwide authorities. These will be listed together with a brief description of their main provisions.


Appendix C: Free security software available to campus users

You can be confident of any software licensed by the campus for the campus community, including some security programs below.

Be wary of any free (not campus licensed) software, whether it is in the security category or not. Free software can be (and sometimes is) actually a malicious trojan horse intended to fool you÷its real function may be entirely different than represented. So if you are tempted to download and install any free software, always check it out first at sites like CNET (http://www.cnet.com/), PC World (http://www.pcworld.com/) or PC Magazine (http://www.pcmag.com/).Ê A category of free software that is generally safe is open-source software that is maintained by a large community of programmers and which is widely used. (The Linux operating system is an example.)

Also be careful where you obtain software, because it is possible for adversaries to modify programs that are otherwise benign. Software should be downloaded only from reputable sources like CNET (for free or open-source software) or the original vendor (for commercial software). If the download site is secure (and uses SSL) then you should be even more comfortable.

The following are security programs that the Committee has actually used. Caveat: Other than relying on the reviews on the above sites, the Committee cannot directly testify to the effectiveness of the intended security function of any of these programs.

Windows

The following have a site license for the campus community and are available at the campus software download site (http://software.berkeley.edu/).

Anti-virus and firewall. Windows XP has a built-in firewall capability, and in the latest upgrade turns on this firewall by default. However, we recommend the Symantec product for several reasons. It is more full-featured, it is integrated with virus protection and intrusion detection, and it is a two-way firewall (XP is only incoming).

"Symantec Client Security" is a full-featured anti-virus, firewall, and intrusion detection program. It also includes limited spyware detection and removal, usually not adequate by themselves. By default it automatically updates its virus definitions. (You can adjust the update schedule from "Schedule updates" on the "File" menu. A laptop may be problematic if it is not connected to the network at the scheduled times, so you may have to manually update.) The firewall is two-way, so as you are using it occasionally a window will pop up asking whether a specific program should be allowed to connect to the Internet. This is a good way to recognize spyware and trojan horse activity on your computer, and take remedial action. Allow only recognizable programs that you are currently using to access the Internet.

Secure file transfer. To transfer a file securely to another computer without explicitly encrypting it, that computer must have the appropriate file transfer server. Many server computers are able to transfer using the following program.

"SSH Secure File Transfer". You must have an account on the computer to which you are transferring the file. This program then makes use of your password for that account to encrypt the file automatically as it is transferred.

The following programs are available for free from Microsoft (http://microsoft.com).

Spyware. The spyware problem is relatively recent, so spyware detection and removal is not yet a feature of standard security suite's like Symantec's. However, there are a number of free programs like this recent addition from Microsoft and two more below.

"Microsoft AntiSpyware". This is the beta version of a new program from Microsoft, available not at download.com but here (http://www.microsoft.com/athome/security/spyware/default.mspx).

The following programs are available for free from CNET's download.com site (http://download.com).

Spyware.

"Ad-Aware SE Personal Edition" from Lavasoft.

"Spybot Search and Destroy" from PepiMK Software.

Encryption: Later versions of Windows include a built-in encryption capability that is easy to use because the encryption keys are generated automatically by Windows. However, if you suffer a computer crash it is possible to lose all your encrypted data unless your computer has been configured properly on the network (specifically to allow for recovery). Thus, we recommend this option only if it is configured by a professional system administrator.

"Cryptainer LE" from Cypherix Software. This utility can be installed by the user. It creates a "secure vault" stored on your disk as a file but appearing on Windows Explorer as a mounted disk (when you are logged in and Cryptainer is running). This is convenient, because any program can access unencrypted data, but the data is always encrypted when it touches the disk. This convenience comes at a price, because it also easy for the user to copy a file to another location, in which case an unencrypted copy is created (beware!). The program can also encrypt a single file to send as an email attachment, and the recipient can even decrypt the file without a copy of the program (you must provide the recipient with the passphrase, say over the telephone). Care must be exercised to preserve the passphrase (because it has no recovery mechanism built in)÷otherwise you can lose all your data permanently! It is recommended that the passphrase be stored in at least two secure locations. The free version has a limit of 25MB of storage in each vault, although other versions offer larger storage limits (up to 5GB) at modest cost.

Disk wiping. To completely rid your computer or a removable disk of a file, you must empty the "recycle bin" and run a disk wiping program. This capability is not built into Windows, although some utility suites (such as Norton Utilities) include this as a feature.

"Eraser" written by Garrett Trant. This utility program can wipe all the free space on your hard disks (on demand or scheduled) and it can wipe any file or folder in the process of deleting it.

Macintosh

To be added.

Linux

To be added.


Appendix D: De-identification of data sets

We illustrate how to de-identify data sets for a simple Excel worksheet. Suppose we have the following table in an Excel file:

Name

Date of birth

Tom

5/21/1945

Dick

1/4/1946

Harry

5/30/1977

 

The dates of birth are considered sensitive personal information, but the names are not, so we want to divide this file into a de-identified file and an identity file such that they can later be re-joined. We do this by adding the same randomized "identity key" to each row in the table÷this identity is a unique (but random) integer that a) can be used to reunite rows of the table back after columns are moved to separate files andÊ b) offers no clue as to the underlying identity information within the row.

ÊTo generate this identity key, we start with a table [10] containing the integers from 1 to n (where n=3 in this case, but in general n is the number of rows in the table):

1

2

3

 

This is column A, and now we add a column B where each cell contains "=rand()". This generates a set of random numbers between 0 and 1 in column B:

1

0.173540171

2

0.093795325

3

0.375212526

 

Now we select both columns of the table and sort [11] it in ascending order on column B (by selecting the two columns and using the Data->Sort menu):

2

0.691631591

1

0.226984256

3

0.725182329

 

In column A we now have a set of guaranteed-unique integers in random order, which can be used as an identity key. We copy and paste those integers into the original table twice to obtain:

 

Name

Identity key

Date of birth

Identity key

Tom

2

5/21/1945

2

Dick

1

1/4/1946

1

Harry

3

5/30/1977

3

 

Now the table can be sorted according to one of the key columns, so that the order of the rows has been randomized:

Name

Identity key

Date of birth

Identity key

Dick

1

1/4/1946

1

Tom

2

5/21/1945

2

Harry

3

5/30/1977

3

 

Finally, the table can be split into two sub-tables and stored in two files. In the de-identified file, not only have sensitive columns been removed, but for additional obfuscation the order of rows has been randomized. As long as the identity key in each sub-table is always sorted along with the other columns, we can manipulate the two files independently. To re-join the two tables, they can be concatenated after sorting each according to its respective identity key column (so that they are restored to the same, albeit random, row order).



[1] This memorandum is available here (http://www.eecs.berkeley.edu/~messer/Campus/COMP/Docs/Security-WP.pdf). Reading it online you will find a number of live hyperlinks. Do not photocopy this memorandum, unless you have verified on the web that it is the latest version (this is a topic that is forever changing).

[2] We employ this bank analogy occasionally in the following to enhance your understanding of computer security.

[3] If you have a DSL or cable modem it may have a firewall built in (you'd better check, don't assume anything). This is the best solution if you have it. Otherwise, you can configure the firewall built into the Windows XP or MacOS 10 operating system, or install and run a separate firewall program.

[4] See System and Network Security (http://security.berkeley.edu) for instructions.

[5] See the Federal Trade Commission site on identity theft here (http://www.consumer.gov/idtheft/) for more information.

[6] Examples of ways to do this include sending by registered mail or reading over the phone.

[7] Thus, measures should always be taken not only to protect the decryption key, but also to ensure that mechanisms are in place to recover it should it be lost (such as writing it down and storing in a physically secure location) or an unencrypted copy of the data can be kept in a physically secure location. In the future, the campus should provide a "secure vault" on a server for storing decryption keys and providing authorized access to them.

[8] The lock symbol indicates that the owner of the site has convinced a responsible authority of their legitimacy and obtained an appropriate credential from that authority.

[9] Support for a VPN is not currently provided on the campus network, but it is likely to be provided in the future.

[10] In case you want to add m rows to a table that already has n rows, you can start instead with the integers from n+1 to n+m.

[11] After the sort, the random numbers have actually changed (because a new set of random numbers is generated every time Excel recalculates the table, which it does after it sorts). This is not a problem, since the column of random numbers will be discarded anyway.