It's time to decide.
Which of these accounts should be associated with names and addresses and other data? Which accounts merit additional investigation and more intrusive scrutiny? (This might include cross-referencing the numbers against other databases, referring them to the FBI or CIA for additional investigation, or asking a supervisor to initiate the process of wiretapping the phones and listening to what the subscriber is talking about.)
You can recommend one of the following lists of numbers for additional investigation or data collection:
1. The 79 phone numbers that called the original number;
2. The 24 most important members of the 79-number set;
3. The 47,923 phone numbers that are two degrees of separation from the original;
4. The 1,250 phone numbers (out of 47,923) with the top scores for importance (but representing only 7 percent of the sum total of all importance scores in the network);
5. The 4,500 top-scoring phone numbers, representing 21.5 percent of the total;
6. The 22,500 top-scoring phone numbers, representing 92.5 percent of the total;
7. One of the color groups identified by your analysis, which range in size from hundreds to thousands of accounts (but which color to choose?);
8. One of the cliques, which limits the set to just a few hundred accounts showing high levels of activity, but incurs the risk of missing an al Qaeda operative or cell that deliberately keeps communication to a minimum.
What will you do?
You've already "touched" tens of thousands of customer records, including many belonging to American citizens. Most of those phone records have been used only to perform the necessary math to identify smaller groups that will receive more intrusive scrutiny. So far, you haven't yet even looked at an actual phone number.
But phone numbers are extremely structured data, if you do choose to look. The list of 47,923 includes thousands of phone numbers for accounts based in the United States. Thanks to the area codes, exchanges, and cell phone location metadata, you can easily click a button and get a list showing which towns are associated with the people in the whole set or any of the smaller sets.
The NSA says it has "minimization" procedures to prevent unnecessary intrusions on the privacy of American citizens, presumably by blocking the analyst's access to information on U.S. phone numbers to a greater or lesser degree. But if the list reveals a dozen well-connected phone numbers based in, say, Minneapolis, isn't that exactly the kind of thing you're supposed to detect? When does the relevance of a U.S. account outweigh privacy concerns? If it's part of the 79 original accounts, or the 22,500 most mathematically relevant?
This leads to another critical question: How much should you trust this math? You have access to multiple types of analysis; each one has strengths and weaknesses. Which ones are a good fit for this data? Are any of them?
Network analysis has proven reliability in discovering nodes that are important... to the structure of networks. But that's not necessarily the same thing as being an important or dangerous terrorist. The wider you cast the net, the greater the chance you will find yourself analyzing a social network instead of a terrorist network.
One of the most important parts of your analysis uses the duration and time of day of a call in an effort to determine which calls are more likely linked to terrorist operations. Do these criteria reflect historical trends or are they constantly updated? More importantly, have they been tested for accuracy?
Is there any way to conduct a credible test other than by saying "privacy be damned" and collecting the call content of all the people in a large sample network so you can compare the actual content of the network to your predictions? Can you trust this kind of analysis if it isn't periodically tested?
There are no clear objective answers to most of these questions. But there are factors that influence how the government chooses to answer them.
For one thing, U.S. policies are still informed by the idea that all terrorist attacks should be interdicted. A frequently expressed corollary to that premise states that, while tradeoffs against civil liberties might be bad in the abstract, those issues are meaningless when faced with a ticking time bomb.
But we don't know how many "imminent" terrorist attacks have been prevented by these techniques. Does anyone act on your analysis in real time?
Privacy aside, it's also important to keep your data focused and avoid bloat. When you start off with one seed account (the fundraiser), it's possible that investigating fewer data -- in this case the 79 accounts that contacted the fundraiser -- will produce a better result in terms of both civil liberties and counterterrorism.
But if you have multiple seeds -- say the fundraiser and his banker and four couriers they work with -- that opens the door to much stronger mathematical analysis, at the expense of exponentially increasing the number of accounts you need to analyze.