Eqs

Thursday, April 25, 2013

What is Information? (Part I: The Eye of the Beholder)


Information is a central concept in our daily life. We rely on information in order to make sense of the world: to make "informed" decisions. We use information technology in our daily interactions with people and machines. Even though most people are perfectly comfortable with their day-to-day understanding of information, the precise definition of information, along with its properties and consequences, is not always as well understood. I want to argue in this series of blog posts that a precise understanding of the concept of information is crucial to a number of scientific disciplines. Conversely, a vague understanding of the concept can lead to profound misunderstandings, within daily life and within the technical scientific literature.  My purpose is to introduce the concept of informationmathematically defined—to a broader audience, with the express intent of eliminating a number of common misconceptions that have plagued the progress of information science in different fields.

What is information? Simply put, information is that which allows you (who is in possession of that information) to make predictions with accuracy better than chance. Even though the former sentence appears glib, it captures the concept of information fairly succinctly. But the concepts introduced in this sentence need to be clarified. What do I mean with prediction? What is "accuracy better than chance"? Predictions of what? 

We all understand that information is useful. When is the last time that you have found information to be counterproductive? Perhaps it was the last time you watched the News. I will argue that, when you thought that the information you were given was not useful, then what you were exposed to was most likely not information. That stuff, instead, was mostly entropy (with a little bit of information thrown in here or there). Entropy, in case you have not yet come across the term,  is just a word we use to quantify how much you don't know. Actually, how much anybody doesn't know. (I'm not just picking on you).

But, isn't entropy the same as information?

One of the objectives of these posts is to make the distinction between the two as clear as I can. Information and entropy are two very different objects. They may have been used synonymously (even by Claude Shannon—the father of information theory—thus being responsible in part for a persistent myth) but they are fundamentally different. If the only thing you will take away from this article is your appreciation of the difference between entropy and information, then I will have succeeded.

But let us go back to our colloquial description of what information is, in terms of predictions. "Predictions of what"? you should ask. Well, in general, when we make predictions, it is about a system that we don't already know. In other words, an other system. This other system can be anything: the stock market, a book, the behavior of another person. But I've told you that we will make the concept of information mathematically precise. In that case, I have to specify this "other system" as precisely as I can. I have to specify, in particular, which states the system can take on. This is, in most cases, not particularly difficult. If I'm interested in quantifying how much I don't know about a phone book, say, I just need to tell you the number of phone numbers in it. Or, let's take a more familiar example (as phone books may appeal, conceptually, only to the older crowd among us), such as the six-sided fair die. What I don't know about this system is which side is going to be up when I throw it next. What you do know is that it has six sides. How much don't you know about this die? The answer is not six. This is because information (or the lack thereof) is not defined in terms of the number of unknown states. Rather, it is given by the logarithm of the number of unknown states. 

"Why on Earth introduce that complication?", you ask.

Well, think of it this way. Let's quantify your uncertainty (that is, how much you don't know) about a system (System One) by the number of states it can be in. Say this is \(N_1\). Imagine that there is another system (System Two), and that one can be in \(N_2\) different states. How many states can the joint system (System One And Two Combined) be in? Well, for each state of System One, there can be \(N_2\) number of states. So the total number of states of the joint system must be \(N_1\times N_2\). But our uncertainty about the joint system is not \(N_1\times N_2\). Our uncertainty adds, it does not multiply. And fortunately the logarithm is that one function where the log of a product of elements is the sum of the logs of the elements. So, the uncertainty about the system \(N_1\times N_2\) is the logarithm of the number of states
$$H(N_1N_2)=\log(N_1N_2)=\log(N_1) + \log(N_2).$$
I had to assume here that you knew about the properties of the log function. If this is a problem for you, please consult Wikipedia and continue after you digested that content.

Phew, I'm glad we got this out of the way. But, we were talking about a six-sided die. You know, the type you've known all your life. What you don't know about the state of this die (your uncertainty) before throwing it is \(\log 6\). When you peek at the number that came up, you have reduced your uncertainty (about the outcome of this throw) to zero. This is because you made a perfect measurement. (In an imperfect measurement, you only got a glimpse of the surface that rules out a "1" and a "2", say.) 

What if the die wasn't fair? Well that complicates things. Let us for the sake of the argument assume that the die is so unfair that one of the six sides (say, the "six") can never be up. You might argue that the a priori uncertainty of the die (the uncertainty before measurement) should now be \(\log 5\), because only five of the states can be the outcome of the measurement. But how are you supposed to know this? You were not told that the die is unfair in this manner, so as far as you are concerned, your uncertainty is still \(\log 6\). 

Absurd, you say? You say that the entropy of the die is whatever it is, and does not depend on the state of the observer? Well I'm here to say that if you think that, then you are mistaken. Physical objects do not have an intrinsic uncertainty. I can easily convince you of that. You say the fair die has an entropy of \(\log 6\)? Let's look at an even more simple object: the fair coin. Its entropy is \(\log 2\), right? What if I told you that I'm playing a somewhat different game, one where I'm not just counting whether the coin comes up heads to tails, but am also counting the angle that the face has made with a line that points towards True North. And in my game, I allow four different quadrants, like so:


Suddenly, the coin has \(2\times4\) possible states, just because I told you that in my game the angle that the face makes with respect to a circle divided into 4 quadrants is interesting to me. It's the same coin, but I decided to measure something that is actually measurable (because the coin's faces can be in different orientation, as opposed to, say, a coin with a plain face but two differently colored sides). And you immediately realize that I could have divided the circle into as many quadrants as I can possibly resolve by eye. 

Alright fine, you say, so the entropy is \(\log(2\times N)\) where \(N\) is the number of resolvable angles. But you know, what is resolvable really depends on the measurement device you are going to use. If you use a microscope instead of your eyes, you could probably resolve many more states. Actually, let's follow this train of thought. Let's imagine I have a very sensitive thermometer that can sense the temperature of the coin. When throwing it high, the energy the coin absorbs when hitting the surface will raise the temperature of the coin slightly, compared to one that was tossed gently. If I so choose, I could include this temperature as another characteristic, and now the entropy is \(\log(2\times N\times M)\), where \(M\) is the number of different temperatures that can be reliably measured by the device. And you know that I can drive this to the absurd, by deciding to consider the excitation states of the molecules that compose the coin, or of the atoms composing the molecules, or nuclei, the nucleons, the quarks and gluons? 

The entropy of a physical object, it dawns on you, is not defined unless you tell me which degrees of freedom are important to you. In other words, it is defined by the number of states that can be resolved by the measurement that you are going to be using to determine the state of the physical object. If it is heads or tails that counts for you, then \(\log 2\) is your uncertainty. If you play the "4-quadrant" game, the entropy of the coin is \(\log 8\), and so on. Which brings us back to six-sided die that has been mysteriously manipulated to never land on "six". You (who do not know about this mischievous machination) expect six possible states, so this dictates your uncertainty. Incidentally, how do you even know the die has six sides it can land on? You know this from experience with dice, and having looked at the die you are about to throw. This knowledge allowed you to quantify your a priori uncertainty in the first place. 

Now, you start throwing this weighted die, and after about twenty throws or so without a "six" turning up, you start to become suspicious. You write down the results of a longer set of trials, and note this curious pattern of "six" never showing up, but the other five outcomes with roughly equal frequency. What happens now is that you adjust your expectation. You now hypothesize that it is a weighted die with five equally likely outcome, and one that never occurs. Now your expected uncertainty is \(\log 5\). (Of course, you can't be 100% sure.)

But you did learn something through all these measurements. You gained information. How much? Easy! It's the difference between your uncertainty before you started to be suspicious, minus the uncertainty after it dawned on you. The information you gained is just \(\log 6-\log5\). How much is that? Well, you can calculate it yourself. You didn't give me the base of the logarithm you say? 

Well, that's true. Without specifying the logarithm's base, the information gained is not specified. It does not matter which base you choose: each base just gives units to your information gain. It's kind of like asking how much you weigh. Well, my weight is one thing. The number I give you depends on whether you want to know it in kilograms, or pounds. Or stones, for all it matters.

If you choose the base of the logarithm to be 2, well then your units will be called "bits" (which is what we all use in information theory land). But you may choose the Eulerian e as your base. That makes your logarithms "natural", but your units of information (or entropy, for that matter) will be called "nats".  You can define other units (and we may get to that), but we'll keep it at that for the moment. 

So, if you choose base 2 (bits), your information gain is \(\log_2(6/5)\approx 0.263\) bits. That may not sound like much, but in a Vegas-type setting this gain of information might be worth, well, a lot. Information that you have (and those you play with do not) can be moderately valuable (for example, in a stock market setting), or it could mean the difference between life and death (in a predator/prey setting). In any case, we should value information.  

As an aside, this little example where we used a series of experiments to "inform" us that one of the six sides of the die will not, in all likelihood, ever show up, should have convinced you that we can never know the actual uncertainty that we have about any physical object, unless the statistics of the possible measurement outcomes of that physical object are for some reason known with infinite precision (which you cannot attain in a finite lifetime). It is for that reason that I suggest to the reader to give up thinking about the uncertainty of any physical object, and be only concerned with differences between uncertainties (before and after a measurement, for example). 

The uncertainties themselves we call entropy. Differences between entropies (for example before and after a measurement) are called information. Information, you see, is real. Entropy on the other hand: in the eye of the beholder.

In this series on the nature of information, I expect the next posts to feature more conventional definitions of  entropy and information—meaning, those that Claude Shannon has introduced—(with some examples from physics and biology), then moving on to communication, and the concept of the channel capacity.

Part 2: The Things We Know


Sunday, April 7, 2013

The evolution of the circle of empathy

What is the circle of empathy? Empathy, as we all know, is the capacity to feel (or at least recognize) emotions in other entities that have emotions. Many people believe that this capacity is in fact shared by many types of animals. The "circle of empathy" is  a boundary within which each individual places the things he or she empathizes with. Usually, this only includes people and possibly certain animals, but is unlikely to include inanimate objects, and very rarely plants or microbes. This circle is intensely personal, however. (Psychopaths, for example, seem to have no circle of empathy whatsoever.) Incidentally, I thought I had invented the term, but it turns out that Jaron Lanier has used it before me in a similar fashion, as has the bioethicist Peter Singer. What I would like to discuss here is the evolution of our circle of empathy over time, what this trend says about us, and think about where this might lead us in the long run.

When we go way, way back in time, life was different. There wasn't what we now call "society", or even "civilization". There were people, animals, and plants. And there was the sun rising predictably in the morning, and setting in the evening just as expected. But everything else was less predictable. Life was "fraught with perils" (as a lazy writer would write). Life was uncertain. What is the best mode of survival in this world?

"Trust no-one", the X-files may exhort you, but in truth, you've got to trust somebody. The life (and survival) of the Lone Ranger is not predicated on loneliness; he too must rely on the kindness of strangers and companions.  Life is more predictable when you can trust. But who do you trust, then? Of course, you trust family first: this is the primal empathic circle: you feel for your family, and expect they feel for you. Emotions are almost sure-fire guarantors of behavior. From this point of view, emotions protect, and make life a little more predictable. 

As we evolve, we learn that expanding the circle of empathy is beneficial. When it comes to protecting the family, as well as the things we have gained, it is beneficial to gang up with those that have an equal amount to lose. "Let us forge a band of brothers that is not strictly limited to brothers and sisters; we who defend the same stake, let us stand as one against those that thrive to tear us down".

Thus, through ongoing conflicts, new bonds are forged. We may not be related in the familiar manner, but we are alike, and our costs and benefits are aligned. The circle of empathy has widened. 

Time, relentlessly, goes on. And the circle of empathy inevitably widens (on average). Yes, don't get me wrong, I fully understand that human history is nothing but a wildly careening battle between the forces that compel us to love our fellow man, and the urge to destroy those who are perceived to interfere with our plans of advancement. Throughout history, the circle of empathy may widen for a while, then restrict. People perceived to be different  (often, in fact, perceived as inferior) may be admitted to the circle for some (sometimes even most), but just as often dismissed. Yet, over time, the circle appears to inexorably widen. 

There is no doubt about this trend, really.  From the family, the circle expanded to encompass the clans that were probably closely related. From those, the circle expanded to cities, city states, and finally countries. At this point it was just a matter of time until humans expressed their empathy with respect to all humankind. "We are all one", the idealist would invariably exclaim (mindful that not everyone on Earth has evolved to be quite as magnificent, or magnanimous).  Our many differences aside, the widening of the circle of empathy is palpable. The tragedy of September 11th 2001, for example, was genuinely felt to be a tragedy by the majority of people on the globe. 

It is also clear that the evolution of the circle's radius proceeds by a widening in a few individuals first, who then spend a good portion of their lives convincing their fellow humans that they ought to widen their circles just as much. Civil rights struggles and equal rights campaigns can be subsumed this way. Anti-abortion crusaders would like everyone to include the unborn fetus into their circle of empathy. Many vegetarians have chosen not to eat meat for the simple reason that they have included all animals within their circle of empathy.

Given that the dynamics of the widening of the circle on average is driven by a few pioneers who widened theirs ahead of everyone else, how far should we expect to widen our own circles? For example, I am not a vegetarian. I do empathize with animals, but like most people I know, my empathy has its limits. I generally do not kill animals, but when insects find their way into my house I consider that a territorial transgression. Given the nervous system of most insects, it is unlikely that they perceive pain in any manner comparable with how we perceive it.

And this is probably the line of empathy that will most likely be drawn by the majority of people at some point in the future: if animals can perceive pain just as we do, then we are likely to include them into our circle. The more complex they are cognitively, the more likely we would have them in our circle.

The trouble is, the cognitive complexity of animals isn't easily accessible to us. We empathize with the great apes (the group of primates that, besides the gorilla, chimpanzee, and organgutan, also includes us) in part because they are so similar to us. But cetaceans (the group of animals that includes whales and dolphins) have at least as complex a cognitive system as the great apes, but appear on far fewer people's radar.


Bottlenose dolphin. (Source: Wikimedia)

The neuroscientist Lori Marino, for example (who together with Diana Reiss first published evidence that bottlenose dolphins can recognize themselves in a mirror) has been pushing for the ethical treatment of cetaceans (and therefore for a widening of our circle of empathy to include cetaceans) using scientific arguments based both on behavioral as well as neuroanatomical evidence. She (as well as people like the lawyer Steven Wise) have been pushing for "non-human legal rights" for certain groups of animals, thus enshrining the widened circle into law. From this point of view, the recent analysis of the methods used by Japanese dolphin hunters to round up and kill dolphins is another stark reminder of how different the radius of the circle can be among fellow humans (and how culture and ethnic heritage affects it).

All this leads me back to a thought I have touched upon in a previous post: if higher cognitive capacities are associated with things we call "consciousness" and "self-awareness", perhaps we need to be able to better capture them mathematically, and therefore ultimately make them measurable. If we were to achieve this, then we may end up with a scale that gives us clear guidelines on what the radius of our circle of empathy should be, rather than waiting for more enlightened people to show us the path.

It is unlikely that this circle will encompass plants, microbes, or even insects. But there are surely animals out there who, on account of their not being able to talk to us, have been suffering tremendously. Looking at this from the vantage point of our future more enlightened selves, we should really figure out how to draw the line, somehow, sooner better than later. I don't know where that line is, but I'm pretty sure that my line will evolve in time, and yours will too.