top of page

Information (& Entropy)

Entropy and Information – Can’t “Talk” About Complexity Without Them!

Entropy is a key concept in the foundational subdiscipline of thermodynamics. Information, is its Janus-like counterpart.  Information is ironically more difficult for many people to understand because it is a term that has several meanings and is used as both a colloquial term as in, “I got a lot of information from the news last night,” to highly technical as in, “information is preserved on the event horizon of a black hole."  If we parse the different meanings of "information" carefully and thoughtfully, we should come to better grips to what the different types of information refer.  First, let's look at the opposite side of the coin, "entropy."

Entropy – “The Devil?”

A common refrain in science education is that an understanding of “entropy” is the equivalent of being familiar with the works of Shakespeare for a Western liberal arts education.  The reason for entropy’s central importance is that no process (nuclear fusion, photosynthesis, metabolism, oxidation, and even thinking itself) can proceed forward without there being an accompanying increase in entropy as stated by thermodynamics second law: “The entropy of the universe increases in the course of any spontaneous change.” (1)  The second law of thermodynamics has a myriad number of other different articulations because the results of this law make themselves known in a variety of ways depending upon what is your focus.  For example, a chemist will be interested in knowing if a chemical reaction will occur spontaneously (i.e., occur without a net input of energy). If the reaction results in an increase in entropy, then the answer is, “yes.”  If there will not be an increase in entropy with the reaction, then work energy must be provided in some manner for the reaction to occur – and the compensating increase in entropy occurs from where the work energy originates.  A mechanical engineer on the other hand, might be more interested in the second law’s assertion that some energy involved in any process will not be available to do work, but will be irretrievably lost to increased entropy – typically in the form of heat. She will then try to design a machine that maximizes the amount of energy that is available for work while minimizing that lost to heat so that it is more efficient. 

So far, I have just stated that for any process to occur, there must be an increase in entropy, but I have not told you what entropy is. Commonly, entropy is described as the inexorable tendency for things to become more disordered. One of the classic examples is that a broken egg has a higher degree of entropy than an intact egg, i.e., it is more disordered. To be more rigorous for our purposes, however, a more accurate definition is: “entropy” is the logarithm (log) of the number of possible microstates that constitutes a system’s particular macrostate as described by the equation: S = k log W where “S” is entropy, “k” is Boltzmann’s constant in the units joules per degree Kelvin, and “W” is the number of microstates for a system’s macrostate. (2).   This definition and equation, which was first described in the latter 1800’s by the Austrian physicist Ludwig Boltzmann (1844-1906) and elaborated further by the American physicist Willard Gibbs (1839-1903), might sound obtuse. Whoa! I thought that you were going to make things easy to understand.

 

Fair enough.  Let's use a hypothetical teen’s bedroom to explain what entropy means:

As a metaphor, we will state that a teen’s bedroom represents a system.  Its overall condition in turn, represents its overall condition or "macrostate."  The furniture, apparel, garbage, and other articles metaphorically represents its “microscopic” constituents, and one particular arrangement of these articles represents one very particular condition or "microstate."  When the bedroom is in a macrostate of “tidiness,” its articles are all placed where they “should be,” including the dressers, bed, desk, lamp, apparel, and any garbage placed in the trash can.  Importantly, there are many possible microstates where the room’s macrostate is “tidy” because the various articles can be moved around or rearranged to a certain degree and still not be a mess. For example, the room is still tidy if the socks are placed neatly together, but in a different drawer, the bed has been moved a couple of centimeters, the garbage is in the trash but in a different arrangement, and so on.  In the end, there are numerous possible microstates where the room has a macrostate of “tidiness” (our “W” variable in the equation).  However, the number of possible microstates where the articles are disordered or the room is “messy,” is much, much larger. (“W” increases, and so does entropy, which is represented as “S” in the equation.)  In fact, it is more likely that the universe will expire trillions of years in the future before a messy room would just by chance become tidy again!   Because a tidy room has comparatively few microstates, it could also be described as being in a low entropy state, whereas a messy room is in a high entropy state.  A room unfortuitously destroyed by a tornado would be in a state of maximal entropy.  (Note: the “k” in the equation is just a constant that is needed to add units of measure and for some technical reasons that are not important for our discussion.)

An important visual-conceptual tool and concept that will be relevant for our ongoing discussion is the idea of “phase space.”  A phase space is an abstract area, whose size is proportional to the number of possible microstates that constitute a system’s particular macrostate.  Figure 1 represents the hypothetical phase space of a tidy room versus a messy room, where some point in each phase space would in turn represent a very particular arrangement of the room’s articles.  Note: If drawn to scale, the phase space of a messy room compared to a tidy room would be much larger in area than would fit on a sheet a paper.  The phase space of a room destroyed by a tornado would be many magnitudes larger still.

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1. The abstract phase space of a “tidy room” (low entropy) is much smaller than the phase space of a messy room (higher entropy). One particular arrangement of the room’s articles is represented by one point within their respective phase spaces. Note: the phase spaces are not drawn to scale. The phase space of a messy room would be much, much larger in comparison to the phase space of a tidy room.

Of course, early physicists who pondered upon the nature of entropy did not think in terms of a tidy versus messy bedroom. Instead, in a more typical physics example, a “system” might be a one liter container of air molecules whose macrostate is described as having one atmospheric pressure and a temperature of 20 degrees Celsius. That macrostate is in turn physically determined by a particular range of locations, density, and velocities of the container’s microscopic air molecules.

Norbert Wiener, the 20th century mathematician of “cybernetics” fame, was perhaps the first to see that entropy is a correlate for the “bad,” and that “information” is a correlate for the “good,” when he remarked in his 1954 book, “The Human Use of Human Beings” that:

The scientist is always working to discover the order and organization of the universe, and is thus playing a game against the arch enemy, disorganization. Is this devil Manichaean or Augustininian? (3)

 

We will look at how Wiener arrived at the conclusion that a basic thermodynamic property like entropy can be linked to ethical values in the section on “Complex-Information” Ethics theory. Of course, Wiener also states that “information” or order is the obverse of entropy, which implies that it is a force for the good.How is it that information and order are equivalent? Before we also delve into that conclusion, let us look briefly at the nature of information itself.

 

Information about Information

To make the claim that information is the equivalent of order and forms the basis of the “good,” we need to first look carefully at the fundamental nature of information and its major variants.  (I offered a deeper analysis of the nature of information and how it has morphed or “unfolded” over the course of the history of the universe in the Journal of Big History at:  https://jbh.journals.villanova.edu/index.php/JBH/article/view/2254/2099 ). Terrence Deacon, a neuro-anthropologist at the University of California - Berkley, importantly pointed out that much of the confusion regarding the nature of information is because: “This term is used to talk about a number of different kinds of relationship, and often interchangeably without discerning between them.” (4)  I concur with his assessment as well as his way of parsing the main types of information used in everyday discourse: 1. syntactical, 2. semantical, and 3. pragmatic (a.k.a., “surprise” or “novel”). 

Syntactical Information

The terms “syntax” and “syntactical” are used most often in the context of grammar where it refers to how words are ordered in a language. The dictionary definition of “syntax,” however, is not restricted to language, but refers to how things in general are ordered. Similarly, syntactical information is concerned with how constituents are ordered, or more generally stated, how they are in relationship to each other.  Syntactical information also forms the basis for the other types of information that will be described later.  Not everyone who contemplates the underlying nature of information concurs with Norbert Wiener that information is fundamentally a measure of order or, more broadly, relationships. He and I have good company, however. That company includes Benjamin Schumacher, a physicist, quantum information authority, and protégé of the late famous physicist John Wheeler (1911-2008), Luciano Floridi, who is a leading professor of “Information Philosophy” at Oxford University, and Terrence Deacon, the neuroanthropologist noted above. (5,6,7)

A simple, hypothetical example can illustrate why relationships between things is synonymous with information: Imagine that you want to someone about a particle’s location in otherwise empty spacetime.Unless that location is relative to something else, you cannot give them that information. You must give its physical location XA, YA, ZA in relation to something else, such as the center or some boundary of that space, its position relative to particle “B,” or something mundane like: 3 blocks west, 2 blocks south, and 5 stories above the street level of the Chrysler building in New York City. In fact, without particle “A” being in relation to something else, you cannot even determine if it is moving or not – it moves only in relation to something else. Similarly, with regards to informing someone about a moment in time, you need to state it is in relation to another event in time such as when Rome was legendarily founded, when Jesus Christ was believed to have been born, or perhaps when the sun reaches its zenith in the sky on the longest day of the year.

 

The attributes of syntactical information were more deeply understood after the 1948 publication of a seminal paper by the mathematician-engineer, Claude Shannon (1916-2001), “A Mathematical Theory of Communication.”(8)Working on a task assigned to him by his employer, Bell Labs, Shannon quickly understood that an engineer only needed to worry about the syntactical information (i.e., the ordered signals comprising a message) that was transmitted, and not the meaning of the information.His paper is widely considered by scientists, engineers, and science historians to be one of the most important of the 20th century because it characterized various aspects of syntactical information that were so instrumental as to help to usher in the “information age.”*His information theory also introduced the use of “binary digits” (1’s and 0’s) for transmitting and processing information, the concept of “signal to noise ratios,” the utility of mathematical Boolean logic for computer operations, and many other insights that resonate deeply in the information technologies of today.

 

*Side note: the “transistor” was first also developed and demonstrated at Bell Labs in December, 1947 by physicists John Bardeen (1908-1991), William Shockley (1910-1989), and Walter Brattain (1902-1987). Hence, the “digital information age” arguably began at the very end of 1947 to 1948 at Bell Labs in Murray Hill, New Jersey . (9) Few, if any significant eras’ origins can be pinpointed in location and time so precisely!

 

Most importantly for our purposes Shannon also determined that syntactical information can be mathematically measured in units that he called bits – a contraction of binary digit.The amount of syntactical information of a message is determined by the formula H = -k log2 M, where “H” is the amount of information in bits, “k” is a constant, and “M” is the number of possible messages. (Note: in this formula, each message has equal probability of occurring. * Measuring information where messages have varying probabilities has a slightly more complicated formula, but that does not change the following discussion in any substantial way.)

This mathematical equation also unveiled syntactical information’s relationship to thermodynamics’ entropy, whose formula, S = k log W, was noted earlier. As you can see, the equations are the same in form except for the negative sign. This difference importantly reveals that information is mathematically the antithesis of entropy, as well as conceptually.   The ramifications of the Janus-like nature of entropy and information has been born out in other ways and fields as well – too many in fact for me to recount for this website.

*Note: This formula measures the information in bits where there is an equal probablility of any number of messages occurring (like the throw of a dice). In the paper I wrote for the Journal of Big History and I referred to earlier.  I erred in stating that the formula worked only for two equally likely message. 

Semantic Information

Semantic information is usually defined as information that has meaning or purpose to an agent.  Semantic information, therefore, is syntactical information that is dependent on apprehension, processing, and interpretation by an agency for its realization.  It is typically difficult for us to mathematically quantify semantic information, except for the subtype called informational “surprise” that will be discussed in the next section. Another semi-quantitative exception is semantics with which we can add qualifiers like “incredible,” “deadly,” “life-sustaining,” or other adjectives that belie the relative importance of the message.  Nevertheless, there is not currently, and perhaps never will be a mathematical equation to determine the “degree” of semantical information present. For example, the novel Moby Dick might quantitatively and qualitatively have a much greater amount of semantic information than the back of a cereal box, but we are not able to attach a derived number of bits of semantic informational content to either.

Although semantic information has meaning or purpose for an agent, it need not rise to the level of awareness for an organism. Simple life forms like bacteria might chemically sense nutrients in one direction and a noxious substance in another direction. Through a series of complicated, but hypothetically traceable chemical reactions that ends in the movement of its flagella or cilia, it would then move towards a nutrient (meaning = sustenance), and away from the noxious substance (meaning = danger). Even for higher organisms with advanced brains that are capable of awareness, there is a great amount of information that has semantic content, but of which the agent is not aware, or of which they could be aware, but ignore.  In the former situation, the usually subconscious act of breathing alone is driven by information with semantic content, whether that might be the blood’s pH, carbon dioxide or oxygen levels to name a few important determinants for driving our respiration.  In the case of respiration, higher organism can even decide whether to pay attention to the “drivers” of respiration for any variety of reasons.

For organisms with nervous systems, and especially those with advanced central nervous systems (i.e., brains) like dolphins or humans, how syntactical information is processed so that we are aware of that meaning, that the information has various subjective qualities, can be useful for planning, etc., remains a daunting, and for the foreseeable future, an impregnable challenge to understand on the level of physics, chemistry, and the biological sciences. The philosopher David Chalmers even labels this “the hard problem,” because science does not have the tools, methods, or even a hypothetical basis on which it can explain how matter/energy can eventually manifest these and other higher mental phenomena like consciousness, abstract thinking, and I think he would include, moral decision making. (10).  For problems like the apprehension of semantic information and consciousness, known physics can only provide us with some boundaries or necessary conditions but is insufficient to fully explain how these phenomena become manifest. In other words, although thus far no living processes have been demonstrated to conflict with physics, we are still far short of explaining higher living processes at the fundamental levels of the relevant sciences.

Surprise or Novel Information

Claude Shannon, the founder of information theory, believed that it was unlikely that there would be a consensus on the real meaning of “information.”  In that regards, he was technically correct. There is still no universal agreement on what information ultimately is, although I and others assert that it is fundamentally the relationships between things or relata as explained above.  Shannon’s own stated belief about information’s character was that it was “that which reduces uncertainty” and called this reduction to be the “surprise” of information.*  Deacon prefers to call this kind of information “pragmatic,” whereas I prefer the adjective “novel” because at times this kind of information is neither a surprise nor pragmatic but is simply new.  For example, if you had estimated that a container had 360,000 grains of sand, but after careful counting determine that that there are 360,781 grains, the information is neither pragmatic or a surprise, just new or novel.   Novel information is an important subset of semantic information because its meaning or purpose is to provide an agent with new relata about the world.  Importantly, novel information can often be quantified as noted below, which is not typical for the broader category of semantic information of which it is a small part.

 

To illustrate how a message can have a novelty (or surprise) that is both demonstrable and measurable, we can use the American historical example of how Paul Revere and other riders learned how the British troops were going to travel to Lexington: one lantern was to be lit in the North church tower if they were traveling by land, two if by sea. When they saw two lanterns lit, the informational surprise of how the troops were going to travel was reduced by 50 per cent – the same as learning which side of coin lands up.  Of course, many examples are more complicated than this simple one and can require a bit more math to determine the message’s surprise. The simplest version of the surprise of a message is: s(x) = log2 [1/p(x)], where s(x) is the surprise of a message as measured in bits, and p(x) is the probability of each message.  In the North Church tower case, s(x) = log2 1/ ½ = log2 2 = 1 bit.  Hence, Paul Revere and his fellow riders gained 1 bit of information when they saw the two lanterns in the tower, or expressed in another manner, had their uncertainty reduced by 1 bit.  If the probability of a particular message decreases, then learning what the message is in the end similarly increases its surprise.  For example, burglar alarms are quiet the vast majority of time. If the alarm sounded off for one minute only once every 10 years, then the surprise of it going off would be s(x) = log2 1/ 1/5,256,00 or log2 5,256,000 ≈ 22.3 bits.  The number of bits might seem small given the intuitively large amount of surprise that would occur if a fire alarm sounded, but logarithms have a way of making even the largest of numbers more “manageable.”

*Relationships that are known can be viewed as an arrangement that reduces the uncertainty of how things are extant relative to other things, and, hence is but another result of what occurs when more information is known about a system – the more you know its relationships, the less the uncertainty. Obversely, increased entropy results in a diminution of relationships and an increase in uncertainty about them.

  1. Atkins P (2010), The Laws of Thermodynamics – A Short Introduction, Oxford University Press, p49)

  2. Atkins P (2010), p54

  3. Wiener, N (1954), “The Human Use of Human Beings – Cybernetics and Society,” Da Capo Press, p34

  4. Deacon, T., (2011), “What is Missing from Theories of Information?” Cambridge, England, Information and the Nature of Reality, Cambridge University Press, p152. 

  5. Schumacher, B. (2015) “The Science of Information,” The Great Courses Lecture Transcript, p186

  6. Floridi F., “Semantic Conceptions of Information,” The Stanford Encyclopedia of Philosophy web site, https://plato.stanford.edu/entries/information-semantic/ accessed 1/8/21.

  7. Deacon, T., (2011), “What is Missing from Theories of Information?” Cambridge, England, Information and the Nature of Reality, Cambridge University Press, p159.

  8. Shannon C (1948), “ A Mathematical Theory of Communication,” The Bell System Technical Journal, American Telephone and Telegraph, http://www.essrl.wustl.edu/~jao/itrg/shannon.pdf  , accessed 1/20/21

  9. (https://en.wikipedia.org/wiki/Bell_Labs, accessed 1/20/21.

  10. (Chalmers, David (1995). "Facing up to the problem of consciousness" , Journal of Consciousness Studies. 2 (3): 200–219.). 

CI theory fig 1.gif
Above the Clouds
bottom of page