Science

A look at the role of science in society, and our beliefs.

Meet Wes McKinney, the man behind the most important tool in data science β€” Quartz

Pandas: Meet Wes McKinney, the man behind the most important tool in data science β€” Quartz

Perhaps more than any other person, McKinney has helped fix that problem. McKinney is the developer of “Pandas”, one of the main tools used by data analysts working in the popular programming language Python.

Millions of people around the world use Pandas. In October 2017 alone, Stack Overflow, a website for programmers, recorded 5 million visits to questions about Pandas from more than 1 million unique visitors. Data scientists at Google, Facebook, JP Morgan, and virtually every other major company that analyze data uses Pandas. Most people haven’t heard of it, but for many people who do heavy data analysis—a rapidly growing group these days—life wouldn’t be the same without it. (Pandas is open source, so it’s free to use.)

Wes McKinney. So what does Pandas do that is so valuable? I asked McKinney how he explains it to non-programmer friends. “I tell them that it enables people to analyze and work with data who are not expert computer scientists,” he says. “You still have to write code, but it’s making the code intuitive and accessible. It helps people move beyond just using Excel for data analysis.”

Basically, Pandas makes it so that data analysis tasks that would have taken?50 complex lines of code in the past now only take 5 simple lines, because McKinney already did the heavy lifting.

Often probability predictions are surprising

Often probability predictions are surprising. In the case of the coin-tossing experiment described in the puzzle, Dr. Theodore P. Hill of the Georgia Institute of Technology wrote in American Scientist, a “quite involved calculation” revealed a surprising probability. It showed, he said, that the overwhelming odds are that…
 
…in a series of 200 tosses, either heads or tails will come up six or more times in a row.

Most fakers don’t know this and avoid guessing long runs of heads or tails, which they mistakenly assume to be improbable. At just a glance, Dr. Hill can see whether or not a student’s 200 coin-toss results contain a run of six heads or tails.  If they don’t, the student is branded a fake.

Read more on http://niquette.com/puzzles/randoms.htm

Science doesn’t mandate βš—οΈ

Science doesn’t mandate βš—οΈ

Few thing annoy me more than when people use scientific evidence to say that a political policy is required. Science doesn’t give us values, it instead gives us probabilistic outcomes and evidence of what is likely to happen should any one course of action be persued. Science isn’t a moralistic cause, it’s not church but a way to gather facts to guide policy options. 

I often think per-capita statistics are misleading

I often think per-capita statistics are misleading

  1. The bias of averaging. Relatively few people in a community is likely to have an “average” experience to a problem. Often it’s handful of people contributing to the extremes, and even those closer to the middle are likely to have smaller or larger impact the statistic might suggest. Crime might be really bad in neighborhood but other neighborhoods are quite safe.
  2. The bias of noise in rural communities. Any community with a relative small population is likely to be biased by just random chance, and is not reflective of typical events happening on ground. One  random event in a community of 1,000 people is going to look a lot worse on paper then ten random  events in a community of 100,000.
  3. The bias of smoothing out the mean in urban communities. Large urban communities on paper often appear to have lower per-capita emissions or acts of violence. That’s because while there may be some randomness in data, there are just so many more people to smooth out biases in the data causing randomness.
  4. Not recognizing that one community is connected to another. Often a factory, a farm or other rural business can be producing products for an urban community, so emissions that appear on a chart for one community, might actually be attributable to another community through the consumption of their product. For example, you see this with China being the industrial exporter of the world. Or a per-capita crash statistic recorded in a rural area, even though the motorist was passing through between cities and didn’t really impact the local area.
  5. Total emissions or crimes might be within the local community’s ability to manage them. Pollution is after all emissions at a level that cause negative impacts to an environment, at levels that existing ecological services can’t absorb. Likewise, often big cities are better equipped to deal with crime, and while overall crime rates might be higher, the impact is lower to most individuals then we might think due to effective policing.

I think it’s a mistake to use per capita statistics to stigmatize or lay blame on a community. Often problems that raise to the level of community concern involve the entire community, not a single area that is easy to point fingers too, especially if it’s not our own. Dividing a total population into individual heads by math can be misleading and must be used with care. While we all like a colorful map, think about what you seeing in the map, and people it represents before coming to conclusions based on a mathematical model.