US states and words with no letters in common

“Ohio is the only state not to share a letter with the word mackerel” my buddy tells me.

Of course, I need to find out what other words don’t share any letters with states. So I start hunting for a large English dictionary. It turns out the Brown corpus, included in NLTK has ~1 million words. So I whip up a python script to figure out what state-word combos pass. The code lives here.

Spoiler alert: apparently there’s 38969. At least. That’s just words from the particular corpus I used. Ohio is the most popular (least popular?) state, clocking in at 1085. Next closest is Mississippi, at 678. Third is Alabama with 599. Ohio doesn’t have any A’s or E’s, so that helps.

NBA winning percentages

During a Milwaukee Bucks broadcast on February 11, John McGlocklin mentioned something to the effect of “the first team to get to 100 points usually wins.” Intuitively, this makes some sense. If you’re the first team to a given score, you have the lead at that point. 100 points is usually a late-game score, and having the lead near the end of the game makes it more likely you are going to win.

I was curious as to whether the data backed this up. I collected play-by-play scoring data for the 2013-2014 season, and ran some analysis on it. First, the number of times the first team to X points won or lost:
point totals

That’s a little hard to read in the higher-scoring (interesting) portion of the graph. Here’s the winning percentages plotted as a function of score:
point ratios

Some notes:

  • There were 1319 games played, including playoffs.
  • Teams win at a .932 clip for scoring 100 first. Not a bad rule of thumb!
  • On the other hand: at no point having the lead gives you a sub-.500 winning percentage. So you could also say “the first team to 1 point usually wins”.
  • The worst is 1 or 2 points, yielding wins at .547. Even getting to 3 first improves your chances a bit, an extra percent and a half, all the way up to .562.
  • The highest point total 145. The Rockets beat Lakers on the road on April 8. The game didn’t even go to overtime!
  • There is a curious bump in the ratio at 121. This is an artifact of not many teams scoring that many and still losing - only 3 to 5 losses in that range. Makes for a noisy signal.
  • No team scoring 128 or more lost.

The Bucks were first to 100 that night, and beat the Kings 111-103 :)

Thanks to basketball-reference.com for the play-by-play data.

Installing OpenCV in a virtualenv

There are a handful of answers strewn across the internet about installing OpenCV’s Python bindings, but none of them seem to apply to installing them in a virtualenv in Linux. In the interest of collecting all that information in one place, here’s what I did to get it running. I’m using:

  • Ubuntu 14.04 64 bit
  • Python 2.7.6
  • OpenCV 2.4.9.0

This will install OpenCV, the Python bindings, and Numpy system-wide, but afterward you will be able to use them inside a virtualenv. This assumes you have pip, virtualenv, and virtualenvwrapper installed and properly configured. If you aren’t familiar with these, Googling yields many resources, for example this tutorial.

First, install OpenCV’s dependencies, per the installation instructions. (Some of these were pre-installed on my system)

$ sudo apt-get install build-essential cmake libgtk2.0-dev pkg-config \
python-dev libavcodec-dev libavformat-dev libswscale-dev

The next part had me tripped up for a little bit. OpenCV doesn’t play particularly well with virtualenvs, so numpy needs to be installed on the system Python:

$ sudo pip install numpy

After that, continue to build OpenCV per the instructions. Download the source (I’m using version 2.4.9.0, from here and unzip it in the directory of your choice.

$ unzip opencv-2.4.9.zip
$ cd opencv-2.4.9
$ mkdir build
$ cd build

Configure the make files using cmake. There is a flag required for the Python bindings that I couldn’t find in the official documentation, only in StackOverflow questions: BUILD_NEW_PYTHON_SUPPORT. Also note the two trailing periods.

$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local \
-D BUILD_NEW_PYTHON_SUPPORT=ON ..

This will output a lot of text, but if you scroll up you should find a section referring to Python. It will refer to the system Python binary. This is fine. We will set up our virtualenv later.

After the build is configured, time to make the project. This will take a few minutes to run.

$ make
$ sudo make install

After the build completes, you need to set up your virutalenv if you haven’t already. Numpy also needs to be installed in the virtualenv.

$ mkvirutalenv opencv
[...]
(opencv) $ pip install numpy

Now that our virtualenv is ready to go, we just need to copy the OpenCV binary into the virtualenv’s site-packages directory. It should be somewhere under the location you installed it, mine was in /usr/local/lib/python2.7/dist-packages/cv2.so

(opencv) $ cd lib
(opencv) $ cp /usr/local/lib/python2.7/dist-packages/cv2.so ~/.virtualenvs/opencv/lib/python2.7/site-packages

You can now delete the opencv-2.4.9 directory if you want. The reason we don’t want the cv2.so that got built there is the libraries it’s linked to. If you run

$ ldd cv2.so

you’ll get something like

libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007fe78eca2000)
libopencv_core.so.2.4 (0x00007fe78e7f9000)
libopencv_flann.so.2.4 (0x00007fe78e585000)
libopencv_imgproc.so.2.4 (0x00007fe78e096000)
(...)

See that first one with an absolute path (/usr/lib/…)? The other ones are linked to copies in the build directory so if you delete them, cv2.so won’t have its full feature set, even though it may import correctly.

$ ldd /usr/local/lib/python2.7/dist-packages/cv2.so
libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f9f27e63000)
libopencv_core.so.2.4 => /usr/local/lib/libopencv_core.so.2.4 (0x00007f9f279bb000)
libopencv_flann.so.2.4 => /usr/local/lib/libopencv_flann.so.2.4 (0x00007f9f27746000)
libopencv_imgproc.so.2.4 => /usr/local/lib/libopencv_imgproc.so.2.4 (0x00007f9f27257000)
libopencv_highgui.so.2.4 => /usr/local/lib/libopencv_highgui.so.2.4 (0x00007f9f26e1e000)
(...)

Ahh, much better.

Everything should be good to go:

(opencv) $ python
>>> import cv2
>>>

If you get

ImportError: no module named cv2

double check you had the BUILD_NEW_PYTHON_SUPPORT flag set, and that numpy is installed for the system Python. If you get the error message

ImportError: numpy.core.multiarray failed to import

you need to install numpy in your virtualenv. From inside the virtualenv,

(opencv) $ pip install numpy

should fix that. I believe that should do the trick.

Good luck with your computer vision!

New host

Fox row was down for a while due to some issues with the old web host. It’s on a new and improved hosting service now, so there shouldn’t be any more problems.

Sweeping the 1-seeds

The Bracket is out, and once Virginia was announced as the final 1-seed, I realized Wisconsin may be in a unique position. The Badgers beat both Florida and Virginia during the regular season, and have the possibility to beat both Arizona and Wichita State in the tournament. If they do so, they will have beaten all four 1-seeds this season. I had to find out if anyone ever has. There have been 116 1-seeds since the field expanded to 64 in 1985. Naturally, all the 1-seeds are excluded by virtue of being unable to lose to themselves.

It turns out there have only been 2 teams that have 4 wins against 1-seeds in the same year, and they’re both Arizona squads. In 1997, the Wildcats beat North Carolina twice, Kansas, and Kentucky en route to the national championship. They never played Minnesota, the other 1-seed. In an even crazier 2001 season, they came within a game of doing it: they split 2 regular-season games against Illinois before beating them in the tournament, split regular season-games against Stanford, and beat Michigan State in the Final Four. The last 1-seed was Duke, who they lost to in the National Championship. They played 7 games against the 1-seeds that year!

There have been 7 teams with 3 wins against 1-seeds (what was it with the Wildcats those 4 years?): 1985 Illinois 1985 Georgetown 1986 Duke 1991 Duke 1992 Southern California 1992 Indiana 2000 Arizona

The distribution falls off rapidly after that: there have been 64 teams with 2 wins against 1-seeds and 352 with 1 win.

The all-time leaders contain no surprises. If we assume that most of the 1-seeds come from historically “power” conferences, being in the same conference provides more opportunity over the course of a season to play against them. Going deep in the tournament doesn’t hurt either.

Duke 26
Arizona 18
North Carolina 17
Indiana 16
Kansas 16
Maryland 16


On the flip side, Wake Forest has the dubious honor of most losses to 1-seeds in a year. In 2002 they also played 7 games against the eventual 1-seeds. They went 0-7, losing to Cincinnati and Kansas once each, Maryland twice, and Duke 3 times. There have been 16 5-loss teams and 73 4-loss teams. In terms of all-time futility, NC State takes the cake. Since 1985, they have a paltry .114 winning percentage against 1-seeds:

NC State 62
Virginia 56
Georgia Tech 54
Clemson 53
Michigan 52


So if the Badgers pull it off, they’ll be the first in the 64+ team era to do so.

Stats courtesy Sports-Reference.

Pypeline update

I’ve pushed some changes to the pypeline repository, adding basic stacking functionality. There isn’t any registration, it only takes the median of each channel (R, G, B) for each pixel. It’s currently way slow, but I suspect there is substantial room for improvement there. Wrangling NEF files has proven more difficult than I anticipated, so currently the state of the art in pypeline is JPGs.

My camera is rated down to 32°F, and nightly lows have been around 0, so I’m scared to take it out into the elements. On the plus side, the stacking works with regular images too! Any particular pixel just needs to have the “right” value for at least half the shots.

Inputs:
dsc_0031 by Wisco crew, on Flickr dsc_0030 by Wisco crew, on Flickr dsc_0029 by Wisco crew, on Flickr dsc_0028 by Wisco crew, on Flickr dsc_0027 by Wisco crew, on Flickr

And the stacked result:
dsc_0027 by Wisco crew, on Flickr

There is a little ghosting, but quite good considering, I’d say. I am not sure how to get rid of that totally. More pictures should quash the error, but at 5 I would have thought it would wipe out any traces of the marker. Also, a better algorithm should be able to push down the > 50% requirement to only a plurality. Maybe with some sort of clustering of values? I’m also taking the mean of each channel independently, maybe a better way would be to use luminance. In any case, baby steps!

Visual cryptography

Inspired by this post at DataGenetics,, I implemented a quick-and-dirty script in python to test it out. The first takes an input image and iterates over it pixel by pixel, splitting it into two output images. Ideally, the outputs are randomly assigned, so it is impossible to recover the original without both outputs. The outputs can be combined to recover the original. Here’s an example of it in action:

Original image:

Intermediate images (hopefully look like static):

Final output:

Not perfect, but it is definitely recognizable. The idea can apparently be extended to 9x9 (and 16x16, and 25x25… I presume) images, for a wider-shared secret. In any case, this scheme should make it possible for any number of people to share a secret, but none of them individually can recover it. I uploaded the code here on github.

New PC

I’ve been a little busy lately, but should be able to pick things up again with the new PC I built. Made a time-lapse of assembling it too:

SSL on Fox Row

I’ve just enabled SSL on the site. Technically it’s TLS, but the name SSL seems to be sticking. Now you can access it at https://www.foxrow.com (note https). The not-encrypted version should still be available at http://www.foxrow.com. I’m not sure with Heroku hosting the apps if I can get it set up for ergs and weather, still looking into that.

Pillow and NEF files

Apparently PIL, and therefore Pillow, do not support Nikon RAW (.NEF) files. From what I can tell, my camera shoots RAWs in a 14-bit grayscale format. With the help of nefarious, I’ve found a way to get the image data out of my RAW files. Images coming soon!

← Newer Page 1 of 2