Efficient Matlab (I): pairwise distances

Changed some code from impossibly shore to merely slow!

Machine Learning Rumination

I want to share some tricks for making Matlab function more efficient and robust. I decide to write a series of blog posts. This is the first one of this series, in which I want to show a simple function for computing pairwise Euclidean distances between points in high dimensional vector space.

The most important thing for efficient Matlab code is VECTORIZATION! Here I will take distance function as an example to demonstrate how accelerate your code by getting ride of “for-loops”.
Say, we have two sets of points X and Y in d-dimensional space. We want to compute the distances between any point in X and any other point in Y. This function is very useful. For example, if you want to find the k-nearest-neighbor of a point x in a database Y, an naive method is to first compute the distances between the point x and all points in Y…

View original post 335 more words

Advertisements
Posted in Uncategorized | Leave a comment

Nim game

The other I was cutting the hedge and the smell of cut box reminded me of my late grandad. He was a pretty cool guy and inspired me a lot to study maths and physics, as he was always busy solving some puzzle or other.

One of the games he taught us was “take the stick”. This is a two player game where a certain number of ticks are drawn on each row. Each player has to remove at least one or more ticks from a particular row. The last player to remove a stick loses. We used to play with five rows containing five, four, three, two and one stick respectively. I always has a stinking suspicion that the first player always wins if they play optimally, but could never prove it. I figured, now that I am all grown up I should give it a go. I should add the old man was a bit of a luddite, so I probably should not do it with a computer!

The technique that I will use to prove this, is similar to a minimax algorithm. In particular, I observe that there are a finite number of states in the game and that each of these can be converted into some of the other ones. This means that the game can be represented as a directed graph. For example, if we were playing “take the stick” with two rows, the first with two ticks, the second with one, the directed graph representing every possible game would be:

130929-203556

 

 

 

 

 

 

 

 

 

 

 

In this picture each box contains the disposition of sticks in a particular state of the game. Since the player to remove the last stick loses, the box with one tick is a losing state. In addition, any configuration of the game which can lead to a losing state, is itself a winning state. Recapping there are two types of states:

  1. Winning states can lead in the next moves to a losing state
  2. Anything that is not a winning state is a losing state. The state with a single tick is by default a losing state

For any game except for the 2-1 game, drawing out the entire graph by hand is difficult, too many states and too many arrows. I therefore invent a bit of notation. I am going to therefore invent a bit of notation: winning states are going to be drawn in a rectangular square and losing games are going to be drawing in a circular one. Each losing game will be labelled A,B,C etc. and each winning game will be labelled with the letter of one of the losing games that it is connected to. Furthermore I am going to cunningly order the states by ensuring that the possible successors of a state are always after it (*).

For the 3-2-1 game the graph now looks like this:

130929-204755This shows that in the 3-2-1 game, the starting position is a losing position. Whatever the starting player does, the next player will be in a winning position. Clearly if the 3-2-1 game is a losing position, the 4-3-2-1 is a winning one. But what about 5-4-3-2-1 that we used to play so often? This is at the limit of being doable by hand as it has about 130  individual states. The following pictures show my graphic for this case:

photoB

photoA

So, clearly, the 5-4-3-2-1 game is a winning game. (A) winning strategy is to remove one of the ticks from the row of five, leaving the second player in the 4-4-3-2-1 position which is a losing one!

(*) This is ensured by sorting the rows of strings such that the longest one comes first and then ordering the states such that they are in lexical order.

Posted in Uncategorized | Tagged | Leave a comment

Unit testing destructors

I asked the excellent Kevlin Henney about techniques to test the destruction of pointers in C++. He suggested that the basic idea is to overload the “new” and “delete” operators and count how many times each is called. A testing framework can then check that the number of calls to new and delete balances out.

Testing destructors should definitely be tested in unit tests, but it strikes me that simply ensuring that new is called as many times as delete might not be sufficient. For example this function:

void leaky_function () {
   new double;
   delete NULL;
}

This code shows an overloaded new and delete that keeps counts allocation/deallocation. Note that the register that keeps track of which pointers have been allocated/deallocated cannot be an stl container, as these call “new” and delete themselves causing a stack overflow.

Posted in Uncategorized | 3 Comments

Cook's Tales

Augusta suggested a new recipe she’d seen on TV. So, ever ready to please…

I whizzed a thick slice of white bread with an anchovy, a handful of capers, some black olives, a few small cherry tomatoes, herbs from the terrace and a little salt and pepper. I put some seasoned cod fillets into a lightly oiled dish, covered with the “crumble” topping and baked for 10-15 minutes at 180 C.

Next time I try this dish, and there will be a next time because it certainly has promise, I need to find a  a way of giving the crumble more crunch. maybe by increasing the oven temperature?

View original post

Posted in Uncategorized | Leave a comment

Would Icarus’s wings _actually_ melt on way to the sun?

The other day, we were arguing in the office about Icarus’s flight to the sun. As you all know, the legend goes that Icarus made himself some wings out of wax and feathers. Unfortunately his hubris led him to fly higher and higher until his wings melted due to the heat from the sun.

Our argument was about whether, having abandoned the safety of the atmosphere, would the wax melt or freeze when in sunlight? In order to answer this question, we need to make use of the principle of detailed balance. According to detailed balance, energy exchanges between bodies at equilibrium must cancel out. Therefore if Icarus is receiving a certain flux of energy from the sun, he must also be emitting the same amount of energy. The corresponding equation is:

E_{incoming} = E_{outgoing},

icarus

The only way for Icarus to emit energy is by radiating black body radiation. Since the intensity of black body radiation is related to the temperature of the body by Stefans law, we know that:

E_{outgoing}=S\sigma T^4.

where T is Icarus’s temperature, \sigma is Stefan’s constant and S is Icarus’s total surface area. Similarly, the incoming energy depends on the solar flux and on the surface of Icarus that is exposed to sunlight:

E_{incoming} = S_{solar} \Phi.

I will assume that the surface of Icarus exposed to the sun is half of the total surface area. Therefore Icarus’s temperature will be:

T = ( \frac{\Phi}{2 \sigma} )^{1/4} .

Now, the solar flux \Phi is approximately 1 kW m^{-2} at the distance from the Earth to the sun, therefore Icarus’s temperature is approximately 306 degrees Kelvin. Or about 40 degrees Celsius. The melting point of wax is 60 degrees, so Icarus would be fine. In your face legends!

Posted in Uncategorized | 3 Comments

Solving Sudoku by depth first search

I am trying to remind myself of a little C++ and have decided the ideal project to do so would be to write a little solver for Sudoku. My aim is, given a board,  to find all valid solutions. A solution is valid if each number is present exactly once on each line and row and in each quadrant. There are two ideas that I use: updating the constraints and performing a depth first search.

Updating the constraints

Given a particular assigned position on the board, no other position on the same row or in the same quadrant is allowed to have that value. I therefore store in an array of 9bits the allowed values for each position. For example, if a particular position is allowed to be in any position its state would be “111111111”. If, however it is allowed to only have a value of ‘1’ or a value of ‘3’ its’ state would be: “000000101”.

There are two particular important states. If a position is not allowed to take any value, then its state will be 0. This means that the board configuration I am considering is not valid. If on the other hand the state has a single true bit, then the position is in a definite state and I can assign that position.  So the pseudo code for updating the constraints is:

  1. UpdateConstraints(AssignedPositions):
  2. for aPos in AssignedPositions:
  3.       if aPos has not yet been assigned:
  4.            assign aPos
  5.            for neigh in (list of neighbour of aPos):
  6.                 unset aPos as possible state of neigh
  7.                 if neigh has no possible states left: return -1
  8.            for Pos in AllPositions:
  9.                  if Pos has a single state left: GOTO 1
  10. return 0

Depth first search

The approach above is not sufficient, as given the initially assigned cells, it is possible for certain positions to still not have a determined state even after updating the constraints. In order to try out all possible states, I perform a depth first search, by creating all board states  that differ from the original one by having a single cell occupied and performing the algorithm above.

Testing

I first tested on a simple news paper problem.

Starting problem:

_____________
|304|610|005|
|708|000|306|
|000|903|400|
_____________
|807|000|510|
|020|705|040|
|600|091|002|
_____________
|480|352|007|
|000|000|900|
|106|009|280|
_____________

After a single guess I can find the solution to  be:

_____________
|394|617|825|
|718|524|396|
|562|983|471|
_____________
|837|246|519|
|921|735|648|
|645|891|732|
_____________
|489|352|167|
|273|168|954|
|156|479|283|
_____________

I then tried a 17 given problem from  Wolfram Alpha.

_____________
|030|000|900|
|006|000|000|
|000|241|030|
_____________
|000|900|700|
|000|002|004|
|080|070|020|
_____________
|850|000|000|
|090|704|000|
|000|006|001|
_____________

this problem is clearly much much harder. The problem still is solved in a fraction of a second, having tried just 88 guesses with solution:

_____________
|532|768|941|
|416|395|872|
|978|241|635|
_____________
|325|984|716|
|769|512|384|
|184|673|529|
_____________
|853|129|467|
|691|437|258|
|247|856|193|
_____________

unfortunately I seem to have also spat out another valid solution:

_____________
|532|768|941|
|416|593|872|
|978|241|635|
_____________
|325|984|716|
|769|312|584|
|184|675|329|
_____________
|853|129|467|
|691|437|258|
|247|856|193|
_____________

It turns out that a “mathematician’s” Sudoku is different from a regular Jo one. The maths ones are allowed to differ by certain symmetry operations that I do not understand. Still main objective – to brush up on my C++ is achieved! Here is the source code – if you like  you can inspect my code here

Posted in Uncategorized | Leave a comment

Are you more likely to give birth on a full moon? Revised

In my previous post Ratbag and Roland pointed out that there seems to be a slight increase in probability in the birth rate for the week after a full moon and they suggested I carry out a significance test. I returned to the original data and found that the assumption that birth is equally likely on all days is in  significant disagreement with the data: could it be that the moon actually has an effect on the probability of birth? Luckily, if I correct my assumptions by noting that births are less likely on weekends, the influence of the full moon disappears again. So you are not more likely to give birth on a full moon, but you are more likely to give birth during the week.

In order to make a significance test it is necessary to have  a model of the underlying probability. In this case, we need to make some assumption about the probability of being born at a certain number of days distance from the nearest full moon. The most naive assumption is that babies are born with equal probability on each day of the year. In 1969 there were 13 full moons. Of the 12 intervals between each full moon, 6 were 29 days long and 6 were 30 days  long. Thus, assuming each day of the year is equally probable for birth, we can compute that the  probability of a baby being born between -14 and +14 days from a full moon is 0.03391 (to 4 s.f.) and  that  the probability of a baby being born 15 days from a full moon is 0.01667 (to 4 s.f.). The variance of a binomial distribution is v=p(1-p)n, where p is the probability of an even and n is the number of events considered. Since about 1.8 million births is the sample size used, this leads to a standard deviation \sigma=0.000135. Since any deviations from the probabilities of more than 3~\sigma would be significant, the increase in probability in the ten days after full moons seems significant.

This result surprised me: are women’s wombs really in tune to the music of the spheres? To make my mind up, I decided to look at the distribution of births as a function of day of the year, which is shown in the graph below.

Every 7 days there seems to be an event which leads to a significant reduction in births. This event happens on Saturdays and Sundays.  I am not sure why, but I reckon that doctors and nurses being on holidays might have something to do with it :D.

A better model for the probability of being born a certain number of days away from the full moon, must take this variation into account. If I do this, I obtain the curve shown in green in the graph below.

now it's significant!

The error bars on the green and red line show the 99.7% confidence interval (3 \sigma) . Notice that they match the data very closely indeed.

So the bottom line is that you are certainly more likely to give birth during the week, but the moon has nothing to do with it.

Posted in Uncategorized | Leave a comment