In [1]:

```
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, date, timedelta
from sklearn.cluster import KMeans
%matplotlib inline
```

We're going to use Pandas to read the data from a CSV

In [2]:

```
# Loading Dataset
df = pd.read_csv("Birthdays.csv")
df.head()
```

Out[2]:

My first thought was to use the day of the year as the only dimension to use

In [3]:

```
# Day of the year
toDayOfYear = lambda t: pd.to_datetime(t).timetuple().tm_yday
daysOfYear = np.array([toDayOfYear(xi) for xi in df.Birthday])
print(daysOfYear)
```

And while this sounds good my, very clever, wife pointed at me that this is not linear because we're going to be seeing the same kids every year for the following years so my approach wouldn't work.

After crying for a while I decided then to use polar coordinates to create a circle.

First we would transform the days into degrees, then I would calculate the x and y with the simple equations: $$x=r\cos(\alpha)$$ $$y=r\sin(\alpha)$$

Let's see how it looks...

In [4]:

```
# Normalize to Degrees
normalize = lambda t: (360 * t) / 365
# Coordinates
coords = lambda t: [math.cos(t), math.sin(t)]
# Transformation
X = np.array([coords(math.radians(normalize(xi))) for xi in daysOfYear])
# Plot Chart
plt.figure(figsize=(12, 12))
plt.subplot(221)
plt.scatter(X[:, 0], X[:, 1])
```

Out[4]:

**Great!**
We now have a circular distribution we can work with, so let's move on.

We'll use the K-Means algorithm and will decide to start by saying we're going to have 5 birthday parties every year. That's the number of clusters you want the algorithm to use.

In [5]:

```
# Amount of Birthday Parties we want to have
birthdayParties = 5
# K-Means Algorithm
kmeans = KMeans(n_clusters=birthdayParties)
y_pred = kmeans.fit_predict(X)
# Plot Chart
plt.figure(figsize=(12, 12))
plt.subplot(221)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
# Print Names
for i, txt in enumerate(df.Name):
plt.annotate(txt, (X[i][0]+0.05, X[i][1]-0.02))
```

**And ... Voila!**

We got it, the last item on the list would be to print the best days of the year to have the birthday parties which in our case are the found centroids.

Before printing them however, we need to undo all the transformations we've done before so that instead of having a pair of x and y coordinates, we have a date. For this we'll use the equation: $$\alpha=atan2(y,x)$$

In [6]:

```
# Coordinates to degrees
anglify = lambda t: math.degrees(math.atan2(t[1], t[0]))
# Denormalize
denormalize = lambda t: 365*t/360
# Day of Year to Actual Date
toDay = lambda t: datetime(datetime.today().year + 1, 1, 1) + timedelta(round(t) - 1)
# Inverse Transformation
bestDays = np.array([toDay(denormalize(anglify(xi))) for xi in kmeans.cluster_centers_])
# Print the result
print("Best Days:")
i = 0
for day in bestDays:
print(" - " + day.strftime('%d %b') + ":")
j = 0
for kid in y_pred:
if kid == i:
print(" * " + df.Name[j])
j = j+1
i = i+1
```

There you have it!