The basic tool for neural networks is: vector times matrix equals vector. The first vector is your input pattern, the second is your output pattern. Stack these in a series and you have a deep neural network. The absolute simplest implementation I could find for this is in C++ using boost.

Here it is:

```
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>
int main () {
using namespace boost::numeric::ublas;
matrix<double> m (3, 3);
vector<double> v (3);
for (unsigned i = 0; i < std::min (m.size1 (), v.size ()); ++ i) {
for (unsigned j = 0; j < m.size2 (); ++ j)
m (i, j) = 3 * i + j;
v (i) = i;
}
vector<double> out1 = prod (m, v);
vector<double> out2 = prod (v, m);
std::cout << out1 << std::endl;
std::cout << out2 << std::endl;
}
```

This is almost verbatim the sample code from Boost BLAS. Scroll down to the **prod** example. So now the question is, how does one go from that to a neural network? More to come.

Another question that comes to my mind is how would you optimize this if you have a vector based co-processor on your machine?