The problem of embedding an $N$-processor architecture $G$ into an $M$-processor architecture $H$ for $N>M$ arises when algorithms designed for architectures of an ideal size are simulated on existing architectures which are of a fixed size. In this paper we present solutions to this embedding problem for the case when both architectures are hypercubes and the embeddings are to achieve a balanced load. An embedding achieves a balanced load if every processor of $H$ simulates at most $\lceil{N\over M}\rceil$ processors of $G$. We show that in this case hypercube $G$ can be embedded into hypercube $H$ with a dilation of $1$ and an optimal congestion of ${N\over M}$. The main contribution of the paper is the lower bound on the congestion.