Image alignment is an important problem in computer vision, which can be solved by tensor based methods that are robust to noise and have satisfactory performance. However, these methods face two common challenges:(1) they have high computational cost when dealing with large-scale tensor data; (2) they ignore the local structures within and across images. To overcome these challenges, we propose an efficient data-driven tensor dictionary learning (DTDL) model for image alignment. In our DTDL model, we factorize the underlying third order tensor into a coefficients tensor and three dictionary matrices of smaller sizes, which reduces the dimensionality and complexity of the problem. We also exploit the generalized hyper-Laplacian regularization to preserve the local structures that are embedded in the underlying tensor and represented by the dictionary framework. Furthermore, we prove that our proximal linearized alternating direction method of multipliers algorithm can generate a sequence that converges to a Karush-Kuhn-Tucker point under very mild conditions. We conduct experiments on image alignment and face recognition tasks, and show that our method outperforms state-of-the-art methods in terms of performance and efficiency.