The recovery of an unknown density matrix of large size requires huge computational resources. State-of-the-art performance has recently been achieved with the factored gradient descent (FGD) algorithm and its variants since they are able to mitigate the dimensionality barrier by utilizing some of the underlying structures of the density matrix. Despite the theoretical guarantee of a linear convergence rate, convergence in practical scenarios is still slow because the contracting factor of the FGD algorithms depends on the condition number $\ensuremath{\kappa}$ of the ground truth state. Consequently, the total number of iterations needed to achieve the estimation error $ϵ$ can be as large as $O(\sqrt{\ensuremath{\kappa}}\mathrm{ln}(1/ϵ))$. In this Letter, we derive a quantum state tomography scheme that improves the dependence on $\ensuremath{\kappa}$ to the logarithmic scale. Thus, our algorithm can achieve the approximation error $ϵ$ in $O(\mathrm{ln}(1/\ensuremath{\kappa}ϵ))$ steps. The improvement comes from the application of nonconvex Riemannian gradient descent (RGD). The contracting factor in our approach is thus a universal constant that is independent of the given state. Our theoretical results of extremely fast convergence and nearly optimal error bounds are corroborated by the numerical results.