Electrocardiogram (ECG) is a non-invasive approach to capture the overall electrical activity produced by the contraction and relaxation of the cardiac muscles. It has been established in the literature that the difference between ECG-derived age and chronological age represents a general measure of cardiovascular health. Elevated ECG-derived age strongly correlates with cardiovascular conditions (e.g., atherosclerotic cardiovascular disease). However, the neural networks for ECG age estimation are yet to be thoroughly evaluated from the perspective of ECG acquisition parameters. Additionally, deep learning systems for ECG analysis encounter challenges in generalizing across diverse ECG morphologies in various ethnic groups and are susceptible to errors with signals that exhibit random or systematic distortions To address these challenges, we perform a comprehensive empirical study to determine the threshold for the sampling rate and duration of ECG signals while considering their impact on the computational cost of the neural networks. To tackle the concern of ECG waveform variability in different populations, we evaluate the feasibility of utilizing pre-trained and fine-tuned networks to estimate ECG age in different ethnic groups. Additionally, we empirically demonstrate that finetuning is an environmentally sustainable way to train neural networks, and it significantly decreases the ECG instances required (by more than 100× ) for attaining performance similar to the networks trained from random weight initialization on a complete dataset. Finally, we systematically evaluate augmentation schemes for ECG signals in the context of age estimation and introduce a random cropping scheme that provides best-in-class performance while using shorter-duration ECG signals. The results also show that random cropping enables the networks to perform well with systematic and random ECG signal corruptions.