Automatic fault diagnosis in many systems is performed via the analysis of vibration data in the time or frequency domain. In the literature, many approaches for extracting featured from signals for fault diagnosis have been proposed. In this paper, we apply a variety of transform functions including Fourier, Wavelet, etc. to extract features from the vibration data, which includes the statistical features and some features automatically extracted via Convolutional Neural Networks (CNNs). For each of these feature extraction approaches, a learning algorithm is trained to diagnose the faults and their results are aggregated in an ensemble machine learning algorithm. The weights of the base learner algorithms are optimized via an evolutionary algorithm to achieve the best-weighted voting scheme. The architecture of CNNs has a significant effect on the performance of the algorithm, thus, in this paper, an evolutionary algorithm is proposed to find the best architecture for CNNs in fault diagnosis. The CNNs are trained via gradient descent algorithms which suffer from getting stuck in local optima. To manage this, we propose an evolutionary algorithm that benefits from the speed of gradient descent and the global search of evolutionary algorithms. The proposed algorithm is tested on a number of benchmark problems and the experimental results are presented.