Most gamma-ray bursts (GRBs) observed by the Swift satellite show an early steep decay phase (SDP) in their X-ray light curve, which is usually a smooth continuation of the prompt gamma-ray emission, strongly suggesting that it is its tail. However, the mechanism behind it is still not clear. The most popular model for this SDP is high-latitude emission (HLE), in which after the prompt emission from a (quasi-) spherical shell stops photons from increasingly large angles relative to the line of sight still reach the observer, with a smaller Doppler factor. This results in a simple relation between the temporal and spectral indexes, α= 2 +β where Fν∝t−αν−β . While HLE is expected in many models for the prompt GRB emission, such as the popular internal shocks model, there are models in which it is not expected, such as sporadic magnetic reconnection events. Therefore, testing whether the SDP is consistent with HLE can help distinguish between different prompt emission models. In order to adequately address this question in a careful quantitative manner we develop a realistic self-consistent model for the prompt emission and its HLE tail, which can be used for combined temporal and spectral fits to GRB data that would provide strict tests for the HLE model. We model the prompt emission as the sum of its individual pulses with their HLE tails, where each pulse arises from an ultrarelativistic uniform thin spherical shell that emits isotropically in its own rest frame over a finite range of radii. Analytic expressions for the observed flux density are obtained for the internal shock case with a Band function emission spectrum. We find that the observed instantaneous spectrum is also a Band function. Our model naturally produces, at least qualitatively, the observed spectral softening and steepening of the flux decay as the peak photon energy sweeps across the observed energy range. The observed flux during the SDP is initially dominated by the tail of the last pulse, but the tails of one or more earlier pulses can become dominant later on. A simple criterion is given for the dominant pulse at late times. The relation α= 2 +β holds also as β and α change in time. Modelling several overlapping pulses as a single wider pulse would overpredict the emission tail.