The ESR of the capacitor is very important is controlling the heat generated. More capacitors in parallel can help you reduce the ESR while at the same time increasing the surface area. If this is a high speed switching type system, consider reducing the ripple current (often a higher inductance can help). Higher inductance does bring its own problems, so it's a trade off.
In my 46 years in this field, I cannot recall ever seeing capacitors on a heat sink. Not to say I've seen everything under the sun, but a capacitor is not supposed to conduct electrons from one plate to another, save minute leakage current. Paralleling capacitors to distribute I^2R losses is also in the same category; covering a design fault.
What type of capacitors are you using? Polarized or non-polarized electrolytics? Is the applied voltage AC or DC? Do you have a schematic of the circuit having the issue? Could you be using motor start caps for the wrong application? Many questions more, but it would help greatly if you supplied some specific information about the design and the components in question.
It helps to review the ripple current rating of the capacitor in question and not to exceed that rating. Connecting caps in parallel is a common technique as a means to this end but the current should not be expected to divide perfectly between caps, especially with age.