Manoj, I'm happy you have success after all the frustration.
Looking at the datasheets for the types of triacs you are using, a 1500W load will dissipate as much as 8 watts in the triac at full brightness. With a junction to ambient thermal resistance of 60 deg C per watt, that's 480 deg C junction temp (8W times 60 deg C) without a heat sink (even for the BT139). Smoke city! That requires some serious attention to heatsinking, snubber or no snubber. Based on the temp data in the datasheet, I figure your heatsink should have a mounting surface to ambient thermal resistance of no more than 13 deg C per watt (limiting value). This is figured as follows:
Max junction temp (150) - Junction to tab resistance × dissipation (2 × 8) - ambient temp (25) = max tab temp (109)
max tab temp (109) ÷ dissipation (8) = max heat sink thermal resistance (13 deg C per watt)
Preferably the heatsink should be 6 deg C per watt or less for a junction temp of 70 deg C (half the max junction temp)or less in free air. I would go for a heatsink of about 4 deg C per watt, or less for great reliability.
Your transistor idea, while admirable, won't work because the transistor is a unipolar device. I don't understand why the triac is full on, however, with that configuration, unless S1 is closed.
You can use a much smaller triac (with a more sensitive gate) to provide the high gate current for the BTA16. MT1 and MT2 of the smaller triac are connected across MT2 and the gate of the larger triac. The gate of the smaller triac is connected the same way as that of the larger triac was without the addition of the smaller triac.