My friend's master's thesis in computer vision is only slightly more complicated than this project. For a 'fun' project in your spare time, this is far from what you want. If you aren't already familiar with Matlab, which you would have to use initially, I estimate this project would take you at least 6 months if you are extraordinarily motivated.
If you are serious about it, here are some design considerations:
- Will you use fiducials (pink dots or ping-pong balls on joints and limbs)?
- Does this need to run in real-time or is it a program that processes prerecorded video?
- Can you set up the background scenery or does it need to run on real-world scenes?
- How many people will be in the frame at a time?
- Is the camera's position in relation to the ground, background scenery and person known?
Then get a camera, record yourself and all your friends doing everything you want to recognize about a hundred times (preferably all wearing black suits and facemasks with pink ping-pong balls stuck on key body parts in an all-white room otherwise it's going to be really damn hard), load it into Matlab, find yourself some good journal articles and get to coding.