The purpose of this investigation is to find out if a large range of timbres can be learnt on signal level and then recreated generatively. Recorded sounds are often preferred over synthetic sounds by composers or sound designers because of their inimatable complexity. The most accurate way of recreating such complexity computationally is by constructing a physical model of the desired sound object(s). However, a physical model for one sound object is unlikely to be usable for an entirely different sound object. The more complex the desired sound is the more expensive and limiting the physical model will be also. This problem can be avoided if timbre can indeed be learnt and recreated.
For the learning process to be successful timbre should not simply be discernable by the system (MFCCs can already be used as a representation of timbre). Exact information about the "micro-level composition" (Döbereiner, 2011, p. 29) of the sound should be learnt for the purpose of it's subsequent reproduction...