Working with sound when creating rhythm game (TERRORHYTHM)
The first question every rhythm-game developer asks himself: “Should my gameplay actually be powered by music?” If the answer is yes and it’s not enough for the developer to have just visual scene feedback to playing music, the second question appears: “Should I give the player possibility to upload his own track to play”? Sure, you can have the same question in the first case too, but only if your gameplay is really affected by music the answer is crucial for you. Because giving player ability to upload his music means to spend resources on developing custom music analyzer and level generator and also on wide genre testing and tuning. Delegate-level creation to Steam Workshop community doesn’t seem like a good idea till you don’t have the community itself.
We understood that program sound analyzer is a rather popular task, though still not trivial and trying to solve it with our small studio is the responsible decision. And we still could not resist adding this killer-feature, allow a player to crush souls to the rhythm of his favorite song.
Rhythm is center of our gameplay. So-called BPM(beats per minute) with its frequency you move your head to cool track. But if you want to build your game around the rhythm you should get it first. Music for the game engine is only set of samples he can play and some information about them.
BPM extraction is not a new task - even if you ignore rhythm games, DJ equipment manufacturers and digital audio workstations developers have done a great job in this area. And still after all these achievements they often delegate rhythm detection to human.
- tap CUE button to the beat and CDJ will catch it
We reviewed a lot of materials before came up with the solution that worked for us.
Music is complex soundwave, the vibration of air in your room, in the club or between your headphones membrane and eardrum. Transformation of these vibrations to something melodic and rhythmic (what our brain does automatically) requires some calculations. Most part of these calculations is Fast Fourier Transform that converts the complex sound signal to the sum of sinusoids with certain frequencies and amplitudes. For those who can’t understand in what way curve from school connected to sound we will explain: the sound wave of sinusoidal form is the simplest timbre familiar to anyone. Phone line ringing for example. The frequency of this sine is sound pitch and amplitude is volume. Here are some FFT for different sound waveforms for illustrative purposes:
Of course for music resulting frequency spectrum will be much more complex - all over the range it will have some frequencies. If you remember any music spectrum analyzer (people often call it wrong: equalizer) you will understand what we are talking about. This is the clear picture of what sound look after Fast Fourier Transform.
- the loudness of frequencies from lowest to highest (left to right)
Thanks to the game engine that made this transform for us we can work with music like with the array of samples (number of these samples per second depends on quality and compression) that gives us information about all the sounding frequencies at every moment of the record.
So we have several tasks to solve with help of this data.
First and main task - BPM extraction. Almost every in-game parameters and values are linked to the rhythm - from enemies spawn and move speed to animations playback speed and combo behavior.
To extract rhythm we should first understand what it is. Usually, by beat we mean start point of the short repeated musical piece. Looping of this piece determines main song tempo. Classic rhythm-section consists of drums and bass (whether it rock or jazz, metal or dance music). This knowledge gives us approximate borders of the frequency range that we will scout for volume peaks. Our main goal is under 120Hz range, but besides it, we will also scan additional narrow range around 1kHz where often rhythm-section-supporters sit.
First, we should understand loudness range for these frequencies along all the song. These values are needed to make relative decisions. When we got minimum and maximum volume we can start the analysis. In short, this analysis is a iterating over samples from song beginning to its end and comparison of needed frequencies amplitude with min and max values and with siblings. By making kind of “low-frequency volume jumps map” we are trying to find periodical sequences - peaks, repeating with the same time period (luckily for music lovers and unluckily for music analyzers, bass and drum parts can be complex and different).
Most frequent value of this time period we are using as beat duration, which is used to calculate BPM.
Analyzer reacts different to various genres - of course, house music, where the clean beat is marked by kickdrum, is the simplest one to analyze. Dealing with aggressive metal genres, where the drummer can do incredibly fast and furious things, is much harder. In that case, we can ask additional frequency range for help. Searching for snare and electric guitar drops can help in beat determination. We should also remember about music genres with floating bar size, where musical phrases can be different size and duration - jazz, math rock and others.
Sound compression is one more thing that can hinder analysis. This way of post-processing music is used by many producers nowadays. Compression is all about narrowing difference between the loudest and the quietest frequencies (so-called “dynamic range”) through the song to make it sound louder without quality loss and extreme peaks. Obviously, extreme compression (which can often be heard in dance music) can be a barrier to drawing proper “low-frequency volume jumps map” and make analysis results wrong.
Bad overall record quality can be the problem too - big amount of noises at the wide range or bad mixing can cause frequency conflicts (garbage side frequencies emergence).
To provide some support for beat extraction algorithm we add sync during the playback. The pre-analyzed track gives us average BPM value. According to this value, we set beat duration and run main game timer. Then during song playback, we periodically synchronizing this timer to our “map of peaks”. If deviation is big, next synchronization is set to short period of time and gameplay in this period gets softer to protect player from system errors.
Add some small features to this analysis and we get working version of beat-sync mechanism. Now we can concentrate on interactive levels - spawn more difficult enemies sequences when music is intense, visual scene feedback, etc.
Anyway, to avoid the bad experience for a player who wants to play the game to old grandpa vinyl or arrhythmic metal of his friends' band, we honestly say after the attempt to upload song - sorry we failed to get this track tempo, please find something else.