The Shtooka Recorder is the easiest and fastest way to record sentences for the Tatoeba Project. You can use other free software such as Audacity, but it's much, much slower and requires many extra steps to accomplish the same thing.
Brief Outline - How to record for the Tatoeba Project
- Download the Shtooka Recorder
- Try recording a few sentences to make sure you know how it works.
- Write to email@example.com telling us that you're interested in recording for us. Tell us what your native language is.
- CK will create a list of sentences formatted for the Shtooka Recorder for you.
- Record a few of these and send them to firstname.lastname@example.org just to make sure everything is OK before you spend a lot of time recording many sentences.
- After that, you can easily record many sentences for us.
Download the Shtooka Recorder
- Download Link: http://web.archive.org/web/20110621160436/http://shtooka.net/soft/kit_shtooka/kit_shtooka_Install_0.9.8.exe
The "Shtooka Recorder" is part of this package.
The original website that hosted this is currently offline. (Jan. 2012)
- This is a "Windows" program. (I use it with Windows XP.)
- This wouldn't work on my Macintosh with Windows 7 via Bootcamp. The application would run, but the audio interface wouldn't work.
- If you don't have a computer that runs Windows, maybe you can borrow a friend's computer.
Download the Swac Recorder(I think this is just a slight rewrite of the Shtooka program.) So far, I think I still prefer the Shtooka Recorder. However, maybe one of these will work better for you.
- for Windows (XP, Vista, Win7, Win8), download the 32-bit or 64-bit binary
- for Ubuntu (13.10, 12.04): download the 32-bit or 64-bit binary
- for Fedora (20, 19), download the source package
You can't easily record long sentences with the Swac Recorder.
1-minute Demo of the Shtoooka Recorder by CKThis quick demo shows you how fast and easily you can record sentences.
Notice that on the 4th sentence, the audio was "saturated" and would have been distorted, but the recorder takes care of this kind of error by flashing pink in the level meter and then going back to the beginning of that sentence, so you can record it again..
1.5 minute Demo of a Shtoooka Recording Session
See some of the keyboard commands and features.
If you want to contribute, do this first.
- Record a few sentences and send these to email@example.com, with the title "Audio for Tatoeba."
- If the quality isn't good enough, we can perhaps suggest ways to improve your recordings.
Once you know that your recording quality is good enough, then ...
- For your first batch of files, someone on the team will send you a list of sentences properly formatted.
- Paste these into "Words to Record."
- Then do the recording. See the documentation below, if you are not sure how to do this.
- As you record, you may want to skip some sentences that don't sound 100% natural to you. There is also an option to "remove" a sentence after you've recorded it.
- After recording the audio, you should listen to them all and throw away any that don't sound good.
- See the bottom of this page, if you want CK to help you.
The following is from shtooka.net (July 22, 2011) and is used under the Creative Commons "By" license. (http://creativecommons.org/licenses/by/2.0/fr/)
It has been slightly edited, mainly to eliminate non-working links.
List of words that will be recorded:This is where you paste in the sentence data.
Information About the Speaker:This can be ignored for the Tatoeba Project. We don't use this data. However, it doesn't hurt to enter it. You'll only need to do this once.
Audio Recording:You can resize these windows to fit your screen as you like.
The user pronounces the first word, then Shtooka Recorder automatically switch to the next word while saving the file.
How to configure Recording SettingsYou can use the default settings, so you don't need to change any of them.
For sentences, I find that setting the "final silence" to about 0.80 works quite well for me if there are some 2 sentence items. If all items are single sentences with no pauses, then 0:40 is what I use since I don't have to wait as long between sentences.
How does it work?
This window demonstrates the settings that are relevant to the recording process. Let's review the way the program works.
- The program continuously waits for a word to begin. It decides that a word begins when the input level exceeds a given threshold (the "start level", shown at point #1).
- When a word has started, the program waits for a silence. The program considers that there is a silence when the input level is low enough to be attributed to residual noise (i.e. when the input level goes below the "Max Noise Level" threshold, for example at point #2).
- During the recording of a sentence, there can be silences between words. To record a full sentence, the system has to distinguish between silences in the middle of the sentence and the final silence at the end of the sentence. The criterion is simple: when a silence exceeds a given length (say, 0.5s or 1s), it is considered to be a final silence, and the word/sentence is saved. When a silence is detected, a plain vertical line is drawn to show the point where the program considers silence to be final one (#6)
- When the program decides to save a word/sentence, it saves not only the grey "word" itself (#4), but also a small time before the word starts and after the word stops (the two hatched zones, #3)
- If the input level goes higher than the threshold #7 (the horizontal horizontal line), the program will consider that the input is saturated. You will then have to record the word again. Speak a bit more softly, or move your microphone further from your mouth.
The "Block Length" Parameter
This sets the time shown as a single block in the life "sound graph" diagram, and sets the duration for which "sound" or "silence" is determined. If you want a finer granularity, make it smaller; otherwise 0.05s is a good choice.
The "Margin Before" Parameter
This sets the time to be included in the recording before the first "sound" is determined. It should not be less than "Block Length", and usually should allow a listener to shift attention to listening after clicking "playback". (This is the duration of left of the two hatched zones, #3)
The "Margin After" Parameter
This sets the time to be included in the recording after the last "sound" block. It can be used as a "buffer" of silence before another sound recording can be played. (This is duration of the right of the two hatched zones, #3)
The "Final Silence" Parameter
This sets the time that the program has to wait after the end of the word (#6) to save it. If you want to record simple words, you can set it to 0.5s, if you are recording whole sentences, set it to 1s or 1.5s.
The "Minimum Length" Parameter
At the end of the word, if the total time is less than the "Minimum Length" the program will not save the buffer. This parameters can help you not to record parasite sounds.
The "Starting Threshold" Parameter
This sets the #1 Level, the minimum loudness triggering the beginning of the word or sentence.
The "Max Noise Level" Parameter
Sets the #2 threshold. Set it as low as you can. If this level is too high, the program will stop before the end of words!
The "Saturation Threshold" Parameter
This sets the #7 threshold. Try speaking very loudly into your microphone to determine the saturation level of your audio system, and set this parameter a little lower.
The documentation is also at web.archive.org
- English: http://web.archive.org/web/20110722082618/http://shtooka.net/soft/shtooka_recorder/en/
- French: http://web.archive.org/web/20110621160010/http://shtooka.net/soft/shtooka_recorder/fr/
- Use the best external microphone that you have or can borrow from someone.
- Built-in microphones often pick up noise from the hard disk or the fan.
- The higher quality the microphone, the higher quality your recordings will be.
- Before sending your audio files to the Tatoeba Project, listen to all of them (maybe twice), and throw out the ones that don't sound natural or that have unwanted noises. I suggest using VLC (Free at www.videolan.org), rather than the Shtooka Recorder, since I think it's easier and faster. VLC can play FLAC files.
- Read the Tatoeba Project's blog post about this.
YouTube VideoSkip to 0:32, if you've already downloaded the Shtooka Recorder.
Created by tatoeba.org/user/profile/AmberShadow, I think.
Linux SourceSwac-Record swac-record est un programme écrit en C++ pour Qt qui permet l’enregistrement systématique de mots ou expression.
Find Some "Packs" of Words and Sentences
- Nico is a Tatoeba Project member related to the Shtooka Project
If you are asking CK to help you, then ...
- Paste the sentences as formatted by me (CK) into the Shtooka Recorder.
- Record the sentences.
- DO NOT record any sentences that don't quite sound natural to you.
- (Even if they are perfectly good "written" sentences, it might be good to limit audio to things we actually say.)
- Send the flac files to firstname.lastname@example.org.