About Transcriber

Up ] News ] News Archive ] About ] Author ] Downloads ] References ] Etudes VBA ] Etudes C++ ] E-Mail ] Search ] Thanks! ]

About Transcriber Project

Speech Transcribers

Despite significant progress in speech recognition algorithms reached in last decades, the main part of real speech recognition job is done by people. The basic instrument for this job is transcriber.



Transcriber is a device or a program for documenting speech phonograms.


Typists and journalists use transcribers for speech phonograms to text documents conversion. The speech phonograms above could be records of miscellaneous meetings, interviews, lectures, court sittings, conferences and so on. Foreign language teachers as well as their students use transcribers for auditing. "Philips" produces special hardware transcribers for many years. Hardware transcribers use real recorders as on the following picture.

Professional digital recorder "Nagra IV-S" produced by "Kudelski Group". Picture from http://www.nagra.com


Software transcribers use multimedia capabilities of personal computers. The most part of the software transcribers currently on the market usually use simple text editors such as WordPad. Microsoft Word is used only in outstanding programs of such kind. Actually the simple transcriber is a text editor supplemented with sound player capabilities. The most outstanding transcribers use sound labels (special type of hyperlinks) in the text documents for starting playback from the arbitrary time mark. Usually "arbitrary time mark" is a fixed event such as "meeting start", "appearance of the meeting participant" and so on.

Foot Control

The professional typists type 180 and more symbols per minute and as a rule don't use a mouse. Instead they control transcribers using foot control as on the following picture.


Three button "Foot Control LFH 0110/90" from "Philips".

Picture from www.philips.com




Foot Control

Any electronic device that generates standard modem signals "CTS", "DSR" or "DCD" and connected to COM port could be used as a foot control.

Typical Users

The typical categories of transcriber users are

  • Typists

  • Journalists

  • Foreign language teachers and their students

  • Private detectives

Atypical transcriber users, for example, could store play lists directly in the Microsoft Word documents.


This project integrates two great technologies - Microsoft Word and Windows Media Player (just a "player" below). Template "AhWMPlayer2.dot" ("Program") is intended to transform Microsoft Word to the fully functional digital transcriber - audiotext editor for professional typists with simultaneously listening phonograms and controlling playback. Audiolabels give direct access to any part of any phonogram.


Program has been tested with
Microsoft Windows XP SP2,
Microsoft Word 2003 (11.8026.8028) SP2 and
Windows Media Player v.

Files List

The package contains the following files:

File Location Description
AhWMPlayer2.dot MS Office Startup Folder
Usually  "C:\Program Files\Microsoft Office\OFFICE11\Startup"
Template for working with sound documents in Microsoft Word. Should be located in the Microsoft Office Startup folder.
AhWMPlayer2.ini the same Program settings file.
AhPlayer2Rus.chm the same Help file in Russian.
AhPlayer2Eng.chm the same Help file in English.
Detochkin.mp3 Any location Sound sample file (Court Sitting).
Detochkin.doc Any location Sample sound document.
Dialog131.mp3 Any location Sound sample file (for English Lesson).
Dialog131.doc Any location Sample sound document.

The following file(s) could be optionally included to the demo version.

File Location Description
"Transcriber. Installation Guide.doc" Any location Installation Guide.
AhPlayer_FC.dll System folder.
Usually - "c:\Windows\System32".
Foot Control support library.

Demo Version Limitations

Program demo version is freeware. Electronic dongle is not required. Main demo version limitations is the lack of some important features of professional transcribers: foot control and special tempocorrection DSP plugin are not supported.


To setup the Program perform the following actions

1. unzip archive

2. copy files "AhWMPlayer2.dot", "AhWMPlayer2.ini" and "AhPlayer2Eng.chm" to the Microsoft Office Startup folder (usually "C:\Program Files\Microsoft Office\OFFICE11\Startup")

3. if you need foot control support copy file "AhPlayer_FC.dll" to the system folder (usually "c:\windows\system32")

4. launch or relaunch Microsoft Word.



Audiolabels in the sample documents refer to the media files located in the "C:\_AhPlayer" folder.

If you unzip the archive to disk C: (folder "C:\_AhPlayer" will be created automatically), then audiolabels in the sample documents remain valid and you won't have to use find-and-replace to change them.


The following message appears while launching Microsoft Word due to used ActiveX "Windows Media Player".



Just press the "OK" button to continue.

Technical Support

The Program is supplied "as is", no technical support is assumed. The author will be glad to receive your feedback on E-mail.




Tempocorrection is slowing down or accelerating the phonogram without changing the pitch.


Tempocorrection algorithms are used in the professional software transcribers during about last twenty years. Lately even musicians have been using them - see http://www.ronimusic.com/.


Windows Media Player supports tempocorrection directly using menu item "View\Enhancements\Play Speed Settings...".


Moreover Windows Media Player capabilities could be extending with special DSP plugins. The current version of the Program uses embedded tempocorrection support.


Media File Formats

Windows Media Player v. 9 supports tempocorrection for the .wma, .wmv, .wm, .MP3, and .asf media file formats. In addition, tempocorrection may not be available when playing streaming or progressively downloaded media.

Unfortunately tempocorrection for the WAV format is not supported.

Audiolabels and Sound Documents



For simplicity we will call any WMP compatible media file a phonogram bearing in mind that even sound track of the movie could be used as sound source for transcribing as well as WAV or MP3 file.

URL Ė Uniform Resource Locator


A URL is a compact representation of the location and access method for a resource located on the Internet. Each URL consists of a scheme (HTTP, HTTPS, FTP, or Gopher) and a scheme-specific string. This string can also include a combination of a directory path, search string, or name of the resource.

Another words, URL is a string that we usually type in the browser "Address" window. In the text documents URLs are usually displayed as hyperlinks, for example http://transcriber.narod.ru/1.mp3.

Streaming Technologies

Program has not been tested with streaming sound, including net radio stations.



Audiolabel is basically a bookmark in the media file and is used for starting playback from the picked time, allowing the usage of the advantages of the direct access to the media.


The basic audiolabel information items are listed in the following table.



Media File Path

Media file path or URL.

Time Mark

Time mark could be in the absolute format (DD/MM/YYYY hh:mm:ss) or in the absolute and relative format (hh:mm:ss) or in both. Time mark is used for starting playback from the picked time.

Event Description

Arbitrary text. An event name, a person name, etc.


As audiolabels should differ from the ordinary document text and allow change, usage and deletion from the document, the best choice is to use special Microsoft Word styles for managing audiolabels.

Using Styles for Managing Audiolabels

As it was stated above, special Microsoft Word styles are the best choice for managing audiolabels. Letís introduce the style conventions, which will be used for managing audiolabels.



Paragraph styles AhSoundLink and AhSoundText are used for insertion and deletion of audiolabels and event description respectively. The definitions of both AhSoundLink and AhSoundText styles (all style parameters except name and type) could be changed by any user in arbitrary way.


If style AhSoundLink or AhSoundText is absent at the moment of insertion (for example, style has been deleted by the user), then Program automatically creates the default style definitions, which are described in the following pictures.


AhSoundLink style definition

AhSoundText style definition



Audiolabel is a set of two (or more) Microsoft Word paragraphs, first of which is styled with paragraph style AhSoundLink and second is styled with paragraph style AhSoundText. Audiolabel (or its part) is automatically inserted to the text document or deleted from the text document after execution of special commands.


The following lines illustrate the audiolabel structure format

The paragraph containing time marker and media file path is styled with AhSoundLink style.



AudioLabel Text here...

The paragraph containing event description is styled with AhSoundText style.


Audiotext or Sound Documents

Audiotext Document

We will call a text document with audiolabels an Audiotext Document (or just a Sound Document)


There are some examples of audiotext (or sound) documents below.


Sound document "Detochkin.doc"


Sound document "Dialog131.doc"




About Transcriber

Unless otherwise noted, all materials on this site are
© 2007-2010 Evgeny Akhundzhanov, All Rights Reserved Worldwide.
www.transcriber.ru | E-mail the Author