Speech Science, Technology and Innovative Applications

— an intensive course for students at IIIS of Tsinghua University

Spring 2015

Instructors: P.C. Ching, Tan Lee, Ken Ma, Helen Meng, and William S.-Y. Wang

Students please contact Prof. Tan Lee, tanlee@ee.cuhk.edu.hk, for more information and inquires.

Course project

Course home

Notes:

Students will work on the course project in groups. Two students form a group;
Students are expected to form groups and start working on the project as early as possible. A list of suggested projects are given below. Students are encouraged to propose their own projects;
Students are strongly encouraged to communicate closely with the respective professor about the project details.
The assessment of projects will be based on a project proposal, a written report and an oral presentation.
For general questions, please contact Prof. Tan Lee (tanlee@ee.cuhk.edu.hk).

Suggested projects:

(1) The integers are named differently in Chinese and English. Select another major language (for example, French, German, Japanese, Korean, or …) accessible to you. Observe how speakers of these three languages remember long telephone numbers and do multiplications, with and without writing. Does the linguistic difference in names influence their numerical ability in any way? Suggest some detailed experiments to verify your hypotheses. (Prof. William S.-Y. Wang, wshiyuanw@gmail.com)

(2) The bulk of our knowledge on language, behavior and the brain is based on Western languages, couched in Western cultures. [See Henrich, J. et al. 2010. Most people are not WEIRD. Nature 466.29.] Chinese is distinct from the West in many ways, for both language structures and cultural development. It is generally believed that language shapes thought. [See Boroditsky, L. 2011. How Language Shapes Thought. Scientific American 63-5, February issue.] Review the various scientific issues here from a Chinese viewpoint. (Prof. William S.-Y. Wang, wshiyuanw@gmail.com)

(3) Chinese is distinct in the spoken language in having lexical tones and in the written language in using sinograms. Using any brain imaging method you have access to and perform an experiment that yields new knowledge on the Chinese language. For general background, see the excellent review, Friederici, A.D. 2011. The Brain Basis of Language Processing: From Structure to Function. Physiol Rev 91.1357-92. (Prof. William S.-Y. Wang, wshiyuanw@gmail.com)

(4) Design an adaptive filter to keep track the time-varying property of a speech signal. (Prof. P.C. Ching, pcching@ee.cuhk.edu.hk)

(5) Design and implement a low-bit-rate speech codec at 2.4 kbit/sec. (Prof. P.C. Ching, pcching@ee.cuhk.edu.hk)

(6) Suggest methods to time-encode a speech signal. What are the challenges of this problem ? (Prof. P.C. Ching, pcching@ee.cuhk.edu.hk)

(7) Spoken language recognition/classification. The basic idea is to tokenize input speech into a sequence of language-independent sound units. The arrangement of these sound units follows different rules in different languages. By capturing and modeling these rules, language recognition can be achieved; (Prof. Tan Lee, tanlee@ee.cuhk.edu.hk)

(8) Music transcription. Like speech, music signal can be seen as a sequence of sound units, e.g., music notes. Each music note is described by its time position and music pitch. Automatic music transcription is the process of locating and identifying the music notes in a music signal; (Prof. Tan Lee, tanlee@ee.cuhk.edu.hk)

(9) Seeing the sound. Different sounds, including speech, music and noise, have different properties. Can you design a system to represent sounds by images ? In this way, one can “listen by seeing”. (Prof. Tan Lee, tanlee@ee.cuhk.edu.hk)

(10) Dialogue modeling Design and implementing an interactive, mixed-initiative dialog model (for text-based input and output) related to general inquiries about the Yao class. (Prof. Helen Meng, hmmeng@se.cuhk.edu.hk)

(11) Beamforming of speech (Prof. Ken Ma, wkma@ee.cuhk.edu.hk)

(12) Signal restoration based on compressive sensing and sparsity-based techniques (Prof. Ken Ma, wkma@ee.cuhk.edu.hk)

(13) Blind separation of speech sources (Prof. Ken Ma, wkma@ee.cuhk.edu.hk)