COMMUNICATIONS
ON THREE CLASSES OF SOUND SPACES FOR SONIFICATION SYSTEMS DESIGN
DOI 10.24411/2072-8735-2018-10012
Gleb G. Rogozinsky,
Bonch-Bruevich St. Petersburg State University of Telecommunications, St. Petersburg, Russia,
Solomenko Institute of Transport Problems of the Russian Keywords: sonification, sound space,
Academy of Sciences, phonographic space, auditory displays,
[email protected] multi-domain ivofeL
The sonification, i.e. methods of representing various data through the non-speech sounds, begins its history from the works of Kramer [1], Barrass [2], Edworthy [3] and others in late 1990s. Although the first documented examples of using sound to understand the data can be traced to Mesopotamian civilizations (around 3500 BCE), where the auditors compared the soundness of strictly independently scribed accounts of commodities moving in, out and remaining in warehouses, to ensure that the Pharaoh was not being cheated [4].
Systems with relatively smaller complexity and lower size of informational thesauri obviously can be sonificated through some generic approaches. The situation changes when we have a variety of different complex systems, i.e. the systems of systems, which can consist of lot of elements of thesauri with different hierarchy organization and dynamics. In the era of the Industry 4.0, the technological and informational systems become even more complex and growing than ever before. In spite of high intellectualization level of the state-of-art systems, we still need the human operator at the situation centers and at the control of critical operations. At the same time, the level of the informational overload increases together with the data flow inside the systems and the informational capacity of the system. It demands developing novel approaches to the workspace design, also including multimodal human-machine interfaces. Other important aspect to lead us to the further study of the sonification is the actual problem of gaining the comfort of the visually impaired people. Given aspects account for the development of polymodal and polysen-soric human-machine interfaces. Today's level of infocommunications and technological systems reflects the vast variety of such systems. Thus, we need a search for the unified approaches and methods of sonifications, i.e. we need a common terminology for the methods of sonification development, and for describing the theoretical and practical limitations of what we can actually hear. Such study can be initiated from the different starting points, but one of the most important starting point is the definition of what we can really hear in exact situation, i.e. speaker setting and overall acoustics, and finding the methods of formalization of that definition. The multiple methods of representing data as non-speech sounds, known under common notion of sonification, increase the spread between the developers of the monitoring systems and displays. The auditory representation of data is widely used by media artists, and recently by designers of the monitoring systems. The paper presents three different classes of sound spaces, which can be used for the universal description of any sound object in the sonification system thesaurus. The sound spaces are defined in the terms of the Modified Multi-Domain Model.
Information about author:
Gleb G. Rogozinsky, PhD, Deputy Head of Medialabs, The Bonch-Bruevich St. Petersburg State University of Telecommunications, St. Petersburg, Russia;
Principal Researcher, Solomenko Institute of Transport Problems of the Russian Academy of Sciences Для цитирования:
Рогозинский Г.Г. Три класса звуковых пространств для проектирования систем сонификации // T-Comm: Телекоммуникации и транспорт. 2018. Том 12. №1. С. 59-64.
For citation:
Rogozinsky G.G. (2018). On three classes of sound spaces for sonification systems design. T-Comm, vol. 12, no.1, pр. 59-64.
7ТЛ
1 The Object-to-Listener Chain Links
The thesaurus design for any sonifieation system depends on lots of factors. That is to say, it is not only an issue of the pure sound design research, hut also a subject of multiple limitations, which are to be estimated and assumed to prevent the loss of the meaningful information, or its undesired misrepresentation.
The main unit of any sonifieation System is a human operator, or listener. Thus, the properly designed sonifieation system should be conformed to the operator's hearing characteristics. The latter are well known from the research ofZwieker [5],
Beranek and other acousticians and physiologists. After them, we typically outline an ability to sense an intensity of the signal, which correlates with the volume of the sound; an ability to sense the frequency, which correlates with the pitch and ability for spatial localization of the signal's source due to our binaural hearing. We arc also able to analyze the complex structure ofthe signal, which correlates with the timbre.
Assuming the auditory display, we conditionally outline five source-target links within the object (physical or virtual) and the listener, i.e. object - gene ml or, generator - display, display -medium, medium ~ listener's sensory system (LSS). LSS - listener's cognition. Each link transforms the previous thesaurus of the information carried with the sound.
If we assume an existence of some object A as a thing-in-itself with its own thesaurus, which includes all possible slates of
a given object, we can define it as (Af * > ¡-e- object A, represented in its own thesaurus .
Every transfer of the object's representation from one media to another, or from one domain to another, e.g. from physical domain to the informational, implies the object's representation in another thesauri, i.e.
(Af ) .
iD
The Multi-Domain Model of communications (MDM), first defined by Sotnikov |6J, and then further developed by the author [7], provides the universal solution for describing the various processes happen in the different domains, i.e. Physical (PD), Informational (ID), or Cognitive (CD), The mode! provides a flexible and abstract approach to describe the transformation of the information inside and On the edge of the domains. Using such approach, we can describe any transformation of signal or object at any step.
First, an object-to-generator path is divided into two cases. If the sonificated object is a part ofthe PD, the process of mapping its thesaurus onto the sound generator described by (2),
(2)
So firstly, an object A of the PD should he transformed to corresponding informational representation, or the entity of the
ID (IaY* } • Siven 'n informational thesaurus Next
goes the changing ofthe thesaurus within the ID, i.e. the initial informational representation of the physical object is being adapted to the purposes of sonifieation.
In the second case, where the object is a virtual-oniy entity and exists in a cyber-world, the (2) becomes (3)
(cf ->(cf. (3)
The second link, or the generator - display link, describes the signal degradation at passing the path to the speaker or auditory display. Typically, the modem digital equipment will not induce any significant distortion, though in the case of long lines or noise induction the initial message can be a subject of misrepresentation, This operation lakes part completely in the ID.
The third link represents the technical acoustical limitations of a speaker system. For instance, if the developed auditory display constructed as a halo-like body connected to the headrest of an operator's chair, with a group of small speakers inside, we should not rely much on the low frequencies due to the limited frequency response of such speakers, so the thesaurus objects should be carefully designed and filtered. The same can be addressed to some wearable units with the small-sized speakers. From the sound design position, such procedure can be understood as a special-purpose sound mastering tasks. From the MDM point, it is yet another intradomain operation.
The fourth link, i.e. medium - LSS, represents the aspects of architectural acoustics, which include the reverberation issues, the absorption of sound, the speaker and listener positions, etc. At this stage, the sounds, which are played through speakers, can lose their legibility and sharpness, the high frequency bands can be sufficiently attenuated because of surfaces and people, the lows can be misheard due to the standing waves or undesired position of the listener. Also sufficiently strong localization issues can be identified because of incorrect speaker or listener placement. At this stage, we should also consider ergonomical limitations, caused by the acoustic peculiarities of operator's working space, i.e. the ambient noise level etc. If the working place is polluted by some noise, it can mask subtle sounds.
The psychoacoustic limitations include several limitations caused by our human auditory system, including different types of masking. Besides masking, the partial hearing loss in adult ear should also be considered [5], Such limitations demand careful design of a sound set, especially in the case of simultaneous play of several sounds.
All latter links, i.e. generator - display, display - medium, medium - listener's sensory system (LSS), correspond to the processes or transformations, which take part in the ID. We can group those links under some higher level abstraction, but we would like here to name all possible principal links on the path from object to listener's cognition, since some limitations, which we are going to describe later, are localized in the given links.
So, we can formally define the described operations through the changing ofthe initial thesaurus of ID entity.
(4)
(cf^cf
■ <cf.
The final LSS-to-listener's brain link brings the signal to the CD, where the user finally recognizes hs meaning, i.e. the initial signal or object after being routed through the links of different domains and transforms, is represented in the user thesaurus of
CD ÇB
(cf^iicf^j
(5)
2 The Search for the Sound Spaces
Having the complete path for the sonifieation, formulated though the terms ofthe MDM, the next question is can we define the space, or the set of spaces, which will theoretically include all possible sound objects for the sonifieation, or can we have
W
some metrics to define the sel of objects, depending on the exact task.
From the naïve point of view, the most straightforward way of obtaining such sound spaces is to rely on the well-defined Amplitude-Frequency-Time (AFT) space class. Such class includes various windowed transforms, like Short Time Fourier Transform, and compact support based transforms like classic Continuous and Discrete Wavelet Transforms, and their further modifications.
There are no subjective descriptors included in such sound space class, and modem computer analysis systems can provide a detailed technical description of the analyzed signal. Although those spaces can represent the most common features of sound objects, they possess several issues related to the crisp values. In addition, they typically do not include spatial dimension, which is of high importance in such systems as auditory displays, as it was described previously.
Pros: AFT class is able to precisely describe the signal in technical terms.
Cons: AFT class does not correlate to subjective characteristics of the sound, as well does not account the spatial position of a sound source.
Moreover, due to non-linearity of our hearing system, the actual amplitude and frequency values do not correlate well with the sound objcct we hear. It implies the necessity of search for some subjectivity. The closest answer leads to early experiments of electronic and computer music researchers, such as Pierre Schaeffer [8J, Iannis Xenakis [9J. Karlheinz Stockhausen [10], and other theorists of electronic and computer sound.
In particular, Schaeffer, being a composer, engineer, musicologist, acoustician and the ideologist of the musique concrete, systematically developed his own concept of including any sound into the music thesaurus, defined as a three plane space, i.e. Melodic plane, Dynamic plane and Harmonic plane.
The Melodic Plane described the pitch development over the course of a note, Schaeffer outlined three simple properties of pilch and four complex ones.
The Dynamic Plane represented the attack, the sustain, and the decay of the sound event. The Harmonic Plane represented the frequency spectrum engaged by the given note. Also known as timbre, it can be poor (utilizing only one or very few frequencies) or rich (using many frequencies at different amplitudes) — alternately, thick (broad band of contiguous frequencies), or thin (small band of contiguous frequencies). It also has the property of "color," being: brilliant (large number of harmonies, and amplitude tapers slowly across them), bright (small number of harmonics, slow amplitude taper), or dark (few harmonics, rapid amplitude drop after first harmonic).
The Shaeffer's class of sound spaces can be v iew ed as an attempt to subjectivize the AFT class, so it becomes subjectivized Amplitude-Frequency-Time class (sAFT). Here we have literally the same AFT triad, but with human level formalization, e.g. the sound envelope loses its precise AFT description to symbolic attack-sustain-decay model.
Pros: the sAFT space class uses the formal descriptions and induces some level of subjectivity, which increases the ambiguity of strict AFT approach.
Cons: the sAFT set of descriptors extracted from the space is not enough to represent the huge v ariety of sounds. It also lacks for any spatial description of the sound objects.
At least two independent sources of theoretical study of sound mixing use quite different interpretation of sound objects, based on so-called visual analogy. Thus, David Gibson [II] proposes a Space-Frequency-Loudness space. Viktor Dinov [12] uses phonographic and phortocoloristic spaces, which were developed for the same approach in mixing as a positioning the sources of sound in a virtual visual space.
Both of the spaccs do not operate with the loudness parameter directly, but rather with the loudness planes. Thus, Dinov defines 5 planes of loudness, or the closeness of the sound source to the listener, while Gibson defines 6 planes. The plane by itself correlates with the loudness, frequency characteristics and the reverberation time. Thus, the actual resolution of loudness plane axis is just 5-6 points.
Another coordinate is the spatial position, Gibson uses the perspective box, while Dinov uses the azimutal model. Both approaches mainly relate to a stereo settings, though Dinov briefly describes 5.1 Dolby Surround system as a top view scheme with speaker directions and object placements [12]. Comparing to standardized speaker settings, sonification systems can have various settings, depending on working place ergonomics.
According to Zwicker, we can confidently locate five main directions. Meanwhile, for moving objects the localization precision increases up to 3 grads. Therefore, the actual resolution depends on speaker position and head position. The best possible localization abilities expected for the front plane. However, it is normal for the subject to move his/her head to the object of the interest. Since through moving our head towards the object of our interest, we increase the resolution in that area. This feature can be used in sonification design.
As the third coordinate, Gibson defines the spectral width. The highs are subjectively related to higher position comparing to the basses, which are near the bottom. In addition, Dinov proposes the spatial width of the sound object. Such feature also exists in the Gibson model and can be considered as a fourth dimension of the model.
Pros: Phonographic space (PhG) is the subjective-friendly space and it is based on a human sense of a sound. It also accounts the spatial position of the sound object.
Cons: the space is rather empirical and designed mostly for practical purposes. It lacks for some precise numeric characteristics. The space is non-orthogonal. Each of its coordinate is dependent on several parameters.
The table I gives a comparison between three sound space classes.
Table Î
Features of three sound space classes
Orthogonality Subjectivity Dimensions Timbre Description Spatialisation
AFT Yes No 3 No No
sAFT Yes Yes 3 Yes No
PhG No Yes 4 No Yes
Using the MDM approach, we can formally describe the transforms, which lake part inside each of the given model. In the ease of AFT, we have
S{' ) . (6)
Therefore, the transformation path exists only within the ID.
The initial thesaurus of the signal S, which is typically a time
series, transformed into AFT class set of parameters using one of
the time-frequency transforms.
In the case of sAFT, the previously obtained Si , i.e. the sig-
j* f x
nal S in the time-frequency thesaurus g , goes under subjee-
tivization by user in the CD. Later, the user provides a sAFT description of the object, changing its thesaurus and domain again.
i £
(7)
where ¡¡f"_ft >(3** J ~ a transfer to the CD, i.e. the comprehension, or subjectivization of the AFT-dcscribcd signal S' in the user thesaurus ;
(s'-f^ff
defines a transfer back to the ID, after the user (human) understanding of the signal in the CD.
The Dinov-Gibson class, or Phonographic class, is the result of subjectivization without additional transforms in the ID, i.e.
The Figure 1 gives a visual representation for the given models.
Figure 1. Three sound space classes in the terms of MDM
Therefore, any sound object can be completely defined from the positions of amplitude/volume, frequency/pitch, panorama/stereo width, spectral width/timbre by the combination of three different sound class descriptions, i.e.
VS:
(9)
3. On Applying of Limitations
This part of the paper gives several further considerations on the practical implementation of the described models. Here we propose some limitations to be considered at the process of the sonification system design and for marking the real world edges of the sound spaces.
From the side of frequency, the human listener due to the peculiarities of our hearing sensory system cannot exactly localize the low-end components. Meanwhile, the sonillcation objects are typically polyharmonic ones, so the higher components provide better localization of given sound object. The important issue here is the frequency masking. From the positions of psycho-acoustics, our hearing is based on analysis carried in 24 critical bands. Within the range of one critical band, our ear integrates an energy of sound, so we hear only the strongest components within each band. It means that to avoid the frequency conflicts we should create the signals with (at least) spectral centroids to be placed in different critical bands. For the same reason, we should avoid simultaneous usage of two or more sound objects with the closer position of their spectral centroids. Otherwise, due to the frequency masking we will he able to hear only the louder one. Meanwhile, due to the informational masking in our hearing system, it is impossible for our brain to analyze all of the 24 different sounds presented simultaneously [13]. However, the sounds with alike timbres should be avoided, at least for simultaneous play. The described limitations take part in the listener's sensory system of the full sonification path describe above.
The number of perceived bands also a subject of equipment or hardware limitations, i.e. the subject of display - medium link. If sonification interface is based on a set of low-class speakers, which cannot normally represent even middle low part of the range, the number of possible elements will be even more reduced.
Obviously psychophysical limitations are rather constant. If we exclude the hearing-impairment persons {who cannot exploit the sonification interfaces, just like visually impairment people who cannot use the screens), the working age lays within the range of 16 - 60. Considering the worst case due to unification of such interfaces, we can have a worst case hearing thresholds. This provides the main limitation for the high end of the frequency range. In addition, the sonification system can provide user a simple equalization unit, so the user can be able to make some subtle adjustments according to his or her taste. Using visual analogy it can be seen partly as a color balance of the screen display. The lower end of the frequency range will be limited by equipment parameters.
Human operator's feeling of com fort volume will limit the affordable level of volume. It can be controlled during the work-time, but obviously, it will be normally lower than the upper limit. The low end of volume will be limited by ambient noise level and worst hearing threshold case. The worst of the two cases will provide the main limitation threshold.
The equipment setting will also cause the spatialisation limitations. In the case of some communication peculiarities, when the sound generator is not in the same location with the sonification displays, we should also account the issues of media compression and transfer, i.e. generator - display link. In that situation, the big number of channels can demand lossy coding, which can introduce additional errors.
From the position of time, in the digital systems the theoretical minimum of time resolution is strictly defined by the sampling frequency, i.e. the sampling interval for 44100 Hz is 2,267-10° sec. Comparing to that, the resolution of our hearing system is much coarse, e.g. the zone of pitch perception starts from 200 jisec and goes up to 50 msec, the recognition of instrument timbre defined at -100 msec [14].
We proposed a systematized approach for defining the set of three sound space classes through the terms of the Multi-Domain Model of communications. The proposed set of sound space classes can be used at design stage of sonifieation systems and complexes. It provides a set of abstractions to describe the possibilities of system under development, and the sound objects within it.
We should also mention that the given approach could be formalized for the automated measuring systems, which can be used to increase the development of sonifieation measurements and adjustments. The well-known worst human hearing limitations ean be programmed as a set of weighting factors, so with the combination of I lead-related Transfer Function (11 RTF) set, all possible measures can be taken automatically for the given speaker settinii and room.
I. Kramer, G, (1994). An introduction to auditory display. In G. Kramer (Ed.), Auditory Display: Sonifieation. Audification, and Auditory Interfaces. Reading, MA: Addison Wesley, pp. 1-78.
2. Barrass, S. (1994/2005), A perceptual framework for the auditory display of scientific data. ACM Transactions on Applied Perception, 2(4), 389-402.
3. Edwoithy, J. (1998). Does sound help us to work better with machines? A commentary on Rautenberg's paper 'About the Importance of auditory alarms during the operation of a plant simulator'. Interacting with Computers, 10, pp. 401-409.
4. Worall, D, (2009). An Introduction to Data Sonifieation. The Oxford Handbook of Computer Music. Ed. Roger T. Dean. 624 p.
5. Zwicker, E, and Fasti, H, Psychoacoustics: Facts and Models. Springer-Ver lag, Berlin-Heidelberg, Germany, 1990.
6. Sotnikov, A.D. (2003). Classification and Models of Applied In-focommunication Systems, Proceedings of Higher Educational Establishments in Communications, no.169, pp. 149-162.
7. Sotnikov, A.D., Rogozinsky G.G. (2017). The Multi-Domain In-focommunication Model as the Basis of an Auditory Interfaces Development for Multimedia Informational Systems. T-Comm, 5(11), pp. 77-82.
8. Schaeffer, P. (2012). In Search of a Concrete Music. University of California Press, 165 p.
9. Xenakis, 1. (1992). Formalized Music. Thought and Mathematics in Composition. N.Y,
10. Stockhausen, K. (1989). Stockhausen on Music: Lectures and Interviews, edited by Robin Maconie. London and New York: Marion Boy ars.
11. Gibson, D. (2005). Tlie Ait of Mixing: A Visual Guide to Recording, Engineering and Production. Artistpro, Boston, MA. 344 p.
12. Dinov, V.G. (2007). Zvukovaya kartina. Zapiski o zvukorejis-sure. SPb, Getikon Plus. 488 pp. (in Russian)
13. Leek, M.R., Brown, M.E, Domian M.F. (1991), Infonnational Masking and Auditory Attention. Percept Pshycliophys, Sep. 50(3) 205-14.
14. Roads, C. (2004). Microsound. MIT Press, Boston, MA. 424 pp.
Conclusion
References
( I 'N
COMMUNICATIONS
ТРИ КЛАССА ЗВУКОВЫХ ПРОСТРАНСТВ ДЛЯ ПРОЕКТИРОВАНИЯ СИСТЕМ СОНИФИКАЦИИ
Рогозинский Глеб Гендрихович, НОЦ "Медиацентр", Санкт-Петербургский университет телекоммуникаций
им. проф. М.А. Бонч-Бруевича; Институт проблем транспорта им. Н.С. Соломенко Российской академии наук (ИПТ РАН),
Дннотация
Многочисленные методы отображения информации с помощью неречевых звуковых сигналов, известные под общим термином "сонификация", вызывают все больший интерес у разработчиков систем мониторинга. Звуковое представление различных данных широко применяется медиахудожниками и разработчиками экспериментальных систем мониторинга. Представлены три различных класса звуковых пространств, которые могут быть использования для универсального описания любых звуковых объектов из тезауруса систем сонификации. Звуковые пространства определены в терминах модифицированной мультидоменной модели.
Ключевые слова: сонификация, звуковое пространство, фонографическое пространство, аудиальные интерфейсы, мультидоменная модель.
Литература
1. Kramer, G. (1994). An introduction to auditory display. In G. Kramer (Ed.), Auditory Display: Sonification, Audification, and Auditory Interfaces. Reading, MA: Addison Wesley, pp. 1-78.
2. Barrass, S. (1994/2005). A perceptual framework for the auditory display of scientific data. ACM Transactions on Applied Perception, 2(4), 389-402.
3. Edworthy, J. (1998). Does sound help us to work better with machines? A commentary on Rautenberg's paper 'About the importance of auditory alarms during the operation of a plant simulator'. Interacting with Computers, 10, pp. 401-409.
4. Worall, D. (2009). An Introduction to Data Sonification. The Oxford Handbook of Computer Music. Ed. Roger T. Dean. 624 p.
5. Zwicker, E. and Fastl, H. (1990). Psychoacoustics: Facts and Models. Springer-Verlag, Berlin-Heidelberg, Germany.
6. Sotnikov, A.D. (2003). Classification and Models of Applied Infocommunication Systems, Proceedings of Higher Educational Establishments in Communications, no.169, pp. 149-162.
7. Sotnikov, A.D., Rogozinsky G.G. (2017). The Multi-Domain Infocommunication Model as the Basis of an Auditory Interfaces Development for Multimedia Informational Systems. T-Comm, 5(11), pp. 77-82.
8. Schaeffer, P. (2012). In Search of a Concrete Music. University of California Press, 165 p.
9. Xenakis, I. (1992). Formalized Music. Thought and Mathematics in Composition. N.Y.
10. Stockhausen, K. (1989). Stockhausen on Music: Lectures and Interviews, edited by Robin Maconie. London and New York: Marion Boyars.
11. Gibson, D. (2005). The Art of Mixing: A Visual Guide to Recording, Engineering and Production. Artistpro, Boston, MA. 344 p.
12. Dinov V.G. (2007). Sound picture. Notes on sound engineering. SPb, Helikon Plus. 488 р.
13. Leek, M.R., Brown, M.E, Dorman M.F. (1991). Informational Masking and Auditory Attention. Percept Pshychophys, Sep. 50(3) 205-14.
14. Roads, C. (2004). Microsound. MIT Press, Boston, MA. 424 p. Информация об авторе:
Глеб Гендрихович Рогозинский, к.т.н., зам. начальника НОЦ "Медиацентр", Санкт-Петербургский университет телекоммуникаций им. проф. М.А. Бонч-Бруевича; Ведущий научный сотрудник, Институт проблем транспорта им. Н.С. Соломенко Российской академии наук (ИПТ РАН)
7ТТ