3D Sound Coding Color for the Visually Impaired
Round 1
Reviewer 1 Report
On line 30, you say 1 billion are sight impaired. This is a BIG overstatement. How many of those 1 billion actually cannot see a painting? I have sight impairment (wear glasses) but can definitely see a painting. You need to move this to a more correct number: how many people cannot adequately see a painting shown to them.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Dear Authors.
I have read your article, trying to understand your idea, methodology and results. I am also (just like you) interested in image understanding, I am working with binaural audio, I do research in beamforming, I was interested in synesthesia (my wife sings in a choir and she always converts her music scores into colouring books, painting notes with colours “because they become easier”). So I started reading your paper as an optimist.
Unfortunately your paper has some problems/challenges that need to be solved before publishing. I will try to do my best to show the issues that I have found, so that you would have the best possible position to improve your paper and have it published.
- Merit-based issues
- Language-implied issues
- Merit-based issues
- Why do you publish in “Electronics” journal? Your research is audio/video-oriented, and software-implemented, and museum/impaired-targeted …so why Electronics? You do not even say about any electronics/EmbeddedSystem in your paper.
- The paper title is bad. //is very bad. I know it should be in the “English-related” part of review, but I struggled for over 2 minutes to decipher the title and I got it wrong. The title is the most important “marketing part” of a paper. I have to understand it immediately and to be attracted immediately, to stop “scrolling” and to click it, to read it. You can not mess up the title.
(The very first sentence of the chapter 5 sounds MUCH better than your title, and it is much easier to understand) - I personally think, that 3D/binaural hearing is primarily intended for (and related to) direction and dimensions. I can understand the idea of rewiring our brains for other information meaning, but I think that hearing something “from left to right” induces some primal instincts, some awareness of a predator sneaking “from left to right” and makes me look left, to see the source of sound/speech. Your paper SHOULD convince me that I am wrong, that 3D can be a useful encoder, but your paper did not convince me. I have found a lot of numbers, five people giving good answers, and 32 pages, but I am still not convinced that your idea is good.
- Table 7 is very big. It has numbers, which are unclear. The numbers would be clear if I would read the previous paragraph carefully, but I didn’t. The numbers should be explained NEAR the table.
- Figure 4 has a header “distance correlations” but the values are greater than 1 ! How is that possible???
- Figure 5 is too wide. If it was in 3 rows (instead of 2) it would fit between margins.
- Figure 6a+6b is on another page that 6c+6d – you should not do that
- Figure 6a is not clear to me. I do not know its meaning and role.
- Figure 6b+6c is not clear to me. I do not know its meaning and role. Why do we see those pictures? Because they were in prototype? What does that tell me? Why is it important for me (/for a reader)?
- Figure 6d is not clear to me. I do not know its meaning and role.
- Chapter4, experiments1-3, were ONLY audio-based, right? The people did not see the images, and did not presume/guess the images, right? (This was not clarified in text)
- Experiments : that is very sad that you have only 5 people in each group. 50 people or 500 people would be better. “5 people” do not sound scientific/professional.
- The text-based part of the UX analysis (page 25 and 27) seem lengthy and boring. Opinion of every and each participant should be important to you, BUT WE (the readers) should have the luxury of reading the preprocessed observations and the summarized conclusions (not the raw input). Imagine THIS chapter when you have 500 participants…
- Language-implied issues
- [001] seriously? You should know this before writing ;)
- [002] Title is bad. 1. “3D Sound Coding Color” or “3D Sound Color Coding”? Is it “color of coding” or “coding of color”? 2. “for Visual Arts Appreciation” really? For appreciation? Why appreciation? For other not? If I see a painting but I do not like it, thus I do not appreciate it, than I can not use your method? 3. “of Person” oh, wait, it is “Appreciation of Person”, is it? 4. “Visually” is an adverb. Adverbs do not glue with nouns. Thus, “Visually Impairment” does not have a meaning.
- [002] Really. The sentence in [546] is much better. “3D Sound Color Coding using HRTF”. Or remixed with current title: “3D Sound Color Coding for Visually Impaired”. Or “3D Sound Color Coding for Visual Arts Descripting for Visually Impaired”.
- [006] This email seems invalid (are these two addresses? (The link is joined)
- [007] Why “Sungkyunkwan” and not “SungkyunKwan” like the others?
- [013] “direction of appreciation” ??
- [019] “on THE mobile platform” – a particular one?
- [019] “3D audio description on the mobile platform” – why are you limiting your achievement here? Isn’t it a universal idea? Or only for mobile platforms? (Sorry: only for ONE particular (THE) mobile platform?)
- [019] “We” – 1st person is not unavoidable here.
- [019] “device” – I would argue, that your “product” is NOT a device. It might be idea, it might be software, but no hardware :(
- [021]=[019]
- [022] “3D sound effects” has a broader meaning than you need. It means also binaural/5.1 audio jingles for movies, and you do NOT need this meaning. So maybe it would be better to find a better name here
- [022] “leads to represent” – shouldn’t the “leads” word+form result in a gerund
“-ing” form of the verb? - [024] “colors including (…) colors and (…) colors” – there is a better way of talking about features of something
- [024] “a correlation (…) was found” – really? A correlation? The paper doesn’t present “correlation” – why?
- [024] “distance” – I do not remember reading about distance
- [033] “have been increased the accessibility” – please give the text to a native speaker before submitting
- [040] “Although many studies have been conducted on the cross sensation between the sight and other senses, there are not many studies on the cross sensation between the non-visual senses” – “visual+other=many ; nonvisual=rarely” – don’t you want to do the fist one? So what’s the point of this statement?
- [052] “allowing users to interact (…) together” – it is unclear if they interact together or the senses are “used together” (but what would that mean? Smelling a speaker?)
- [050]..[054] – you have three sentences. One about “their map”, one about “a sliding gesture”, and one about “San Diego Museum”. I have completely no idea what are you talking about. Where is the main thread/topic of the statement/paragraph/chapter? Is the sliding gesture “inside” the map? The “San Diego” should be a new thought/paragraph? Or does it have something to do with previous sentences?
- [056] “you” – an unofficial reference to the reader of the paper. This should be avoided. The reader should be treated with respect (maybe he is not willing to touch, so than the “if you touch” would result in returning “false” (“no, I won’t touch”) – the sentence can (and should) be redesigned not to refer to the reader (by a “you”)
- [053] “The San Diego Museum of Art Talking Tactile Exhibit Panel” – what a long name. If those are two names (1=museum, 2=exhibit) than the distinction is not clear (you have to read the sentence twice to understand this (and this is never good if a reader needs to read sth twice)
- [060] “scent-operated device” – this means that: “a human operates à this device” by use of a scent / by presenting a scent to a device (why would someone do that?)
- [061] “Scent actuators that trigger mobile notifications by touch screen input or incoming text message” – I do not understand in this sentence: which one is the action and which is the reaction. 1. Scent actuators :trigger: notifications (so you mean sensors? Not actuators? Actuators are output not input) –or/and– 2. Text message is the result? Or the input? //// did you mean “Scent actuators triggered by mobile notifications”?
- [073] “binaural (…), which expresses the direction of the sound” – to express is to come with a meaning. So you say that binaural extends the reality by introducing the sound direction (which is true), WHILE the whole idea of your conception is to TEKE AWAY the information of the direction and completely replace it with encoding of colour. So finally the user will have ZERO knowledge of the direction of sound, because the “new direction” will represent colour not direction… So… the sentence in [073] is NO LONGER TRUE (or is it?)
- [076]= [021]= [019]
- [078] “synergizing” – really? You DO think that color and direction are “in nature” correlated? Instead you should admit that you will try to replace/re-encode the direction for an artificial meaning (of colour)
- [083] “ecologically critical function” – I do not see that being a part of the ecology domain
- [087] inconsistent timeline: “In (19) the sound is (…) and they felt (…)” – I would propose to stick to the past simple or use sth like “proved to be useful” if it is so necessary to underline the ongoing influence of (19)
- [090] “the three elements of color, that is, color hue, value, and chroma” – it sounds like 5 elements: “1.color”, “2.thatis”, “3.colorhue”, … You have too many commas and the beginning of the iterated list is unnatural. Instead indicate the beginning of the list with “:” : “the three elements of color, namely: color hue, value, and chroma”
- [092] “while getting the phonetic description of the things” – I really thought that by “things” you refer to the features (namely: color) that you want to replace :-) You should indicate more directly that you say about …“things”… (like: artworks)?
- [093] why “Related” is uppercase and/while “works” is lowercase?
- [095] “Review of tactile and sound approaches to coding color” ? / “Review of tactile and sound color coding approaches” ?
- [118] “and level 5 represents the pure hue” – hue can not be pure
(or un-pure/dirty.) hue is just a digitized wavelength, and represents colour (always at maximum saturation). What/which colour? –it depends on the value of the hue. But hue does not represent “more red” or “less red” – those alterations are made by saturation and luminance (not hue) - [140] I don’t recall PVI acronym to be explained beforehand
- [143]= [076]= [021]= [019]
- [144] HRTF not explained before
- [147] I do not understand what does the “and ours” mean
- [147] “ours” is unfortunately also in 1st person //=[019]
- [148] “for each hues” > “for each hue”/”for each of the hues”
- [148] “3 levels of achromatic)” < ?? //achromatic is an adjective, I would support a noun/subject here
- [148] “The first eight colours are divided into 20o angles” < colours? angles? how do you “divide colour into angles”? //did you mean some kind of a representation?
- [148] “colours” – in the other parts of the paper you use the “colors” form, please be consistent
- [148] “3 levels of achromatic)” //=[148]
- [148] “Our:” //=[147]
- [151] “the sound effects of the head” – this reminds me all the bad movies with “movie sound effects” related to various head movements or “uses”, e.g. cartoon-style “bang” sound of someone’s head hitting something –– I honestly do not know what is this sentence about, although I know HRTF more or less
- [182]=[180]=[148]
- [246]= [143]= [076]= [021]= [019]
- [250]= [246]= [143]= [076]= [021]= [019]
- [254] “and not accurate as” > “and not as accurate as”
- [266] “expressed by changing the sound of the pitch” – a pitch doesn’t have a sound. It is a sound that has a pitch. (( > “expressed by changing the pitch(…)”))
- [270] “The strategy complied with the definition” – why the past tense? It compliED? And now it does not comply?
- [270] “with the definition of light and dark colors” – what definitions? I do not see any definitions nor references to definitions
- [267] “rising 3 chromatic scales” – really !?!?!? a chromatic scale (in sound) is 12 semitones (keys) up, so you change pitch of the sound by 36 keys !? Why is it possible that I do not hear such a HUGE difference in the files provided?
- [274] “decreasing 3 chromatic scales” = [267]
- [272] “do not influence” – I totally disagree, human speech alternating by 6 octaves makes SO HUGE disruption that I (personally/subjectively) would not be able to focus on the text
- …
- [306] please check the “ ~ “ parts of the texts in the table, 5 of 11 are written carelessly
- [306] please try to fit this HUGE table on one page, it doesn’t deserve two pages :/
- [318]= [250]= [246]= [143]= [076]= [021]= [019]
- [312] “audios” – do you need the plural here? Audio is a communication form. It is not a chunk of data/information, but rather a communication channel. Why do you want to use it in plural here? Instead, write WHAT (if not “audios”) did the participants use. Audio files? Audio messages? Sound-encoded colours?
- [333]=[306]
- [336] “correlations” :/ where?
- [336] “Longth” (?)
- [336] “Sound valuables” – valuables? Like gold and diamonds?
- [340]= [318]= [250]= [246]= [143]= [076]= [021]= [019]
- [350] “We used the Android mobile application to represent.” – o.k., but what? Is this even a sentence?
- [351] “process of creating a 3D Sound mobile application for artwork.” – really? The “artwork” is the purpose? (I thought that the colour encoding)
- [353] “Production process” – this is not a “production process”. neither production, nor software production.
- [359]=[087] //”is”@355,”was”@359,”is”@360
- [365] “(…) step is to create a mobile application using Android Studio software. The basic production method is (…)” – you REALLY should not say so much about “production” (or about “process”) this word has a strong meaning, especially in software development and information systems design. It refers to software development workflow, you can read a little bit here: https://en.wikipedia.org/wiki/Kanban_(development) // I am sure you wanted to say something else, but I do not know what :(
- [366] “add (…) software” – what? How do you “add software” do you mean “install application”?
- [366] “add processed audio software” – I have no idea what “processed audio software” is. Did you mean “audio processing software”? but why would you want to install and what?
- [367] “used to create” == “their habit was to create” – did you mean “The artworks used during the prototype development” (?)
- [384]=[359]=[087] //”was”@383,”is(learn)”@384,”will”@386,”were”@387
- [389] “experimental participants” – LOL, so they are not real participants, not even beta-participants but they are “experimental” ? // did you mean “experiment participants”?
- [398]=[389]
- [403]=[398]=[389]
- [411] what is S3 ? //it is introduced AFTER this line
- [429]= [340]= [318]= [250]= [246]= [143]= [076]= [021]= [019]
- [460] ––– what is the difference in relation to ––– [456] ?
- …
- // the conclusion section is well-written :)
Personally, when I get a review, I like the reviews which help me to improve my writing style.
I hope that my review will help you improve your papers.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Concern about the work presented in this manuscript is the fact that
although the application is meant as an assistive device for people with
blindness, the usability testing was conducted with people with no
visual impairments. Consequently, the experiment results are not
representative of the target end-user group. Since the experiment cannot
change at this point, it is suggested that the authors provide a
justification for their decision to not involve this group of users and
to address this as a limitation of their study.
Regarding suggestions for improving this research work in the future it
should be noted that the visual perception of Art is not just bound to
distance and colour but in a collection of different tools that the
artists are using to generate visual stimuli. These for example are
colour hue, colour value, texture, placement, size, contrast changes, cool
vs warm colours etc.
A better understanding of how these tools affect the visual perception of
art may in the future allow the authors to implement experiments that
employ new visual features that may help to achieve enhanced "visual
understanding" through sound.
Furthermore, it would be interesting to check whether the perception of
colour by people with synesthesia may affect the transition of colour
information to people with visual impairments.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf