Next Article in Journal
Navigating Ambiguity: Scope Interpretations in Spanish/English Heritage Bilinguals
Previous Article in Journal
A Web Corpus Analysis of the Italian Grazie Di/Per Alternation
Previous Article in Special Issue
Articulatory Control by Gestural Coupling and Syllable Pulses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese

by
Marta Ortega-Llebaria
1,2,* and
Jun Nagao
3
1
Linguistics Department, University of Pittsburgh, Pittsburgh, PA 15260, USA
2
Learning Research & Development Center, University of Pittsburgh, Pittsburgh, PA 15260, USA
3
Faculty of Humanities, Gifu Shotoku Gakuen University, Gifu 501-6194, Japan
*
Author to whom correspondence should be addressed.
Languages 2025, 10(9), 242; https://doi.org/10.3390/languages10090242
Submission received: 15 June 2025 / Revised: 11 September 2025 / Accepted: 15 September 2025 / Published: 20 September 2025
(This article belongs to the Special Issue Research on Articulation and Prosodic Structure)

Abstract

This production study examines how Japanese speakers mark information structure through an Edge-Reinforcing Strategy—a prosodic system that signals focus via boundary-based cues, independently of lexical pitch accent or phrasing constraints. While many Japanese dialects mark focus with F0 expansion and post-focal compression, such strategies are limited in utterances containing unaccented words and in systems without lexical accent or multiword Accentual Phrases. We hypothesize that when pitch cues are constrained, speakers rely on temporal and spectral cues aligned with prosodic edges, such as silence insertion, jaw opening, and duration asymmetry. Nine educated speakers of Japanese standard produced 48 genitive noun-phrases (e.g., umáno hizume ‘horse’s hoof’) under Broad and Narrow Focus. Acoustic measures included word duration, and F1-based estimates of jaw opening and silence insertions. Results showed that silence and duration were the strongest predictors of Narrow Focus, functioning additively and independently of pitch accent. F1-based measurements of jaw opening played a secondary, compensatory role, particularly in unaccented contexts. Cue-profile analysis revealed a functional hierarchy: silence and duration together were most effective, while jaw alone was less informative. These findings broaden current models of focus realization, showing that prosodic restructuring can emerge from gradient, edge-based cue integration.

1. Introduction

1.1. The Problem: Limited Functionality of the F0 Expansion-Compression Strategy in Unaccented Contexts and Across Japanese Dialects

Japanese dialects share an agglutinative morphological structure and a basic Subject–Object–Verb (SOV) word order. Grammatical relations are marked by postpositional case particles—such as ga (nominative), o (accusative), and wa (topic)—which enable flexible constituent order and frequent omission of contextually inferable elements. Prosodically, all Japanese dialects segment speech into Accentual Phrases (APs) and Intonational Phrases (IP), defined by dialect-specific boundary tones (e.g., %L-H at the left edge of AP in Tokyo Japanese, see Figure 1) and supported by cues like pauses, final lengthening, and pitch reset (e.g., Pierrehumbert & Beckman, 1986, 1988; Maeda & Venditti, 1998; Venditti, 2000; Igarashi, 2012, 2014).
However, Japanese dialects differ in their prosodic phrasing and use of pitch accent. Igarashi (2012, 2014) classifies them along two dimensions: whether Accentual Phrases (APs) can contain more than one word ([±multiword AP]) and whether the lexicon encodes pitch-based contrasts ([±lexical tone]). [+multiword AP] dialects sometimes show a preference for binary grouping, avoiding single-word APs (Kubozono, 1993; Shinya et al., 2004; Ishihara, 2023b). In [+lexical tone] dialects, pitch accent marks lexical distinctions (e.g., háshi ‘chopsticks’ vs. hashi ‘edge’), usually with a falling HL contour (Figure 2). Within this typology, Tokyo and Fukuoka are [+lexical tone, +multiword AP]; Osaka, Kyoto, and Kagoshima are [+lexical tone, −multiword AP]; Koriyama is [−lexical tone, +multiword AP]; and Kobayashi is [−lexical tone, −multiword AP].
Igarashi’s typology of Japanese dialects, which aligns with Jun’s (2014, 2025) distinction between Head- and Edge-prominence languages, offers a framework for understanding how different dialects—and, by extension, language types—encode meanings such as Narrow and Broad Focus through prosody. In Head-prominence languages like English, and in Japanese dialects that are [+lexical accent] (e.g., Tokyo and Kyoto), focus is often marked by expanding the pitch range of the prominent element (stress or pitch accent) and compressing or deaccenting post-focal material—a strategy known as post-focal compression (PFC). This strategy is fully productive in English because every content word has a stressed syllable that serves as a locus for sentence-level pitch accents. In Japanese [+lexical accent] dialects, however, its productivity is limited to cases where the focused word is lexically accented (A-word).
Empirical studies (Lee et al., 2022; Mizuguchi & Tateishi, 2020, 2023) confirm that this contrastive F0 enhancement and PFC strategy is unreliable for unaccented words (U-words) and yields weak perceptual effects. For instance, Lee et al. (2022) report that in [+lexical accent] dialects, accented words under focus showed an average F0 range increase of ~50 Hz. In contrast, unaccented words exhibited less than 10 Hz expansion. In perception tasks, listeners identified focused A-words with 63.1% accuracy—above chance—but performed at chance level (51.7%) for U-words. Mizuguchi and Tateishi (2023) further argue that Broad Focus may not be systematically encoded prosodically via F0 in Japanese. These findings suggest that the pitch-based contrastive strategy is lexically constrained to accented words. Consequently, it should not be treated as a general focus-marking mechanism across utterances in [+lexical accent] dialects as it is not functional in unaccented words.
Moreover, [−lexical accent] Japanese dialects as well as Edge-prominence languages—such as French and Seoul Korean—cannot rely on local F0 modulation because they lack lexical stress or pitch accent. Instead, focus is marked by restructuring prosodic domains like the Accentual Phrase (AP) and Intonational Phrase (IP), typically through the insertion of a boundary before the focused element and deletion of boundaries afterward—a process known as dephrasing. Dephrasing is used in the Koriyama dialect ([−lexical accent], [+multiword AP]), where post-focal APs are frequently dephrased as in Figure 3d. When word 2 receives Narrow Focus in Figure 3d, its AP merges with the following word, forming a single prosodic unit, and producing an effect functionally like post-focal deaccenting in English. In Broad Focus (Figure 3b), each word has its own AP.
However, in dialects like Kobayashi ([−lexical accent], [−multiword AP]), neither pitch-based modulation nor AP restructuring is available. Kobayashi assigns a fixed pitch pattern to each AP, restricts APs to single-word domains, and lacks lexical pitch accents (Figure 3a). These constraints prevent both pitch expansion and the deletion of post-focal boundaries. Consequently, the marking of Narrow Focus in Kobayashi is limited to modulating the pitch range of APs’ boundary tones and through declination (see Figure 3b). These cases illustrate how prosodic typology constrains the available strategies for marking focus.
This raises the broader question of whether focus-marking mechanisms can operate independently of both lexical pitch accent and AP structure. Scholars such as Igarashi (2012), Ishihara (2023a, 2023b), Mizuguchi and Tateishi (2023), and Mizuguchi and Tateishi (2025) have proposed that edge-aligned cues—including Prominence Lending Boundary Tones, silent pauses, and phrase-initial metrical strengthening—may constitute cross-dialectal resources for marking focus. Such cues could function in dialects that lack lexical accent or restrict AP-level phrasing, suggesting that even when neither deaccenting nor dephrasing is possible, speakers can still signal focus through edge-based modulation. This points to a broader, typologically flexible system of prosodic focus marking grounded in phrase-boundary cues.

1.2. The Edge-Reinforcing Strategy as a Cross-Dialectal Mechanism

In this paper, we extend the proposal that focus can be marked independently of lexical pitch accent and Accentual Phrase (AP) structure, through a broader system of edge-based cues that may apply across languages. We refer to this as the Edge-Reinforcing Strategy, which proposes that prosodic edges can be strengthened by articulatory and temporal adjustments—such as pauses, jaw opening, or lengthening—that disrupt the fluent flow of speech and create perceptual “chunks.” When gestural reinforcement is weak, listeners perceive a continuous sequence of APs; when it is strong, the speech stream is segmented, and APs may be reanalyzed as separate Intonational Phrases (IPs), as illustrated in Figure 1.
Here, we test the production side of this hypothesis in Standard Japanese, a [+lexical accent, +multiword AP] variety. This choice may seem conservative, since edge-based strategies are expected to be more prominent in dialects such as Koriyama (−lexical accent, +multiword AP) or Kobayashi (−lexical accent, −multiword AP), which lack reliable pitch-based cues and multiword phrasing. However, Standard Japanese also includes unaccented words, where traditional pitch-based focus cues are weak (Lee et al., 2022; Mizuguchi & Tateishi, 2020, 2023). If edge-reinforcing cues are recruited under these conditions, it strengthens the claim that this strategy operates broadly, and not only in dialects with prosodically “edge-heavy” structures.
The idea of the Edge-Reinforcing Strategy is grounded in two sources. First, it emerges from native speaker intuitions: in the debriefing after data collection in a previous study (Nagao & Ortega-Llebaria, 2024), participants consistently described producing the genitive constructions like umáno hizume as a single unit under Broad Focus, but as two distinct chunks under Narrow Focus. Yet, acoustic analysis did not consistently reveal F0-based dephrasing in Narrow Focus but not in Broad Focus contexts; the expected downstep patterns illustrated in Figure 1b were often ambiguous between Broad Focus and Narrow Focus in Word 2, raising questions about what additional cues might underlie the natives’ speakers’ intuitions. Second, the proposal draws on prior literature identifying edge-aligned cues in Japanese dialects—particularly silence insertion, jaw opening, and word duration—as potential mechanisms for signaling prosodic restructuring at domain boundaries.
Among the cues proposed for the Edge-Reinforcing Strategy, silence insertion emerges as a particularly salient and structurally independent marker of prosodic boundaries. Evidence from the [+lexical accent] dialects of Tokyo, Osaka, and the [−lexical accent] Northern Kanto Japanese (Mizuguchi & Tateishi, 2025) indicates that silence functions as a perceptually robust boundary cue that can override syntactic constituency in both [+lexical accent] and [−lexical accent] dialects, reflecting a bottom-up segmentation strategy driven by acoustic salience rather than grammatical structure. In their Experiment 2, both Japanese and English listeners evaluated utterances from the Buckeye Corpus of American English. While both groups consistently perceived boundaries at major syntactic junctures, only Japanese listeners identified boundaries at minor syntactic breaks when these were accompanied by silent intervals. This asymmetry suggests that Japanese listeners are particularly attuned to silence as a prosodic segmentation cue, even in the absence of strong syntactic bracketing, underscoring the role of silence in acoustically driven prosodic processing.
Further evidence of silence’s functional flexibility comes from Ishihara (2023a), who reports that pauses in Tokyo Japanese—described as [+lexical accent] and [+multiword AP])—frequently align with discourse-level structures—such as contrastive focus and topic-comment boundaries—rather than with syntactic junctures. Similarly, Ishihara (2023b) observes that pause placement is often sensitive to the informational prominence of constituents, suggesting that silence can reflect communicative intentions even when F0 cues are ambiguous or absent. Similarly, Nagao and Ortega-Llebaria (2024), in a production study of genitive constructions in Standard Japanese, found that post-nominal silent intervals occurred more frequently under Narrow Focus than Broad Focus, despite identical underlying syntactic configurations. Collectively, these findings suggest that silence insertion operates across dialects and structural levels as a perceptually salient, discourse-sensitive cue. The present study builds on this evidence by testing whether silence at genitive NP boundaries systematically encode information-structural distinctions, contributing to a broader prosodic mechanism of edge reinforcement in Japanese.
Jaw opening is a second candidate cue for the Edge-Reinforcing Strategy in Japanese prosody. Cross-linguistic research by Donna Erickson and colleagues (e.g., Erickson et al., 2014, 2024; Erickson & Kawahara, 2016; Kawahara et al., 2017; Erickson & Niebuhr, 2023) shows that jaw displacement is not merely a biomechanical consequence of vowel articulation, but a prosodic gesture whose function varies typologically. Specifically, jaw opening serves two primary roles: it reinforces lexical heads in stress-based “Head languages” like English and Brazilian Portuguese, and prosodic edges in “Edge languages” like Japanese and French. In their 2014 and 2016 studies, Erickson and colleagues compared Japanese and English speakers producing sentences with similar vowel content and found a striking functional divergence: English speakers consistently opened the jaw at stressed syllables, with little boundary-related movement, while Japanese speakers exhibited jaw opening at the ends of Intonational Phrases (IPs), regardless of pitch accent alignment. This supports the interpretation that jaw opening in Japanese is not a correlate of lexical prominence, but a boundary marker aligned with larger prosodic domains.
Erickson and Niebuhr (2023) extended these findings to other Edge languages, including French and Mandarin, where jaw opening similarly aligns with the right edge of prosodic constituents such as APs and IPs. Conversely, in Brazilian Portuguese—like English—a rapid jaw opening on the stressed syllable followed by quick closing in the post-tonic syllable reflects its stress-based prosodic function (Erickson et al., 2024). These typological patterns confirm that jaw opening acts as a gradient articulatory marker shaped by the prominence system of the language: accentual head vs. prosodic edge.
In Japanese, this consistent alignment of jaw opening with the right edge of large prosodic constituents—rather than with pitch-accented syllables—makes it an excellent candidate for the Edge Reinforcing Strategy. While Erickson’s work focuses on IP-boundaries from either sentences or words produced in isolation, our study extends this line of inquiry by testing whether jaw opening also marks smaller syntactic units such as genitive NPs and contributes to encoding information structure in addition to syntax.
Moreover, building on the adjacency of jaw opening and silence at the right edge of an NP, we explore whether these two cues may function as part of a shared articulatory mechanism to reinforce domain edges. Given that jaw opening typically occurs in the final vowel just before a post-boundary silence, we ask whether this co-occurrence reflects a coordinated gesture for boundary marking—and whether jaw opening is enhanced by the presence of silence. If so, jaw opening—like silence—may play a key role in prosodic restructuring, especially when F0-based cues are weak or unavailable.
Duration constitutes a third candidate cue for the Edge Reinforcing Strategy in Japanese prosody, although unlike silence and jaw opening—which occur at boundaries—duration operates as a distributed cue across constituents. Japanese, as a mora-timed language, uses vowel length contrastively at the lexical level (Han, 1962; Sano & Guillemot, 2025), and duration also plays a key role in demarcating prosodic domains through final lengthening at the Intonational Phrase (IP) level (Venditti, 2000, 2005; Seo et al., 2019). While such durational patterns have been traditionally attributed to rhythmic constraints, recent work by Sano and Guillemot (2025) shows that duration is dynamically modulated by communicative pressures: speakers lengthen less predictable words in shorter phrases, but simultaneously preserve phonemic contrasts, aligning duration with information structure. These findings support a message-oriented view of prosody, where duration enhances both lexical access and boundary perception depending on informational load.
Building on this functional view of duration, Nagao and Ortega-Llebaria (2024) examined genitive NP constructions (e.g., umáno hizume) and found that the ratio of word durations systematically cued focus structure: similar word durations predicted Broad Focus, whereas asymmetrical duration ratios marked Narrow Focus being word 1 longer than word 2 when it is in focus. While such duration ratios span the entire genitive NP and are not localized at the boundary per se, they may interact with other edge cues to guide listeners’ perception of restructuring. Therefore, we propose that duration contributes indirectly to edge marking by enhancing the salience of edge cues such as silence and jaw opening, especially under conditions where F0 cues are weak or unavailable.
Together, the evidence on silence, jaw opening, and duration suggests that Japanese speakers rely on a constellation of acoustic and articulatory cues to reinforce prosodic edges and restructure constituent domains. This leads us to formulate the Edge Reinforcing Strategy as a cross-dialectal, cue-based mechanism for encoding discourse meanings like focus. We define it as the articulatory and temporal modulation of prosodic boundaries, disrupting the fluent gestural flow of speech and prompting listeners to perceive chunked, prosodically distinct units, which we explore in the following section through four empirical research questions.

1.3. Research Questions

To investigate this strategy, we focus on production data, examining how focus is realized in genitive NP constructions (e.g., umano hizume) in educated Standard Japanese, a [+lexical accent, +multiword AP] variety (see Section 2.1). Perception will be addressed in future work. Under Broad Focus, we expect speakers to group the phrase as a single prosodic unit, consistent with the principle of Rhythmic Binarity. Narrow Focus, by contrast, should disrupt this phrasing, with the right edge of the first noun reinforced to signal a boundary between the two words. This restructuring is predicted to rely on silence, jaw opening, and duration—working together or in coordination—in all [−lexical accent] dialects, and in unaccented contexts even within [+lexical accent] systems. Standard Japanese therefore serves as a conservative test case: if edge-reinforcing cues are found here, they should be even stronger in dialects like Koriyama or Kobayashi, which lack both lexical accent and multiword phrasing. To test these predictions, we ask four research questions that examine the presence, interaction, and relative strength of these cues within the Edge-Reinforcing framework.
RQ1. Cues: Do jaw opening, silence insertion, and duration asymmetry serve as boundary cues that signal Narrow Focus—especially when the NP-initial word is unaccented—more reliably than they do Broad Focus?
This question targets the core claim of the Edge-Reinforcing Strategy: that Narrow Focus is cued by boundary signals other than pitch. We test whether silence, jaw opening, and duration asymmetry reliably distinguish Narrow from Broad Focus, particularly in unaccented words, where traditional F0 cues are known to be weak (Mizuguchi & Tateishi, 2023; Lee et al., 2022). In this way, these cues are expected to act as alternative focus-marking devices, strengthening boundary perception in prosodically constrained contexts.
RQ2. Mechanism: Do jaw opening and silence co-activate as part of a shared articulatory mechanism for boundary marking, particularly under unaccented Narrow Focus?
Here we ask whether jaw opening and silence function as a coordinated gesture. Because jaw opening peaks on the final vowel of the focused word and silence follows immediately after, their co-occurrence may reflect a single articulatory mechanism for reinforcing prosodic edges. If so, this would suggest that edge reinforcement arises from gesture coordination, especially in unaccented contexts where pitch alone cannot mark focus.
RQ3. Functional Integration: Do combinations of cues (jaw opening, silence, and duration) more effectively predict focus status than individual cues? Are their effects additive, compensatory, or accent-dependent?
This question considers how the three cues work together. While silence and jaw opening are localized at boundaries, duration spans the whole NP. In Narrow Focus, durational asymmetry emerges as the focused word lengthens relative to its partner, while Broad Focus shows more balanced timing (Nagao & Ortega-Llebaria, 2024). RQ3 therefore tests whether the cues act synergistically, whether some substitute when others are weak, and how their contribution depends on lexical accent.
RQ4. Cue Hierarchy: Is the Edge-Reinforcing Strategy organized hierarchically, with some cues compensating for others in the absence of pitch-based accent? Or do cue combinations adapt flexibly depending on contextual demands?
Finally, we ask about the overall organization of the system. One possibility is that silence, jaw opening, and duration form a ranked hierarchy, with certain cues consistently stronger than others. Another is that cue combinations shift flexibly depending on accent and context. This addresses whether the Edge-Reinforcing Strategy functions as a stable system with predictable rankings, or as an adaptive one that adjusts when pitch cues are unavailable.

2. Methodology

2.1. Participants

Ten native speakers of Japanese (3 females, 7 males; mean age: 46 years) participated in the study. Unfortunately, recordings from one participant had to be dropped due to technical problems that impacted the quality of the recording. All participants held university degrees and reported using the educated Japanese standard in their daily communication. Here, educated Japanese standard refers to the [+lexical accent], [+multiword AP], Tokyo-based variety taught in formal education since the Meiji period which is prevalent in national media and widely recognized as the spoken norm across Japan (Igarashi, 2018). Informed consent was obtained from all subjects involved in the study.
Participants were originally from diverse regions of Japan: Miyagi (P1), Hyogo (P2), Nagano (P3), Tochigi (P4), Saitama (P5), Aichi (P6), Gifu (P7), Gifu (P8), and Saitama (P9). While regional origin could potentially influence prosodic features, the task was conducted in a formal, interview-like setting that encouraged participants to use the educated standard. Prior research confirms that this context typically suppresses dialectal variation (e.g., Tanaka et al., 2016; Murakami, 2008). Additionally, as judged by the native speaker author, each participant consistently produced expected accent patterns, including the distinction between accented and unaccented words and canonical pitch accent locations. These patterns confirmed their use of educated Japanese standard, a [+lexical accent] [+multiword AP] dialect and ensured comparability across speakers.

2.2. Materials

The stimuli consisted of 48 two-word genitive noun phrases (e.g., umáno hizume ‘horse’s hoof’, hanáno kubiwa ‘flower’s necklace’), each produced under both Broad Focus and Narrow Focus conditions. In the Narrow Focus condition, emphasis was placed either on the first or second noun (see Table 1).
To ensure balanced prosodic coverage, the stimuli systematically varied in lexical pitch accent combinations across the two nouns. Lexical pitch-accent was consistently placed in the second mora. The following accent patterns were included:
  • Accented-unaccented words (e.g., tsunóno kubiwa, ‘horn’s collar’),
  • Unaccented-accented words (e.g., torano kawá, ‘tiger’s skin’),
  • Accented-accented words (e.g., umáno mimí, ‘horse’s ear’),
  • Unaccented-unaccented words (e.g., sameno kubiwa, ‘shark’s collar’).
Since the 48 sentences were produced with both intonations (broad and narrow focus) by nine participants, we collected a total of 864 noun phrases.

2.3. Test Administration

All recordings were conducted in a quiet, dedicated room to ensure high-quality audio capture. The second author, a native speaker of Japanese from Sendai and fluent in the educated Japanese standard, served as the interviewer and experimenter. His familiarity with the standard variety helped maintain consistency in pronunciation and prosodic expectations across participants.
Participants were informed that the recordings might be used for pronunciation training purposes, which encouraged them to speak with careful diction and clarity. This aligns with prior findings that such framing prompts speakers to adopt a clear speech register (see Smiljanić & Bradlow, 2005).
The task consisted of two parts:
  • Broad Focus Condition: Participants were first asked to read aloud a printed list of the 48 noun phrases without special emphasis, producing them with broad, neutral prosody.
  • Narrow Focus Condition: In a second round, the experimenter elicited contrastive Narrow Focus through a correction task. For each phrase, the experimenter presented an incorrect version (e.g., “umáno hizume desu ka?”—Is it a horse’s hoof?) while pointing to a mismatched phrase (e.g., a cow’s hoof). The participant then responded with a correction (e.g., “USHIno hizume desu”—It’s a COW’s hoof), thereby emphasizing either the first or second noun depending on the correction target. This method ensured natural production of prosodic focus without metalinguistic instruction.

2.4. Labeling and Measurements

Labeling was executed in several steps. First, a forced aligner (WebMAUS, Poerner & Schiel, 2018) produced the initial TextGrids which included word, syllable, and phoneme labeling, and automated detection of silences between word 1 and word 2 at the default settings. Then, those TextGrids were modified by a Praat script that kept only those labels relevant to our study (Figure 4a,b). Finally, these automated labels were manually adjusted when necessary. For example, pauses automatically labeled before a word 2 with an initial voiceless stop, which happened to be [k] in our materials, may signal the silent gap of the voiced stop and not a pause. In this context, pauses were kept only if they were longer than 50 ms (Figure 4d). Otherwise, the pause was relabeled as part of the stop gap (Figure 4c). For statistical analysis, pauses were coded as either present (as in Figure 4b,d) or absent (as in Figure 4a,c) disregarding pause length.
A second Praat script extracted the vowel’s F1 in the genitive particle -no as a proxy for jaw opening (Kawahara et al., 2017) together with duration for Word 1 (e.g., umáno), and Word 2 (e.g., hizume). Duration ratios were computed between Word 1 and Word 2 to normalize for participants’ differences in speech rate.
A methodological note concerns our use of the F1 of [o] in the genitive particle -no as a proxy for jaw opening. Kawara et al. showed that F1 correlates strongly with jaw opening (measured via EMA) for the five Japanese vowels, but earlier work linking prominence and jaw displacement (Erickson & Kawahara, 2016) is less well documented for rounded vowels. In English, Williams et al. (2013) observed greater jaw lowering for stressed /o/, but they did not report acoustic results, and recent reviews (e.g., Steffman & Zhang, 2023) likewise do not address rounded vowels. Since lip rounding generally lowers formant frequencies (Catford, 2001), F1 in [o] may not consistently increase with prominence. Our measure should therefore be treated as an indirect estimate of jaw movement, and future research using direct articulatory methods such as EMA or ultrasound will be needed to confirm this relation in Japanese.

2.5. Statistics

We used linear and logistic mixed-effects models to answer the 4 research questions outlined in Section 1.3. All models included random intercepts for participant and sentence to account for individual and item-level variability. Fixed effects included Focus (Broad, Narrow-W1, Narrow-W2), AccentW1 (whether word 1 was accented or unaccented), AccentW2 (whether word 2 was accented or unaccented), and the three prosodic cues (jaw opening, silence, and duration), either individually or in combination, depending on the research question. Focus was Helmert-coded to allow comparisons between Broad and Narrow conditions, and between the two Narrow Focus positions.
All statistical analyses were conducted in R (Version 4.X; R Core Team, 2024) using the lme4 package (Bates et al., 2015) for mixed-effects modeling. Model comparisons and significance testing were performed using likelihood ratio tests and Type III ANOVAs from the lmerTest and car packages.
In RQ1, we tested whether jaw opening, post-boundary silence, and duration asymmetry signaled Narrow Focus, and whether their effects varied by accent or focus position. Three parallel linear mixed-effects models were used, each targeting a different cue as the dependent variable:
  • Models:
  • RQ1-M1: JawOpening~Focus * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
  • RQ1-M2: Silence~Focus * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
  • RQ1-M3: Duration~Focus * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
In RQ2, we tested whether jaw opening and post-boundary silence, which occur adjacently at the NP boundary, interact dynamically as a boundary-marking mechanism. More specifically, Models RQ2-M1 and RQ2-M2 test the bidirectional relationship between jaw movement and silence, while RQ2-M3 examines whether their interaction predicts focus, paying particular attention to their potential compensatory roles.
  • Models:
  • RQ2-M1: JawOpening~Silence * Focus * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
  • RQ2-M2: Silence~JawOpening * Focus * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
  • RQ2-M3: FocusNB~JawOpening * Silence * AccentW1 + AccentW2 + (1|Participant) + (1|Sentence).
In RQ3, we expanded the analysis to include duration asymmetry, a distributed cue less directly aligned with the boundary, but potentially contributing to the perceptual weight of prosodic restructuring. We examined whether combinations of cues (jaw opening, silence, and duration) better predict focus than individual cues. Models were divided into accent-sensitive (testing compensatory roles under accentual conditions) and accent-independent (testing additive contributions). Model RQ2b-M2 assessed a full interaction among all three cues and AccentW1 in predicting focus. Models RQ2b-M3 and RQ2b-M4 tested whether cue combinations—especially silence and duration—operate additively, and whether jaw adds incremental value.
  • Models:
  • Accent-sensitive (Compensatory Models):
    RQ3-M1: JawOpening~Duration * Focus * AccentW1 + (1|Participant) + (1|Sentence).
    RQ3-M2: FocusNB~Silence * Duration * JawOpening * AccentW1 + (1|Participant) + (1|Sentence).
  • Accent-independent (Additive Models):
    RQ3-M3: FocusNB~Silence * Duration + (1|Participant) + (1|Sentence).
    RQ3-M4: FocusNB~Silence * Duration + JawOpening + (1|Participant) + (1|Sentence).
RQ4 evaluated whether the prosodic system exhibits a fallback structure—where one cue strengthens when another is absent—or a stable hierarchy of cue effectiveness. To assess this, we employ two complementary models. RQ4-M1 is a continuous interaction model that evaluates whether the predictive value of each cue depends on the presence or absence of lexical accent (AccentW1). Significant interactions in this model would support the idea that the cue system is sensitive to pitch availability, with some cues becoming more or less active depending on accentual context.
In contrast, RQ4-M2 adopts a categorical approach by defining a variable called CueProfile, which classifies each token according to the combination of cues present (e.g., DurationOnly, Jaw+Silence, AllCues, NoCues). This model allows us to directly compare which cue constellations are most effective at predicting Narrow Focus, and whether certain minimal combinations (e.g., Silence+Duration) are sufficient when other cues are absent. Together, these models provide insight into the adaptive and compensatory nature of prosodic focus marking in Japanese, supporting the broader claim that cue interaction in the Edge-Reinforcing Strategy is both dynamic and hierarchical.
  • Models:
  • RQ4-M1: FocusNB~AccentW1 * (JawOpening + Silence + Duration) + (1|Participant) + (1|Sentence).
  • RQ4-M2 (categorical CueProfile): FocusNB~CueProfile + (1|Participant) + (1|Sentence).

3. Results

3.1. RQ1: Cues to Broad and Narrow Focus

We first asked whether prosodic cues such as jaw opening, post-boundary silence, and duration asymmetry reliably distinguish Narrow from Broad Focus, and whether their activation depends on lexical accent or focus position. Table 2 summarizes the models and results (see Supplementary Materials for detailed results on these models).
In model RQ1-M1, a linear mixed-effects model predicting jaw opening, we found a significant interaction between Focus (Broad vs. Narrow) and AccentW1, t(812.87) = −2.61, p = 0.009, marginal R2 = 0.025. Jaw opening increased under Narrow Focus but only when the focused word lacked lexical accent, suggesting a compensatory use of jaw movement when pitch cues are unavailable (Figure 5a). Model RQ1-M2, a generalized linear mixed model predicting silence insertion, showed a robust main effect of Focus, z = 9.76, p < 0.001, marginal R2 = 0.21, with no significant interaction with AccentW1. This indicates that silence consistently signals prosodic boundaries in Narrow Focus, regardless of pitch accent (Figure 5b). In model RQ1-M3, which predicted duration asymmetry, both contrasts of Focus were significant: Broad vs. Narrow Focus (t = 9.44, p < 0.001) and Narrow Focus W1 vs. W2 (t = −6.22, p < 0.001), with marginal R2 = 0.17. An interaction with AccentW1 (p = 0.048) suggested that duration is sensitive to both focus structure and word position (Figure 5c).
In summary, all three cues are activated under Narrow Focus signaling a contrast between Broad and Narrow Focus. Jaw opening functions compensatorily, silence acts as a general boundary cue, and duration reflects both focus structure (Broad versus Narrow) and location of Narrow Focus.

3.2. RQ2: Mechanistic Alignment of Jaw Opening and Silence Insertion

Next, we explored the possibility that jaw opening and silence function as a coordinated boundary-marking mechanism, particularly when lexical pitch accent is absent. As summarized in Table 3, models tested whether these cues mutually predict one another (RQ2_M1, RQ2_M2) and whether their interaction contributes to the perception of Narrow Focus (RQ2_M3) (see Supplementary Materials for detailed results on these Models).
In model RQ2-M1, a linear mixed-effects model predicting jaw opening, the interaction between Silence, Focus, and AccentW1 was not significant (p = 0.78), and the marginal R2 was low (R2m = 0.03; R2c = 0.11). This suggests that silence does not predict jaw opening at the NP boundary. In model RQ2-M2, which reversed the direction by predicting silence insertion from jaw opening, no predictive effect was observed (p = 0.75), though FocusNB showed a strong main effect (p < 0.001), indicating that silence remains a reliable cue for Narrow Focus, independent of jaw opening (R2m = 0.21; R2c = 0.38). In model RQ2-M3, a generalized linear mixed-effects model predicting FocusNB from jaw opening, silence, and their interaction with AccentW1, a strong main effect of silence was found, z = 5.55, p < 0.001, alongside a weaker interaction with AccentW1. The model had good fit (R2m = 0.22; R2c = 0.39). This suggests that while jaw opening does not trigger silence, it may enhance its boundary-marking function in unaccented conditions where pitch cues are weak or absent.
In summary, silence functions independently and dominantly in marking focus boundaries. There is no evidence of mutual activation between jaw and silence, but results provide partial support for a compensatory role of jaw opening in unaccented contexts.

3.3. RQ3: Functional Coordination of Duration, Silence Insertion and Jaw Opening

While jaw and silence are boundary-aligned, duration is a more distributed cue that may contribute to the functional coordination of boundary signaling. Specifically, models in Table 4 examined whether silence, duration asymmetry, and jaw opening jointly enhance focus marking, and whether their effects are additive, compensatory, or dependent on accentual context.
In model RQ3-M1, a linear mixed-effects model predicting jaw opening from duration, Focus, and AccentW1, no significant main effects or interactions were found (p > 0.05), and explained variance was low (R2m = 0.025; R2c = 0.11). This indicates that duration does not modulate jaw movement, further supporting the view that jaw activation is not driven by duration. In model RQ3-M2, we tested a four-way interaction model predicting FocusNB from silence, duration, jaw opening, and AccentW1. All predictors contributed significantly (p < 0.001), and the model showed strong fit (R2m = 0.71; R2c = 0.89), but convergence warnings were present, limiting interpretability. While the results suggest potential cue synergy, instability in the estimation cautions against strong conclusions.
More stable and interpretable results came from additive models. In model RQ3-M3, which included only silence and duration, both cues significantly predicted FocusNB (p < 0.001), yielding a high marginal R2 of 0.79. In model RQ3-M4, we added jaw opening, which yielded only a small gain in explained variance (R2m = 0.81; R2c = 0.91), and jaw’s individual effect was weak and not consistently significant.
In summary, silence and duration are the primary predictors of Narrow Focus in cue combinations. Their effects appear additive rather than compensatory, and the contribution of jaw opening is minimal when stronger cues are present. These findings support the view that the Edge-Reinforcing Strategy operates as a coordinated but stable cue ensemble, with silence and duration at its core.

3.4. RQ4: Is the Edge-Reinforcing Strategy Hierarchical or Compensatory?

Finally, we considered whether the prosodic system follows a stable cue hierarchy or adjusts dynamically when primary cues are weakened. To address this, we examined whether the predictive strength of jaw opening, silence, and duration changes are modulated by the presence of lexical accent (model RQ4-M1). We then used a categorical CueProfile (model RQ4-M2) to compare combinations of cues and assess which ones most effectively signal Narrow Focus. Together, these models test whether cue integration is adaptive and reactive or stable and graded.
In model RQ4-M1, we tested whether the predictive strength of each cue (jaw opening, silence, duration) varies with the presence of AccentW1. All three cues significantly predicted FocusNB (silence: z = 6.58, duration: z = 11.44; both p < 0.001), but no significant interactions with AccentW1 were observed. Marginal R2 was 0.79. To further evaluate cue hierarchy, model RQ4-M2 tested a categorical CueProfile variable based on combinations of cue presence. A comparison of CueProfiles showed that AllCues and Silence+Duration were the most effective configurations for predicting Narrow Focus, followed by DurationOnly. Profiles like JawOnly, SilenceOnly, and NoCues performed significantly worse (e.g., JawOnly vs. AllCues, z = −5.49, p < 0.001).
Altogether, these results show that the system is not compensatory—there is no evidence that one cue strengthens when another fails (e.g., due to accent absence). Instead, cue combinations operate with graded strength, revealing a functional hierarchy:
  • AllCues > Silence+Duration > DurationOnly > SilenceOnly > JawOnly > NoCues

3.5. General Summary

The findings offer strong empirical support for the Edge-Reinforcing Strategy: a prosodic mechanism through which Japanese speakers mark Narrow Focus not only via pitch, but by enhancing prosodic boundaries using non-pitch cues—namely, silence insertion, duration asymmetry, and jaw opening.
Results from RQ1 confirmed that all three cues contribute to focus marking, each in distinct ways. Silence served as a robust, accent-independent marker of boundary insertion. Duration asymmetry reliably reflected focus structure and position. Jaw opening emerged primarily in unaccented contexts, suggesting a compensatory role when pitch-based cues were unavailable.
In RQ2, we found no evidence that jaw opening, and silence form a coordinated mechanism. The two cues did not predict each other, but silence remained a dominant predictor of Narrow Focus, and jaw opening may reinforce its effect when pitch is weak.
RQ3 showed that silence and duration function additively, significantly improving focus prediction when combined. Jaw opening contributed minimally, supporting a model of cue integration that is stable and weighted rather than compensatory.
RQ4 confirmed a functional hierarchy: cue profiles involving silence and duration, with or without jaw opening, best predicted Narrow Focus. Profiles relying only on jaw or lacking all cues were least effective. These effects held regardless of accentual context, indicating a robust and non-reactive cue system.
Altogether, the results validate the Edge-Reinforcing Strategy as a flexible but stable method of prosodic restructuring, allowing speakers to signal information structure through coordinated, non-pitch boundary cues—even when pitch-based strategies are constrained.

4. Discussion

This production study set out to test whether Japanese speakers use non-pitch-based, boundary-aligned cues to mark information structure, particularly in contexts where pitch cues are unavailable or unreliable. The results strongly support the existence of what we term the Edge-Reinforcing Strategy: a system of prosodic modulation in which silence insertion, word duration asymmetry, and jaw opening reinforce prosodic edges, potentially guiding the listener’s perception of focus. Crucially, these cues are not merely redundant or peripheral; they serve as primary mechanisms for structuring utterances when pitch fails to convey discourse-level meaning, particularly in unaccented contexts. Our findings confirm that this articulatorily grounded strategy operates not only in typologically prototypical edge-marking languages but also in Standard Japanese, which display a mixture of head-based and edge-based prosodic features.

4.1. Edge-Based Cues Operate in Standard Japanese, a Mixed Typological System

As explained in Section 2.1, Standard Japanese spoken by educated speakers is not a canonical edge-marking dialect. It is characterized by both lexical pitch accent and multiword Accentual Phrases (APs), aligning it typologically with [+lexical accent, +multiword AP] systems (Igarashi, 2012, 2014; Jun, 2014, 2025). Despite these head-prominence features, our results show that our participants reliably use duration and silence to mark Narrow Focus. This suggests that the Edge-Reinforcing Strategy is not categorically tied to edge-typology dialects like Kobayashi or Koriyama. Rather, it operates along a continuum, with speakers accessing edge-based cues depending on the availability and reliability of pitch cues.
This challenges the notion of typological exclusivity in prosodic systems (see Jun, 2025 for a discussion in this topic). It suggests that edge reinforcement is not an all-or-nothing strategy but a flexible, gradient mechanism available even in systems with robust pitch-accent architectures. Our findings parallel observations by Mizuguchi and Tateishi (2023) that pitch expansion fails to robustly mark focus in spontaneous Japanese, particularly in unaccented words. Thus, boundary-aligned cues such as silence and duration offer a reliable alternative—or complement—to pitch-based focus marking.

4.2. Jaw Opening as a Compensatory Cue: Implications for Other Dialects

In our data, jaw opening—measured through F1 height—played a secondary, compensatory role. It was not uniformly elevated in all Narrow Focus contexts but selectively emerged when the focused word was unaccented. This supports the idea of adaptive cue re-weighting: speakers deploy articulatory strategies like jaw opening when dominant cues such as pitch are absent or ineffective. Erickson and Kawahara’s (2016) work; Erickson and Niebuhr (2023) provide cross-linguistic support for this interpretation, showing that jaw opening marks prosodic boundaries in edge-marking languages such as Japanese and French but marks lexical stress in head-marking systems like English and Brazilian Portuguese.
Given this, we hypothesize that in more prototypically edge-based Japanese dialects—particularly those that are [−lexical accent] and [−multiword AP]—jaw opening may play a more central role. In such systems, where neither pitch-based prominence nor multiword restructuring is available, articulatory gestures like jaw opening may form a cue complex with silence to enhance prosodic edge salience. Future work should test this hypothesis in dialects such as Kobayashi, which severely constrain F0-based focus cues.

4.3. Silence as a Discourse-Level Cue to Genitive NP Boundaries

One of the most novel findings in this study is that silence insertions do not merely mark major syntactic junctures but appear at NP-level boundaries (e.g., between genitive noun phrases) and align with discourse distinctions such as Broad vs. Narrow Focus. This extends the role of silence in Japanese prosody beyond clause demarcation, suggesting that speakers actively manipulate pause placement to convey information structure.
This finding resonates with Mizuguchi and Tateishi’s (2025) Rapid Prosody Transcription study, which shows that Japanese listeners rely more heavily on post-boundary pauses than on syntactic cues when parsing prosodic structure. Given the tendency for Japanese spontaneous speech to omit case-marking suffixes (e.g., ‘ga’, ‘o’) and allow flexible word order through scrambling, the language places increased weight on prosodic phrasing for syntactic and discourse interpretation. In this light, silence functions as a discourse-sensitive, bottom-up segmentation tool.
The strategic insertion of silence at NP boundaries indicates a recalibration of prosodic boundary strength according to pragmatic function. For example, in our data, NP-level pauses under Narrow Focus were described by our participants as two chunks triggering perceptual re-segmentation of the two Accentual Phrase (AP) into two separate Intonational Phrases (IPs). This finding supports a view of prosodic boundaries as gradient and functional, where edge reinforcement scales according to the discourse prominence of the constituent.

4.4. Gestural Segmentation and the Articulatory Grounding of Prosody

We propose that the Edge-Reinforcing Strategy we observed is not merely acoustic but gesturally grounded. Broad Focus productions were characterized by smooth gestural integration—shorter durations, compact jaw postures, and absence of pauses—suggesting tight gestural coupling. In contrast, Narrow Focus prompted gestural re-segmentation: longer duration of the AP under focus, increased jaw aperture at the AP edge, and post-focal inserted pauses. These results are consistent with Articulatory Phonology (Browman & Goldstein, 1992), which models prosodic structure as emerging from the temporal organization of articulatory gestures.
In our interpretation, prosodic phrasing arises from the perceptual chunking of speech gestures. When edge-aligned cues (duration, silence, jaw opening) are present and strong, they weaken gestural coupling across constituents, increasing the likelihood of perceptual boundary insertion. This accounts for the observed rephrasing of two-word genitive constructions under Narrow Focus where participants weakened gestural coupling across the two nouns. The presence and coordination of edge cues thus determine whether prosodic domains are fused or separated in the perceptual grammar.
This view aligns with recent research on rhythmic binarity and hierarchical phrasing in Japanese (Ishihara, 2023a, 2023b; Shinya et al., 2004), reinforcing the idea that prosodic boundaries are shaped by both informational and gestural pressures. Our study contributes to this literature by demonstrating that non-pitch cues—traditionally understudied in Japanese prosody—serve as key mediators between articulation and discourse function.

4.5. Limitations and Future Research

Several limitations of the present study also suggest avenues for future research. First, our study focused exclusively on speech production. Although listener intuitions inspired aspects of the design, we did not directly test the perceptual weighting of the observed cues. Future perception experiments should evaluate the relative salience of duration, silence, and jaw opening in real-time processing and determine how listeners integrate these cues to resolve focus structure.
Second, while we relied on F1 of [o] as a proxy for jaw opening—an approach supported in earlier work (Kawahara et al., 2017)—this measure should be interpreted cautiously, since prominence effects on rounded vowels are not well established. Direct articulatory methods such as EMA or ultrasound would provide more precise evidence about the magnitude and timing of jaw displacement and help clarify whether F1 reliably reflects jaw movement in these contexts.
Finally, our sample included only nine speakers of educated Standard Japanese, which limits the generalizability of the findings. Additional work is needed to assess whether the Edge-Reinforcing Strategy operates similarly in dialects with different prosodic typologies, particularly those with [−lexical pitch accent] and [−multiword AP] structures such as Kagoshima or Kobayashi.

5. Conclusions

This study demonstrates that focus marking in Japanese is not limited to pitch expansion or lexical accent. Our production data show that speakers also employ edge-reinforcing cues—silence, duration, and jaw opening—to restructure prosodic domains. Although the perceptual impact of this strategy remains to be tested, the production evidence indicates that these cues work together to mark focus when pitch alone is insufficient. The findings further suggest that prosodic systems cannot be neatly divided into “pitch-based” versus “edge-based” types (Jun, 2025). Rather, speakers draw flexibly on multiple cues, with silence and duration serving as core markers and jaw opening providing support in contexts where pitch is weak. By highlighting the role of these non-pitch cues, the study broadens current models of prosody and shows how discourse meaning emerges from the coordination of acoustic, articulatory, and structural signals. Future research should examine how listeners weight these cues in perception, test their use in other Japanese dialects, and incorporate direct articulatory methods to clarify the relation between jaw movement and F1. Taken together, these results establish edge reinforcement as a central mechanism for focus marking in Japanese prosody.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/languages10090242/s1. In the file Supplementary_Models, the statistical models for RQ1 and RQ2 are included.

Author Contributions

Conceptualization, M.O.-L. and J.N.; Methodology, M.O.-L.; Software, M.O.-L.; Validation, M.O.-L. and J.N.; Formal analysis, M.O.-L.; Investigation, J.N.; Resources, J.N.; Data curation, M.O.-L. and J.N.; Writing—original draft preparation, M.O.-L.; Writing—review and editing, M.O.-L. and J.N.; Visualization, M.O.-L.; Supervision, M.O.-L.; Project administration, M.O.-L.; Funding acquisition, M.O.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Hewlett International Grant, University of Pittsburgh, STUDY20110234.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of University of Pittsburgh (Approval Code:STUDY20110234, Approval Date: 1 February 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

We are grateful to the participants for their collaboration in this project. The authors used ChatGPT4 to assist with English grammar editing, to generate R code for the statistical models, and to create the tables in the Section 3. All output was reviewed and edited by the authors, who take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. [Google Scholar] [CrossRef]
  2. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3–4), 155–180. [Google Scholar] [CrossRef]
  3. Catford, J. C. (2001). A practical introduction to phonetics (2nd ed., p. 154). Oxford University Press. ISBN 0-19-924635-1. [Google Scholar]
  4. Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistics Vanguard, 2(1), 207–218. [Google Scholar] [CrossRef]
  5. Erickson, D., Kawahara, S., Shibuya, Y., Suemitsu, A., & Tiede, M. (2014). Comparison of jaw displacement patterns of Japanese and American speakers of English: A preliminary report. Journal of the Phonetic Society of Japan, 18(2), 88–94. [Google Scholar]
  6. Erickson, D., & Niebuhr, O. (2023). Articulation of prosody and rhythm: Some possible applications to language teaching. In Studies in laboratory phonology (pp. 1–45). Language Science Press (langsci-press.org). [Google Scholar] [CrossRef]
  7. Erickson, D., Rilliard, A., Svensson Lundmark, M., Rebollo Couto, L., Silva, A., de Moraes, J., & Niebuhr, O. (2024, September 1–5). Collecting mandible movement in Brazilian Portuguese. Interspeech 2024, Kos Island, Greece. [Google Scholar]
  8. Han, M. S. (1962). The feature of duration in Japanese. Onsei no Kenkyuu [Studies in Phonetics], 10, 65–80. [Google Scholar]
  9. Igarashi, Y. (2012). Prosodic typology in Japanese dialects from a cross-linguistic perspective. Lingua, 122(13), 1441–1453. [Google Scholar] [CrossRef]
  10. Igarashi, Y. (2014). Typology of intonational phrasing in Japanese dialects. In S.-A. Jun (Ed.), Prosodic typology II (pp. 123–146). Oxford University Press. [Google Scholar]
  11. Igarashi, Y. (2018). Standardization and Japanese people’s perception toward languages. Ritsumeikan Kokusai Kenkyu, 31(2), 61–76. [Google Scholar]
  12. Ishihara, S. (2023a). Focus and prosody in Japanese: Discourse-level influences on pitch accent realization. The Journal of Japanese Linguistics, 39(1), 25–52. [Google Scholar]
  13. Ishihara, S. (2023b, August 7–11). Prosodic realization of syntactic phrase and clause boundaries in Tokyo Japanese. Proceedings of ICPhS, Prague, Czech Republic. [Google Scholar]
  14. Jun, S. A. (2014). Prosodic typology: By prominence type, word prosody, and macro-rhythm. In S.-A. Jun (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 520–539). Oxford University Press. [Google Scholar]
  15. Jun, S. A. (2025). Prosodic typology: Intonational tone types and functions. In D. Bradley, K. Dziubalska-Kołaczyk, C. Hamans, I.-H. Lee, & F. Steurs (Eds.), Contemporary linguistics: Integrating languages, communities, and technologies (Vol. 7, pp. 93–111). Brill’s Handbook in Linguistics. [Google Scholar]
  16. Kawahara, S., Erickson, D., & Suemitsu, A. (2017). The phonetics of jaw displacement in Japanese vowels. Acoustical Science and Technology, 38(2), 99–107. [Google Scholar] [CrossRef]
  17. Kubozono, H. (1993). The organization of Japanese prosody. Kurosio Publishers. [Google Scholar]
  18. Lee, S., Xiu, Y., & Xu, Y. (2022). Prosodic focus marking in Japanese: Analyzing F0 and duration patterns in narrow focus. Journal of Phonetics, 91, 101123. [Google Scholar]
  19. Maeda, K., & Venditti, J. J. (1998). Phonological phrasing meets rhythmic constraints: Evidence from Japanese. Proceedings of ICSLP, 2, 693–696. [Google Scholar]
  20. Mizuguchi, M., & Tateishi, K. (2020). Prosodic focus in Japanese: Reassessing the role of pitch accent. Linguistic Research, 37(2), 211–234. [Google Scholar]
  21. Mizuguchi, M., & Tateishi, K. (2023). Prominence in a pitch language: The production and perception of Japanese. Rowman & Littlefield. [Google Scholar]
  22. Mizuguchi, M., & Tateishi, K. (2025). Cues to narrow focus in Japanese: The limits of pitch and role of boundary cues. Journal of East Asian Linguistics. in press. [Google Scholar]
  23. Murakami, A. (2008). Phonological phrasing in Tokyo Japanese: An experimental approach. Studies in Language Sciences, 7, 97–110. [Google Scholar]
  24. Nagao, J., & Ortega-Llebaria, M. (2024). Beyond pitch: Exploring duration, intensity, and silence in Japanese focus marking. Phonica, 20, 1–17. [Google Scholar] [CrossRef]
  25. Pierrehumbert, J., & Beckman, M. (1986). Japanese tone structure. MIT Press. [Google Scholar]
  26. Pierrehumbert, J., & Beckman, M. (1988). The Japanese tone system and prosodic phrasing. In M. Beckman, & J. Kingston (Eds.), Papers in laboratory phonology I (pp. 123–138). Cambridge University Press. [Google Scholar]
  27. Poerner, N., & Schiel, F. (2018, May 7–12). A web service for pre-segmenting very long transcribed speech recordings. Proceedings of LREC, Miyazaki, Japan. [Google Scholar]
  28. R Core Team. (2024). R: A language and environment for statistical computing (Version 4.X) [Computer software]. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 14 June 2025).
  29. Sano, S.-i., & Guillemot, C. (2025). Contrast enhancement and the distribution of vowel duration in Japanese. Journal of Phonetics, 108, 101386. [Google Scholar] [CrossRef]
  30. Seo, J., Kim, S., Kubozono, H., & Cho, T. (2019). Preboundary lengthening in Japanese: To what extent do lexical pitch accent and moraic structure matter? The Journal of the Acoustical Society of America, 146(3), 1817–1823. [Google Scholar] [CrossRef]
  31. Shinya, T., Kishida, M., & Kubozono, H. (2004). Focus and phrasing in Tokyo Japanese. Journal of the Phonetic Society of Japan, 8(1), 3–18. [Google Scholar]
  32. Smiljanić, R., & Bradlow, A. R. (2005). Production and perception of clear speech in Croatian and English. The Journal of the Acoustical Society of America, 118(3), 1677–1688. [Google Scholar] [CrossRef] [PubMed]
  33. Steffman, J., & Zhang, W. (2023). Vowel perception under prominence: Examining the roles of F0, duration, and distributional information Open Access. Journal of the Acoustical Society of America, 154, 2594–2608. [Google Scholar] [CrossRef] [PubMed]
  34. Tanaka, Y., Hayashi, N., Maeda, T., & Aizawa, M. (2016). Ichiman-nin kara mita saishin no hōgen, kyōtsūgo ishiki [Latest trends in nationwide language consciousness and standard language of 10,000 people]. NINJAL. [Google Scholar]
  35. Venditti, J. J. (2000). Discourse structure and attentional salience effects on Japanese intonation [Doctoral dissertation, The Ohio State University]. [Google Scholar]
  36. Venditti, J. J. (2005). The J_ToBI model of Japanese intonation. In S.-A. Jun (Ed.), Prosodic typology: The Phonology of intonation and phrasing (pp. 172–200). Oxford University Press. [Google Scholar]
  37. Williams, J. C., Erickson, D., Ozaki, Y., Suemitsu, A., Minematsu, N., & Fujimura, O. (2013). Neutralizing differences in jaw displacement for English vowels. Proceedings of International Congress of Acoustics. POMA, 19, 060268. [Google Scholar] [CrossRef]
Figure 1. Idealized pitch track of prosodic domains IP and AP in Tokyo Japanese. The pitch reset of the second IP in (a) or its continuous declination in (b) indicates whether the 2 APs belong to different IPs (as in (a)) or to the same IP (as in (b)). The dotted line emphasizes the pitch resetting in (a) versus the declination of F0 maximum in the APs in (b).
Figure 1. Idealized pitch track of prosodic domains IP and AP in Tokyo Japanese. The pitch reset of the second IP in (a) or its continuous declination in (b) indicates whether the 2 APs belong to different IPs (as in (a)) or to the same IP (as in (b)). The dotted line emphasizes the pitch resetting in (a) versus the declination of F0 maximum in the APs in (b).
Languages 10 00242 g001
Figure 2. Waveform and F0 tracks of word minimal pair ‘hashi’. In (a) hashi ‘chopsticks’ bears an HL pitch accent while in (b) hashi ‘edge’ is unaccented.
Figure 2. Waveform and F0 tracks of word minimal pair ‘hashi’. In (a) hashi ‘chopsticks’ bears an HL pitch accent while in (b) hashi ‘edge’ is unaccented.
Languages 10 00242 g002
Figure 3. Idealized F0 tracks of Broad Focus (left, (a,c)) and Narrow Focus (right, (b,d)) productions of a 3-word sentence in Kobayashi (a,b) and Koriyama (c,d) dialects.
Figure 3. Idealized F0 tracks of Broad Focus (left, (a,c)) and Narrow Focus (right, (b,d)) productions of a 3-word sentence in Kobayashi (a,b) and Koriyama (c,d) dialects.
Languages 10 00242 g003
Figure 4. Soundwave, spectrogram, and TextGrid of “umano hizume” ‘horse’s hoof’ (a,b), and “umano kaseki” ‘horse’s fossil’ (c,d).
Figure 4. Soundwave, spectrogram, and TextGrid of “umano hizume” ‘horse’s hoof’ (a,b), and “umano kaseki” ‘horse’s fossil’ (c,d).
Languages 10 00242 g004aLanguages 10 00242 g004b
Figure 5. The effects of focus on (a) F1 height of -no when word 1 is accented (dotted blue line) or unaccented (solid green line), and (b) on silence insertion, and (c) on word duration ratios. A ratio of 1 indicates similar durations in word 1 and word 2.
Figure 5. The effects of focus on (a) F1 height of -no when word 1 is accented (dotted blue line) or unaccented (solid green line), and (b) on silence insertion, and (c) on word duration ratios. A ratio of 1 indicates similar durations in word 1 and word 2.
Languages 10 00242 g005
Table 1. Set of 48 items. (A = Accented word, U = Unaccented word).
Table 1. Set of 48 items. (A = Accented word, U = Unaccented word).
Accent PatternPosition of Narrow FocusNumber of SentencesExamples
AUWord 1
(the first noun + -no)
6tsunóno kubiwa
UA6torano kawá
AA6umáno mimí
UU6sameno kubiwa
 
aUWord 2
(the second noun)
6umáno hizume
uA6ushino tsunó
aA6umáno honé
uU6ushino kazari
Total Number of Sentences48
Table 2. Summary of models and results for RQ1.
Table 2. Summary of models and results for RQ1.
ModelDependent VariableSignificant EffectsR2mR2cInterpretation
RQ1-M1Jaw OpeningFocus × AccentW1 (p = 0.009)0.0250.106Compensatory cue
(unaccented contexts)
RQ1-M2Silence
Insertion
Focus (p < 0.001)0.210.37General cue, accent-
independent
RQ1-M3Duration AsymmetryFocus (p < 0.001), Focus position,
AccentW1 (p = 0.048)
0.170.26Sensitive to focus and
position
Table 3. Summary of models and results for RQ2.
Table 3. Summary of models and results for RQ2.
ModelDependent VariableSignificant EffectsR2mR2cInterpretation
RQ2-M1Jaw OpeningNone (Silence × Focus ×
AccentW1, p = 0.78)
0.030.11No effect of silence on jaw
RQ2-M2Silence
Insertion
FocusNarrowBroad (p < 0.001), not Jaw Opening0.210.38Silence is robust; jaw not predictive
RQ2-M3Focus (Broad–Narrow)Silence (z = 5.55, p < 0.001);
minor AccentW1 interaction
0.220.39Silence dominant; jaw may reinforce under low pitch
Table 4. Summary of models and results for RQ3.
Table 4. Summary of models and results for RQ3.
ModelDependent VariableSignificant EffectsR2mR2cInterpretation
RQ3-M1Jaw OpeningNone (Duration × Focus ×
AccentW1, p > 0.05)
0.0250.11Duration does not
modulate jaw opening
RQ3-M2FocusNBAll cues contribute (p < 0.001); convergence issues0.710.89Strong model fit, but
unstable estimation
RQ3-M3FocusNBSilence (p < 0.001), Duration
(p < 0.001)
0.790.90Additive cue effects
without jaw
RQ3-M4FocusNBSilence, Duration; Jaw
marginal
0.810.91Jaw adds minimal value when other cues are strong
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ortega-Llebaria, M.; Nagao, J. When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese. Languages 2025, 10, 242. https://doi.org/10.3390/languages10090242

AMA Style

Ortega-Llebaria M, Nagao J. When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese. Languages. 2025; 10(9):242. https://doi.org/10.3390/languages10090242

Chicago/Turabian Style

Ortega-Llebaria, Marta, and Jun Nagao. 2025. "When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese" Languages 10, no. 9: 242. https://doi.org/10.3390/languages10090242

APA Style

Ortega-Llebaria, M., & Nagao, J. (2025). When Pitch Falls Short: Reinforcing Prosodic Boundaries to Signal Focus in Japanese. Languages, 10(9), 242. https://doi.org/10.3390/languages10090242

Article Metrics

Back to TopTop