Clarifications on annotation 

Hi there, 
I am working on a building a new dataset in Spanish (polysyllabic language). I have gone though [MakeDiffSinger](https://github.com/openvpi/MakeDiffSinger/tree/main) but I still have some gaps. I would be grateful if you could sanity check me on my understanding and share any thoughts you might have 

**Questions for clarifications:**
1. _ph_seq_: **These are sequences of phonemes or syllables?** 
Currently I using phonemes and their timestamps as provided by MFA. I am using a pre-trained Spanish model available by MFA. Would you recommend training a new one on my specific data?

2. _note_dur_: **The midi notes should be estimated over phonemes, syllables, or words?** 
Now I estimated one note for each phoneme and assumed ph_dur==note_dure

4. _ph_num_: **The number of phonemes in each word or in each syllable?**
Now I assumed the number of phonemes in each word

3. _note_seq_:  Do you think [SOME](https://github.com/openvpi/SOME) would suffice to get a first shot at this ? I would speculate yes?

6. _is_slur_: how would you define slur in this context? I have not found plenty of resources on this  topic
Now I assumed no slurs at all

7. _SPs and APs_: Would you recommend doing that manually or using the [enhance script](https://github.com/openvpi/MakeDiffSinger/blob/main/acoustic_forced_alignment/enhance_tg.py) might be OK for a first shot? 


Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications on annotation #211

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarifications on annotation #211

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions