hanyupinyin converts Chinese characters into Hanyu
Pinyin in R. It is a modern, vectorized, and self-contained package
inspired by the now-orphaned CRAN package pinyin.
kMandarin), covering ~44k unique
characters.sapply
loops; works natively on character vectors.qiu1), toneless (qiu), or diacritic marks
(qiū) via a single function.You can install the development version of hanyupinyin from GitHub:
# install.packages("devtools")
devtools::install_github("CuiHR17/hanyupinyin")library(hanyupinyin)
# Basic conversion (numeric tones)
to_pinyin("春眠不觉晓")
#> [1] "chun1_mian2_bu4_jue2_xiao3"
# Tone marks via the unified interface
to_pinyin("春眠不觉晓", tone = "marks")
#> [1] "chūn_mián_bù_jué_xiǎo"
# Convenience wrappers
to_pinyin_toneless("中华人民共和国")
#> [1] "zhong_hua_ren_min_gong_he_guo"
to_pinyin_marks("春眠不觉晓")
#> [1] "chūn_mián_bù_jué_xiǎo"
# Initials only
to_pinyin_initials("中华人民共和国")
#> [1] "zhrmghg"
# Polyphone handling
to_pinyin("银行行长", polyphone = TRUE)
#> [1] "yin2_hang2_hang2_zhang3"
to_pinyin("银行行长", polyphone = TRUE, tone = "marks")
#> [1] "yín_háng_háng_zhǎng"
# Generate valid R variable names from Chinese labels
to_varname(c("姓名", "年龄", "性别"))
#> [1] "xing_ming" "nian_ling" "xing_bie"
# URL-friendly slug
to_slug("2026年报告")
#> [1] "2026-nian-bao-gao"Users can extend the built-in phrase table. The reading
argument accepts numeric tones, tone marks, or even toneless syllables.
Syllables should be separated by spaces (underscores and hyphens are
also accepted and normalised automatically).
# Numeric input -- both tone and mark outputs work automatically
add_phrase("测试短语", "ce4 shi4 duan3 yu3")
to_pinyin("测试短语", polyphone = TRUE)
#> [1] "ce4_shi4_duan3_yu3"
to_pinyin("测试短语", polyphone = TRUE, tone = "marks")
#> [1] "cè_shì_duǎn_yǔ"
# Tone-mark input -- numeric tones are derived automatically
add_phrase("和平", "hé píng")
to_pinyin("和平", polyphone = TRUE, tone = "marks")
#> [1] "hé_píng"
# Underscore separators are also accepted
add_phrase("行长", "hang2_zhang3")
to_pinyin("行长", polyphone = TRUE)
#> [1] "hang2_zhang3"
# Inspect stored phrases
list_phrases()
#> phrase tone marks
#> 1 测试短语 ce4 shi4 duan3 yu3 cè shì duǎn yǔ
#> 2 和平 hé píng hé píng
#> 3 行长 hang2 zhang3 háng zhǎngThis package was inspired by the CRAN package pinyin
(Peng Zhao et al.), which was archived in April 2026 after the
maintainer became unreachable. hanyupinyin is a ground-up
rewrite using standard Unicode data and modern R practices.
Dictionary data are derived from the Unicode Unihan Database (https://www.unicode.org/reports/tr38/) and used under the Unicode License.