Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readr::parse_number fails with Japanese data on Windows #1111

Closed
hidekoji opened this issue Jun 17, 2020 · 3 comments
Closed

readr::parse_number fails with Japanese data on Windows #1111

hidekoji opened this issue Jun 17, 2020 · 3 comments

Comments

@hidekoji
Copy link
Contributor

readr::parse_number fails with Japanese data on Windows. (Tried it with readr 1.3.1 on R4.0.1)

# Set Japanese Locale on Windows
Sys.setlocale("LC_CTYPE", "Japanese_Japan.932")
#> [1] "Japanese_Japan.932"

# Load required packages.
library(readr)
 
# Steps to produce the output
tmp <- file("https://www.dropbox.com/s/06jmg6k4lnb61p2/flight300.rds?dl=1")
flight <- readRDS(tmp)
test <- readr::parse_number(as.character(flight[[34]]))
#> Error in nchar(x): invalid multibyte string, element 4

Created on 2020-06-17 by the reprex package (v0.3.0)

sessionInfo()
#> R version 4.0.1 (2020-06-06)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
#> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=English_India.1252    
#> system code page: 932
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.1  magrittr_1.5    tools_4.0.1     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.2  
#>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.14      
#> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14
@jimhester
Copy link
Collaborator

You need to give parse_number() the locale if it is not UTF-8, it does not assume the native locale.

@hidekoji
Copy link
Contributor Author

Thank you for the update. It still fails even I pass locale argument.

# Set Japanese Locale on Windows
Sys.setlocale("LC_CTYPE", "Japanese_Japan.932")
#> [1] "Japanese_Japan.932"

# Load required packages.
library(readr)
 
# Steps to produce the output
tmp <- file("https://www.dropbox.com/s/06jmg6k4lnb61p2/flight300.rds?dl=1")
flight <- readRDS(tmp)
test <- readr::parse_number(as.character(flight[[34]]), locale=readr::locale(encoding ="CP932"))
#> Error in nchar(x): invalid multibyte string, element 4

Created on 2020-06-17 by the reprex package (v0.3.0)

@jimhester
Copy link
Collaborator

Should be fixed by #1152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants