Skip to content

Commit

Permalink
Add support for transcoding String CSV
Browse files Browse the repository at this point in the history
GitHub: fix #254

Syntax is "from-encoding:to-encoding".

Reported by Richard Stueven. Thanks!!!
  • Loading branch information
kou committed Jul 15, 2022
1 parent 6ff170e commit 3eeaeef
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 1 deletion.
13 changes: 12 additions & 1 deletion lib/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1889,8 +1889,19 @@ def initialize(data,
raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?

if data.is_a?(String)
if encoding
if encoding.is_a?(String)
data_external_encoding, data_internal_encoding = encoding.split(":", 2)
if data_internal_encoding
data = data.encode(data_internal_encoding, data_external_encoding)
else
data = data.dup.force_encoding(data_external_encoding)
end
else
data = data.dup.force_encoding(encoding)
end
end
@io = StringIO.new(data)
@io.set_encoding(encoding || data.encoding)
else
@io = data
end
Expand Down
31 changes: 31 additions & 0 deletions test/csv/test_encodings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,37 @@ def test_invalid_encoding_row_error
error.message)
end

def test_string_input_transcode
# U+3042 HIRAGANA LETTER A
# U+3044 HIRAGANA LETTER I
# U+3046 HIRAGANA LETTER U
value = "\u3042\u3044\u3046"
csv = CSV.new(value, encoding: "UTF-8:EUC-JP")
assert_equal([[value.encode("EUC-JP")]],
csv.read)
end

def test_string_input_set_encoding_string
# U+3042 HIRAGANA LETTER A
# U+3044 HIRAGANA LETTER I
# U+3046 HIRAGANA LETTER U
value = "\u3042\u3044\u3046".encode("EUC-JP")
csv = CSV.new(value.dup.force_encoding("UTF-8"), encoding: "EUC-JP")
assert_equal([[value.encode("EUC-JP")]],
csv.read)
end

def test_string_input_set_encoding_encoding
# U+3042 HIRAGANA LETTER A
# U+3044 HIRAGANA LETTER I
# U+3046 HIRAGANA LETTER U
value = "\u3042\u3044\u3046".encode("EUC-JP")
csv = CSV.new(value.dup.force_encoding("UTF-8"),
encoding: Encoding.find("EUC-JP"))
assert_equal([[value.encode("EUC-JP")]],
csv.read)
end

private

def assert_parses(fields, encoding, **options)
Expand Down

0 comments on commit 3eeaeef

Please sign in to comment.