Why is string(int) a valid cast operation?

Today I've found out that this is valid in Go (unexpected)

string(65)

Where as this isn't (as expected)

int(string(65))

Here's proof https://play.golang.org/p/kHHKHb7ISV and here's the "feature" documented https://golang.org/ref/spec#Conversions_to_and_from_a_string_type.

While I'm trying to get over my disbelief that this is possible, I'd like to understand why in a relatively logical and uneventful type system something like this happened... There must be some historical context for this, but it seems like a big, bad oversight from the language creators.

~~At the very least, I expected all cast operations to be reversible, as it is with all other types I know.~~

EDIT: Removed the part about reversible. Bad wording on my side.

UPDATE: Got an answer from Rob Pike https://www.reddit.com/r/golang/comments/78hxd7/why_is_stringint_a_valid_cast_operation/dou5s1g/

Thanks to everybody who replied for your input and for certain corrections and insights.

评论：

robpike:

The conversion string(int) was added at the dawn of time because it provided, before fmt, the simplest way to convert an integer to a string. In other words, it was just a hack, nothing principled.

I think it should disappear in Go 2. https://github.com/golang/go/issues/3939

goomba_gibbon:

It definitely seems like a hack given how it behaves in some cases. I take it you're quite familiar with UTF-8, so I'll take your word for it :)

Thanks for creating the issue and linking it here!

Personally I'd prefer to see Itoa everywhere as it's more consistent with Atoi anyway.

dlsniper:

Int to string is a well understood conversion, there's nothing uncommon about, it's boring.

String to int brings the following issues: overflows, invalid strings as numbers, strings starting with numbers then characters, empty strings. All of these are cases which can happen, and, sooner or later, will happen. How should the runtime handle this? Panic? Silent error? There is no error returned from casting so silent would be the only option. But then what if you could handle this better than the runtime chooses to do it?

That's why it's not always possible to have a reversible cast operation.

Also see https://godoc.org/strconv#Itoa and https://godoc.org/strconv#Atoi and their implementation.

deafmalice:

While I agree that int-to-string is kinda boring, and maybe well understood, I don't see any good reason for it to be accepted by the compiler, especially in a language that is so strictly typed. I can understand a int slice, since it's the same as a byte slice, but a single byte does not make a string...

As for reversible, I did not mean that I expect int(string) to work. That was bad wording from my side. I would much prefer int-to-string type casts not be allowed, than have something like this.

[deleted]

Sythe2o0:

Just fyi-- string(123) will -not- result in "123", but a character, specifically "{". This is equivalent to python's chr, for example.

titpetric:

I stand corrected. Also equivalent to chr/ord in PHP. And I guess many others. Somebody already mentioned that you can convert back from a rune, so you'd have to do something like func ord(s string) int { return int(s[0]) }. Also valid as a null char, but I'm not exactly sure you should use it for that purpose as there's also a decimal/octal notation: https://play.golang.org/p/Ehzlsh6234

ROFLLOLSTER:

Go does not have a strong type system, Haskell does.

Shammyhealz:

That doesn't work, but this does:

int(rune(string(65)[0])))

Strings aren't sized, where runes are. What would the type conversion for "aaa" be? Do you add the value of the UTF-8 characters together? Do you just concatenate them like strings? There's not really a sane answer.

If you convert it to a single rune, you know it's only one character, and you can convert that to an int without all of the strange questions.

deafmalice:

Sorry for the confusion on the reversible part. That was bad wording.
I would prefer that string(0) not be allowed. I agree that int("ABC") is dumb.

Shammyhealz:

string(0) has a logical conversion though. You're thinking of it as passing in an int, which is what you're doing in a literal sense, but in the contextual sense, you're passing in the value of a UTF-8 character. It's basically just letting the compiler know that it should start interpreting a given block of memory as a UTF-8 character rather than an int.

They could add a UTF8Value type, but then you end up doing this:

string(UTF8Value(0))

Which doesn't really accomplish anything, is harder to read (I think) and adds a few key presses without getting any benefit for it.

deafmalice:

A string is a slice of bytes. A UTF8 character is just that, a character. Not exactly a slice.

It would've been more logical to do:

string([]byte{1})

Go already makes the programmer write pretty verbose and sometimes hard to read code. Adding readability and saving a few keypresses for a relatively strange edge case is kind of a moot point.

Shammyhealz:

A string is a slice of bytes. A UTF8 character is just that, a character. Not exactly a slice.

If you go further down, a byte is just a type alias for uint8. So a string is a slice of unsigned integers.

string([]byte{1})

That doesn't work for non-ASCII characters. byte is an alias to uint8, so it's maximum value is 255. UTF-8 characters are up to 4 bytes, so they can be way higher than that. If I pass a UTF-8 int in there, it overflows and wraps around.

That's also not to mention that that code snippet is what Go does under the hood. Take the int, convert it to a uint8, and then use that memory as a string. Go is just saving you the step of explicitly converting to a uint8.

To be honest, I think this is the sane way of handling it. Yours is much harder, and involves segmenting an int on memory boundaries to handle non-ASCII characters. In addition, if I work with ASCII and UTF-8, your solution requires me to have different code paths for each due to the differing lengths of characters in those encodings.

Adding readability and saving a few keypresses for a relatively strange edge case is kind of a moot point.

I don't think it's really all that strange of an edge case. Keyboard input is commonly addressed as the ASCII code for the key that was pressed, instead of the character as a string. It's also a feature of C, so I think they preserve it for feature parity.

deafmalice:

Just an update: got an answer from Rob Pike - https://www.reddit.com/r/golang/comments/78hxd7/why_is_stringint_a_valid_cast_operation/dou5s1g/

I was wrong about the byte part. Still, you can always do this

string([]rune(256, 267, 300)

So my point stands. As for C feature parity, I don't think Go ever strived for that.

titpetric:

https://play.golang.org/p/Ehzlsh6234 - if you want a null terminated string, string(0) is as good a way to produce a null character as any other options. No?

deafmalice:

Appending to 0 a []byte would make more sense, though, both from a intent perspective and from a performance one, as the string concat will result in the string being copied to anther memory location.

titpetric:

Append allocates, and I'm reasonably sure that string(0) might be completely optimized away by the compiler. Go isn't a scripting language ;)
Edit: as I suspected:
5000000               333 ns/op               0 B/op          0 allocs/op

Code:
func BenchmarkString(b *testing.B) {
        var q string
        for i := 0; i < b.N; i++ {
                for j := 0; j < 1000; j++ {
                        q = string(0) + "null" + string(0)
                }
                q = ""
        }
        println(q)
        b.ReportAllocs()
}

deafmalice:

Seems like you're mistaking compiler optimisations for actual no-allocation behaviour. Because you're using an empty string and always assigning the same thing, the compiler notices that and optimises it away.

You'll have to run this on a local machine, since the playground takes too long https://play.golang.org/p/Aiv4lvO-b5.

With compiler optimisation and inlining:

$ go build bmark.go
$ ./bmark -concat
AppendFunc      | Total allocs: 63, Bytes Allocated: 6168958912, AllocPerBytes: 0, String: 1000000000            2.57 ns/op
Append          | Total allocs: 66, Bytes Allocated: 12049299392, AllocPerBytes: 0, String: 2000000000           3.05 ns/op
ConcatFunc      | Total allocs: 0, Bytes Allocated: 0, AllocPerBytes: 0, String: 50000000               28.7 ns/op
String          | Total allocs: 30000000, Bytes Allocated: 60000240, AllocPerBytes: 1, String: 30000000         38.1 ns/op
Empty String    | Total allocs: 0, Bytes Allocated: 0, AllocPerBytes: 0, String: 100000000              10.2 ns/op
Slice           | Total allocs: 1, Bytes Allocated: 16, AllocPerBytes: 0, String: 2000000000             0.68 ns/op
BigConcat       | Total allocs: 1000475, Bytes Allocated: 503995716080, AllocPerBytes: 1, String:  1000000           49594 ns/op

Without optimisation and inlining. Notice how, for some reason, the empty string variant stays at 0 allocations:

$ go build -gcflags '-N -l' bmark.go
$ ./bmark -concat
AppendFunc      | Total allocs: 55, Bytes Allocated: 1034565568, AllocPerBytes: 0, String: 200000000             8.24 ns/op
Append          | Total allocs: 60, Bytes Allocated: 2526492688, AllocPerBytes: 0, String: 500000000             3.45 ns/op
ConcatFunc      | Total allocs: 30000001, Bytes Allocated: 120000480, AllocPerBytes: 1, String: 30000000                45.8 ns/op
String          | Total allocs: 30000000, Bytes Allocated: 60000176, AllocPerBytes: 1, String: 30000000         41.5 ns/op
Empty String    | Total allocs: 0, Bytes Allocated: 0, AllocPerBytes: 0, String: 100000000              11.0 ns/op
Slice           | Total allocs: 1, Bytes Allocated: 16, AllocPerBytes: 0, String: 500000000              3.10 ns/op
BigConcat       | Total allocs: 1000517, Bytes Allocated: 503995719440, AllocPerBytes: 1, String:  1000000           49635 ns/op

TheMerovius:


At the very least, I expected all cast operations to be reversible, as it is with all other types I know.

What does "reversible" mean? There's []byte(string(b)), which will return a value that's different from b, so that's not really reversible. Or there's float64(int(math.Pi)), which isn't equal to math.Pi - or float32(float64(math.Pi)) != float64(float32(math.Pi)).
On a statically type-checked level there is io.Reader(io.ReadCloser(r)), which works, while io.ReadCloser(io.Reader(r)) doesn't. Or, similarly, io.Reader((*io.PipeReader)(r)). We have (chan<- int)((chan int)(x)), which can be okay, while (chan int)((chan<- int)(x)) can't. 2 of these 6 combinations work, the rest don't:
u := []byte(string([]rune("")))
v := []byte([]rune(string("")))
w := string([]rune([]byte("")))
x := string([]byte([]rune("")))
y := []rune(string([]byte("")))
z := []rune([]byte(string("")))

A bunch of these are nit-picky… But I really don't think the generalization you made is valid that way :)

deafmalice:

Good point here. What I meant was to be able to do type1(type2(type1)) without the compiler complaining. I probably went a bit overboard with the reversible part.

Losing some fidelity with going from a bigger type to a smaller one is understandable, but float-to-int is more or less in the same ballpark compared to int-to-string.

As for your examples, most go from a specific type to a more generic type. Where as int to string kind of jumps the gun, string not coming close to being a "general" kind of int. But they do show that not all conversions are type acceptable.

mcandre:

What mind of string is produced by that? I guess it makes sense as a convenience for ASCII codes and such to be castable to strings, but I agree that this operation is somewhat awkward compared to casting int to rune.

TheMerovius:

It seems like a natural consequence of a) having rune be an int-type, meaning untyped integer constants can be used for them (and 'x' being effectively an untyped integer constant) and b) enabling converting runes to string, i.e. enabling foo(string('x')).

Honestly, it seems like an edge-case, but one that won't make a lot of problems in practice, so shrug?

deafmalice:

Do have problems with it... Recently changed a variable from int to string, and was hoping for a build fail for all situations and tests. Because of this conversion, there's no check for that and I'll have to comb a lot of code for this.

skidooer:


I'll have to comb a lot of code for this.

You should be able to build a pretty simple static analyzer to detect occurrences across your codebase. As an added benefit, you can run it on future code to make sure the same problem doesn't creep back in.

TheMerovius:

That seems admittedly unfortunate (though the problem only seem to matter in cases where you'd convert that variable to another string-type again; I agree, that it can't be excluded though). It might warrant a bug filed to consider changing this for Go 2.

DeedleFake:

It's so that you can do string('C'), I would assume.

binaryblade:

65 is a constant not a number, constants have some very special properties. string(65) says cast the constant to a string while int(string(65)) say cast a constant to a string and then cast a string to an int which then fails.

deafmalice:

Try this in the playground
var d int = 65
c = string(d)
print(c)

用户登录

今日阅读排行

一周阅读排行

最新主题