Hi everyone,
I'm trying to go through a file through a Scanner or a Reader line by line, then check if each line matches a certain pattern.
Once this is done, I'd like to record the position of the line to edit it, but file.Seek(0,1) doesn't work well, as Scanner and Reader are buffered and will only show the position of their buffer (every 4096 bytes if I'm not wrong) which does not help a lot.
Any idea about how I could tackle that? Thank you!
Edit: why the downvotes, did I miss something?
评论:
TheMerovius:
Vaglame:You can simply count the bytes you get from the Scanner. This still has a caveat, though, if you have non-unix line endings. A way around that is to supply your own SplitFunc and have that take track -- but I found the SplitFunc API to be somewhat hard to use and it includes a bunch of subtleties. Be aware, that the code I linked isn't actually well-tested, so it might be totally wrong.
Also, as others have pointed out, keep in mind that editing files is non-trivial, unless your replacements are 1:1 in number of bytes. Like, if you want to delete (or shorten) a line, you'd have to move all bytes after that line forward and truncate the file. If you want to insert text, you have to actually take care of buffering the overwritten contents. Like, in a sense, files behave like
[]byte
: You canappend
, you cancopy
, but you can't delete or insert (and the common tricks for slices use the fact that the language will hide a bunch of the book-keeping and buffering for you, which you'll have to do manually). Of course, if you'd change the length of a file, you'd also invalidate all the other line-offsets after it.So, unless you always want to do 1:1 replacement, it's far easier, more sensible and probably just as performant, to write a new file line-by-line and rename it over the old one.
iCurlmyster:Oh thank you very much that's exactly what I was looking for!
Vaglame:If I understand your question correctly you could open it as an *os.File object and use WriteAt. I mean you would have to record the byte position though
iCurlmyster:Thanks for the answer. It's not exactly the writing part that bothers me, it is the reading part. I have found no way to keep track of the actual position when reading a file with a Scanner or a Reader.
For example if I have two lines
this is line one this is line two
and I do scanner.Scan(), even if I only get the text of the first line, the file.Seek methods returns the offset of the end of the file.
Vaglame:Oh okay, yeah, I read it wrong then.
The only thing I can think of right now is you could keep track of how many bytes you are reading with Reader.Read. But other than that I can’t think of anything off the top of my head with scanner, but I also don’t claim to be a golang guru.
Sorry I can’t be more help than that.
albatr0s:I'll give it a try, thanks :)
shovelpost:It's much easier to open a second file and write the results there, once you are done you just rename it.
Vaglame:You could use Scanner which by default reads the input as a set of lines.
shovelpost:Yes but it reads it using a buffer, which makes it very difficult to record its actual position in the file, so far I haven't found a seek() function that could say what the Scanner has read so far. In short I'm looking for something like this: https://golang.org/pkg/text/scanner/#Scanner.Pos, but for bufio.
Vaglame:I'm trying to go through a file through a Scanner or a Reader line by line, then check if each line matches a certain pattern.
Once this is done, I'd like to record the position of the line to edit it
It might be helpful to tell us why you want to do this. Manipulating files has never been the easiest of tasks. Depending on what you are trying to achieve there might be a better way.
shovelpost:I'm trying to keep a record of ip adresses of a few devices. I write them in a file with this format:
device1 address1 device2 address2
etc.
So, every time this address changes, I (try to) update this information in the file
Vaglame:I'm trying to keep a record of ip adresses of a few devices. I write them in a file with this format:
device1 address1 device2 address2
etc.
So, every time this address changes, I (try to) update this information in the file
Based on what you said, I can't see any good reason to save the information in a file. You're only making your life harder.
Save yourself from all the trouble and use an embedded database like Bolt.
tgaz:I'll give it a try, thank you :)
Vaglame:If your file is line-based text, you usually don't edit in-place since line lengths generally change (as /u/albatr0s points out)...
So for a generic "framework/scaffolding" for this problem, I'd use the scanner and output the result to a new temporary file, and when you hit the thing you want to change, you output that instead while discarding the input. Then rename the temp file to the old file. Remove the temp file on error (use a
defer
and just ignore the error).Oh, and I have no idea why you were downvoted. Maybe because the question is about a trivial problem and better suited for StackOverflow or such. But the downvoting feels like it breaks the first subreddit rule.
tgaz:Thanks a lot. Would you consider this solution even if the file must accessed to relatively frequently?
Vaglame:It's the only sane option in a POSIX environment, so yes. Depends on the file size, I guess. But if you care about performance, you should probably not be using a text format anyway. More importantly, it's the only atomic way of modifying a file, which solves concurrency correctness.
Just use a buffered writer, or you might pay a penalty for writing the small
Scanner
fragments.
tgaz:What format would you advise instead of the text one?
0xjnml:That's impossible to say without knowing your use-case. If this is data you have to interface with some other system, you may of course not have a choice.
I've never had the need to modify a text file from Go. For quick things, I'd normally use a Shell script with
awk
orsed
, or a Python script. Manipulating text in Go isn't that bad, but for simple text file manipulation, Python code is definitely nicer.
sed
