What is the best practice to update 2 million rows data in MySQL by Golang?

xuanbao · · 1218 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>As update by primary id, I have to update 1 by 1. It is so slow even if I have 2000 goroutes. What is the correct method to update by primary id faster?</p> <hr/>**评论:**<br/><br/>cafxx1985: <pre><pre><code>BEGIN; UPDATE table SET col1=value1 WHERE id=1; UPDATE table SET col1=value2 WHERE id=2; UPDATE table SET col1=value3 WHERE id=3; ... UPDATE table SET col1=valueN WHERE id=N; COMMIT; </code></pre> <p>This way you can update thousands of rows in a single request. Run all of this in one or multiple batches/transactions and you&#39;re good to go. As others have pointed out, this is not a golang problem.</p></pre>mcouturier: <pre><p>Wanted to say the same thing. Use Go to generate an SQL file.</p></pre>Zlepper: <pre><p>Depending on your day library you can batch queries, so you don&#39;t have to wait for 1 query to finish, but rather can send them all to the mysql server at once.</p></pre>xter0: <pre><p>Use single SQL query updating all resources instead multiple queries inside multiple goroutines.</p></pre>kingname: <pre><p>No way. Because every primary id is unique, can not update more than 1 row by 1 query statement.</p></pre>Zlepper: <pre><p>What is the condition for updating the rows? You can use the WHERE for other things than id</p> <p>Assume you have a table like this: Id, value 1, 10 2, 15 3, 6 Etc.</p> <p>You can then do Update t set value = value * 2 where value &gt; 8;</p> <p>That will double the value in all rows where the value is already more than 8.</p></pre>kingname: <pre><p>My condition is id, where id = 12345 or where id = 78901 I use tx to solve this problem.</p> <p>tx, _ := db.Begin()</p> <p>tx.Exec(&#34;sql&#34;, param)</p> <p>tx.Exec(&#34;sql&#34;, param)</p> <p>tx.Commit()</p> <p>P.S.: How can I write code in reddit....</p></pre>skidooer: <pre><pre><code>db.Exec(&#34;update table set foo = ? where id in (12345, 78901, ...)&#34;, param) </code></pre> <blockquote> <p>P.S.: How can I write code in reddit....</p> </blockquote> <p>From the formatting help located right below the comment box: &#34;Lines starting with four spaces are treated like code&#34;</p></pre>epiris: <pre><p>As said by others this is smells like a design issue (xy problem). As long as you feel you must update one by one the problem will exist, as each record no matter the concurrency will need to lock the table while doing work. Without the reasoning for your &#34;I have to clause&#34; no one can really help.</p> <p>Questions I would ask someone who came to me with this:</p> <ul> <li>can you get updates in smaller batches? I.e. more aggressive polling for example without implications for that system?</li> <li>does this update edit every record in the table? If so you are not editing a table, you are replacing it. So do that instead, as suggested in thread. If this is the case it sounds more like a cache and should be designed with a storage system more suitable for it.</li> <li>if the update intervals are fixed and the data is updated than can you change the way you partition your data and shift some responsibility to readers to aggregate across new tables? I.e insert records into new table with a naming convention suitable for your retention / rolloff, like table_update00N and have a table that keeps a counter clients can check for the current update ranges, deleting tables as they roll out of range.</li> </ul></pre>klauspost: <pre><p>I suspect this is more of a MySQL issue than a Go issue.</p> <p>In my experience this is one of the tradeoffs between SQL and NoSQL. Your DB is likely to have a table lock, so when you get above a certain number of concurrent operations per second your DB operates serially.</p> <p>Investigate if it is possible to shard your table. I haven&#39;t experimented with this personally, since I usually go for another type of DB when there are heavy write operations.</p> <p>Of course you could also look into doing what you are doing in another way, but since you didn&#39;t describe the problem you are solving there is no way for us to help.</p></pre>kingname: <pre><p>I have use Golang&#39;s tx to solve this problem.</p> <p>tx, _ := db.Begin()</p> <p>tx.Exec(&#34;sql&#34;, param)</p> <p>tx.Exec(&#34;sql&#34;, param)</p> <p>tx.Commit()</p></pre>mm_5687: <pre><p>Do it in two queries. This isn&#39;t a problem to solve using go.</p> <p>First, update the ID so that there will be no conflict with the desired range (e.g. id = id + 2000001), then use a second query to set the id to the desired value (e.g. id = id - 2000000).</p> <p>Alternatively, add a new column, set it to id+1, delete the old column, and rename the new column.</p></pre>twek: <pre><p>In my experience the fastest way to update a large number of rows to unique values that can&#39;t be calculated (assuming that&#39;s why you think you need multiple queries?) Is to use bulk copy <a href="https://github.com/denisenkom/go-mssqldb/blob/master/examples/bulk/bulk.go" rel="nofollow">https://github.com/denisenkom/go-mssqldb/blob/master/examples/bulk/bulk.go</a> I&#39;m not sure there is a mysql equivalent. But if there is. You bulk copy in a set of rows with the primary key and the new value into a temp table then do an update statement with an inner join to the new table then delete the temp table. Mssql bcp can do hundreds of thousands of inserts in less than a second</p></pre>Redundancy_: <pre><p>Your problem is partially that you&#39;re changing the primary key very incrementally and doing it with transactions, so you&#39;re incurring a lot of cost in practically every place you could. You&#39;ve got the transaction log cost as well as potential table locks - It&#39;s a db issue, not a Go issue.</p> <p>You have to find a better way of doing it - preferably something that doesn&#39;t involve hideous locking or transaction logs. I&#39;d be looking at the possibility of creating a new table with the same schema and filling it in a single operation, then deleting your original table and renaming the new one.</p></pre>adamcolton: <pre><p>I did something along the same lines a while back - my memory might be a bit fuzzy and the case might be a little different. What I found to be fastest was LOAD DATA INFILE with indexing disabled. In my case, I was using Go to pull data from a bunch of sources, I&#39;d write the data to files (I broke them up into reasonable chunks, 1GB I think), then turn off the indexing on the tables, load the data from the files to the database and re-enable indexing. I think I was working with ~50 million lines of data on m3.large and I could everything in ~45 minutes. Not sure if that helps.</p></pre>ChristophBerger: <pre><p>The main bottleneck by far when mass-updating a table are the table indexes, especially if the table has many of them. Every insert or update also needs to update all of the indexes. These individual index updates sum up.</p> <p>A faster way is to drop all indexes before doing the mass insert, and recreate them afterwards. This should give a very noticeable performance boost (of course depending on the number of indexes). </p></pre>BurpsWangy: <pre><p>Another misconception about goroutines/threads. Goroutines/threads help with responsiveness... they don&#39;t increase overall performance. Matter of fact, the reverse is true. The more goroutines/threads being utilized... the slower the overall application performance.</p> <p>Like suggested here, optimize your SQL call and do away with all the goroutines doing the handling for you.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

1218 次点击  
加入收藏 微博
1 回复  |  直到
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传