best way to write a file handler to serve gzipped files

阅读 17 次  0 条评论
    <p>Hello, I&#39;m trying to write a http handler to serve gzipped files but if the client does not accept the gzip content-encoding, it falls back to identity.</p> <p>I see three ways to do this.</p> <p>First is dynamic gzipping. If a client supports it and a file is gzippable, I gzip the file on the fly and then serve it. Interestingly, I noticed that google does the opposite. They have the gzipped versions of the files stored, but when a client does not accept the gzip content-encoding, the server decompresses the gzipped file on the fly and serves that. This first method is attractive because it is simple to implement but it can be CPU intensive.</p> <p>Second is static gzipping on the file system, e.g. storing .gz files next to regular files. Another variation on this might be to have a second root folder, one which contains the gzipped files. This is attractive because you can gzip files at the highest compression level to save the most on bandwidth but it requires some setup.</p> <p>Third is a combination of the first two, gzipping files as they are requested and storing them in an in memory cache and serving subsequent requests to clients that accept the gzip content-encoding from there. This combines the simplicity of the first option with the performance benefits of the second option but it is significantly more complex to write the code.</p> <p>Which do you think is the best option?</p> <hr/>**评论:**<br/><br/>sh41: <pre><p>I&#39;ve worked a lot in this problem area, and I&#39;d like to share my thoughts.</p> <p>To answer what&#39;s the best option, I think you should look at the probability of each condition and multiply by the cost of dealing with it.</p> <p>Ignoring malicious clients, I think the vast majority of HTTP clients today (i.e., browsers) support and accept gzip compression. I don&#39;t know of any that don&#39;t. So if you want to optimize for run-time performance (instead of compilation speed), it makes sense to do the work of gzipping in advance, rather than when serving each HTTP request.</p> <p>One factor you haven&#39;t considered in your question is that not all files are worth gzip compressing. I.e., some files, when gzip compressed, will actually become larger than uncompressed.</p> <p>This usually happens when the file is binary and already compressed, such as jpeg, png, zip. It can also happen for text files such as .css or .js, if they&#39;re tiny (because the overhead of gzip headers becomes higher than the savings). It might be a good idea to do the work of detecting which of your static files are worth gzip compressing in advance too (and making note of which ones to avoid trying to compress).</p> <p>The strategy I&#39;ve come up with that seems optimal to me, given my goal of optimizing for run-time performance, is as follows:</p> <ol> <li>If the client doesn&#39;t accept gzip compression, serve file without compression.</li> <li>If the file was determined (in advance) to not be worth compressing, serve file without compression.</li> <li>If we got this far, we&#39;ll serve compressed file. But first, determine Content-Type before compressing (since it&#39;s harder/not possible after compressing).</li> <li>If we have access to a compressed version of the file (compressed in advance), serve those compressed bytes directly.</li> <li>Otherwise, apply gzip compression dynamically and serve that. Optionally, detect if not worth gzip compressing, and revert to serving uncompressed version; but doing so adds extra latency since you can&#39;t start writing until you&#39;ve finished compressing.</li> </ol> <p>That seems to be roughly optimal to me, but it can be tweaked and adjusted depending on specific needs or preferences. If anyone has improvement suggestions, I welcome them.</p> <p>This is roughly the algorithm that&#39;s implemented in <a href=""><code>httpgzip</code> package</a>, see:</p> <p><a href=""></a></p> <p>Another factor that&#39;s important to me is that I want to have static Go binaries with all assets embedded inside, for easier distribution. But, I also want to be able to read directly from disk without regenerating/rebuilding during development.</p> <p>As a result, I use <code>httpgzip</code> in combination with <a href=""><code>vfsgen</code></a>. <code>vfsgen</code> runs at <code>go generate</code> time for production use, and generates an <code>http.FileSystem</code> that compresses all files that are worth compressing, and makes note of the ones that are not. Then <code>httpgzip</code> uses all that information to serve static resources in an optimal way. There&#39;s very little work done at run-time to serve files that are gzip compressed, all the heavy-lifting happens during <code>go generate</code> time.</p> <p>For development, I use <code>-tags=dev</code> mode, which I&#39;ve setup to read from disk directly for each request. It allows me to modify files on disk and refresh the web page to see changes more quickly.</p> <p>Hope that helps, I&#39;m happy to answer more questions or accept improvement suggestions. I&#39;ve iterated on this strategy for some time, but it&#39;s possible there&#39;s still room for improvement.</p></pre>karma_vacuum123: <pre><p>great comment and good more thing to add...the best way to optimize delivery of many of the types of relevant files mentioned here is to exploit locality with a CDN</p></pre>raff99: <pre><p><a href="" rel="nofollow"></a></p></pre>nhooyr: <pre><p>that is option 1, is it better than the other options though? I don&#39;t need a package, I can implement any of them myself. I&#39;m interested in their advantages/disadvantages.</p></pre>jsabey: <pre><p>I don&#39;t know if this entirely correct (I don&#39;t know if Accept-Encoding would expect the Range before or after compression), but If you store pre gzipped files you can easily make use of <a href="" rel="nofollow"></a></p> <p>You would also probably not need to store the ungzipped files and could implement a io.ReadSeeker to decompress the gzipped files before sending them to the clients that don&#39;t support gzip and still support ServeContent</p> <p>I don&#39;t know if you if you could do it in reverse easily if gzip wasn&#39;t deterministic</p></pre>RenThraysk: <pre><p>Second also has the advantage of using the best available compressor. Like Zopfli </p> <p><a href="" rel="nofollow"></a></p> <p>Which takes considerably longer to compress than gzip so less suited to dynamic approaches, but can result in better compression ratios. Plus there is a zopflipng for PNG images.</p></pre>karma_vacuum123: <pre><p>use a CDN</p></pre>gohacker: <pre><p>Always serve gzip and ignore the infinitesimal quantity of (perhaps malicious) clients that do not support it.</p></pre>



    (您需要 登录 后才能评论 没有账号 ?)
    • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
    • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet