Golang, Machine Learning and Spark - Need some help from gophers?!

polaris · · 558 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Hi Everyone,</p> <p><strong>Context</strong></p> <p>I have built a machine learning algorithm in Golang for my company&#39;s ad network to perform ad click prediction. The ad delivery machines are written in Golang, (and do their job like a boss!). Currently I train a model written in golang (basically a logistic regression model) in realtime and then every minute export this model to every ad delivery machine. The model needs to be embedded in each machine as overall the service predicts on ~1 billion ad candidates every day and introducing any network roundtrip would probably add unacceptable latency.</p> <p><strong>The issue</strong> </p> <p>I want to reimplement the training of the model using Spark 2.0, as it seems a very advanced platform for ML. My question is does anybody know how i can export a trained model from Spark and into some Golang service? Seems Spark uses PMML for exporting a model, but can&#39;t find anything in Golang that could load this to then make predictions.</p> <hr/>**评论:**<br/><br/>chewxy: <pre><blockquote> <p>Currently I train a model written in golang (basically a logistic regression model) in realtime and then every minute export this model to every ad delivery machine. The model needs to be embedded in each machine as overall the service predicts on ~1 billion ad candidates every day and introducing any network roundtrip would probably add unacceptable latency.</p> </blockquote> <p>Sorry to be blunt, but are you sure you know what you&#39;re doing? I once wrote a RTB system that used fairly advanced ML stuff. Never needed to do all that, and we had to respond to DBM/ADX&#39;s requirements of returning results in under 50ms.</p> <p>I&#39;d advise rethinking the architecture of your ML program. Use plenty of caches. Depending on your input signals, you can precalculate a lot of things and don&#39;t even need Spark. Heck, you can handle plenty of data with bash tools alone. But the trick is to flatten your regressions into a simple lookup table, and update that lookup table every hour or so. Also depending on your input, you don&#39;t need to go do a regression on 1B serves. A small subset is more than enough to make good robust predictions</p> <p>email me if you need more help - my username at gmail.com</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

558 次点击  
加入收藏 微博
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传