Encrypting files if you know nothing about encryption?

polaris · 2016-12-30 14:02:29 · 621 次点击

这是一个分享于 2016-12-30 14:02:29 的资源，其中的信息可能已经有所发展或是发生改变。

So, i'm sure i'm committing some cardinal sin, but i'm looking to add encryption to a project i'm working on and i know little to nothing about encryption.

Specifically, i want the user to be able to provide a file (some type of key file on disk) and it will decrypt bytes being read from disk. Likewise, they can write data in the same way, encrypting the data to disk.

Now, that's the UX i want - is there a fool-proof way to implement this in Go? Seeing as i know so little about encryption i just want to know what algo i should be using, what libraries are good for that, and etc.

Thoughts? I'm there some good libraries and crypto algos that will let this fool encrypt without fear of introducing some core weakness to the encrypted files.

Thanks to any replies!

评论：

bear1728:

You might also want to check out this "copy and paste friendly" repo: https://github.com/gtank/cryptopasta

throwlikepollock:

Awesome! Appreciate this

qftvfu:

Libsodium is perfect for you: https://jedisct1.gitbooks.io/libsodium/content/

The short version is that developers should almost never be using functions like AES/RSA/SHA256 directly. You should use a higher-level function that smarter people have written. 

Whilst libsodium is for C, the documentation is excellent, and the specific functions are found in Go. For example, for Public Key Authentication, libsodium uses crypto_box_keypair. These functions are available with slightly different names in https://godoc.org/golang.org/x/crypto/nacl/box, or you can use a libsodium wrapper from https://github.com/GoKillers/libsodium-go

tscs37:

NaCl or Libsodium.

Chacha20 and Ed25519 are almost foolproof crypto if you don't do anything obviously idiotic.

I severely recommend to learn atleast a bit about crypto and the common pitfalls.

comrade-jim:

Watch this:

https://www.youtube.com/watch?v=2r_KMzXB74w&index=20&list=PL2ntRZ1ySWBdliXelGAItjzTMxy2WQh0P

throwlikepollock:

Watching now, thanks!

dchapes:


[I]'m looking to add encryption to a project [I]'m working on and i know little to nothing about encryption.

Don't expect it to be secure and more importantly don't make any claims that it is secure. Encryption solutions fail not because of the underlying math (e.g. AES) but because of implementation flaws. If you are not a cryptographer the chances of using a low-level package like crypto/cipher correctly without adding any implementation flaws is almost non-existent.

[E.g. a semi-recent essay by Bruce Schneier: Cryptography Is Harder Than It Looks]

throwlikepollock:

In the above example, where someone plugs in the key and a program encrypts/decrypts data from disk, what might be some flaws in the implementation?

Is it primarily using crypto incorrectly (of which i'd likely use the cryptopasta from above)? Or is there more subtle stuff like not leaving the key in memory for long and only loading it on demand, etc?

I know it's nearly impossible for you to point out pitfalls in my exact program, i'm just trying to get a feel for hypotheticals in a simple crypto design. Eg, if you hand me a secure function and all i have to do is to pipe bytes in and bytes out with a key, what am i likely to still screw up?

These "Crypto is harder than it looks" style talks are exactly what makes people like me nervous to begin with haha.

epiris:

It's not so much about calling a function, you're a developer, that's easy for you. It's selecting the right functions, understanding side effects and caveats of their usage and if you should call them at all.

Key management is important, but every detail is. How do you know he data decrypted was in fact data the user encrypted? If an operation fails why? Did the user enter a wrong key or was it possible data tampering? Corruption? How do you handle key rotation in the event of a key compromise or security flaw in the app that requires responsible disclosure? How much ciphertext can fn(K) produce until it leaks entropy? Does it even leak entropy? Asymmetric or symmetric? Is the size always fixed or arbitrary. How likely is cipher text to be observed by bad actors? How many rounds do you need to perform in your key derivation function to protect against offline dictionary attacks? You shouldn't just use the same key for each file, so how do you integrate entropy from file details such as name, date or a special guid from a separate table without losing determinism or breaking other invariants you have already settled on? Do you want the user to have plausible deniability in the event of an invasive government investigation? What is legal in the country's your users operate in, does the lowest common denominator from a legal perspective diminish your security posture as a whole? If so what do you do?

On and on I ask myself these questions, no one here can tell you the right approach without knowing every minor detail. Just food for thought. Have fun and think through things carefully!

throwlikepollock:

You shouldn't just use the same key for each file, so how do you integrate entropy from file details such as name, date or a special guid from a separate table without losing determinism or breaking other invariants you have already settled on?

While much of that went over my head, this is an interesting question. In the implementation i have, the only data i have is the raw bytes to store. I also have the hash, but that will of course be lost once it is encrypted. No metadata exists.

Is this a fundamental problem? There will be millions of data entries all without any reproducible metadata to be encrypted with.

epiris:

I can't responsibly answer that without a lot of unknowns being defined, but I can give you something to think about. I feel like:

In the implementation i have, the only data i have is the raw bytes to store.

This statement insinuates a paradigm similar to block level encryption, i.e. Linux block devices and LUKS. Since you have no meta data, indices, etc. However the next statement seems to be in contention with this:

There will be millions of data entries all without any reproducible metadata to be encrypted with.

If you have millions of data entries, you have millions of distinct records. If you have distinct records then you do not store raw bytes, you store encrypted but distinct records. They have meaning to you or the user in some way because they derive from a distinct transaction. They must be retrieved later, by filename, by location, or iterating the entire set. I'm sure you don't intend to walk the entire file system each time the user requests a file so you must have a index or some kind of way to look these files up. Even if you decrypt the entire file system and bring it into some data structures in memory, at some point they have to be represented as the distinct values originally given.

Point is they must at least have a canonical location relative to some rooted point with an association to the user, how do you know after a user is authenticated that a group of files belongs to them?

If they exist on the users local machine and you have a tool that runs there, I really think you will be doing your users more harm then good pursuing this. Lets say you take the time to get a correct implementation, don't forget you have to support it ongoing. You have to become a responsible member of the security community and watch for CVE's for your dependencies, ensure you have a mechanism to quickly notify users of vulnerabilities. You have so much responsibility to take ownership of to be acting in good faith, you need to keep a proper audit trail in case a user every attempts to enter litigation with you because a flaw in your implementation causes them damages. Maybe your product is free, on github, or just encrypts cat pictures, has a license that "Frees" you from liability, none of that matters when you receive a subpoena it must be dealt with through legal channels regardless of merit. If a vulnerability is found or a bug in your implementation in a future migration from old product versions loses all your customers data, that is your company / reputation on the line and the results could be disastrous for your future. I would really think about it carefully!

If that is enough to make you reconsider, you can still provide encryption for your users through engagement and being a evangelist of vetted best practices for your end users given platforms. You could write tools or tutorials to help with the setup or configuration of those systems, which is a slipper slope but much less risk then a start-from-scratch.

I wrote this a good while back for a buddy, but has some documentation on LUKS encryption. If you are targeting Linux you can actually allocate a file and mount it as a block device so it may be used as a regular file system. Then that data can be sync'd to dropbox, or other external storage systems securely. It also supports multiple keys, so the user can have backup keys or share their files with approved individuals. It would be very easy to implement scripts for this, around line 150 I touch on this and have an example script in bash.

If you have to support OSX/Windows, I would suggest good write ups on bitlocker and FileVault. Or might be able to support all 3 with the spiritual successor of TrueCrypt, VeraCrypt.. though I do not know the status of the project. I only use Linux, which means Luks and OpenPGP (GnuPG implementation).

The TLDR; if it's just bytes of data, I would highly recommend block level encryption from a vetted implementation to relinquish the responsibilities imposed by implementing this yourself. If you choose to move forward in your implementation, there are too many unknowns here to even nudge you in the right direction. Don't let this be discouraging or take offense, it's written with good intent. I am sure if you put enough time and resources into this, you can properly implement it, just be careful and patient! Happy coding.

throwlikepollock:

The TLDR; if it's just bytes of data, I would highly recommend block level encryption from a vetted implementation to relinquish the responsibilities imposed by implementing this yourself. If you choose to move forward in your implementation, there are too many unknowns here to even nudge you in the right direction. Don't let this be discouraging or take offense, it's written with good intent. I am sure if you put enough time and resources into this, you can properly implement it, just be careful and patient! Happy coding.

This is really an excellent post, thank you! I don't take offense to it, though it is mildly discouraging haha, but you raise many excellent points.

My implementation is primarily just a locally running web server, think a home grown S3 implementation, and i wanted to be able to encrypt/decrypt the data on disk so that when the key is not present, the data is "safe". This is cross platform unfortunately, also. (and free, cat pictures, etc)

With that said, security is not something i want to take too lightly, and this post was primarily about "fool proof" implementations. I wanted an easy (for the user, my wife, etc) way to plug and play this encrypted data - not have to spin up TrueCrypt/BitLocker/etc on our home server. Unfortunately, your post makes me feel as if that is the only sane way to do it.

I must say, it feels weird comparing your comments to the linked presentation above. The two show a start difference in how i look at adding AES to my data on disk.

I really appreciate the time you took to write this, thank you! I have much thinking to do now.

dchapes:

tl;dr: Make sure to not fall into the all too common, "I heard AES256 is the best; I just found that lib/function and call it. Hey now I'm super duper secure secure and no one can ever attack this!" fallacy.

epiris mentions many things to consider but there are even more basic things like which "mode" to use with a block cipher (e.g. ECB, CBC, CFB, OFB, etc). For example ECB should never be used unless you have very specific requirements and fully understand the implications; there is a history of products that naively used ECB opening up many attack vectors (e.g. you mention your data has records; if either the record size or block size is an integer multiple of the other an attacker can trivially and undetectably manipulate your data by reordering the data blocks).

Did you make sure to use some kind of authentication with your encryption? If not do you know what that implies? If so did you use an HMAC? Correctly? Or did use a cipher with built in authentication?

You mentioned "some type of key file on disk", is this a raw key (e.g. from /dev/random with the correct size), a password, a pass-phrase, a random file such as an image? If it's anything other than a raw key how are you converting it into a key (e.g. PBKDF). Are you aware of the issues of storing such sensitive information "on disk"? Are you aware how your operating system stores such files and that it's unlikely there is any way to securely wipe/delete it?

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

621 次点击

加入收藏微博

github

linux

godoc

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Encrypting files if you know nothing about encryption?

用户登录

今日阅读排行

一周阅读排行

最新主题