How to safely let users define the DB query they want?

With JSON API, it's becoming more standard to let users define the fields they want to receive back from an API call:

GET /articles?fields[articles]=title,body,author&filter[title]=Go

One can imagine this as a SQL query: SELECT title, body, author FROM articles WHERE title = "Go"

In Go, it might look like: db.Query("SELECT title, body, author FROM articles WHERE title = $1", "Go")

However, is there a way to safely let the user define the fields without being vulnerable to SQL injection?

The following doesn't work, but is akin to what I'm looking to accomplish:

db.Query("SELECT $1, $2, $3 FROM articles WHERE title = $4", "title", "body", "author", "Go")

or better yet:

db.Query("SELECT $1 FROM articles WHERE title = $2", []string{"title", "body", "author"}, "Go")

I know one possible way is to SELECT * and scan the result into a struct, then remove the unnecessary fields and marshal the modified struct into JSON, but that seems tedious.

评论：

dinkumator:

If you really need to, have a set of "allowed" columns, and parse the user-provided text into an interpolated list.

Here's some terrible unchecked code, but you get the idea:

allowed:=map[string]struct{}{
  "title": {}, "body": {}, "author": {},
}
cleaned := []string{"id"}
for _, field := range args["fields[articles]"] {
  if _, ok := allowed[field]; ok {
    cleaned = append(cleaned, field)
  }
}
query := "SELECT "+strings.Join(cleaned, ", ")+" FROM articles"

wittywitwitty:

This is similar to my response. You could also change out the allowed columns for "protected" or "hidden" columns that should be ignored.

everdev:

Yes, this approach could work, thanks!

pharrisee:

Sounds like you could use GraphQL?

https://github.com/graphql-go/graphql

https://outcrawl.com/graphql-server-go-google-app-engine/

everdev:

I've looked at it, but the packages for Go looks pretty new and without much documentation: https://github.com/neelance/graphql-go

wittywitwitty:

You could use transformations. If you were to create an array of "fillable" and "hidden" or "protected" fields it wouldn't matter what was queried since you are essentially sanitizing the query. You could query for all fields or be more specific without worrying about injection attacks.

everdev:

I'm not sure I follow. In essence, I want users to be able to request title, body or author (or any combination of) from the articles table. I know I can SELECT * and then only return what they asked for, but I'm wondering if there's a safe way to construct the exact query the want programmatically.

wittywitwitty:

See /u/dinkumator's answer for an example similar to what I was talking about. The idea is that you define what fields are available and let that be what you use to programmatically build your query. The user could ask for title, body, author, and DROP TABLE articles but only get back the title, body, and author assuming they were in your list of allowed columns/fields. You could also do the inverse and say they can query for anything except for values in a list of protected or hidden columns/fields.
I'm not at my computer at the moment or I'd give you an example. If that still doesn't make sense I'll try to come back with an example sometime tomorrow.

dinkumator:

I wouldn't recommend the blacklist/"inverse" approach since you'd still have to sanitize the inputs pretty extensively.

wittywitwitty:

There's actually not much additional sanitation needed if you set up the lists so that they map the fields/columns to a the same field/column or even a custom name. If you set up your map so that title has the value of title you know that the only field going into the query is title. Essentially what I'm talking about are associative arrays where you use the value as the query parameter. This approach also allows you to make some ugly columns easier to expose on your API if they have really long or prefixed column names.

EDIT: I misread your comment and thought you said you would recommend the inverse approach. I agree and would recommend the fillable/allowed approach first.

everdev:

Makes sense, thanks!

titpetric:

The packages database/sql and jmoiron/sqlx support query parameters or named parameters (sqlx) which will escape any value that you pass to it. In terms of field names and table names, I would suggest you to validate inputs. Never trust user input.

You can either create a slice with all the valid field inputs, or if you really don't want do to that by hand, you can retrieve the column names for a specific table from INFORMATION_SCHEMA table space (mysql, pgsql), or issue a desc [table] to get pretty much the same thing. You can then match any query inputs to actual database schema and reject any that reference non-existing tables or fields.

Edit: I know this example is php but the first answer explains in detail why and how this (no facilities to escape table or column names) is the case.

In the strictest sense, at the database level, prepared statements only allow parameters to be bound for "values" bits of the SQL statement.

One way of thinking of this is "things that can be substituted at runtime execution of the statement without altering its meaning". The table name(s) is not one of those runtime values, as it determines the validity of the SQL statement itself (ie, what column names are valid) and changing it at execution time would potentially alter whether the SQL statement was valid.

lostuserofinterwebs:

You just don't do it. Seriously. Don't think in terms of SQL query, think about implementing a feature that allows accessing said data.

tmornini:

Just ignore it and return all fields.
Bandwidth is incredibly cheap and plentiful.

everdev:

OK, but in some cases the data could be quite large for certain apps. Regardless, is it technically possible to query for user-defined fields safely?

tmornini:

In those cases, make the larger data available elsewhere.

There's NOTHING that says the fields in your response must correspond to the fields in the DB tables...

Is it technically possible? Sure!

wittywitwitty:

Bandwidth may be cheap but depending on the volume and number of fields returned this suggestion could cause performance and overhead issues. It's fine if you only need five fields but it was never stated how many fields could potentially be returned.

tmornini:

If the client side can control the output, and by doing so cause performance and overhead issues, that’s an entirely separate problem.

用户登录

今日阅读排行

一周阅读排行

最新主题