Hadoop
OLAP
Kafka
Druid
date | user | url | browser | os | device |
---|---|---|---|---|---|
12.03.17 10:01 | 1 | www.ya.ru | Opera | Windows | DT |
12.03.17 13:44 | 4 | www.st.com | UC | Android | Samsung |
12.03.17 15:23 | 2 | www.ya.ru | Safari | iOS | iPhone |
Лямбда-архитектура!
Nathan Marz
Так бывает???
Данные
Raw Data
Все остальное
V1 = F(Raw Data)
V2 = F(V1)
...
Пользователь А посетил страницу ххх в 22:30 12.03.17 |
Пользователь B посетил страницу yyy в 14:15 13.03.17 |
Пользователь А посетил страницу yyy в 15:27 13.03.17 |
... |
Пользователь | Страница | Посещений |
---|---|---|
A | xxx | 1 |
B | yyy | 1 |
A | yyy | 1 |
Страница | Посещений |
---|---|
xxx | 1 |
yyy | 2 |
Raw Data
V1
V2
Recomputational: чтобы учесть новые данные, нужно повторить вычисления с нуля
Incremental:
новые данные можно добавлять без перевычисления
"Big Data isn't called "Big Data" for nothing" (c)
{
"queryType": "groupBy",
"dataSource": "sample_datasource",
"granularity": "day",
"dimensions": ["country", "device"],
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "carrier", "value": "AT&T" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "make", "value": "Apple" },
{ "type": "selector", "dimension": "make", "value": "Samsung" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
{ "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
"having": {
"type": "greaterThan",
"aggregation": "total_usage",
"value": 100
}
}
[
{
"version" : "v1",
"timestamp" : "2012-01-01T00:00:00.000Z",
"event" : {
"country" : <some_dim_value_one>,
"device" : <some_dim_value_two>,
"total_usage" : <some_value_one>,
"data_transfer" :<some_value_two>,
"avg_usage" : <some_avg_usage_value>
}
},
{
"version" : "v1",
"timestamp" : "2012-01-01T00:00:12.000Z",
"event" : {
"dim1" : <some_other_dim_value_one>,
"dim2" : <some_other_dim_value_two>,
"sample_name1" : <some_other_value_one>,
"sample_name2" :<some_other_value_two>,
"avg_usage" : <some_other_avg_usage_value>
}
},
...
]