Skip to content

Commit 1ff4d05

Browse files
authored
[docs](function) add REGR aggregate function docs for current and 4.x (#3544)
## Versions Add REGR aggregate function docs for apache/doris#61352 - [x] dev - [x] 4.x - [ ] 3.x - [ ] 2.1 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [x] Checked by AI - [ ] Test Cases Built
1 parent 97abf4a commit 1ff4d05

38 files changed

Lines changed: 2086 additions & 300 deletions

File tree

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
{
3+
"title": "REGR_AVGX",
4+
"language": "en",
5+
"description": "Returns the average of the independent variable (x) for non-null pairs in a group."
6+
}
7+
---
8+
9+
## Description
10+
11+
Returns the average of the independent variable `x` over non-null `(y, x)` pairs in a group, where `x` is the independent variable and `y` is the dependent variable.
12+
13+
## Syntax
14+
15+
```sql
16+
REGR_AVGX(<y>, <x>)
17+
```
18+
19+
## Parameters
20+
21+
| Parameter | Description |
22+
| -- | -- |
23+
| `<y>` | The dependent variable. Supported type: Double. |
24+
| `<x>` | The independent variable. Supported type: Double. |
25+
26+
## Return Value
27+
28+
Returns a Double value representing the average of `x` for non-null `(y, x)` pairs.
29+
If there are no rows in the group, or all rows contain NULLs for the expressions, the function returns `NULL`.
30+
31+
## Example
32+
33+
```sql
34+
CREATE TABLE test_regr (
35+
`id` int,
36+
`x` double,
37+
`y` double
38+
) DUPLICATE KEY (`id`)
39+
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
40+
PROPERTIES (
41+
"replication_allocation" = "tag.location.default: 1"
42+
);
43+
44+
INSERT INTO test_regr VALUES
45+
(1, 0, NULL),
46+
(2, 1, 3),
47+
(2, 2, 5),
48+
(2, 3, 7),
49+
(2, 4, 9),
50+
(2, 5, NULL);
51+
```
52+
53+
```sql
54+
SELECT id, REGR_AVGX(y, x) FROM test_regr GROUP BY id ORDER BY id;
55+
```
56+
57+
```text
58+
+------+--------------------+
59+
| id | REGR_AVGX(y, x) |
60+
+------+--------------------+
61+
| 1 | NULL |
62+
| 2 | 2.5 |
63+
+------+--------------------+
64+
```
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
{
3+
"title": "REGR_AVGY",
4+
"language": "en",
5+
"description": "Returns the average of the dependent variable (y) for non-null pairs in a group."
6+
}
7+
---
8+
9+
## Description
10+
11+
Returns the average of the dependent variable `y` over non-null `(y, x)` pairs in a group, where `x` is the independent variable and `y` is the dependent variable.
12+
13+
## Syntax
14+
15+
```sql
16+
REGR_AVGY(<y>, <x>)
17+
```
18+
19+
## Parameters
20+
21+
| Parameter | Description |
22+
| -- | -- |
23+
| `<y>` | The dependent variable. Supported type: Double. |
24+
| `<x>` | The independent variable. Supported type: Double. |
25+
26+
## Return Value
27+
28+
Returns a Double value representing the average of `y` for non-null `(y, x)` pairs.
29+
If there are no rows in the group, or all rows contain NULLs for the expressions, the function returns `NULL`.
30+
31+
## Example
32+
33+
```sql
34+
CREATE TABLE test_regr (
35+
`id` int,
36+
`x` double,
37+
`y` double
38+
) DUPLICATE KEY (`id`)
39+
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
40+
PROPERTIES (
41+
"replication_allocation" = "tag.location.default: 1"
42+
);
43+
44+
INSERT INTO test_regr VALUES
45+
(1, 0, NULL),
46+
(2, 1, 3),
47+
(2, 2, 5),
48+
(2, 3, 7),
49+
(2, 4, 9),
50+
(2, 5, NULL);
51+
```
52+
53+
```sql
54+
SELECT id, REGR_AVGY(y, x) FROM test_regr GROUP BY id ORDER BY id;
55+
```
56+
57+
```text
58+
+------+------------------+
59+
| id | REGR_AVGY(y, x) |
60+
+------+------------------+
61+
| 1 | NULL |
62+
| 2 | 6.0 |
63+
+------+------------------+
64+
```
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
{
3+
"title": "REGR_COUNT",
4+
"language": "en",
5+
"description": "Returns the number of non-null (y, x) pairs in a group."
6+
}
7+
---
8+
9+
## Description
10+
11+
Returns the number of non-null `(y, x)` pairs in a group, where `x` is the independent variable and `y` is the dependent variable. If there are no valid non-null pairs, the function returns `0`.
12+
13+
## Syntax
14+
15+
```sql
16+
REGR_COUNT(<y>, <x>)
17+
```
18+
19+
## Parameters
20+
21+
| Parameter | Description |
22+
| -- | -- |
23+
| `<y>` | The dependent variable. Supported type: Double. |
24+
| `<x>` | The independent variable. Supported type: Double. |
25+
26+
## Return Value
27+
28+
Returns a BIGINT value representing the number of non-null `(y, x)` pairs.
29+
If there are no rows in the group, or there are no valid non-null `(y, x)` pairs, the function returns `0`.
30+
31+
## Example
32+
33+
```sql
34+
CREATE TABLE test_regr (
35+
`id` int,
36+
`x` double,
37+
`y` double
38+
) DUPLICATE KEY (`id`)
39+
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
40+
PROPERTIES (
41+
"replication_allocation" = "tag.location.default: 1"
42+
);
43+
44+
INSERT INTO test_regr VALUES
45+
(1, 0, NULL),
46+
(2, 1, 3),
47+
(2, 2, 5),
48+
(2, 3, 7),
49+
(2, 4, 9),
50+
(2, 5, NULL);
51+
```
52+
53+
```sql
54+
SELECT id, REGR_COUNT(y, x) FROM test_regr GROUP BY id ORDER BY id;
55+
```
56+
57+
```text
58+
+------+-------------------+
59+
| id | REGR_COUNT(y, x) |
60+
+------+-------------------+
61+
| 1 | 0 |
62+
| 2 | 4 |
63+
+------+-------------------+
64+
```
65+
66+
REGR_COUNT counts only non-null `(y, x)` pairs, so group 1 returns `0`.

docs/sql-manual/sql-functions/aggregate-functions/regr-intercept.md

Lines changed: 22 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,13 @@
22
{
33
"title": "REGR_INTERCEPT",
44
"language": "en",
5-
"description": "Returns the intercept of the univariate linear regression line for non-null pairs in a group."
5+
"description": "Returns the intercept of the linear regression line for non-null pairs in a group."
66
}
77
---
88

99
## Description
1010

11-
Returns the intercept of the univariate linear regression line for non-null pairs in a group. It is computed for non-null pairs using the following formula:
12-
13-
`AVG(y) - REGR_SLOPE(y, x) * AVG(x)`
14-
15-
Where `x` is the independent variable and y is the dependent variable.
11+
Returns the intercept of the linear regression line computed over non-null `(y, x)` pairs in a group, where `x` is the independent variable and `y` is the dependent variable. It is equivalent to `AVG(y) - REGR_SLOPE(y, x) * AVG(x)`.
1612

1713
## Syntax
1814

@@ -29,52 +25,40 @@ REGR_INTERCEPT(<y>, <x>)
2925

3026
## Return Value
3127

32-
Returns a Double value representing the intercept of the univariate linear regression line for non-null pairs in a group. If there are no rows, or only rows that contain nulls, the function returns NULL.
28+
Returns a Double value representing the intercept of the linear regression line.
29+
If there are no rows in the group, or all rows contain NULLs for the expressions, the function returns `NULL`.
3330

34-
## Examples
31+
## Example
3532

3633
```sql
37-
-- Create sample table
38-
CREATE TABLE test_regr_intercept (
34+
CREATE TABLE test_regr (
3935
`id` int,
40-
`x` int,
41-
`y` int
36+
`x` double,
37+
`y` double
4238
) DUPLICATE KEY (`id`)
4339
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
4440
PROPERTIES (
4541
"replication_allocation" = "tag.location.default: 1"
4642
);
4743

48-
-- Insert sample data
49-
INSERT INTO test_regr_intercept VALUES
50-
(1, 18, 13),
51-
(2, 14, 27),
52-
(3, 12, 2),
53-
(4, 5, 6),
54-
(5, 10, 20);
55-
56-
-- Calculate the linear regression intercept of x and y
57-
SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept;
58-
```
59-
60-
```text
61-
+----------------------+
62-
| REGR_INTERCEPT(y, x) |
63-
+----------------------+
64-
| 5.512931034482759 |
65-
+----------------------+
44+
INSERT INTO test_regr VALUES
45+
(1, 0, NULL),
46+
(2, 1, 3),
47+
(2, 2, 5),
48+
(2, 3, 7),
49+
(2, 4, 9),
50+
(2, 5, NULL);
6651
```
6752

6853
```sql
69-
SELECT REGR_INTERCEPT(y, x) FROM test_regr_intercept where x>100;
54+
SELECT id, REGR_INTERCEPT(y, x) FROM test_regr GROUP BY id ORDER BY id;
7055
```
7156

72-
When there are no rows in the group, the function returns `NULL`.
73-
7457
```text
75-
+----------------------+
76-
| REGR_INTERCEPT(y, x) |
77-
+----------------------+
78-
| NULL |
79-
+----------------------+
58+
+------+------------------------+
59+
| id | REGR_INTERCEPT(y, x) |
60+
+------+------------------------+
61+
| 1 | NULL |
62+
| 2 | 1.0 |
63+
+------+------------------------+
8064
```
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
{
3+
"title": "REGR_R2",
4+
"language": "en",
5+
"description": "Returns the coefficient of determination of the linear regression for non-null pairs in a group."
6+
}
7+
---
8+
9+
## Description
10+
11+
Returns the coefficient of determination of the linear regression computed over non-null `(y, x)` pairs in a group, where `x` is the independent variable and `y` is the dependent variable.
12+
13+
## Syntax
14+
15+
```sql
16+
REGR_R2(<y>, <x>)
17+
```
18+
19+
## Parameters
20+
21+
| Parameter | Description |
22+
| -- | -- |
23+
| `<y>` | The dependent variable. Supported type: Double. |
24+
| `<x>` | The independent variable. Supported type: Double. |
25+
26+
## Return Value
27+
28+
Returns a Double value representing the coefficient of determination (R-squared).
29+
- If `REGR_COUNT(y, x) < 1`, the function returns `NULL`.
30+
- If `VAR_POP(x) = 0`, the function returns `NULL`.
31+
- If `VAR_POP(y) = 0`, the function returns `1`.
32+
- Otherwise, the function returns `POWER(CORR(y, x), 2)`.
33+
34+
## Example
35+
36+
```sql
37+
CREATE TABLE test_regr (
38+
`id` int,
39+
`x` double,
40+
`y` double
41+
) DUPLICATE KEY (`id`)
42+
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
43+
PROPERTIES (
44+
"replication_allocation" = "tag.location.default: 1"
45+
);
46+
47+
INSERT INTO test_regr VALUES
48+
(1, 0, NULL),
49+
(2, 1, 3),
50+
(2, 2, 5),
51+
(2, 3, 7),
52+
(2, 4, 9),
53+
(2, 5, NULL),
54+
(3, 1, 5),
55+
(3, 1, 7),
56+
(4, 1, 5),
57+
(4, 2, 5);
58+
```
59+
60+
```sql
61+
SELECT id, REGR_R2(y, x) FROM test_regr GROUP BY id ORDER BY id;
62+
```
63+
64+
```text
65+
+------+---------------------+
66+
| id | REGR_R2(y, x) |
67+
+------+---------------------+
68+
| 1 | NULL |
69+
| 2 | 1.0 |
70+
| 3 | NULL |
71+
| 4 | 1.0 |
72+
+------+---------------------+
73+
```
74+
75+
Group 3 shows the `VAR_POP(x) = 0` case, so the result is `NULL`, and group 4 shows the `VAR_POP(y) = 0` case, so the result is `1.0`.

0 commit comments

Comments
 (0)