Skip to content

Commit 08cc0b7

Browse files
Adding the sentiment analysis R appdeploy sample.
1 parent 2c5441f commit 08cc0b7

10 files changed

Lines changed: 198 additions & 4 deletions

File tree

samples/features/sql-big-data-cluster/app-deploy/RollDice/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
## About this sample
1414

15-
This is a sample [R](https://www.r-project.org/) app, which shows how to run a R script in SQL Server big data cluster. This sample creates an app that simulates the rolling of dice. The code for this sample is in [roll-dice.R](roll-dice.R) The inputs and outputs are shown below.
15+
This is a sample [R](https://www.r-project.org/) app, which shows how to run a R script in SQL Server big data cluster. This sample creates an app that simulates the rolling of dice. The code for this sample is in [roll-dice.R](roll-dice.R). The inputs and outputs are shown below.
1616

1717
### Inputs
1818
|Parameter|Description|

samples/features/sql-big-data-cluster/app-deploy/addpy/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
## About this sample
1414

15-
This is a sample [Python](https://www.python.org/) app, which shows how to run a Python script in SQL Server big data cluster. This sample creates an app that adds two whole numbers and returns the result. The code for this sample is in [add.py](add.py) The inputs and outputs are shown below.
15+
This is a sample [Python](https://www.python.org/) app, which shows how to run a Python script in SQL Server big data cluster. This sample creates an app that adds two whole numbers and returns the result. The code for this sample is in [add.py](add.py). The inputs and outputs are shown below.
1616

1717
### Inputs
1818
|Parameter|Description|

samples/features/sql-big-data-cluster/app-deploy/magic8ball/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
## About this sample
1414

15-
This is a sample [Python](https://www.python.org/) app, which runs a [Magic 8-Ball](https://en.wikipedia.org/wiki/Magic_8-Ball). This sample creates an app that adds requires a question as input and returns an answer to the question. The code for this sample is in [magic8ball.py](magic8ball.py) The inputs and outputs are shown below.
15+
This is a sample [Python](https://www.python.org/) app, which runs a [Magic 8-Ball](https://en.wikipedia.org/wiki/Magic_8-Ball). This sample creates an app that adds requires a question as input and returns an answer to the question. The code for this sample is in [magic8ball.py](magic8ball.py). The inputs and outputs are shown below.
1616

1717
### Inputs
1818
|Parameter|Description|
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Sentiment analysis R app using `MicrosoftML` in SQL Server big data cluster
2+
3+
### Contents
4+
5+
[About this sample](#about-this-sample)<br/>
6+
[Before you begin](#before-you-begin)<br/>
7+
[Run this sample](#run-this-sample)<br/>
8+
[Sample details](#sample-details)<br/>
9+
[Related links](#related-links)<br/>
10+
11+
<a name=about-this-sample></a>
12+
13+
## About this sample
14+
15+
This is a sample [R](https://www.r-project.org/) app, which does sentiment analysis on review text using the `MicrosoftML` package. This sample creates an app in SQL Server big data cluster that accepts a `reviewText` text input and returns the estimate sentiment for it. The scoring uses a pre-trained model, stored in `sentiment.rds`. The code for this sample is in [sentiment.R](sentiment.R). The model file `sentiment.rds` was generated using the [model-training.R](model-training.R) script. You don't need to run the model training again, unless you want to retrain with other data. Also, this sample shows how to pass commands to execute when setting up the container using the `pre-package-install.sh` file which runs `apt install` to install the `MicrosoftML` package.
16+
The inputs and outputs for this sample are shown below.
17+
18+
### Inputs
19+
|Parameter|Description|
20+
|-|-|
21+
|`reviewText`|The text to score for sentiment|
22+
23+
### Outputs
24+
|Parameter|Description|
25+
|-|-|
26+
|`out`|A data frame detailing the sentiment score for the `reviewText`|
27+
28+
29+
<a name=before-you-begin></a>
30+
31+
## Before you begin
32+
33+
To run this sample, you need the following prerequisites.
34+
35+
**Software prerequisites:**
36+
37+
1. SQL Server big data cluster CTP 2.3 or later.
38+
2. `mssqlctl`. Refer to [installing mssqlctl](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-mssqlctl?view=sqlallproducts-allversions) document on setting up the `mssqlctl` and connecting to a SQL Server 2019 big data cluster.
39+
40+
<a name=run-this-sample></a>
41+
42+
## Run this sample
43+
44+
1. Clone or download this sample on your computer.
45+
2. Log in to the SQL Server big data cluster using the command below using the IP address of the `endpoint-service-proxy` in your cluster. If you are not familiar with `mssqltctl` you can refer to the [documentation](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions) and then return to this sample.
46+
47+
```bash
48+
mssqlctl login -e https://<ip-address-of-endpoint-service-proxy>:30777 -u <user-name> -p <password>
49+
```
50+
3. Deploy the application by running the following command, specifying the folder where your `spec.yaml`, `sentiment.rds` and `sentiment.R` files are located:
51+
```bash
52+
mssqlctl app create --spec ./sentiment-analysis
53+
```
54+
4. Check the deployment by running the following command:
55+
```bash
56+
mssqlctl app list -n sentiment-r -v [version]
57+
```
58+
Once the app is listed as `Ready` you can continue to the next step.
59+
5. Test the app by running the following command:
60+
```bash
61+
mssqlctl app run -n sentiment-r -v [version] --input reviewText="Absolutely the best movie experience I have ever had!"
62+
```
63+
You should get output like the example below. The result of the sentiment analysis scoring is returned as a data frame in `out`. A `PredictedLabel` equal to `1` indicates the sentiment is deemed positive, whereas a `PredictedLabel` of `0` indicates a negative sentiment. The `Probability.1` indicates the level of certainty for the `PredictedLabel` to be the true sentiment.
64+
```json
65+
{
66+
"changedFiles": [],
67+
"consoleOutput": "Beginning processing data.\nRows Read: 1, Read Time: 8.51154e-05, Transform Time: 1.90735e-06\nBeginning processing data.\nElapsed time: 00:00:00.0364881\nFinished writing 1 rows.\nWriting completed.\n",
68+
"errorMessage": "",
69+
"outputFiles": {},
70+
"outputParameters": {
71+
"out": {
72+
"PredictedLabel": [
73+
"1"
74+
],
75+
"Probability.1": [
76+
0.6523407697677612
77+
],
78+
"Score.1": [
79+
0.6293442845344543
80+
]
81+
}
82+
},
83+
"success": true
84+
}
85+
```
86+
6. You can clean up the sample by running the following commands:
87+
```bash
88+
# delete app
89+
mssqlctl app delete --name sentiment-r --version [version]
90+
```
91+
92+
<a name=sample-details></a>
93+
94+
## Sample details
95+
96+
Please refer to [sentiment.R](sentiment.R) for the code that does loads the pre-trained model and scores the `reviewText`. If you would like to explore the code that trains the model and saves it, see [model-training.R](model-training.R).
97+
98+
### Spec file
99+
Here is the spec file for this application. As you can see the sample uses the `R` runtime and calls the `handler` method in the `sentiment.R` file, accepting a text input named `reviewText` and returning a data frame named `out`.
100+
101+
```yaml
102+
name: sentiment-r
103+
version: v1
104+
runtime: R
105+
src: ./sentiment.R
106+
entrypoint: handler
107+
replicas: 1
108+
poolsize: 1
109+
inputs:
110+
reviewText: character
111+
output:
112+
out: data.frame
113+
```
114+
115+
<a name=related-links></a>
116+
117+
## Related Links
118+
For more information, see these articles:
119+
120+
[How to deploy and app on SQL Server 2019 big data cluster (preview)](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Load the MicrosoftML library
2+
library(MicrosoftML)
3+
4+
###
5+
# Function to read in text file and perform data preparation
6+
###
7+
getData <- function(tempDir, dataFile) {
8+
9+
# Unzip and read in the file
10+
data <- read.csv(unz(temp, dataFile),
11+
sep = "\t")
12+
13+
# Add column names. 1st column is the text, 2nd column is the rating
14+
colnames(data) <- c("Text", "Rating")
15+
16+
# Convert to a string based dataframe
17+
data <- data.frame(lapply(data, as.character), stringsAsFactors=FALSE)
18+
19+
# Convert the Rating column to numeric
20+
data$Rating <- as.integer(data$Rating)
21+
22+
return(data)
23+
}
24+
25+
# The data we'll use is the Sentiment Labelled Sentences Data Set
26+
# http://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
27+
28+
# We'll pull the data from the UCI database directly
29+
# Since it's a zip file, we'll need to store and extract from some local location
30+
31+
# So get a local temp location
32+
temp <- tempfile()
33+
34+
# Download the zip file to the temp location
35+
zipfile <- download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip",temp)
36+
37+
# We'll use the imdb_labelled.txt file for training
38+
dataTrain <- getData(temp, "sentiment labelled sentences/imdb_labelled.txt")
39+
40+
# Now let's setup the text featurizer transform
41+
textTransform = list(featurizeText(vars = c(Features = "Text")))
42+
43+
# Train a linear model on featurized text
44+
model <- rxFastLinear(
45+
Rating ~ Features,
46+
data = dataTrain,
47+
mlTransforms = textTransform
48+
)
49+
50+
serialized <- rxSerializeModel(model, metadata = NULL, relatimeScoringOnly = FALSE)
51+
52+
saveRDS(serialized, file = "sentiment.rds")
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#!/bin/bash -e
2+
apt install microsoft-mlserver-mml-r-9.3.0
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
library(MicrosoftML)
2+
rds <- readRDS("sentiment.rds")
3+
model <- rxUnserializeModel(rds)
4+
5+
handler <- function(reviewText) {
6+
dataFrame = data.frame(Text = reviewText, Rating = as.integer(c(0)), stringsAsFactors = FALSE)
7+
result <- rxPredict(model, data = dataFrame)
8+
result
9+
}
Binary file not shown.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
name: sentiment-r
2+
version: v1
3+
runtime: R
4+
src: ./sentiment.R
5+
entrypoint: handler
6+
replicas: 1
7+
poolsize: 1
8+
inputs:
9+
reviewText: character
10+
output:
11+
out: data.frame

samples/features/sql-big-data-cluster/app-deploy/sumofsq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
## About this sample
1414

15-
This is a sample [R](https://www.r-project.org/) app, which takes two whole numbers as input and returns the sum of squares (a^2 + b^2). The code for this sample is in [sum_of_squares.R](sum_of_squares.R) The inputs and outputs are shown below.
15+
This is a sample [R](https://www.r-project.org/) app, which takes two whole numbers as input and returns the sum of squares (a^2 + b^2). The code for this sample is in [sum_of_squares.R](sum_of_squares.R). The inputs and outputs are shown below.
1616

1717
### Inputs
1818
|Parameter|Description|

0 commit comments

Comments
 (0)