Adding the sentiment analysis R appdeploy sample.

jeroenterheerdt · jeroenterheerdt · commit 08cc0b7a206d · 2019-03-01T15:20:09.000-08:00
diff --git a/samples/features/sql-big-data-cluster/app-deploy/RollDice/README.md b/samples/features/sql-big-data-cluster/app-deploy/RollDice/README.md
@@ -12,7 +12,7 @@
 
 ## About this sample
 
-This is a sample [R](https://www.r-project.org/) app, which shows how to run a R script in SQL Server big data cluster. This sample creates an app that simulates the rolling of dice. The code for this sample is in [roll-dice.R](roll-dice.R) The inputs and outputs are shown below.
+This is a sample [R](https://www.r-project.org/) app, which shows how to run a R script in SQL Server big data cluster. This sample creates an app that simulates the rolling of dice. The code for this sample is in [roll-dice.R](roll-dice.R). The inputs and outputs are shown below.
 
 ### Inputs
 |Parameter|Description|
diff --git a/samples/features/sql-big-data-cluster/app-deploy/addpy/README.md b/samples/features/sql-big-data-cluster/app-deploy/addpy/README.md
@@ -12,7 +12,7 @@
 
 ## About this sample
 
-This is a sample [Python](https://www.python.org/) app, which shows how to run a Python script in SQL Server big data cluster. This sample creates an app that adds two whole numbers and returns the result. The code for this sample is in [add.py](add.py) The inputs and outputs are shown below.
+This is a sample [Python](https://www.python.org/) app, which shows how to run a Python script in SQL Server big data cluster. This sample creates an app that adds two whole numbers and returns the result. The code for this sample is in [add.py](add.py). The inputs and outputs are shown below.
 
 ### Inputs
 |Parameter|Description|
diff --git a/samples/features/sql-big-data-cluster/app-deploy/magic8ball/README.md b/samples/features/sql-big-data-cluster/app-deploy/magic8ball/README.md
@@ -12,7 +12,7 @@
 
 ## About this sample
 
-This is a sample [Python](https://www.python.org/) app, which runs a [Magic 8-Ball](https://en.wikipedia.org/wiki/Magic_8-Ball). This sample creates an app that adds requires a question as input and returns an answer to the question. The code for this sample is in [magic8ball.py](magic8ball.py) The inputs and outputs are shown below.
+This is a sample [Python](https://www.python.org/) app, which runs a [Magic 8-Ball](https://en.wikipedia.org/wiki/Magic_8-Ball). This sample creates an app that adds requires a question as input and returns an answer to the question. The code for this sample is in [magic8ball.py](magic8ball.py). The inputs and outputs are shown below.
 
 ### Inputs
 |Parameter|Description|
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/README.md b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/README.md
@@ -0,0 +1,120 @@
+# Sentiment analysis R app using `MicrosoftML` in SQL Server big data cluster
+
+### Contents
+
+[About this sample](#about-this-sample)<br/>
+[Before you begin](#before-you-begin)<br/>
+[Run this sample](#run-this-sample)<br/>
+[Sample details](#sample-details)<br/>
+[Related links](#related-links)<br/>
+
+<a name=about-this-sample></a>
+
+## About this sample
+
+This is a sample [R](https://www.r-project.org/) app, which does sentiment analysis on review text using the `MicrosoftML` package. This sample creates an app in SQL Server big data cluster that accepts a `reviewText` text input and returns the estimate sentiment for it. The scoring uses a pre-trained model, stored in `sentiment.rds`. The code for this sample is in [sentiment.R](sentiment.R). The model file `sentiment.rds` was generated using the [model-training.R](model-training.R) script. You don't need to run the model training again, unless you want to retrain with other data. Also, this sample shows how to pass commands to execute when setting up the container using the `pre-package-install.sh` file which runs `apt install` to install the `MicrosoftML` package.
+The inputs and outputs for this sample are shown below.
+
+### Inputs
+|Parameter|Description|
+|-|-|
+|`reviewText`|The text to score for sentiment|
+
+### Outputs
+|Parameter|Description|
+|-|-|
+|`out`|A data frame detailing the sentiment score for the `reviewText`|
+
+
+<a name=before-you-begin></a>
+
+## Before you begin
+
+To run this sample, you need the following prerequisites.
+
+**Software prerequisites:**
+
+1. SQL Server big data cluster CTP 2.3 or later.
+2. `mssqlctl`. Refer to [installing mssqlctl](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-mssqlctl?view=sqlallproducts-allversions) document on setting up the `mssqlctl` and connecting to a SQL Server 2019 big data cluster.
+
+<a name=run-this-sample></a>
+
+## Run this sample
+
+1. Clone or download this sample on your computer.
+2. Log in to the SQL Server big data cluster using the command below using the IP address of the `endpoint-service-proxy` in your cluster. If you are not familiar with `mssqltctl` you can refer to the [documentation](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions) and then return to this sample.
+
+    ```bash
+    mssqlctl login -e https://<ip-address-of-endpoint-service-proxy>:30777 -u <user-name> -p <password>
+    ```
+3. Deploy the application by running the following command, specifying the folder where your `spec.yaml`, `sentiment.rds` and `sentiment.R` files are located:
+    ```bash
+    mssqlctl app create --spec ./sentiment-analysis
+    ```
+4. Check the deployment by running the following command:
+    ```bash
+    mssqlctl app list -n sentiment-r -v [version]
+    ```
+    Once the app is listed as `Ready` you can continue to the next step.
+5. Test the app by running the following command:
+    ```bash
+    mssqlctl app run -n sentiment-r  -v [version] --input reviewText="Absolutely the best movie experience I have ever had!"
+    ```
+    You should get output like the example below. The result of the sentiment analysis scoring is returned as a data frame in `out`. A `PredictedLabel` equal to `1` indicates the sentiment is deemed positive, whereas a `PredictedLabel` of `0` indicates a negative sentiment. The `Probability.1` indicates the level of certainty for the `PredictedLabel` to be the true sentiment.
+    ```json
+    {
+      "changedFiles": [],
+      "consoleOutput": "Beginning processing data.\nRows Read: 1, Read Time: 8.51154e-05, Transform Time: 1.90735e-06\nBeginning processing data.\nElapsed time: 00:00:00.0364881\nFinished writing 1 rows.\nWriting completed.\n",
+      "errorMessage": "",
+      "outputFiles": {},
+      "outputParameters": {
+        "out": {
+          "PredictedLabel": [
+            "1"
+          ],
+          "Probability.1": [
+            0.6523407697677612
+          ],
+          "Score.1": [
+            0.6293442845344543
+          ]
+        }
+      },
+      "success": true
+    }
+    ```
+6. You can clean up the sample by running the following commands:
+    ```bash
+    # delete app
+    mssqlctl app delete --name sentiment-r --version [version]
+    ```
+
+<a name=sample-details></a>
+
+## Sample details
+
+Please refer to [sentiment.R](sentiment.R) for the code that does loads the pre-trained model and scores the `reviewText`. If you would like to explore the code that trains the model and saves it, see [model-training.R](model-training.R).
+
+### Spec file
+Here is the spec file for this application. As you can see the sample uses the `R` runtime and calls the `handler` method in the `sentiment.R` file, accepting a text input named `reviewText` and returning a data frame named `out`.
+
+```yaml
+name: sentiment-r
+version: v1
+runtime: R
+src: ./sentiment.R
+entrypoint: handler
+replicas: 1
+poolsize: 1
+inputs:
+  reviewText: character
+output:
+  out: data.frame
+```
+
+<a name=related-links></a>
+
+## Related Links
+For more information, see these articles:
+
+[How to deploy and app on SQL Server 2019 big data cluster (preview)](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions)
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/model-training.R b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/model-training.R
@@ -0,0 +1,52 @@
+# Load the MicrosoftML library
+library(MicrosoftML)
+ 
+###
+# Function to read in text file and perform data preparation
+###
+getData <- function(tempDir, dataFile) {
+  
+  # Unzip and read in the file
+  data <- read.csv(unz(temp, dataFile),
+                   sep = "\t")
+  
+  # Add column names. 1st column is the text, 2nd column is the rating
+  colnames(data) <- c("Text", "Rating")
+  
+  # Convert to a string based dataframe
+  data <- data.frame(lapply(data, as.character), stringsAsFactors=FALSE)
+  
+  # Convert the Rating column to numeric
+  data$Rating <- as.integer(data$Rating)
+  
+  return(data)
+}
+ 
+# The data we'll use is the Sentiment Labelled Sentences Data Set
+# http://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
+ 
+# We'll pull the data from the UCI database directly
+# Since it's a zip file, we'll need to store and extract from some local location
+ 
+# So get a local temp location
+temp <- tempfile()
+ 
+# Download the zip file to the temp location
+zipfile <- download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip",temp)
+ 
+# We'll use the imdb_labelled.txt file for training
+dataTrain <- getData(temp, "sentiment labelled sentences/imdb_labelled.txt")
+ 
+# Now let's setup the text featurizer transform
+textTransform = list(featurizeText(vars = c(Features = "Text")))
+ 
+# Train a linear model on featurized text
+model <- rxFastLinear(
+  Rating ~ Features, 
+  data = dataTrain,
+  mlTransforms = textTransform
+)
+ 
+serialized <- rxSerializeModel(model, metadata = NULL, relatimeScoringOnly = FALSE)
+ 
+saveRDS(serialized, file = "sentiment.rds")
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/pre-package-install.sh b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/pre-package-install.sh
@@ -0,0 +1,2 @@
+#!/bin/bash -e
+apt install microsoft-mlserver-mml-r-9.3.0
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/sentiment.R b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/sentiment.R
@@ -0,0 +1,9 @@
+library(MicrosoftML)
+rds <- readRDS("sentiment.rds")
+model <- rxUnserializeModel(rds)
+
+handler <- function(reviewText) {
+  dataFrame = data.frame(Text = reviewText, Rating = as.integer(c(0)), stringsAsFactors = FALSE)
+  result <- rxPredict(model, data = dataFrame)
+  result
+}
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/sentiment.rds b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/sentiment.rds
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/spec.yaml b/samples/features/sql-big-data-cluster/app-deploy/sentiment-analysis/spec.yaml
@@ -0,0 +1,11 @@
+name: sentiment-r
+version: v1
+runtime: R
+src: ./sentiment.R
+entrypoint: handler
+replicas: 1
+poolsize: 1
+inputs:
+  reviewText: character
+output:
+  out: data.frame
diff --git a/samples/features/sql-big-data-cluster/app-deploy/sumofsq/README.md b/samples/features/sql-big-data-cluster/app-deploy/sumofsq/README.md
@@ -12,7 +12,7 @@
 
 ## About this sample
 
-This is a sample [R](https://www.r-project.org/) app, which takes two whole numbers as input and returns the sum of squares (a^2 + b^2). The code for this sample is in [sum_of_squares.R](sum_of_squares.R) The inputs and outputs are shown below.
+This is a sample [R](https://www.r-project.org/) app, which takes two whole numbers as input and returns the sum of squares (a^2 + b^2). The code for this sample is in [sum_of_squares.R](sum_of_squares.R). The inputs and outputs are shown below.
 
 ### Inputs
 |Parameter|Description|

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+#!/bin/bash -e`
	`2`	`+apt install microsoft-mlserver-mml-r-9.3.0`