|
| 1 | +# Sentiment analysis R app using `MicrosoftML` in SQL Server big data cluster |
| 2 | + |
| 3 | +### Contents |
| 4 | + |
| 5 | +[About this sample](#about-this-sample)<br/> |
| 6 | +[Before you begin](#before-you-begin)<br/> |
| 7 | +[Run this sample](#run-this-sample)<br/> |
| 8 | +[Sample details](#sample-details)<br/> |
| 9 | +[Related links](#related-links)<br/> |
| 10 | + |
| 11 | +<a name=about-this-sample></a> |
| 12 | + |
| 13 | +## About this sample |
| 14 | + |
| 15 | +This is a sample [R](https://www.r-project.org/) app, which does sentiment analysis on review text using the `MicrosoftML` package. This sample creates an app in SQL Server big data cluster that accepts a `reviewText` text input and returns the estimate sentiment for it. The scoring uses a pre-trained model, stored in `sentiment.rds`. The code for this sample is in [sentiment.R](sentiment.R). The model file `sentiment.rds` was generated using the [model-training.R](model-training.R) script. You don't need to run the model training again, unless you want to retrain with other data. Also, this sample shows how to pass commands to execute when setting up the container using the `pre-package-install.sh` file which runs `apt install` to install the `MicrosoftML` package. |
| 16 | +The inputs and outputs for this sample are shown below. |
| 17 | + |
| 18 | +### Inputs |
| 19 | +|Parameter|Description| |
| 20 | +|-|-| |
| 21 | +|`reviewText`|The text to score for sentiment| |
| 22 | + |
| 23 | +### Outputs |
| 24 | +|Parameter|Description| |
| 25 | +|-|-| |
| 26 | +|`out`|A data frame detailing the sentiment score for the `reviewText`| |
| 27 | + |
| 28 | + |
| 29 | +<a name=before-you-begin></a> |
| 30 | + |
| 31 | +## Before you begin |
| 32 | + |
| 33 | +To run this sample, you need the following prerequisites. |
| 34 | + |
| 35 | +**Software prerequisites:** |
| 36 | + |
| 37 | +1. SQL Server big data cluster CTP 2.3 or later. |
| 38 | +2. `mssqlctl`. Refer to [installing mssqlctl](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-mssqlctl?view=sqlallproducts-allversions) document on setting up the `mssqlctl` and connecting to a SQL Server 2019 big data cluster. |
| 39 | + |
| 40 | +<a name=run-this-sample></a> |
| 41 | + |
| 42 | +## Run this sample |
| 43 | + |
| 44 | +1. Clone or download this sample on your computer. |
| 45 | +2. Log in to the SQL Server big data cluster using the command below using the IP address of the `endpoint-service-proxy` in your cluster. If you are not familiar with `mssqltctl` you can refer to the [documentation](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions) and then return to this sample. |
| 46 | + |
| 47 | + ```bash |
| 48 | + mssqlctl login -e https://<ip-address-of-endpoint-service-proxy>:30777 -u <user-name> -p <password> |
| 49 | + ``` |
| 50 | +3. Deploy the application by running the following command, specifying the folder where your `spec.yaml`, `sentiment.rds` and `sentiment.R` files are located: |
| 51 | + ```bash |
| 52 | + mssqlctl app create --spec ./sentiment-analysis |
| 53 | + ``` |
| 54 | +4. Check the deployment by running the following command: |
| 55 | + ```bash |
| 56 | + mssqlctl app list -n sentiment-r -v [version] |
| 57 | + ``` |
| 58 | + Once the app is listed as `Ready` you can continue to the next step. |
| 59 | +5. Test the app by running the following command: |
| 60 | + ```bash |
| 61 | + mssqlctl app run -n sentiment-r -v [version] --input reviewText="Absolutely the best movie experience I have ever had!" |
| 62 | + ``` |
| 63 | + You should get output like the example below. The result of the sentiment analysis scoring is returned as a data frame in `out`. A `PredictedLabel` equal to `1` indicates the sentiment is deemed positive, whereas a `PredictedLabel` of `0` indicates a negative sentiment. The `Probability.1` indicates the level of certainty for the `PredictedLabel` to be the true sentiment. |
| 64 | + ```json |
| 65 | + { |
| 66 | + "changedFiles": [], |
| 67 | + "consoleOutput": "Beginning processing data.\nRows Read: 1, Read Time: 8.51154e-05, Transform Time: 1.90735e-06\nBeginning processing data.\nElapsed time: 00:00:00.0364881\nFinished writing 1 rows.\nWriting completed.\n", |
| 68 | + "errorMessage": "", |
| 69 | + "outputFiles": {}, |
| 70 | + "outputParameters": { |
| 71 | + "out": { |
| 72 | + "PredictedLabel": [ |
| 73 | + "1" |
| 74 | + ], |
| 75 | + "Probability.1": [ |
| 76 | + 0.6523407697677612 |
| 77 | + ], |
| 78 | + "Score.1": [ |
| 79 | + 0.6293442845344543 |
| 80 | + ] |
| 81 | + } |
| 82 | + }, |
| 83 | + "success": true |
| 84 | + } |
| 85 | + ``` |
| 86 | +6. You can clean up the sample by running the following commands: |
| 87 | + ```bash |
| 88 | + # delete app |
| 89 | + mssqlctl app delete --name sentiment-r --version [version] |
| 90 | + ``` |
| 91 | + |
| 92 | +<a name=sample-details></a> |
| 93 | + |
| 94 | +## Sample details |
| 95 | + |
| 96 | +Please refer to [sentiment.R](sentiment.R) for the code that does loads the pre-trained model and scores the `reviewText`. If you would like to explore the code that trains the model and saves it, see [model-training.R](model-training.R). |
| 97 | + |
| 98 | +### Spec file |
| 99 | +Here is the spec file for this application. As you can see the sample uses the `R` runtime and calls the `handler` method in the `sentiment.R` file, accepting a text input named `reviewText` and returning a data frame named `out`. |
| 100 | + |
| 101 | +```yaml |
| 102 | +name: sentiment-r |
| 103 | +version: v1 |
| 104 | +runtime: R |
| 105 | +src: ./sentiment.R |
| 106 | +entrypoint: handler |
| 107 | +replicas: 1 |
| 108 | +poolsize: 1 |
| 109 | +inputs: |
| 110 | + reviewText: character |
| 111 | +output: |
| 112 | + out: data.frame |
| 113 | +``` |
| 114 | + |
| 115 | +<a name=related-links></a> |
| 116 | + |
| 117 | +## Related Links |
| 118 | +For more information, see these articles: |
| 119 | + |
| 120 | +[How to deploy and app on SQL Server 2019 big data cluster (preview)](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-create-apps?view=sqlallproducts-allversions) |
0 commit comments