5. Model runtime
As mentioned in Section 2.3, Using the IPU Inference Toolkit, using the IPU Inference Toolkit is divided into two phases: model compilation and model runtime. This chapter describes how to deploy and run models with PopRT, Triton Inference Server or TensorFlow Serving after the model has been converted and compiled to a PopEF model as described in Section 4, Model compilation.
Note
Since the commands and code in the examples in this chapter are relatively long, the HTML version of the document is recommended for copying.
When copying from a PDF document, ensure that commands are not truncated by line breaks or page breaks.
5.1. Run with PopRT Runtime
PopRT Runtime is a runtime environment included in PopRT for loading and running PopEF models. PopRT Runtime provides Python and C++ APIs, which can be used for rapid verification of PopEF files or integration with machine learning frameworks and model service frameworks. This section will take the executable.popef
generated in Section 4.1.4, Model conversion and compilation as an example to describe how to use the PopRT Runtime API to load and run PopEF models.
5.1.1. Environment preparation
Switch to the directory where the executable.popef
in Section 4.1.4, Model conversion and compilation is located, and check whether the directory is correct with ls
.
$ ls `pwd -P` | grep bertsquad-12_fp16_bs_16.onnx
The following output shows that the current directory is correct.
bertsquad-12_fp16_bs_16.onnx
Start the Docker container with the following command:
$ gc-docker -- --rm -it \
-v `pwd -P`:/model_runtime_test \
-w /model_runtime_test \
--entrypoint /bin/bash \
graphcorecn/poprt-staging:latest
5.1.2. Run with PopRT Runtime Python API
The sample code is shown in Listing 5.1. Save it as model_runner_quick_start.py
.
# Copyright (c) 2022 Graphcore Ltd. All rights reserved.
import numpy as np
from poprt import runtime
# Load the popef
runner = runtime.ModelRunner('executable.popef')
# Get the input and output information of the model
inputs = runner.get_model_inputs()
outputs = runner.get_model_outputs()
# Create the random inputs and zero outputs
inputs_dict = {
x.name: np.random.randint(2, size=x.shape).astype(x.numpy_data_type())
for x in inputs
}
outputs_dict = {
x.name: np.zeros(x.shape).astype(x.numpy_data_type()) for x in outputs
}
# Execute the inference
runner.execute(inputs_dict, outputs_dict)
# Check the output values
for name, value in outputs_dict.items():
print(f'{name} : {value}')
Run the saved sample code:
$ python3 model_runner_quick_start.py
A successful run will produce an output similar to the following:
unstack:1 : [[-0.9604 -1.379 -2.01 ... -1.814 -1.78 -1.626 ]
[-1.051 -1.977 -1.913 ... -1.435 -1.681 -1.251 ]
[-3.67 -2.71 -2.78 ... -3.951 -4.027 -3.959 ]
...
[-0.0919 -0.6445 -0.3125 ... -0.384 -0.54 -0.3152]
[-0.69 -1.071 -1.421 ... -1.533 -1.456 -1.389 ]
[-3.56 -2.99 -3.23 ... -4.05 -3.977 -3.955 ]]
unstack:0 : [[-1.437 -1.645 -2.17 ... -2.139 -2.379 -2.281 ]
[-1.259 -1.8545 -1.915 ... -1.804 -1.8955 -1.671 ]
[-2.832 -2.057 -2.104 ... -3.29 -3.34 -3.36 ]
...
[-0.4673 -0.8716 -0.8545 ... -1.253 -1.287 -1.289 ]
[-1.288 -1.481 -1.928 ... -2.158 -2.146 -2.129 ]
[-2.762 -2.43 -2.6 ... -3.418 -3.23 -3.324 ]]
unique_ids:0 : [1 0 1 1 1 0 0 1 1 0 1 1 1 1 0 1]
5.1.3. Run with PopRT Runtime C++ API
PopRT Runtime also provides sample C++ API code as shown in Listing 5.2. Save it as model_runner_quick_start.cpp
.
// Copyright (c) 2022 Graphcore Ltd. All rights reserved.
#include <iostream>
#include "poprt/runtime/model_runner.hpp"
int main(int argc, char* argv[]) {
// Load the PopEF file
auto runner = poprt::runtime::ModelRunner("executable.popef");
// Get the inputs and outputs information of model
auto inputs = runner.getModelInputs();
auto outputs = runner.getModelOutputs();
// Create the inputs and ouputs
poprt::runtime::InputMemoryView in;
poprt::runtime::OutputMemoryView out;
std::vector<std::shared_ptr<unsigned char[]>> memories;
int i=0;
for(const auto& input : inputs){
memories.push_back(std::shared_ptr<unsigned char[]>(new unsigned char[input.sizeInBytes]));
in.emplace(input.name, poprt::runtime::ConstTensorMemoryView(memories[i++].get(), input.sizeInBytes));
}
for(const auto& output : outputs){
memories.push_back(std::shared_ptr<unsigned char[]>(new unsigned char[output.sizeInBytes]));
out.emplace(output.name, poprt::runtime::TensorMemoryView(memories[i++].get(), output.sizeInBytes));
}
// Execute the inference
runner.execute(in, out);
// Print the result information
std::cout << "Sucessfully executed. The outputs are: " << std::endl;
for(const auto& output: outputs)
std::cout << "name: " << output.name
<< ", dataType: " << output.dataType
<< ", sizeInBytes: " << output.sizeInBytes
<< std::endl;
}
Compile the sample code.
$ apt-get update && \
apt-get install g++ -y && \
g++ model_runner_quick_start.cpp -o model_runner_quick_start \
--std=c++14 -I/usr/local/lib/python3.8/dist-packages/poprt/include \
-L/usr/local/lib/python3.8/dist-packages/poprt/lib \
-lpoprt_runtime -lpoprt_compiler -lpopef
Run the example program obtained from compilation.
$ LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/poprt/lib:$LD_LIBRARY_PATH ./model_runner_quick_start
A successful run will result in the following output:
Sucessfully execute, the outputs:
name: unstack:1, dataType: F16, sizeInBytes: 8192
name: unstack:0, dataType: F16, sizeInBytes: 8192
name: unique_ids:0, dataType: S32, sizeInBytes: 64
Note
After completing the above example, exit the current container and return to the host environment.
5.2. Deploy to Triton Inference Server
The Poplar SDK includes a plugin (libtriton_poplar.so
) for the Triton Inference Server, implemented with Poplar Model Runtime. This is responsible for loading and running PopEF files. For more information about the Poplar Triton Backend, refer to the Poplar Triton Backend: User Guide.
This section will use the executable.popef
file generated in Section 4.1.4, Model conversion and compilation as an example to explain how to deploy the PopEF file from compilation to a Triton Inference Server.
5.2.1. Environment preparation
First, switch to the directory where the executable.popef
generated in Section 4.1.4, Model conversion and compilation is located, and check whether the directory is correct with the following command.
$ ls `pwd -P` | grep bertsquad-12_fp16_bs_16.onnx
The following output shows that the current directory is correct:
bertsquad-12_fp16_bs_16.onnx
Create a directory for model_repository
.
$ mkdir -p model_repository/bertsquad-12/1/ && \
cp executable.popef model_repository/bertsquad-12/1/ && \
touch model_repository/bertsquad-12/config.pbtxt && \
cd model_repository
The directory structure is as follows.
$ tree .
.
└── bertsquad-12
├── 1
│ └── executable.popef
└── config.pbtxt
bertsquad-12
is the name of the model.1
indicates the version of the model.executable.popef
is the PopEF file generated by model compilation.config.pbtxt
is the Triton configuration file described in Configuration of generated model.
Note
Sometimes the names of a model’s inputs and outputs contain special characters which are not accepted by the Triton Inference Server. In this case, the Poplar Triton Backend can remap these names for you. Refer to the section on Input/Output Name Mapping in the Poplar Triton Backend: User Guide.
5.2.2. Configuration of generated model
To deploy the model to a Triton Inference Server, you need to create a configuration file config.pbtxt
for the model, which contains for example the name of the model, the backend to be used, batching information and input and output information. For more information about model configuration, refer to the Triton Model Configuration documentation.
The configuration of config.pbtxt
used in this example is shown in Listing 5.3. Copy this configuration to the empty config.pbtxt
generated above.
name: "bertsquad-12"
backend: "poplar"
max_batch_size: 16
dynamic_batching {
preferred_batch_size: [16]
max_queue_delay_microseconds: 5000
}
input [
{
name: "input_ids:0"
data_type: TYPE_INT32
dims: [ 256 ]
},
{
name: "input_mask:0"
data_type: TYPE_INT32
dims: [ 256 ]
},
{
name: "segment_ids:0"
data_type: TYPE_INT32
dims: [ 256 ]
},
{
name: "unique_ids_raw_output___9:0"
data_type: TYPE_INT32
dims: [ 1 ]
reshape: { shape: [ ] }
}
]
output [
{
name: "unique_ids:0"
data_type: TYPE_INT32
dims: [ 1 ]
reshape: { shape: [ ] }
},
{
name: "unstack:0"
data_type: TYPE_FP16
dims: [ 256 ]
},
{
name: "unstack:1"
data_type: TYPE_FP16
dims: [ 256 ]
}
]
parameters [
{
key: "synchronous_execution"
value:{string_value: "1"}
},
{
key: "timeout_ns"
value:{string_value: "500000"}
}
]
Model name
The model name, bertsquad-12
, is usually the same as the directory name where the model is located.
Backend
poplar
indicates that we are using the Poplar Triton Backend.
Batching
Since the Poplar Triton Backend supports dynamic batches, we recommended setting the values of max_batch_size
and preferred_batch_size
to integer multiples of the batch size of the model. The batch size of the model in this example is 16. For simplicity, you can set the two parameter values to the batch size.
Input and output
The input and output names, types and dimension information can be viewed with the popef_dump
tool. popef_dump
is included in the Poplar SDK and it allows you to analyze a PopEF file without using a C++ or Python API. The output shows the file structure and gives basic information. popef_dump
does not allow you to view any binary content. For more information about popef_dump
, refer to PopEF file analysis in the PopEF: User Guide.
$ gc-docker -- --rm \
-v `pwd -P`:/models \
--entrypoint popef_dump \
graphcorecn/toolkit-triton-staging:latest \
/models/bertsquad-12/1/executable.popef
The following is an excerpt of the output of the popef_dump
command:
PopEF file: executable.popef
Metadata:
...
Anchors:
Inputs (User-provided):
Name: "input_ids:0":
TensorInfo: { dtype: S32, sizeInBytes: 16384, shape [16, 256] }
Programs: [5]
Handle: h2d_input_ids:0
IsPerReplica: **False**
...
Outputs (User-provided):
Name: "unique_ids:0":
TensorInfo: { dtype: S32, sizeInBytes: 64, shape [16] }
Programs: [5]
Handle: anchor_d2h_unique_ids:0
IsPerReplica: **False**
...
From the above excerpt of popef_dump
, you can see the relationship between the model input and output in PopEF, and the input and output in the model configuration file. For the data type of dtype
, refer to PopEF Tensor and Feed Data and Types supported by Poplar Triton Backend.
In the case that max_batch_size
is not set to 0, the dimension information of dims
in the model configuration file does not contain the batch-size dimension; for example, the dimension of input_ids: 0
is [16, 256], which is configured as dims: [ 256 ]
in the model configuration file. For the input and output that only contain the dimension of the batch size, such as unique_ids:0
, you need to set dims: [ 1 ]
and use reshape: { shape: [ ] }
to convert the dimension into one-dimension. For more information about dimension settings, refer to the Triton Model Configuration documentation.
For more description about the fields in config.pbtxt
, refer to Triton model configuration in the Poplar Triton Backend documentation.
5.2.3. Start model service
Start the container with gc-docker
. If the Poplar SDK is not installed on the host, refer to Section 3.7.2, Run a Docker container.
$ gc-docker -- --rm \
--network=host \
-v `pwd -P`:/models \
graphcorecn/toolkit-triton-staging:latest
Note
In the case of testing on an IPU-M2000 or Bow-2000, omit the --network=host
parameter when running gc-docker
.
The following information is printed to indicate that the service has been started and can accept gRPC
and HTTP
requests.
Started GRPCInferenceService at 0.0.0.0:8001
Started HTTPService at 0.0.0.0:8000
Verify the service with gRPC
The following is an example of testing the deployment of a model using the Triton Client gRPC API. For more detailed API information, refer to its documentation and code examples.
import numpy as np
import tritonclient.grpc as gc
# create the triton client
triton_client = gc.InferenceServerClient(
url = "localhost:8001")
model_name = 'bertsquad-12'
inputs = []
outputs = []
inputs.append(gc.InferInput('input_ids:0', [16,256], "INT32"))
inputs.append(gc.InferInput('input_mask:0', [16,256], "INT32"))
inputs.append(gc.InferInput('segment_ids:0', [16,256], "INT32"))
inputs.append(gc.InferInput('unique_ids_raw_output___9:0', [ 16,1 ], "INT32"))
# create data
input0_data = np.random.randint(0, 1000, size=(16,256)).astype(np.int32)
input1_data = np.random.randint(0, 1, size=(16,1)).astype(np.int32)
for i in range(3):
inputs[i].set_data_from_numpy(input0_data)
inputs[3].set_data_from_numpy(input1_data)
outputs_names = ['unique_ids:0', 'unstack:0', 'unstack:1']
for name in outputs_names:
outputs.append(gc.InferRequestedOutput(name))
results = triton_client.infer(
model_name = model_name,
inputs = inputs,
outputs = outputs)
statistics = triton_client.get_inference_statistics(model_name=model_name)
if len(statistics.model_stats) != 1:
print("FAILED: Inference Statistics")
sys.exit(1)
print(statistics)
for name in outputs_names:
print(f'{name} = {results.as_numpy(name)}')
Open a new terminal and connect to the host, save the above code to grpc_test.py
, create a Python virtual environment and test it.
$ virtualenv -p python3 venv && \
source venv/bin/activate && \
pip install tritonclient[all] && \
python grpc_test.py && \
deactivate
If executed correctly, model statistics and inference results will be returned.
model_stats {
name: "bertsquad-12"
version: "1"
last_inference: 1667439772895
inference_count: 64
execution_count: 4
inference_stats {
success {
count: 4
ns: 170377440
}
...
unique_ids:0 = [[0]
...
unstack:0 = [[-0.991 -1.472 -1.571 ... -1.738 -1.77 -1.803]
...
unstack:1 = [[-0.9023 -1.285 -1.325 ... -1.419 -1.441 -1.452 ]
...
Verify the service with HTTP
The following is an example of testing the deployment of a model using the Triton Client HTTP API . For more detailed API information, refer to its documentation and code examples.
import numpy as np
import tritonclient.http as hc
# create the triton client
triton_client = hc.InferenceServerClient(
url = "localhost:8000")
model_name = 'bertsquad-12'
inputs = []
outputs = []
inputs.append(hc.InferInput('input_ids:0', [16,256], "INT32"))
inputs.append(hc.InferInput('input_mask:0', [16,256], "INT32"))
inputs.append(hc.InferInput('segment_ids:0', [16,256], "INT32"))
inputs.append(hc.InferInput('unique_ids_raw_output___9:0', [ 16,1 ], "INT32"))
# create data
input0_data = np.random.randint(0, 1000, size=(16,256)).astype(np.int32)
input1_data = np.random.randint(0, 1, size=(16,1)).astype(np.int32)
for i in range(3):
inputs[i].set_data_from_numpy(input0_data)
inputs[3].set_data_from_numpy(input1_data)
outputs_names = ['unique_ids:0', 'unstack:0', 'unstack:1']
for name in outputs_names:
outputs.append(hc.InferRequestedOutput(name, binary_data=True))
results = triton_client.infer(
model_name = model_name,
inputs = inputs,
outputs = outputs)
statistics = triton_client.get_inference_statistics(model_name=model_name, headers=None)
if len(statistics['model_stats']) != 1:
print("FAILED: Inference Statistics")
sys.exit(1)
print(statistics)
for name in outputs_names:
print(f'{name} = {results.as_numpy(name)}')
triton_client_http_test.py
(rename to http_test.py
)
Open a new terminal and connect to the host, save the above code to http_test.py
, create a Python virtual environment and test it.
# Execute the following virtualenv command if the python virtual environment has not been created
# virtualenv -p python3 venv
$ source venv/bin/activate && \
pip install tritonclient[all] && \
python http_test.py && \
deactivate
If executed correctly, model statistics and inference results will be returned.
{'model_stats': [{'name': 'bertsquad-12', 'version': '1', 'last_inference': 1667440001420, 'inference_count': 80, ... {'count': 5, 'ns': 462978}}]}]}
unique_ids:0 = [[0]
...
unstack:0 = [[-0.753 -1.183 -1.296 ... -1.595 -1.599 -1.65 ]
...
unstack:1 = [[-0.6206 -0.9683 -1.031 ... -1.222 -1.221 -1.241 ]
...
Note
This example is complete. Press ctrl+C to exit the Triton container and return to the host environment.
5.3. Deploy to TensorFlow Serving
This section will use the executable.popef
file generated in Section 4.2.2, Model conversion and compilation as an example to explain how to deploy the compiled PopEF file to TensorFlow Serving.
5.3.1. Environment preparation
In the following example, the container needs to be started with gc-docker
. If the Poplar SDK is not installed on the host, refer to Section 3.7.2, Run a Docker container. Before starting Docker, define the MODEL_PATH
environment variable to be the path to the directory containing resnet_v2_50_optimized.onnx
:
$ export MODEL_PATH=/path/to/your/models
$ ls $MODEL_PATH
If the MODEL_PATH
environment variable is specified correctly, the following information will be displayed:
resnet_v2_50_optimized.onnx executable.popef ...
5.3.2. Generate SavedModel model
The input model format of TensorFlow Serving is SavedModel, so we encapsulate the PopEF file in a TensorFlow custom op named model_runtime
. This custom op has been written using the Poplar Model Runtime API.
$ gc-docker -- --rm \
-v $MODEL_PATH:/model_path \
graphcorecn/toolkit-tfserving-staging:latest \
/bin/bash -c "python3 -m popef2tf.convert \
--model /model_path/executable.popef \
--name resnet_v2_50_serving \
--version 001 \
--output /model_path"
Note
The TensorFlow included in this Docker image is compiled from the official source code for TensorFlow version 2.6.5. This is not the same as the Graphcore port of TensorFlow that is provided in the Poplar SDK.
The model
input parameter to popef2tf
specifies the path of the input PopEF file, and output
specifies the path of the output SavedModel file.
Use tree
to list the contents of the generated SavedModel file directory:
$ tree $MODEL_PATH/resnet_v2_50_serving
resnet_v2_50_serving
└── 001
├── assets
│ └── executable.popef
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
5.3.3. Start model service
TensorFlow Serving can batch requests to improve performance, while the IPU uses static graphs. Therefore, batch_size
and input_shape
need to be determined. For this model, we use a fixed shape of [4,3,224,224] for the input tensor size. The corresponding batch_size
is 4.
We also need to set allowed_batch_sizes
and max_batch_size
and we set these parameters to 4 in the config file during deployment. This means that the client can send input data to the host server with batch_size
having any integer value between 1 and 4. More information about the effect of allowed_batch_sizes
is given in batching_effect
.
Generate the config file in the directory specified by MODEL_PATH
directory and open the config file for editing:
$ touch $MODEL_PATH/resnet_v2_50_serving/001/resnet_bs4_3_224_224.conf
$ vim $MODEL_PATH/resnet_v2_50_serving/001/resnet_bs4_3_224_224.conf
Add the following to the config file:
allowed_batch_sizes: 4
max_batch_size {value: 4}
Start the TensorFlow Serving service:
$ gc-docker -- --rm \
-v $MODEL_PATH:/model_path \
--network=host \
graphcorecn/toolkit-tfserving-staging:latest \
/bin/bash -c "tensorflow_model_server \
--rest_api_port=8501 \
--model_name=resnet_v2_50 \
--enable_batching \
--batching_parameters_file=/model_path/resnet_v2_50_serving/001/resnet_bs4_3_224_224.conf \
--model_base_path=/model_path/resnet_v2_50_serving"
where --enable_batching
is used to enable batching, and --batching_parameters_file
specifies the path of the config file.
Note
In the case of testing on an IPU-M2000 or Bow-2000, remove the --network=host
parameter when running gc-docker
.
Note
The TensorFlow Serving included in this Docker image supports the TensorFlow custom operator for running PopEF files. This TensorFlow Serving is compiled from the official source code of TensorFlow Serving version 2.6.5 (TensorFlow-2.6.5).
If you do not need to enable batching, remove --enable_batching
and --batching_parameters_file
when starting the container. In this case, the client can only process input data with batch_size=4
. Any other values for batch_size
will raise an error.
After the container is started, the output log is as follows:
Building single TensorFlow model file config: model_name: resnet_v2_50 model_base_path: /model_path/resnet_v2_50_serving
Adding/updating models.
**(Re-)**\ adding model: resnet_v2_50
Successfully reserved resources to load servable {name: resnet_v2_50 version: 1}
Approving load for servable version {name: resnet_v2_50 version: 1}
Loading servable version {name: resnet_v2_50 version: 1}
Reading SavedModel from: /model_path/resnet_v2_50_serving/001
Reading meta graph with tags { serve }
Reading SavedModel debug info (if present) from: /model_path/resnet_v2_50_serving/001
...
Successfully loaded servable version {name: resnet_v2_50 version: 1}
...
Exporting HTTP/REST API at:localhost:8501 ...
The resnet_v2_50
model service has been released to port 8501, and it can be called by the client with the RESTful API.
Running with and without batching
It is possible to run the model with or without batching enabled on TensorFlow Serving. This section describes more details about the consequences of the value for allowed_batch_sizes
when used with and without batching.
- Batching enabled and
allowed_batch_sizes
is set to 4 Clients can send input data with
batch_size
of any integer in the range [1-4] to the host server. If the batch size of the input data is less than 4, then the host server will pad the input data up toallowed_batch_sizes
ormax_batch_size
, which is 4 in our case. For example, if a client sends input data with a shape of [1, 3, 224, 224], the host server will pad the input data to [4, 3, 224, 224] with dummy data.
- Batching enabled and
- Batching enabled and
allowed_batch_sizes
is not set The host server would accept any
batch_size
in the range [1-4], but PopEF will not. For example, if a client sends input data with a shape of [2, 3, 224, 224] to the server, the server won’t pad this input data, but will only check if it exceedsmax_batch_size
. Sincebatch_size
for this data is less thanmax_batch_size
, the server will send this [2, 3, 224, 224] tensor to the TensorFlow backend. This will raise an error about the allocated memory size.Similarly, sending input data with a shape of [5, 3, 224, 224] where
batch_size
exceedsmax_batch_size
, will raise an error that the submitted batch size is larger than the maximum input batch size 4.
- Batching enabled and
- Batching disabled
Clients will be only allowed to send input data with
batch_size==4
to the server. For example, if a client sends input data with a shape of [1, 3, 224, 224] to the server, the server won’t pad the data, This will raise an error about the allocated memory size.In this case, the check that
batch_size
exceedsmax_batch_size
is not done. For example, if a client sends input data with a shape of [5, 3, 224, 224], this will also raise an error about the allocated memory.
5.3.4. Verify the service with HTTP
In this example, a group of pictures will be downloaded as a running example to demonstrate how to call the model service by using the RESTful API:
import numpy as np
import json, urllib3
import tempfile, wget
from PIL import Image
def read_labels_file(fname):
outs = []
with open(fname, 'r') as fin:
for line in fin.readlines():
outs.append(line.strip())
return outs
def read_images(files, img_h, img_w):
images = []
for file in files:
image = Image.open(file)
image = image.resize((img_h, img_w))
image = (np.array(image) / 255).astype(np.float32)
image = image.transpose((2, 0, 1))
images.append(image[np.newaxis, :])
return images
def main():
# image urls
urls = [
'http://images.cocodataset.org/test-stuff2017/000000024309.jpg',
'http://images.cocodataset.org/test-stuff2017/000000028117.jpg',
'http://images.cocodataset.org/test-stuff2017/000000006149.jpg',
'http://images.cocodataset.org/test-stuff2017/000000004954.jpg',
]
# download images
image_files = []
with tempfile.TemporaryDirectory() as tmpdir:
for url in urls:
image_files.append(wget.download(url, tmpdir))
images = read_images(image_files, 224, 224)
label_file = wget.download(
"https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt", tmpdir)
labels = read_labels_file(label_file)
# get input json
input_data = np.concatenate(images, axis=0)
input_data_list = input_data.tolist()
postData = {'inputs':input_data_list}
jPostData = json.dumps(postData)
http = urllib3.PoolManager()
r = http.request('POST','http://localhost:8501/v1/models/resnet_v2_50:predict',body=jPostData)
return_data = json.loads(r.data)
output = np.array(list(return_data.values()))
# get real images top5
k = 5
idx = output.argsort()[:,:,-1:-k-1:-1]
for b,idx_bs in enumerate(idx[0]):
print(f'\nimage{b}:')
for i in idx_bs:
print(labels[i], output[0, b, i])
if __name__ == "__main__":
main()
Open a new terminal and connect to the host, download the above code, create a Python virtual environment and test it.
virtualenv -p python3 venv
source venv/bin/activate
pip install pillow numpy urllib3 wget
python restful_http_test.py
deactivate
If executed correctly, the returned results are shown as follows:
image0:
laptop 18.2867374
notebook 15.9842319
desk 13.6191549
web site 11.1951714
mouse 11.1430635
image1:
mashed potato 14.4165163
guacamole 11.8748741
meat loaf 10.7978802
cheeseburger 9.09055805
plate 8.93529892
image2:
knee pad 8.52720547
volleyball 8.52627659
racket 8.31885242
ski 7.84297752
horizontal bar 7.11924124
image3:
hare 17.0646706
fountain 15.7413292
tennis ball 13.2553062
wallaby 12.6554518
wood rabbit 11.5797682