Estás aquí: start » inv » composit » validation

ComposIT: A Semantic Web Service Composition Engine

Esta é unha revisión antiga do documento!


ComposIT: experiments and comparison with AI planners

In previous works1) 2), ComposIT was analyzed and compared with the winners of the Web Service Challenge 2008 (WSC'08). These experiments demonstrated that ComposIT can obtain optimal solutions, outperforming the results obtained by the winners with respect the number of services and the composition length. However, the aim of the experiments presented here is to compare the performance of ComposIT with two open source state-of-the-art composition engines, OWLS-Xplan and PORSCE-II, which use classical AI planning algorithms.

The experimentation is focused on the generation of semantically valid composite services taking into account the semantic information regarding the inputs and outputs of services in absence of preconditions and effects. To carry out this comparison, we developed a benchmarking tool that automates the evaluation process of the different composition algorithms using an adapted version of the WSC'08 datasets. This page extends the details of the evaluation done.

Datasets

To validate the three composition engines, we used three collections of datasets: Exact-Matching, Semantic-Matching and WSC'08 datasets. The first two collections are based on the original WSC'08 datasets but containing only the services from the solution itself for validation purposes. Characteristics of each dataset are shown below.

Semantic Matching datasets

Download the XML Semantic-Matching datasets semantic-matching.tar.gz

Dataset #Serv #Serv. Sol. #Length Avg. In./Out. #Init. con. #Goal con.
Semantic-Matching 01 2 2 2 1.0 / 1.0 1 1
Semantic-Matching 02 3 3 2 3.0 / 1.0 3 1
Semantic-Matching 03 10 10 3 4.03 / 3.08 3 2
Semantic-Matching 04 5 5 3 4.25 / 4.68 4 1
Semantic-Matching 05 40 40 23 5.07 / 2.44 3 1
Semantic-Matching 06 10 10 5 5.47 / 3.47 6 4
Semantic-Matching 07 20 20 8 3.03 / 5.11 2 3
Semantic-Matching 08 35 35 14 3.10 / 3.37 9 4
Semantic-Matching 09 20 20 12 3.07 / 4.40 8 1
Semantic-Matching 10 30 30 20 3.25 / 5.42 5 4

Exact Matching datasets

Download the XML Exact-Matching datasets exact-matching.tar.gz

Dataset #Serv #Serv. Sol. #Length Avg. In./Out. #Init. con. #Goal con.
Exact-Matching 01 2 2 2 1.0 / 1.0 1 2
Exact-Matching 02 3 3 2 3.0 / 20.5 3 40
Exact-Matching 03 10 10 3 4.03 / 16.53 3 100
Exact-Matching 04 5 5 3 4.25 / 21.05 4 77
Exact-Matching 05 40 40 23 5.07 / 14.20 3 407
Exact-Matching 06 10 10 5 5.47 / 24.09 6 157
Exact-Matching 07 20 20 8 3.03 / 26.39 2 261
Exact-Matching 08 35 35 14 3.10 / 23.21 9 507
Exact-Matching 09 20 20 12 3.07 / 24.71 8 219
Exact-Matching 10 30 30 20 3.25 / 33.81 5 473

Web Service Challenge 2008 datasets

Download the XML WSC'08 datasets wsc08.tar.gz

Dataset #Serv #Serv. Sol. #Length Avg. In./Out. #Init. con. #Goal con.
WSC'08 01 158 10 3 3.53 / 5.25 3 1
WSC'08 02 558 5 3 3.79 / 3.92 4 1
WSC'08 03 604 40 23 4.07 / 6.46 3 2
WSC'08 04 1041 10 5 4.23 / 5.47 6 1
WSC'08 05 1090 20 8 3.36 / 4.26 2 1
WSC'08 06 2198 35 14 6.00 / 4.31 9 4
WSC'08 07 4113 20 12 7.37 / 7.21 8 3
WSC'08 08 8119 30 20 5.44 / 6.54 5 4

These tables show: the number of services of each dataset (column #Serv); the number of services of the optimal solution (column #Serv. Sol.); the length of the shortest solution (column #Length); the average number of inputs and outputs (column Avg. In./Out.); the number of inputs provided by the user (column #Init. con.) and the number of wanted output concepts (column #Goal con.). The goal is to find, for each dataset, the optimal composite service (minimum number of services, minimum length) that satisfies the goal concepts, using only the initial inputs provided.

Exact-Matching datasets were calculated by extending the outputs of each web service, including all superclasses of each output as an output of the service itself (semantic expansion). Thus, the average number of outputs is bigger than in the other datasets. The semantic expansion transforms a semantic matching problem into a exact matching problem, when exact and plug-in match is used to perform the semantic matchmaking. This allows us to test composition algorithms (that do not use semantic reasoners) with the WSC'08 datasets. For example, suppose that a service S1 provides the instance “my_car” which is an instance of the concept “sedan”, whereas another service S2 requires an input “car”. Given that “my_car” is a sedan “car” (plug-in match), S2 can use it as an input. This behavior can be simulated by adding superclasses of my_car to the service. Thus, service S1 will return outputs “my_car”, “car”, “sedan” and “vehicle” after the semantic expansion.

The figure below represents an example of the semantic expansion. In the first case, a semantic reasoner is required to match output “my_car” to “car”. In the second case, the service returns the output “my_car” as well as all superclasses (sedan, car, vehicle). In this case, the output “car” from S1 matches exactly the input “car” from S2.

Evaluation

The performance of each composition algorithm has been measured using a test platform developed for this purpose. We carried out three different experiments, one for each collection of datasets shown in the previous tables, to determine the quality of the solutions and the composition time. The experiments consisted of five independent executions of each algorithm over each dataset. The number of services and the length of the composition, as well as the minimum, average and standard deviation of the time taken to solve the composition problem are shown in the tables below. All the experiments were performed using a Windows XP 32-bit with VirtualBox 4.1.22 (2 GB of RAM, 1 core), on a PC-station equipped with an Intel Core 2 Quad Q9550 at 2.83GHz running Ubuntu 10.04 64-bit.

1. Exact-Matching evaluation results

Dataset Algorithm Solution Execution (ms)
#Serv #Length Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex.5 Min Avg SD
Exact-Matching 01 ComposIT 2 2 238.20 158.04 153.63 153.61 145.07 145.07 169.71 38.57
Xplan 2 2 306.65 348.32 328.53 359.50 305.37 305.37 329.67 24.29
PORSCE-II 2 2 3478.53 3717.59 3657.99 3524.25 3553.97 3478.53 3586.47 98.60
Exact-Matching 02 ComposIT 3 2 274.59 209.04 211.42 196.24 206.73 196.24 219.60 31.28
Xplan 3 3 416.11 490,68 391.30 476.10 422.66 391.30 439.37 42.17
PORSCE-II 9 9 5206.54 5283.59 5240.02 5482.74 5649.92 5206.54 5372.56 188.49
Exact-Matching 03 ComposIT 10 3 422.66 379.86 369.33 330.27 365.57 330.27 373.54 33.19
Xplan 10 10 815.73 642.90 616.80 595.58 622.55 595.58 658.71 89.38
PORSCE-II 22 22 16081.13 15721.06 15105.77 16452.69 14779.84 14779.84 15628.10 686.69
Exact-Matching 04 ComposIT 5 3 321.52 316.63 316.42 354.47 381.93 316.42 338.19 29.13
Xplan - - - - - - - - - -
PORSCE-II 14 14 10889.68 10777.27 10595.46 10898.34 10767.12 10595.46 10785.57 122.58
Exact-Matching 05 ComposIT 40 23 7111.35 6676.08 6669.89 6775.57 6710.47 6669.89 6788.67 185.20
Xplan - - - - - - - - - -
PORSCE-II - - - - - - - - - -
Exact-Matching 06 ComposIT 10 5 463.92 468.82 603.30 484.27 505.79 463.92 505.22 57.20
Xplan - - - - - - - - - -
PORSCE-II 27 27 24106.72 24340.28 23880.96 24732.57 24025.89 23880.96 24217.28 332.65
Exact-Matching 07 ComposIT 20 8 952.07 1103.62 951.09 1015.01 895.89 895.89 983.54 79.27
Xplan - - - - - - - - - -
PORSCE-II 52 52 38036.12 37755.37 35577.92 36377.60 36306.59 35577.92 36810.72 1043.49
Exact-Matching 08 ComposIT 35 14 6071.00 6299.35 5859.83 6324.65 6174.44 5859.83 6145.85 189.58
Xplan - - - - - - - - - -
PORSCE-II - - - - - - - - - -
Exact-Matching 09 ComposIT 20 12 1079.93 1065.36 1096.51 1090.66 1100.17 1065.36 1086.53 14.09
Xplan - - - - - - - - - -
PORSCE-II - - - - - - - - - -
Exact-Matching 10 ComposIT 30 20 5236.27 5596.29 5352.37 5663.87 5263.09 5236.27 5422.38 195.88
Xplan - - - - - - - - - -
PORSCE-II - - - - - - - - - -

2. Semantic-Matching evaluation results

In this experiment, we evaluate the performance of ComposIT and PORSCE-II with the semantic-matching datasets. Note that OWLS-Xplan does not support semantic matchmaking. Therefore, this algorithm has been excluded from this evaluation and also from the WSC'08 evaluation, which requires semantic matchmaking.

PORSCE-II uses different threshold values to configure the maximum semantic concept distance allowed for a valid semantic matching. The values of these thresholds are directly related to the performance of the algorithm so that the higher the thresholds, the slower the algorithm runs, but the better the quality of the solutions. Given the difficulty of selecting an appropriate value for the threshold without compromising too much the performance, we selected the optimal value for each dataset based on a study using the Semantic-matching datasets in which we analyzed the performance of PORSCE-II by using incremental values of the threshold from 1 to 19, which corresponds with the maximum depth for all ontologies.

Plug-in threshold Time (for Semantic-Matching 01) % Datasets solved Semantic-Matching datasets solved Performance
1 6721.69 10% 01 100.00%
2 9735.35 30% 01, 02, 03 69.04%
3 13257.96 40% 01, 02, 03, 04 50.70%
4 15139.06 50% 01, 02, 03, 04, 07 44.40%
5 20059.98 50% 01, 02, 03, 04, 07 33.51%
6 20616.51 60% 01, 02, 03, 04, 06, 07 32.60%
7 24607.99 60% 01, 02, 03, 04, 06, 07 27.32%
8 27791.03 60% 01, 02, 03, 04, 06, 07 24.19%
9 31261.37 60% 01, 02, 03, 04, 06, 07 21.50%
10 33648.79 60% 01, 02, 03, 04, 06, 07 19.98%
11 34074.82 60% 01, 02, 03, 04, 06, 07 19.73%
12 40787.78 60% 01, 02, 03, 04, 06, 07 16.48%
13 40704.43 60% 01, 02, 03, 04, 06, 07 16.51%
14 44085.17 60% 01, 02, 03, 04, 06, 07 15.25%
15 47256.00 60% 01, 02, 03, 04, 06, 07 14.22%
16 50164.25 60% 01, 02, 03, 04, 06, 07 13.40%
17 51815.99 60% 01, 02, 03, 04, 06, 07 12.97%
18 56561.49 60% 01, 02, 03, 04, 06, 07 11.88%
19 58981.02 60% 01, 02, 03, 04, 06, 07 11.40%

Figure below shows the percentage of semantic problems solved by PORSCE-II with different values of the threshold and the performance measured by comparing the time taken by the algorithm to solve the Semantic-Matching 01 with the different thresholds. As can be seen, the computational cost increases (performance decreases) when the threshold is incremented. Thus, using a threshold of 10, the algorithm obtains a performance close to 20\%, which means that the algorithm is about 1/0.20=5 times slower than using a threshold of 1. Based on these results, we selected the optimal thresholds N for each datasets: N=1 for Semantic-Matching 01, N=2 for Semantic-Matching 02 and 03, N=3 for Semantic-Matching 04, N=4 for Semantic-Matching 07 and N=6 for Semantic-Matching 06. Note that ComposIT does not require any special configuration for the semantic datasets as it calculates matches at unlimited depth.

Dataset Algorithm Solution Execution (ms)
#Serv #Length Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex.5 Min Avg SD
Semantic-Matching 01 ComposIT 2 2 92.46 214.92 108.11 297.97 387.13 92.46 220.12 125.32
PORSCE-II (1) 2 2 7487.74 6655.72 6537.85 6190.85 8326.61 6190.85 7039.75 862.66
Semantic-Matching 02 ComposIT 3 2 272.64 387.46 119.45 87.49 240.80 87.49 221.57 121.35
PORSCE-II (2) 3 3 14355.59 14784.66 14467.15 13685.42 14779.62 13685.42 14414.49 449.48
Semantic-Matching 03 ComposIT 10 3 104.14 261.09 229.25 307.21 317.30 104.14 243.80 85.79
PORSCE-II (2) 11 11 47524.98 49847.54 49750.68 45989.71 45359.04 45359.04 47694.39 2076.83
Semantic-Matching 04 ComposIT 5 3 287.12 213.46 97.21 112.70 101.86 97.21 162.47 84.48
PORSCE-II (3) 7 7 29187.01 29593.40 29466.08 28759.73 31130.29 28759.73 29627.30 898.98
Semantic-Matching 05 ComposIT 40 23 684.00 694.36 682.87 272.57 282.76 272.57 523.31 224.32
PORSCE-II (16) - - - - - - - - - -
Semantic-Matching 06 ComposIT 10 5 223.73 409.05 101.80 96.48 125.48 96.48 191.31 132.10
PORSCE-II (6) 14 14 134258.53 114729.94 112889.12 114402.36 118113.16 112889.12 118878.62 8806.96
Semantic-Matching 07 ComposIT 20 8 332.64 402.18 178.90 132.44 342.98 132.44 277.83 115.80
PORSCE-II (4) 35 35 154651.56 155128.96 153448.76 156402.12 153196.42 153196.42 154565.56 1305.71
Semantic-Matching 08 ComposIT 35 14 674.44 742.30 857.37 338.21 790.32 338.21 680.53 202.71
PORSCE-II (17) - - - - - - - - - -
Semantic-Matching 09 ComposIT 20 12 445.83 295.22 151.83 180.21 684.96 151.83 351.61 219.36
PORSCE-II (15) - - - - - - - - - -
Semantic-Matching 10 ComposIT 30 20 219.66 214.97 445.83 535.24 210.08 210.08 325.16 154.28
PORSCE-II (17) - - - - - - - - - -

3. WSC'08 evaluation results

Dataset Algorithm Solution Execution (ms)
#Serv #Length Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex.5 Min Avg SD
WSC'08 01 ComposIT 10 3 308.45 223.35 224.23 217.05 231.03 217.05 240.82 38.13
PORSCE-II (2) 11 11 700552.14 703166.16 706210.96 709269.96 688332.37 688332.37 701506.32 8056.46
WSC'08 02 ComposIT 5 3 286.18 247.79 236.50 231.01 308.12 231.01 261.92 33.63
PORSCE-II (3) 8 8 3633027.68 3635780.31 3494138.55 3518414.74 3501751.84 3494138.55 3556622.62 71551.69
WSC'08 03 ComposIT 40 23 1448.94 1417.31 1429.48 1413.16 1457.06 1413.16 1433.19 19.27
PORSCE-II (16) - - - - - - - - - -
WSC'08 04 ComposIT 10 5 427.54 434.77 434.53 451.68 440.62 427.54 437.83 9.02
PORSCE-II (6) 16 16 13594552.10 13746460.11 13260928.67 13153932.78 13152705.66 13152705.66 13381715.86 272606.85
WSC'08 05 ComposIT 20 8 1052.68 1018.33 995.83 1019.80 1011.26 995.83 1019.58 20.80
PORSCE-II (4) 45 45 9569454.60 9642162.17 9461697.18 9900376.49 9735598.82 9461697.18 9661857.85 166822.85
WSC'08 06 ComposIT 42 7 3611.03 3587.10 3525.84 3535.71 3436.85 3436.85 3539.31 67.31
PORSCE-II (17) - - - - - - - - - -
WSC'08 07 ComposIT 20 12 4164.99 4043.20 4083.13 4062.16 4095.46 4043.20 4089.79 46.54
PORSCE-II (15) - - - - - - - - - -
WSC'08 08 ComposIT 30 20 5849.57 5751.64 5962.46 5740.37 5710.24 5710.24 5802.86 103.39
PORSCE-II (17) - - - - - - - - - -

You can download the validation logs for each algorithm here:

  • ComposIT Exact-Matching/Semantic-Matching/WSC'08 logs (download)
  • PORSCE-II Exact-Matching/Semantic-Matching/WSC'08 logs (download)
  • OWLS-Xplan Exact-Matching logs (download)

Usage

We provide three different java binaries for each composition algorithm. In order to launch a test, you must have installed the Java JDK 6+. Latest java JDK is available here. Once installed, you can run the test platform in GUI mode or in console mode. To launch the GUI interface, simply type:

java -jar algorithm.jar

Where algorithm.jar is one of the available algorithms:

You can launch also a background test from the command line, with the following syntax:

java -jar algorithm.jar algorithm_name dataset_path http_server_port

Examples to launch tests on each repository with the dataset Exact-Matching 01, using the port 80 to create the http server to provide the services and redirecting the output to a file 01.txt:

ComposIT:

java -Xmx1024M -Xms512M -jar CompositAlgorithm.jar "ComposIT" "...\Datasets\Exact-matching\01" 80 > 01.txt

PORSCE-II:

java -Xmx1024M -Xms512M -jar PorsceAlgorithm.jar "PORSCE-II" "...\Datasets\Exact-matching\01" 80 > 01.txt

OWLS-Xplan:

java -Xmx1024M -Xms512M -jar OWLSXplanAlgorithm.jar "OWL-S Xplan 2.0" "...\Datasets\Exact-matching\01" 80 > 01.txt

Disclaimer

All the contents of this web-site are owned by the Centro de Investigación en Tecnoloxías da Información (CITIUS), University of Santiago de Compostela (USC), except the original versions of the WSC'08 datasets, OWLS-Xplan 2.0 and PORSCE-II, which are owned by their respective authors. The software available here is for private, non-commercial use and it is offered solely as a proof of concept for validation purposes.

1)
P. Rodriguez-Mier, M. Mucientes, and M. Lama, “Automatic Web Service Composition with a Heuristic-Based Search Algorithm,” in International Conference on Web Services (ICWS). IEEE Computer Society, 2011, pp. 81–88.
2)
P. Rodriguez-Mier, M. Mucientes, J. Vidal, and M. Lama, “An optimal and complete algorithm for automatic web service composition,” International Journal of Web Services Research (IJWSR), vol. 9, no. 2, pp. 1–20, 2012.