![dfind outliers in high dimension dfind outliers in high dimension](https://image4.slideserve.com/941220/dimension-reduction-l.jpg)
To do so, we will use an IF statement inside a WINDOW_MAX so we only get the window max of the data between percentile 25 and percentile 75 – the upper hinge. Normally we could use an LOD to calculate those numbers, but because we are using the Rank Percentile, a Table Calculation, and we can’t use Table Calcs inside an LOD we will need to look for another solution. Now we need to calculate the lower limit of Q1 and the upper limit of Q3 so we can then calculate the IQR, this is, the difference between the percentile 25 and percentile 75. We have highlighted all the data points between Q1 and Q3 in step 1.
![dfind outliers in high dimension dfind outliers in high dimension](https://image.slidesharecdn.com/outlierdetectionforhighdimensionaldata-130928072124-phpapp02/95/outlier-detection-for-high-dimensional-data-4-638.jpg)
Step 2: Calculating the limits of the box – Lower & Upper Hinge We can make sure the calculation is working the way we want adding it to the color shelf of our previous view. In our example, we have to make sure that the calculation is computed using the State.
![dfind outliers in high dimension dfind outliers in high dimension](https://image.slidesharecdn.com/outlierdetectionforhighdimensionaldata-130928072124-phpapp02/95/outlier-detection-for-high-dimensional-data-9-638.jpg)
This calculation will return a true value for any data points with a sum of profit between the Q1 and the Q3. To do so we will create a calculated field using the rank percentile of our measure (Profit) and use a boolean calculation to return a TRUE value for all data points between that range. That is, all the data between the box of the chart. How we do this? Step 1: Calculating the Percentile 25 and Percentile 75įirst we are going to calculate all the data between the percentile 25 (Q1) and percentile 75 (Q3). So if we want to filter or highlight the outliers, we need to calculate the IQR and all the data within +/- 1.5 times the IQR. So each whisker shows the data points between that range. The IQR is the interquartile range – the difference between the upper quartile and the lower quartile. Then we have both whiskers representing the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile as we can see when we edit the reference lines in Tableau. The box shows the median of the profit distribution by State and also the range between the percentile 25 (lower quartile) and 75 (upper quartile). So, in case you are not sure about how a box and whisker plot looks like, this is a simple box and whisker plot.Įach circle of the chart represents the total profit for each State of the USA using our friend Sample Superstore Sales Excel file. How can we filter the outliers in Tableau based on the logic of a box and whisker plot? But sometimes is not enough to just show the outliers, sometimes we also want to filter the outliers because those outliers can be caused due to data issues or some particular cases we don’t want to include in our analysis. If you are familiar with the box and whisker plot you already know is a very good chart to check the distribution of data and highlight outliers. | Pablo Sáenz de Tejada Filter outliers in Tableau calculating the Distance to IQR