Decision Process - Logic

Pod Server

Logic

Module

request packet

Download info

from GCloud

no images,

just detections

Create

Trackers

Static

Create

Trackers

Weights

Video

Pod

Tracker

Static

= list[ ShelfTracker ]

Create trackers:

Shelf Tracker:

Check available cameras
Process start and end detections using calibration scheme. Which creates the inventory

Weights

Video

For weigths and video we just format the data properly.

Resolving Transactions:

PodTracker.resolve()

Static: (per shelf)

Inventory

Before

After

Events

item
likelihood
in_out
valid
...

Static: (per shelf)

```
full_transaction
```
```
detailed_transaction
```
```
transaction
```

(all events)

(valid and moves)

(valid, no moves)

Static Heuristics: (For miss classification)

If in the same lane we have low confidence Add + Remove.
Then, decrease the confidence of both.

Video:

Left

Right

Events

Merged Events

(merge by class_id and start_time)

Weight Integration:

General problem:

Goal: Determine which events correspond to each weight sensor reading.

Weights

Detections

\text{W}_1, \text{W}_2\dots,\text{W}_{m}

\text{E}_1, \text{E}_2\dots,\text{E}_{n}

\text{Consider }\mathcal{E} = \{\text{E}_1, \text{E}_2\dots,\text{E}_{n}\}

\textbf{Goal:} \text{ Find a partition of the events }

S_1, S_2, \dots, S_m, S_{\text{null}} \subseteq \mathcal{E}, \hspace{3mm} \bigcup_i S_i \cup S_{\text{null}} = \mathcal{E}

\text{where }S_i\text{ corresponds to the reading } W_i, \hspace{2mm}\forall i\in[m]

\text{and }S_{\text{null}}\text{ corresponds to false events}

\text{E}_i = (\ell_i, \omega_i, \dots)

likelihood/score

GT weights

time, direction, frame count,...

\displaystyle \max \hspace{3mm} \sum_{i=1}^m \sum_{j=1}^n \ell_j x_{i,j}

Combinatorial formulation:

\text{subject: } \displaystyle \sum_{j=1}^n \omega_j x_{i,j} \leq W_i, \hspace{3mm} i\in [m]

\displaystyle \sum_{i=1}^m x_{i,j} \leq 1, \hspace{5mm} j\in [n]

x_{i,j} = \boldsymbol{1}_{[E_j \text{ is assigned to }W_i]}

\ell_j = \text{likelihood of }E_j

\omega_j = \text{weight of }E_j

(knapsack version)

\text{subject: } \displaystyle \left|\sum_{j=1}^n \omega_j x_{i,j} - W_i\right| \leq \text{thr}, \hspace{3mm} i\in [m]

\displaystyle \max_{ S_1,\dots, S_m, S_{\text{null}} \text{ part.} } \mathbb{P}\left[ S_i \leftrightarrow W_i, \forall i\in[m] \text{ and } \lnot S_{\text{null}} \right]

Probabilistic formulation:

\displaystyle \max_{ S_1,\dots, S_m, S_{\text{null}} \text{ part.} } \prod_{i=1}^m \mathbb{P}\left[ S_i \leftrightarrow W_i \right] \prod_{\tilde E \in S_{\text{null}} } \left( 1 - \mathbb{P} [ \tilde E ] \right)

\mathbb{P}\left[ S_i \leftrightarrow W_i \right] = \prod\limits_{E \in S_i} \mathbb{P}[E] \times \mathbb{P}( | \sum\limits_{E\in S_i} E.\texttt{w} - W_i | < \text{thr})

Weight Integration:

Static + Weights: (per shelf)

Shelf

Tracker

Weights[shelf_n]

video

Goal: Determine the most likely set of events based given the feedback from weight sensor.

Static + Weights: (per shelf)

Goal: Determine the most likely set of events based given the feedback from weight sensor.

FT = full_transaction

TW = Total Weight

\mathbb{P} (S \leftrightarrow \text{Weights} ) = \mathbb{P}(S \text{ are True})\mathbb{P}(\text{FT}\setminus S \text{ are False})\times

\times \mathbb{P}( |S[\text{weights}] - \text{TW} | < \text{thr})

\text{MLE} = \argmax\limits_{S\subseteq \text{FT}} \hspace{3mm} \prod\limits_{E\in S} \mathbb{P} (E) \prod\limits_{\tilde E\in \text{FT}\setminus S} (1 - \mathbb{P} (\tilde E))\times

\times f_{\mathcal{N}(\text{TW}, \sigma)} \left( \sum_{E\in S} E.\text{weight} \right)

Maximum likelihood estimation:

gaussian density

Video cue:

Decrease likelihood of events that are not found in the current video transaction.

Video + Weights:

Goal: Determine the most likely set of events based given the feedback from weight sensor.

Match by reading:

For each weight reading (WR):

Select video events that match weights based on the start and end times.
Do MLE on the selected events with the weight reading.
Flip the events direction if necessary.
Do a final MLE with the total weight and the union of valid events.

Weights

Video

\text{WE}_1, \text{WE}_2\dots,\text{WE}_{N}

\text{VE}_1, \text{VE}_2\dots,\text{VE}_{M}

weight reading

start/end times

tracking

start/end times

\Delta s = \text{VE}_j . \texttt{start\_time} - \text{WE}_i . \texttt{start\_time}

\Delta e = \text{VE}_j . \texttt{end\_time} - \text{WE}_i . \texttt{end\_time}

\texttt{if} \hspace{3mm} |\Delta s| < \texttt{thr} :

(\text{VE}_j, -1) \rightarrow \text{C}_i

\texttt{if} \hspace{3mm}|\Delta e| < \texttt{thr} :

(\text{VE}_j, 1) \rightarrow \text{C}_i

(removing)

(adding)

\text{C}_i, \text{WE}_i

\text{VE}_{j_1}, \text{io}_{j_1}

\text{VE}_{j_\ell}, \text{io}_{j_\ell}

\vdots

maximum likelihood estimation

\tilde \text{C}_i

\argmax\limits_{S\subseteq \text{C}} \hspace{2mm} \prod\limits_{(j, d)\in S} \left (\mathbb{P} (\text{VE}_j) \times \text{VE}_j.\texttt{io\_likelihood}[d] \right.

\times f_{\mathcal{N}(\text{TW}, \sigma)} \left( \sum_{(j,d)\in S} d \times \text{VE}_j.\texttt{weight} \right)

\left. \times f_{\mathcal{N}(\text{W}_{\text{time}}, \sigma_T)} \left( \text{VE}_j.\texttt{time}[d] \right)\right)

\text{C}_i, \text{WE}_i

\text{VE}_{j_1}, \text{io}_{j_1}

\text{VE}_{j_\ell}, \text{io}_{j_\ell}

\vdots

maximum likelihood estimation

\tilde \text{C}_i

\tilde \text{C}_1, \dots, \tilde \text{C}_N, \hspace{2mm} \displaystyle\sum_{i=1}^N \text{WE}_i

final match (optional):

MLE

\texttt{transaction}

For i in {1, ..., N}:

C = []
For j in {1, ..., M}:

\Delta s = \text{VE}_j . \texttt{start\_time} - \text{WE}_i . \texttt{start\_time}

\Delta e = \text{VE}_j . \texttt{end\_time} - \text{WE}_i . \texttt{end\_time}

\texttt{if} \hspace{3mm} \Delta s > -\texttt{K} \texttt{ and } |\Delta s| < \texttt{thr} \texttt{ and } \texttt{AR}[j] :

\texttt{C}.\texttt{append}\left( (j, -1) \right)

\texttt{if} \hspace{3mm} \Delta e < \texttt{K} \texttt{ and } |\Delta e| < \texttt{thr} \texttt{ and } \texttt{AA}[j] :

\texttt{C}.\texttt{append}\left( (j, 1) \right)

full algorithm:

For i in {1, ..., N}:

...

\texttt{mle} = \texttt{MLE}( \text{WE}_i, C, \texttt{static}[\text{WE}_i .\texttt{shelf\_n}] )

\texttt{if} \hspace{3mm} \texttt{io} > 0 :

\texttt{AA}[j] = \texttt{False}

\texttt{else}:

\texttt{AR}[j] = \texttt{False}

For j,io in mle:

\texttt{AA} = [\texttt{True}, \dots, \texttt{True}]

\texttt{AR} = [\texttt{True}, \dots, \texttt{True}]

(available to remove)

(available to add)

...

For j in {1, ..., M}:

\texttt{if} \hspace{3mm} \texttt{AA}[j] \texttt{ and } \texttt{AR}[j]:

\text{VE}_j . \texttt{valid} = \texttt{False}

\texttt{elif } \texttt{AA}[j]:

\text{VE}_j . \texttt{in\_out} = -1

\texttt{elif } \texttt{AR}[j]:

\text{VE}_j . \texttt{in\_out} = 1

\texttt{else}:

\text{VE}_j . \texttt{in\_out} = 0

Static + Video + Weights:

\texttt{shelf\_1}

\vdots

\texttt{shelf\_k}

\texttt{static}

\texttt{video}

\times

match

\texttt{remaining\_static}

\texttt{remaining\_video}

\texttt{matched\_events}

\texttt{remaining\_static}

\texttt{remaining\_video}

\texttt{matched\_events}

\texttt{if matched\_events} \approx \texttt{total\_weight}:

\texttt{done}

\texttt{else}:

\texttt{events}

\texttt{total\_weight}

\times

MLE

\texttt{transaction}

Shelf coordinates:

Extrinsic Transform:

Decision Process

By Daniel Yukimura

Decision Process

3 years ago
285