Decision Process - Logic

Pod Server

Logic

Module

request packet

Download info

from GCloud

no images,

just detections

Create

Trackers

Static

Create

Trackers

Weights

Video

Pod

Tracker

Static

= list[ ShelfTracker ]

Create trackers:

Shelf Tracker:

  • Check available cameras
  • Process start and end detections using calibration scheme. Which creates the inventory

Weights

Video

  • For weigths and video we just format the data properly.

Resolving Transactions:

PodTracker.resolve()

Static: (per shelf)

Inventory

Before

After

Events

  • item
  • likelihood
  • in_out
  • valid
  • ...

Static: (per shelf)

  • full_transaction
  • detailed_transaction
  • transaction

(all events)

(valid and moves)

(valid, no moves)

Static Heuristics: (For miss classification)

  • If in the same lane we have low confidence Add + Remove.
  • Then, decrease the confidence of both.

Video:

Left

Right

Events

Events

Events

Merged Events

(merge by class_id and start_time)

Weight Integration:

General problem:

Goal: Determine which events correspond to each weight sensor reading.

Weights

Detections

\text{W}_1, \text{W}_2\dots,\text{W}_{m}
\text{E}_1, \text{E}_2\dots,\text{E}_{n}
\text{Consider }\mathcal{E} = \{\text{E}_1, \text{E}_2\dots,\text{E}_{n}\}
\textbf{Goal:} \text{ Find a partition of the events }
S_1, S_2, \dots, S_m, S_{\text{null}} \subseteq \mathcal{E}, \hspace{3mm} \bigcup_i S_i \cup S_{\text{null}} = \mathcal{E}
\text{where }S_i\text{ corresponds to the reading } W_i, \hspace{2mm}\forall i\in[m]
\text{and }S_{\text{null}}\text{ corresponds to false events}
\text{E}_i = (\ell_i, \omega_i, \dots)

likelihood/score

GT weights

time, direction, frame count,...

\displaystyle \max \hspace{3mm} \sum_{i=1}^m \sum_{j=1}^n \ell_j x_{i,j}

Combinatorial formulation:

\text{subject: } \displaystyle \sum_{j=1}^n \omega_j x_{i,j} \leq W_i, \hspace{3mm} i\in [m]
\displaystyle \sum_{i=1}^m x_{i,j} \leq 1, \hspace{5mm} j\in [n]
x_{i,j} = \boldsymbol{1}_{[E_j \text{ is assigned to }W_i]}
\ell_j = \text{likelihood of }E_j
\omega_j = \text{weight of }E_j

(knapsack version)

\text{subject: } \displaystyle \left|\sum_{j=1}^n \omega_j x_{i,j} - W_i\right| \leq \text{thr}, \hspace{3mm} i\in [m]
\displaystyle \max_{ S_1,\dots, S_m, S_{\text{null}} \text{ part.} } \mathbb{P}\left[ S_i \leftrightarrow W_i, \forall i\in[m] \text{ and } \lnot S_{\text{null}} \right]

Probabilistic formulation:

\displaystyle \max_{ S_1,\dots, S_m, S_{\text{null}} \text{ part.} } \prod_{i=1}^m \mathbb{P}\left[ S_i \leftrightarrow W_i \right] \prod_{\tilde E \in S_{\text{null}} } \left( 1 - \mathbb{P} [ \tilde E ] \right)
\mathbb{P}\left[ S_i \leftrightarrow W_i \right] = \prod\limits_{E \in S_i} \mathbb{P}[E] \times \mathbb{P}( | \sum\limits_{E\in S_i} E.\texttt{w} - W_i | < \text{thr})

Weight Integration:

Static + Weights: (per shelf)

Shelf

Tracker

Weights[shelf_n]
video

Goal: Determine the most likely set of events based given the feedback from weight sensor.

Static + Weights: (per shelf)

Goal: Determine the most likely set of events based given the feedback from weight sensor.

FT = full_transaction
TW = Total Weight
\mathbb{P} (S \leftrightarrow \text{Weights} ) = \mathbb{P}(S \text{ are True})\mathbb{P}(\text{FT}\setminus S \text{ are False})\times
\times \mathbb{P}( |S[\text{weights}] - \text{TW} | < \text{thr})
\text{MLE} = \argmax\limits_{S\subseteq \text{FT}} \hspace{3mm} \prod\limits_{E\in S} \mathbb{P} (E) \prod\limits_{\tilde E\in \text{FT}\setminus S} (1 - \mathbb{P} (\tilde E))\times
\times f_{\mathcal{N}(\text{TW}, \sigma)} \left( \sum_{E\in S} E.\text{weight} \right)

Maximum likelihood estimation:

gaussian density

Video cue:

  • Decrease likelihood of events that are not found in the current video transaction.

Video + Weights:

Goal: Determine the most likely set of events based given the feedback from weight sensor.

Match by reading:

For each weight reading (WR):

  • Select video events that match weights based on the start and end times.
  • Do MLE on the selected events with the weight reading.
  • Flip the events direction if necessary.
  • Do a final MLE with the total weight and the union of valid events.

Weights

Video

\text{WE}_1, \text{WE}_2\dots,\text{WE}_{N}
\text{VE}_1, \text{VE}_2\dots,\text{VE}_{M}

weight reading

start/end times

tracking

start/end times

\Delta s = \text{VE}_j . \texttt{start\_time} - \text{WE}_i . \texttt{start\_time}
\Delta e = \text{VE}_j . \texttt{end\_time} - \text{WE}_i . \texttt{end\_time}
\texttt{if} \hspace{3mm} |\Delta s| < \texttt{thr} :
(\text{VE}_j, -1) \rightarrow \text{C}_i
\texttt{if} \hspace{3mm}|\Delta e| < \texttt{thr} :
(\text{VE}_j, 1) \rightarrow \text{C}_i

(removing)

(adding)

\text{C}_i, \text{WE}_i
\text{VE}_{j_1}, \text{io}_{j_1}
\text{VE}_{j_\ell}, \text{io}_{j_\ell}
\vdots

maximum likelihood estimation

\tilde \text{C}_i
\argmax\limits_{S\subseteq \text{C}} \hspace{2mm} \prod\limits_{(j, d)\in S} \left (\mathbb{P} (\text{VE}_j) \times \text{VE}_j.\texttt{io\_likelihood}[d] \right.
\times f_{\mathcal{N}(\text{TW}, \sigma)} \left( \sum_{(j,d)\in S} d \times \text{VE}_j.\texttt{weight} \right)
\left. \times f_{\mathcal{N}(\text{W}_{\text{time}}, \sigma_T)} \left( \text{VE}_j.\texttt{time}[d] \right)\right)
\text{C}_i, \text{WE}_i
\text{VE}_{j_1}, \text{io}_{j_1}
\text{VE}_{j_\ell}, \text{io}_{j_\ell}
\vdots

maximum likelihood estimation

\tilde \text{C}_i
\tilde \text{C}_1, \dots, \tilde \text{C}_N, \hspace{2mm} \displaystyle\sum_{i=1}^N \text{WE}_i

final match (optional):

MLE

\texttt{transaction}
For i in {1, ..., N}:
C = []
For j in {1, ..., M}:
\Delta s = \text{VE}_j . \texttt{start\_time} - \text{WE}_i . \texttt{start\_time}
\Delta e = \text{VE}_j . \texttt{end\_time} - \text{WE}_i . \texttt{end\_time}
\texttt{if} \hspace{3mm} \Delta s > -\texttt{K} \texttt{ and } |\Delta s| < \texttt{thr} \texttt{ and } \texttt{AR}[j] :
\texttt{C}.\texttt{append}\left( (j, -1) \right)
\texttt{if} \hspace{3mm} \Delta e < \texttt{K} \texttt{ and } |\Delta e| < \texttt{thr} \texttt{ and } \texttt{AA}[j] :
\texttt{C}.\texttt{append}\left( (j, 1) \right)

full algorithm:

For i in {1, ..., N}:
...
\texttt{mle} = \texttt{MLE}( \text{WE}_i, C, \texttt{static}[\text{WE}_i .\texttt{shelf\_n}] )
\texttt{if} \hspace{3mm} \texttt{io} > 0 :
\texttt{AA}[j] = \texttt{False}
\texttt{else}:
\texttt{AR}[j] = \texttt{False}
For j,io in mle:
\texttt{AA} = [\texttt{True}, \dots, \texttt{True}]
\texttt{AR} = [\texttt{True}, \dots, \texttt{True}]

(available to remove)

(available to add)

...
For j in {1, ..., M}:
\texttt{if} \hspace{3mm} \texttt{AA}[j] \texttt{ and } \texttt{AR}[j]:
\text{VE}_j . \texttt{valid} = \texttt{False}
\texttt{elif } \texttt{AA}[j]:
\text{VE}_j . \texttt{in\_out} = -1
\texttt{elif } \texttt{AR}[j]:
\text{VE}_j . \texttt{in\_out} = 1
\texttt{else}:
\text{VE}_j . \texttt{in\_out} = 0

Static + Video + Weights:

\texttt{shelf\_1}
\vdots
\texttt{shelf\_k}
\texttt{static}
\texttt{video}
\times

match

\texttt{remaining\_static}
\texttt{remaining\_video}
\texttt{matched\_events}
\texttt{remaining\_static}
\texttt{remaining\_video}
\texttt{matched\_events}
\texttt{if matched\_events} \approx \texttt{total\_weight}:
\texttt{done}
\texttt{else}:
\texttt{events}
\texttt{events}
\texttt{total\_weight}
\times

MLE

\texttt{transaction}

Shelf coordinates:

Extrinsic Transform: