Visual-content based Video Retrieval on Natural Language Query
Used encoderdecoder model with LSTM units to encode the temporal sequence of frames and generated a fixed length caption for every video. Captions are embedded using skip-thoughts vector for query time. Pre-trained image recognition model was used to extract features from frames.