Introduction to Deep Learning
Deadline: November 18th, 8pm
submission options changed!
As we don’t implement much on our own, you don’t need to send your code.
Please send a PDF document with your chosen images with object detection & comments (details below).
While MNIST is a convenient test bed for deep learning models due to its simplicity, real world problems generally work on far more complex data and also have more complex requirements. One very popular application is object detection: Given an image, the model should determine both where in the image objects of interest are located and what these images are.
Because
solving this task yourself from the ground up would be “a little bit” too much at this point. Luckily, there is a Tensorflow Object Detection API that implements this functionality already. We ask you to familiarize yourself with this API in this assignment.
There are articles on different topics linked on the github site, however those
might be a bit overwhelming. Check out
this medium article
with a link to a Colab notebook. Open this notebook in “playground mode” (top
left) to be able to run and modify it. Run the notebook (the first cell installs
stuff on the virtual environment that you might need to reinstall if you come
back at a later point) and observe the output. Try to understand each cell on
its own. Here
you can find another blog post on “frozen” graphs in case the third cell
confuses you. Note that it seems as if the notebook might sometimes spontaneously
combust, in which case you will have to run everything again. Sorry! You might
get around this by inserting tf.reset_default_graph()
in the beginning of step 3.
This paragraph describes, what goes in the submission!
Next, we want to modify the example somewhat. At the bottom of the notebook you
can find some pointers on how to upload and detect objects in your own images.
Pick some images that you like (e.g. holiday pictures, or just download some
from somewhere) and perform detection on those images instead. Furthermore,
there are many pre-trained models available to do object detection with. You can
find these
here.
Use at least 3 other models to classify the above pictures. Comment on any
differences in the outputs, such as quality and time taken.
Aside from this, there are numerous other things you could try (which may be excluded from the submission).
For example, while most models are trained on the COCO dataset, there are also other datasets
available. Try out some of the other models, potentially with inappropriate data
(e.g. the KITTI dataset is mainly intended for self-driving cars…). Make sure
that you are updating the label path accordingly!
Also, while the example model draws bounding boxes, there also models
available that draw masks on the data. Use one such model and try drawing the
masks. This blog post
should be helpful here (in particular the linked Github code).
There are many other things to try linked on the API website
(see above), e.g. you could try transfer learning (retraining an existing model
for a different task) or even training your own model from scratch (although
this will likely take too long – maybe try MNIST ;)).