My thoughts on the course so far

I was surprised at how image recognizers learn to see different patterns in different layers of the CNN (page 34 - 36 in the book). I found it amazing to get an idea of what’s going on inside the model when it is classifying images.

I was also pleased that there is a focus in the course on the ethics of how to use Machine Learning systems, for example the discussion of how ‘predictive policing’ algorithms can cause a feedback loop and reinforce existing biases. For example in the US there is a bias in arrest rates based on race, and predictive policing algorithms could reinforce these biases.

My questionnaire answers

  1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.
    • When in production, the model may see bears in positions which are rarely seen in the training data, because the training data comes from photos people post online. For example in production the model may see bears from behind or partially covered by bushes.
  2. Where do text models currently have a major deficiency?
    • Although Deep Learning is good at generating context-appropriate text such as a convincing reply to a social media post, Deep learning is not currently good at generating factually correct content. For example, we don’t have a reliable way to combine a knowledge base of medical information with a deep learning model to generate medically correct text, such as answers questions posted by patients in an app.
  3. What are possible negative societal implications of text generation models?
    • Politicians could use them to generate compelling social media material to influence voters, which is factually incorrect.
  4. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?
    • Start with an entirely manual process, with the deep learning model not being used directly to drive any actions. The humans involved in the process should look at the output of the model and decide whether to take action. The model is simply an extra source of information for them.
  5. What kind of tabular data is deep learning particularly good at?
    • Where there are columns containing natural language (eg book reviews) and high-cardinality categorical columns (i.e. something containing a large number of discrete choices, such as product ID).
  6. What’s a key downside of directly using a deep learning model for recommendation systems?
    • They tell you which products a user may like, without considering if these recommendations would actually be helpful for the user. Eg if the user likes books by Terry Pratchett the system may recommend more books by Terry Pratchett, which may not be that helpful as the user may already be aware of those books. It may be more helpful if the system could recommend something similar to Terry Pratchett, but by a different author.
  7. What are the steps of the Drivetrain Approach?
    1. Define objective: what outcome are you trying to achieve?
    2. Levers: what actions can you take to better achieve that objective?
    3. Data: what inputs can you collect in order to use the levers?
    4. Models: a predictive model takes both the levers and any uncontrolled variables as inputs; the outputs from the models can be used to predict the final state of the objective.
  8. How do the steps of the Drivetrain Approach map to a recommendation system?
    1. Objective: drive additional sales (or engage users for a system like Spotify) by recommending them items they otherwise would not have been aware of.
    2. Levers: rank the possible recommendations.
    3. Data: conduct randomized experiments to collect data about a wide range of recommendations for a wide range of customers.
    4. Models: build 2 models of purchase probabilities, conditional on seeing or not seeing a recommendation. The difference between these two probabilities is a utility function for a given recommendation to a customer. It is low in cases where the recommended item is already familiar to the customer and they don’t like it (both components are low because the user is unlikely to buy it, regardless of whether they see the recommendation), or if they would have bought it even without the recommendation (both components are large and cancel each other out).
  9. Create an image recognition model using data you curate, and deploy it on the web.
  10. What is DataLoaders?
    • A fastai class that stores multiple DataLoaders objects you pass to it. Normally you would pass it a train and a valid, but it’s possible to pass in as many as you like. The first two DataLoaders passed in (usually train and valid) are made available as properties.
  11. What four things do we need to tell fastai to create DataLoaders?
    • What kinds of data we’re working with (e.g. images)
    • How to get the list of items (e.g. get all images within a path)
    • How to label the items (e.g. the label is the folder name that the image came from)
    • How to create the validation set (e.g. take 20% of the data at random)
  12. What does the splitter parameter to DataBlock do?
    • It defines how to split the training and validation sets.
  13. How do we ensure a random split always gives the same validation set?
    • Fix the random seed
  14. What letters are often used to signify the independent and dependent variables?
    • Independent - x
    • Dependent - y
  15. What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?
    • Crop - crops the image to fit a square shape, using the full width or height of the image.
    • Pad - adds black strips to the tops or sides of the image to make it fit a square.
    • Squish - squish/stretch the image horizontally or vertically to make it fit a square.
    • There are scenarios where we may choose to use each, but what we usually do is to randomly crop different parts of the image and use different parts on each epoch of training our model.
  16. What is data augmentation? Why is it needed?
    • Creating random variations of our input data, so that they look different but the meaning of the data is unchanged. It is needed because untrained neural networks know nothing about how images behave. For example it will not recognize that if an image is rotated by one degree it is still the same image. So training the network with examples of images where the objects are in slightly different places and are slightly different sizes helps it to understand the basic concept of what an object is, and how it can appear in images.
  17. What is the difference between item_tfms and batch_tfms?
    • item_tfms are item transforms, pieces of code that run on each individual item.
    • batch_tfms are batch transforms, where we apply a piece of code to a batch of items at the same time. Using a GPU, this can save a lot of time.
  18. What is a confusion matrix?
    • A matrix which shows the actual and predicted labels. The diagonal cells of the matrix show images that were classified correctly, and the off-diagonal cells show images that were classified incorrectly.
  19. What does export save?
    • A .pkl., or Pickle, file. This is a file created by pickle, a Python module that enabless objects to be serialized to files on disk and deserialized back into the program at runtime. It contains a byte stream that represents the objects.
  20. What is it called when we use a model for getting predictions, instead of training?
    • Inference
  21. What are IPython widgets?
    • GUI components that bring together JavaScript and Python functionality in a web browser, which can be created and used in a Jupyter notebook.
  22. When might you want to use CPU for deployment? When might GPU be better?
    • A CPU is fine for serving most models in production. A GPU might be better if you have a high-volume site.
  23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
    • The app requires a network connection, and there will be some latency each time the model is called. If your app uses sensitive data users may be concerned about sending this to a remote server.
  24. What are three examples of problems that could occur when rolling out a bear warning system in practice?
    • It would need to deal with night time images, which may not appear in many online pictures of bears.
    • It may need to use low resolution camera images, which it may not have been trained on, and which provide less data for it to work with.
    • Results need to be returned fast enough so that action can be taken if required.
  25. What is “out-of-domain data”?
    • Data that the model sees in production which is very different to what it saw on training.
  26. What is “domain shift”?
    • When the type of data that the model sees changes over time. For example an insurance company may use a deep learning model in its pricing algorithm, but over time the types of customers that the company attracts may change so that the original training data is no longer relevant.
  27. What are the three steps in the deployment process?
    • Manual process - humans check all predictions.
    • Limited scope deployment - careful human supervision of actions taken.
    • Gradual expansion - expand time and geography of where the model is used. Ensure good reporting systems are in place.