Goal: make a nice network visualization of connections between developmental disorder-associated proteins and explore the network

1. We start by loading the network for this practical part into Cytoscape. Open Cytoscape and then go to File -> Import -> Network from File. Select the file STRING_interactions_Kaplanis_285.tsv and click Open. (This file has the same format as the one you downloaded earlier from STRING and contains network data that I exported for you for 285 neurodevelopmental disorder-associated genes.) In Cytoscape, you should now see a table-like view with all the columns from the input file. Every column header has a little icon at the beginning showing how Cytoscape recognized this column. The green and red circle indicate the two columns that Cytoscape will use to draw edges between nodes, the other columns with a sheet icon are recognized as describing attributes for the interactions. Make sure the column node1 and node2 are having the green and red circle as icon, if not, click on it to change it. When you are done, click Ok.

2. Familiarize yourself with the view of the network. Zoom in, try to select individual nodes and edges and groups of nodes and edges. Try to move nodes around, etc. Explore how your node and edge selection changes what is displayed in the Node Table and Edge Table below the network window compared to when no nodes and edges are selected. You can also select nodes or edges by clicking onto rows in the respective Node or Edge Table. Make sure you understand the content of the node and edge table.

3. The network has too many edges to really see anything. One way to make the network more explorable is by removing less confident edges or edges without solid experimental evidence. Inspect the column "database_annotated" in the Edge Table and understand how you could select rows (=interactions) that do not have this evidence. (Interactions with evidence from database_annotated in STRING are those that were imported from high throughput interactome studies or other protein interaction databases that do manual curation of interactions from publications. These interactions likely correspond to actual biophysical interactions.) Then, use the Filter pane (at the very left of the Cytoscape window) and figure out how to select all edges that DO NOT have evidence from "database_annotated". Once you think you correctly selected all interactions that you want to remove, click on Edit (at the top of the Cytoscape window) and "Remove selected nodes and edges" to delete the selected edges from the network. 

4. Since the layout of the network hasn't changed when we removed edges, it is hard to see if the network is now less dense so that we can start exploring it. Let's therefore redraw the layout of the network. For this play with the options provided under the "Layout" Menu (at the top of the Cytoscape window). My preferred layout is "prefuse force-directed" but you should play around with many of them to see what they are doing and maybe you prefer another one? Once you redraw the network and you feel like it is still too dense, you can keep playing with the filter pane and remove more edges using information in the "combined_score" column, e.g. you can remove all edges with a combined score < 0.8.

5. I would like to explore how the 28 new developmental disorder (DD)-associated proteins are potentially connected to previously identified disease proteins. The 28 new disease genes are among the 285 genes in the network but I don't know which ones they are. We can use Cytoscape to add additional columns to the Edge and Node table with annotations that would help us differentiate new from previously known disease genes. Let's try this. You can import attributes to nodes or edges using the File Menu -> Import -> Table from File. Select the file Kaplanis_28_genes_annotated.txt and click Open. You will see again a table that shows the content of the file you had just selected and how Cytoscape interprets the content of each column of that table. The file we just opened contains information for nodes, i.e. the 28 new disease genes. To make sure that these attributes are added to the nodes and not the edges, you need to select for "Import Data as:" Node Table Columns. To assign this new information correctly to every node in your network, Cytoscape needs to know which column contains the right node identification information. Cytoscape is making a first guess and puts the key icon next to the column header for the column that it thinks contains the node identification information. Our nodes are identified by the names of the genes. Make sure the key symbol is part of the column header that contains the gene names and not the Uniprot ACs. You can then click Ok. To see whether Cytoscape has added the columns of this file correctly to the existing Node Table in Cytoscape, take a look at the Node Table and scroll to the very right. You will see a couple of new empty columns that you have just added to the table. By clicking on the column header of "Kaplanis_gene_new" it will sort the column. Click until it sorts such that you see the "y" first in this column. There should be 28 rows with a "y" in this column indicating for you, which of the 285 genes correspond to the new disease genes.

6. We will now learn how we can visualize different node and edge attributes on the network using the Style pane at the left side of the Cytoscape window. Please, note that the Style pane has multiple tabs, the two important ones for us today are called Node and Edge. The Node tab allows us to play with how nodes are shown in the network and the Edge tab is doing the same for edges. Let's try to figure out how we can color all nodes in the network in one color that correspond to the 28 new disease genes and how to color all other nodes differently. Find the right style option (make sure you are in the Node tab) that controls the node color, select the correct column (Kaplanis_gene_new), choose as Mapping type "Discrete Mapping" and choose a color for the "y" by clicking into the empty field next to the "y". Figure out how to change the default color of the nodes as well. Next, try to change the shape of all nodes or if you like choose different shapes for the new and previously known DD genes. If the labels of the nodes (the gene names), are too small, increase their font size. You can also play with the font color to improve readability.

7. Now, let's also visualize some edge attributes. For this, switch first to the Edge tab under the Style pane and try to figure out how to vary the thickness of the edges according to the "combined_score" column, i.e. more confident edges (higher score) could be thicker. Try to find settings that draw a "pretty" network.

8. To further prioritize interesting proteins and interactions I would like to know and visualize on the network how strongly each of the genes is expressed in brain tissue. To do this, we can use the file GTEx_tissue_expression_Kaplanis_genes_285.txt. This file contains for every gene information how strongly it is expressed in different human tissues. Feel free to open the file first with another software to see and understand its content. To add the gene expression data to the nodes in the network, repeat the steps 5 and 6 with this new file by loading the gene expression data as additional node attributes to the network (make sure the correct column with the gene names is selected with the node identification information!) and try to visualize strength of expression based on information in the "brain other" column using for example the size of the node (more highly expressed -> larger node). You are encouraged to keep playing with the node and edge attribute and style options.

9. Let's save our network visualization. Under the File Menu you can save this Cytoscape session (try it) and you have different export options (explore the Network to Image and Network to File options).

10. If you are still fancy exploring Cytoscape, you can check out the Tools Menu, play more with the styles, or think of additional attributes you would like to add to and visualize on the network. Maybe you have an idea where to get this information from? Alternatively, you can now start exploring the network. Do you find any interesting connections between new and previously known disease genes? Do you find clusters of genes?