`TextGraphs.jl`

offers Graphs representations of Text, along with natural language proccessing (NLP) functionalities. Check the white paper including vignettes with examples.

This package is inspired by SpeechGraphs. `TextGraphs.jl`

new features include pre-processing (e.g.lemmas), properties (e.g. centrality) and latent space embeddings (adding latent semantic information to graphs).

Julia uses multiple dispatching, focusing on modular functions and high-performance computing.

Check the documentation and the white paper for further information.

See the poster presentation at JuliaCon22:

Install with Pkg.

`pkg>add TextGraphs`

You should also have R and package udpipe available.

```
$sudo apt install r-base
$sudo Rscript -e 'install.packages("udpipe")'
```

You can build the following graphs from text (`AbstractString`

):

Raw

- Naive (
`naive_graph`

) uses the original sequence of words. - Phrases Graph(
`phrases_graph`

): Uses the original sequence of phrases.

POS, Stems and Lemmas

- Stem (
`stem_graph`

) uses stemmed words. - Lemma (
`lemma_graph`

): Uses lemmatized words. - Part of Speech Graph (POS,
`pos_graph`

) uses syntactical functions.

Latent space embeddings

- Latent space embedding (LSE,
`latent_space_graph`

) graphs. - Latent space embeddings to target (
`latent_space_graph`

)

You can obtain several properties of the graphs:

Direct measures

`graph_props`

returns values of density, # of self loops, # of SCCs, size of largest SCC, and mean centrality (betweeness, closeness and eigenvector methods).

Erdős–Rényi ratios

`rand_erdos_props`

returns values as compared to random Erdõs-Rényi graph with identical number of vertices and edges through z-score or ratio to average.

```
julia>using TextGraphs
julia>naive_graph("Sample for graph")
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)
julia>stem_graph("Sample for graph";snowball_language="english") # Optional keyword argument
{3, 2} directed Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)
julia> graph_props(naive_graph("Sample for graph"))
Dict{String, Real} with 7 entries:
"mean_close_centr" => 0.388889
"size_largest_scc" => 1
"num_strong_connect_comp" => 3
"density" => 0.333333
"num_self_loops" => 0
"mean_between_centr" => 0.166667
"mean_eig_centr" => 0.333335
```

```
using GraphMakie , GLMakie
g = naive_graph("Colorless green ideas sleep furiously")
stem_g = stem_graph("No meio do caminho tinha uma pedra tinha uma pedra no meio do caminho")
g_labels = map(x -> get_prop(naive_g,x,:token), collect(1:nv(naive_g)))
stem_g_labels = map(x -> get_prop(stem_g,x,:token), collect(1:nv(stem_g)))
graphplot(naive_g,nlabels=g_labels)
graphplot(stem_g,nlabels=stem_g_labels)
spec3_layout = Spectral(dim=3)
graphplot(naive_g,node_size=30,nlabels=g_labels,layout=spec3_layout)
```

Besides SpeechGraphs, there's a previous object-oriented Python implementation by github/facuzeta.