Santiago Álvarez Rodríguez
Software engineer, front-end developer and language learner.
OK, so your Product Manager comes to you and asks you for a (hopefully very) easy task:
Help me find out how many SLOC* have we got in total... but wait, I'd like to see only Ruby code... Throw anything else away. How about I come back in 10 minutes?
(*) SLOC: Source LOC, that is, no comments nor docstrings, nor blank lines.
Not cool, right?
Well, no need to tear your hair out. By the end of these slides, you'll be like...
NOTE:
We'll be using the Ruby on Rails source code for the examples. You can clone it from Github:
git clone https://github.com/rails/rails
find
ing your way through the file systemSome basics first
# Remember: we care only about Ruby code, so let's find Ruby files first
# General syntax: find <root dir> <conditions> <other options>
~/Documents/rails $ find . -name '*.rb'
# you've probably seen a lot of output here
# I mean it, a lot!!!...
./rails/railties/test/json_params_parsing_test.rb
./rails/railties/test/path_generation_test.rb
./rails/railties/test/paths_test.rb
./rails/railties/test/rack_logger_test.rb
./rails/railties/test/rails_info_controller_test.rb
./rails/railties/test/rails_info_test.rb
./rails/railties/test/railties/engine_test.rb
./rails/railties/test/railties/generators_test.rb
./rails/railties/test/railties/mounted_engine_test.rb
./rails/railties/test/railties/railtie_test.rb
./rails/railties/test/test_unit/reporter_test.rb
./rails/railties/test/version_test.rb
./rails/tasks/release.rb
./rails/tools/test.rb
./rails/version.rb
That was nice. Now, we probably wanna skip test files
# You really need the '*'. Otherwise, 'find' would try
# to do exact matches
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*'
# more output above...
./rails/railties/lib/rails/railtie/configurable.rb
./rails/railties/lib/rails/railtie/configuration.rb
./rails/railties/lib/rails/ruby_version_check.rb
./rails/railties/lib/rails/source_annotation_extractor.rb
./rails/railties/lib/rails/tasks.rb
./rails/railties/lib/rails/test_help.rb
./rails/railties/lib/rails/test_unit/line_filtering.rb
./rails/railties/lib/rails/test_unit/minitest_plugin.rb
./rails/railties/lib/rails/test_unit/railtie.rb
./rails/railties/lib/rails/test_unit/reporter.rb
./rails/railties/lib/rails/test_unit/test_requirer.rb
./rails/railties/lib/rails/version.rb
./rails/railties/lib/rails/welcome_controller.rb
./rails/tasks/release.rb
./rails/tools/test.rb
./rails/version.rb
Let's get fancy
# '-regex' matches the whole path, including file name
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$'
./activerecord/lib/active_record/associations/belongs_to_association.rb
./activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
./activerecord/lib/active_record/associations/has_many_association.rb
./activerecord/lib/active_record/associations/has_many_through_association.rb
./activerecord/lib/active_record/associations/has_one_association.rb
./activerecord/lib/active_record/associations/has_one_through_association.rb
./activerecord/test/cases/associations/belongs_to_associations_test.rb
./activerecord/test/cases/associations/has_and_belongs_to_many_associations_test.rb
./activerecord/test/cases/associations/has_many_associations_test.rb
./activerecord/test/cases/associations/has_many_through_associations_test.rb
./activerecord/test/cases/associations/has_one_associations_test.rb
./activerecord/test/cases/associations/has_one_through_associations_test.rb
Damn, those test files again!
# That '\' at the end of line is just the way you tell Bash
# the command's being split into multiple lines
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$' \
> -not -path '*/test/*'
./activerecord/lib/active_record/associations/belongs_to_association.rb
./activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
./activerecord/lib/active_record/associations/has_many_association.rb
./activerecord/lib/active_record/associations/has_many_through_association.rb
./activerecord/lib/active_record/associations/has_one_association.rb
./activerecord/lib/active_record/associations/has_one_through_association.rb
Further filtering
# '-type d' to search for directories. '-type f' to search for files
~/Documents/rails $ find . -type d -name '*active_record*'
./activerecord/lib/active_record
./activerecord/lib/rails/generators/active_record
./activerecord/test/active_record
./railties/test/fixtures/lib/generators/active_record
# '-maxdepth' to limit how deep you wanna go through
~/Documents/rails $ find . -maxdepth 3 -type f -name '*active_record*'
./actionview/test/active_record_unit.rb
./activerecord/lib/active_record.rb
./guides/bug_report_templates/active_record_gem.rb
./guides/bug_report_templates/active_record_master.rb
./guides/bug_report_templates/active_record_migrations_gem.rb
./guides/bug_report_templates/active_record_migrations_master.rb
./guides/source/active_record_basics.md
./guides/source/active_record_callbacks.md
./guides/source/active_record_migrations.md
./guides/source/active_record_postgresql.md
./guides/source/active_record_querying.md
./guides/source/active_record_validations.md
# '-size -N' to get files less than N bytes.
# '-size +N' to get files greater than N bytes.
# You can also specify units (e.g, -4k, +3.5M)
~/Documents/rails $ find . -type f -name '*.rb' -size -1k
./actionpack/test/controller/controller_fixtures/app/controllers/admin/user_controller.rb
./actionpack/test/controller/controller_fixtures/app/controllers/user_controller.rb
./actionpack/test/controller/controller_fixtures/vendor/plugins/bad_plugin/lib/plugin_controller.rb
./activejob/test/support/delayed_job/delayed/serialization/test.rb
~/Documents/rails $ find . -size +1M
./.git/objects/pack/pack-0d3c6964a83a501da27c9eb5ee55e077fa112064.pack
./.git/objects/pack/pack-0d3c6964a83a501da27c9eb5ee55e077fa112064.idx
./activesupport/lib/active_support/values/unicode_tables.dat
./guides/assets/images/getting_started/rails_welcome.png
Do something with those files you found
# '{}' is a placeholder and gets replaced for each file 'find' has found
# Notice how we end the command: '\;'
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$' \
> -exec rm -f {} \;
~/Documents/rails $ git status
# On branch master
# Changes not staged for commit:
# (use "git add/rm <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# deleted: activerecord/lib/active_record/associations/belongs_to_association.rb
# deleted: activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
# deleted: activerecord/lib/active_record/associations/has_many_association.rb
# deleted: activerecord/lib/active_record/associations/has_many_through_association.rb
# deleted: activerecord/lib/active_record/associations/has_one_association.rb
# deleted: activerecord/lib/active_record/associations/has_one_through_association.rb
# deleted: activerecord/test/cases/associations/belongs_to_associations_test.rb
# deleted: activerecord/test/cases/associations/has_and_belongs_to_many_associations_test.rb
# deleted: activerecord/test/cases/associations/has_many_associations_test.rb
# deleted: activerecord/test/cases/associations/has_many_through_associations_test.rb
# deleted: activerecord/test/cases/associations/has_one_associations_test.rb
# deleted: activerecord/test/cases/associations/has_one_through_associations_test.rb
#
no changes added to commit (use "git add" and/or "git commit -a")
grep
to get exactly what you wantThe most simple use case
# From 'active_support.rb', take those lines matching 'require'
~/Documents/rails $ grep require activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"
# '-E' to use extended regex syntax
~/Documents/rails $ grep -E 'require "[^/]+"$' activesupport/lib/active_support.rb
require "securerandom"
Now, show me lines NOT matching the pattern
~/Documents/rails $ grep -vE 'require "[^/]+"$' activesupport/lib/active_support.rb
# more output above...
def self.halt_callback_chains_on_return_false
Callbacks.halt_and_display_warning_on_return_false
end
def self.halt_callback_chains_on_return_false=(value)
Callbacks.halt_and_display_warning_on_return_false = value
end
def self.to_time_preserves_timezone
DateAndTime::Compatibility.preserve_timezone
end
def self.to_time_preserves_timezone=(value)
DateAndTime::Compatibility.preserve_timezone = value
end
end
autoload :I18n, "active_support/i18n"
Let's match multiple expressions
~/Documents/rails $ grep -E -e '^require' -e 'activesupport' activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"
# Something looks odd here... let's try to fix that
# '-i' for case-insensitive matching
~/Documents/rails $ grep -iE -e '^require' -e 'activesupport' activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"
module ActiveSupport
extend ActiveSupport::Autoload
# Now that's what I'm talking about! :)
Can I query multiple files?... Oh, you bet!
~/Documents/rails $ cd activerecord/lib/active_record/associations
lib/active_record/associations $ grep -E '^\s*class ' *.rb
alias_tracker.rb: class AliasTracker # :nodoc:
association.rb: class Association #:nodoc:
association_scope.rb: class AssociationScope #:nodoc:
association_scope.rb: class ReflectionProxy < SimpleDelegator # :nodoc:
belongs_to_association.rb: class BelongsToAssociation < SingularAssociation #:nodoc:
belongs_to_polymorphic_association.rb: class BelongsToPolymorphicAssociation < BelongsToAssociation #:nodoc:
collection_association.rb: class CollectionAssociation < Association #:nodoc:
collection_proxy.rb: class CollectionProxy < Relation
has_many_association.rb: class HasManyAssociation < CollectionAssociation #:nodoc:
has_many_through_association.rb: class HasManyThroughAssociation < HasManyAssociation #:nodoc:
has_one_association.rb: class HasOneAssociation < SingularAssociation #:nodoc:
has_one_through_association.rb: class HasOneThroughAssociation < HasOneAssociation #:nodoc:
join_dependency.rb: class JoinDependency # :nodoc:
join_dependency.rb: class Aliases # :nodoc:
join_dependency.rb: class Table < Struct.new(:node, :columns) # :nodoc:
preloader.rb: class Preloader #:nodoc:
preloader.rb: class AlreadyLoaded # :nodoc:
preloader.rb: class NullPreloader # :nodoc:
singular_association.rb: class SingularAssociation < Association #:nodoc:
Show me some line numbers!
~/Documents/rails $ cd activerecord/lib/active_record/associations
lib/active_record/associations $ grep -nE '^\s*class ' *.rb
alias_tracker.rb:6: class AliasTracker # :nodoc:
association.rb:18: class Association #:nodoc:
association_scope.rb:3: class AssociationScope #:nodoc:
association_scope.rb:96: class ReflectionProxy < SimpleDelegator # :nodoc:
belongs_to_association.rb:4: class BelongsToAssociation < SingularAssociation #:nodoc:
belongs_to_polymorphic_association.rb:4: class BelongsToPolymorphicAssociation < BelongsToAssociation #:nodoc:
collection_association.rb:26: class CollectionAssociation < Association #:nodoc:
collection_proxy.rb:30: class CollectionProxy < Relation
has_many_association.rb:8: class HasManyAssociation < CollectionAssociation #:nodoc:
has_many_through_association.rb:4: class HasManyThroughAssociation < HasManyAssociation #:nodoc:
has_one_association.rb:4: class HasOneAssociation < SingularAssociation #:nodoc:
has_one_through_association.rb:4: class HasOneThroughAssociation < HasOneAssociation #:nodoc:
join_dependency.rb:3: class JoinDependency # :nodoc:
join_dependency.rb:7: class Aliases # :nodoc:
join_dependency.rb:35: class Table < Struct.new(:node, :columns) # :nodoc:
preloader.rb:41: class Preloader #:nodoc:
preloader.rb:168: class AlreadyLoaded # :nodoc:
preloader.rb:183: class NullPreloader # :nodoc:
singular_association.rb:3: class SingularAssociation < Association #:nodoc:
Well, you know what? I only need to know which files have classes defined... ¯\_(ツ)_/¯
~/Documents/rails $ cd activerecord/lib/active_record/associations
lib/active_record/associations $ grep -lE '^\s*class ' *.rb
alias_tracker.rb
association.rb
association_scope.rb
belongs_to_association.rb
belongs_to_polymorphic_association.rb
collection_association.rb
collection_proxy.rb
has_many_association.rb
has_many_through_association.rb
has_one_association.rb
has_one_through_association.rb
join_dependency.rb
preloader.rb
singular_association.rb
What about showing how many matches are there?
~/Documents/rails $ cd activerecord/lib/active_record/associations
lib/active_record/associations $ grep -cE '^\s*class ' *.rb
alias_tracker.rb:1
association.rb:1
association_scope.rb:2
belongs_to_association.rb:1
belongs_to_polymorphic_association.rb:1
collection_association.rb:1
collection_proxy.rb:1
foreign_association.rb:0
has_many_association.rb:1
has_many_through_association.rb:1
has_one_association.rb:1
has_one_through_association.rb:1
join_dependency.rb:3
preloader.rb:3
singular_association.rb:1
through_association.rb:0
I probably won't remember (or know) '\bclass\b'
... How can I match 'class'
only as a whole word?
~/Documents/rails $ cd activerecord/lib/active_record/associations
lib/active_record/associations $ grep -wE class *.rb
alias_tracker.rb: class AliasTracker # :nodoc:
association.rb: # This is the root class of all associations ('+ Foo' signifies an included module Foo):
association.rb: class Association #:nodoc:
association.rb: # Returns the name of the table of the associated class:
association.rb: # Returns the class of the target. belongs_to polymorphic overrides this to look at the
association.rb: @reflection = @owner.class._reflect_on_association(reflection_name)
association.rb: attributes[reflection.type] = owner.class.base_class.name
association.rb: # the kind of the class of the associated objects. Meant to be used as
association.rb: "got #{record.inspect} which is an instance of #{record.class}(##{record.class.object_id})"
association.rb: # the association in the specific class of the record.
association_scope.rb: class AssociationScope #:nodoc:
# more output below...
Let's remember the "simple" task we've been given previously:
Help me find out how many SLOC have we got in total... but wait, I'd like to see only Ruby code... Throw anything else away. How about I come back in 10 minutes?
You're now able to do it, right?
Let's figure it out!
Only Ruby code; no tests: find . -name '*.rb' -not -path '*/test/*'
Drop comments: grep -vE -e '^\s*#' <input stream(1)>
Drop blanks: grep -vE '^\s*$' <input stream>
Count lines (2): wc -l <input stream>
Putting everything together...
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*' | grep -vE -e '^\s*#' -e '^\s*$' | wc -l
1030
1030 SLOC? Really?
That doesn't seem right!
Let's see what we're actually doing
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*'
# more output above...
./railties/lib/rails/tasks.rb
./railties/lib/rails/test_help.rb
./railties/lib/rails/test_unit/line_filtering.rb
./railties/lib/rails/test_unit/minitest_plugin.rb
./railties/lib/rails/test_unit/railtie.rb
./railties/lib/rails/test_unit/reporter.rb
./railties/lib/rails/test_unit/test_requirer.rb
./railties/lib/rails/version.rb
./railties/lib/rails/welcome_controller.rb
./railties/lib/rails.rb
./tasks/release.rb
./tools/test.rb
./version.rb
That's what we're feeding grep
with... Not quite what we want. Can you tell why?
How to fix it?
xargs
Let's try....
# '-print0' and '-0' are needed when piping 'find' through 'xargs'
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*' -print0 | \
> xargs -0 grep -vE -e '^\s*#' -e '^\s*$' | wc -l
65616
I don't know about you, but that seems a lot more reasonable to me.
The way
xargs
works is pretty simple (at least for the use case we've just seen):
It executes a command once for each line it gets as input. In the previous example, it takes each file name provided by
find
, and
grep
s over each one of them.
Pretty much the same thing
-exec
does when used as option in
find
.
Nothing better than an example
~/Documents/rails $ find . -name '*.md' -print0 | xargs -0 rm
~/Documents/rails $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
deleted: .github/issue_template.md
deleted: .github/pull_request_template.md
deleted: CODE_OF_CONDUCT.md
deleted: CONTRIBUTING.md
deleted: README.md
deleted: RELEASING_RAILS.md
deleted: actioncable/CHANGELOG.md
deleted: actioncable/README.md
# more output below...
By Santiago Álvarez Rodríguez
find, grep and the tools you can't (almost) live without if your day-to-day is the shell.